From 23a364774e39345bbbd24140a19fa13ecae88fab Mon Sep 17 00:00:00 2001
From: James Turk <dev@jpt.sh>
Date: Mon, 30 Sep 2024 00:23:37 -0500
Subject: [PATCH] 01 and 02

---
 01.basics-1.ipynb |  20 ++-
 02.basics-2.ipynb | 357 +++++++++++++++++++++-------------------------
 data/animals.txt  |   4 +
 data/cnetids.txt  |   3 +
 data/emails.txt   |   3 +
 data/names.txt    |   2 +
 6 files changed, 183 insertions(+), 206 deletions(-)
 create mode 100644 data/animals.txt
 create mode 100644 data/cnetids.txt
 create mode 100644 data/emails.txt
 create mode 100644 data/names.txt

diff --git a/01.basics-1.ipynb b/01.basics-1.ipynb
index 1a3d5d4..21ffbb2 100644
--- a/01.basics-1.ipynb
+++ b/01.basics-1.ipynb
@@ -1327,7 +1327,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 1,
    "id": "f3b01815",
    "metadata": {
     "slideshow": {
@@ -1353,12 +1353,12 @@
     "\n",
     "bad_tuple = (1+492)\n",
     "\n",
-    "print(bad_tuple)"
+    "print(bad_tuple)  # why is this not a tuple?"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 2,
    "id": "f6c54b13",
    "metadata": {
     "slideshow": {
@@ -1368,16 +1368,12 @@
    },
    "outputs": [],
    "source": [
-    "multi_item = (1, 2.0, \"three\")\n",
-    "\n",
-    "# parentheses are optional\n",
-    "\n",
-    "multi_item2 = 1, 2.0"
+    "multi_item = (1, 2.0, \"three\")"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 4,
    "id": "310a20b3-ef99-43cf-a9eb-60674beb8967",
    "metadata": {
     "tags": []
@@ -1387,12 +1383,12 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "(1, 2.0)\n"
+      "(1, 2.0, 'three')\n"
      ]
     }
    ],
    "source": [
-    "print(multi_item2)"
+    "print(multi_item)"
    ]
   },
   {
@@ -2551,7 +2547,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.6"
+   "version": "3.10.15"
   },
   "toc": {
    "base_numbering": 1,
diff --git a/02.basics-2.ipynb b/02.basics-2.ipynb
index c13347c..5130135 100644
--- a/02.basics-2.ipynb
+++ b/02.basics-2.ipynb
@@ -29,7 +29,7 @@
    "source": [
     "## Iteration\n",
     "\n",
-    "Last week we ended on `for` loops.\n",
+    "Last week we introduced `for` loops.\n",
     "\n",
     "```\n",
     "for var_name in iterable:\n",
@@ -38,6 +38,8 @@
     "\n",
     "What is an **iterable**?  Why not just say **sequence**?\n",
     "\n",
+    "What **sequences** have we seen?\n",
+    "\n",
     "### More Iterables"
    ]
   },
@@ -60,7 +62,7 @@
     "\n",
     "Same rules as slice, always **inclusive** of start, **exclusive** of stop.\n",
     "\n",
-    "*or as you'd write mathematically:* ```[start, stop)``` -- we've seen this before with slicing"
+    "or as you might write: ```[start, stop)``` -- we've seen this before with slicing"
    ]
   },
   {
@@ -154,14 +156,23 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 1,
    "id": "7ed88d5c-e848-46f8-8fc5-8d48fadc303e",
    "metadata": {
     "tags": []
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0 A\n",
+      "1 B\n",
+      "2 C\n"
+     ]
+    }
+   ],
    "source": [
-    "\n",
     "i = 0\n",
     "for x in [\"A\", \"B\", \"C\"]:\n",
     "    print(i, x)\n",
@@ -2602,7 +2613,7 @@
     "toc-hr-collapsed": true
    },
    "source": [
-    "## Functions\n",
+    "## Functions Revisited\n",
     "\n",
     "A function is a set of statements that can be called more than once.\n",
     "\n",
@@ -2667,84 +2678,6 @@
     "This means mutability determines whether or not a function can modify a parameter in the outer scope."
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "d9f602cd-d726-47fa-b3d6-ec51a86f760d",
-   "metadata": {
-    "slideshow": {
-     "slide_type": "subslide"
-    }
-   },
-   "source": [
-    "### return\n",
-    "\n",
-    "- `return` may appear anywhere in a function body, including multiple times.\n",
-    "\n",
-    "- The first `return` encountered exits the function.\n",
-    "\n",
-    "- Every function in python returns a value. \n",
-    "\n",
-    "- If no `return` statement is present, `None` is implicitly returned."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3da80043-16b9-47e7-9aa2-157b5fa29ea0",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def is_even(num):\n",
-    "    return num % 2 == 0\n",
-    "\n",
-    "\n",
-    "print(is_even(3))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "31cd6b0d-6932-43cb-94ea-183fbe3491b4",
-   "metadata": {
-    "slideshow": {
-     "slide_type": "slide"
-    }
-   },
-   "source": [
-    "###  `pass` statement\n",
-    "\n",
-    "Can be used whenever you need to leave a block empty.  Usually temporarily.\n",
-    "\n",
-    "```python\n",
-    "\n",
-    "if x < 0:\n",
-    "    pass # TODO: figure this out later\n",
-    "\n",
-    "\n",
-    "def func():\n",
-    "    pass\n",
-    "```\n",
-    "\n",
-    "**What does func return?**"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "61fdaebf-2b45-4c83-a4c0-f46bbeced5c1",
-   "metadata": {
-    "slideshow": {
-     "slide_type": "slide"
-    }
-   },
-   "source": [
-    "### docstrings\n",
-    "\n",
-    "Functions should provide docstrings, which are strings declared as the first statement within the function body.\n",
-    "\n",
-    "Almost always use triple-quotes to allow multi-line formatting.\n",
-    "\n",
-    "The style guide & assignments show examples of the format we expect."
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "4729a9a2-66af-4d9c-af2e-441c8486a92c",
@@ -2753,7 +2686,7 @@
     "toc-hr-collapsed": true
    },
    "source": [
-    "# I/O"
+    "## I/O"
    ]
   },
   {
@@ -2761,7 +2694,7 @@
    "id": "59141a54-daf9-4a55-ad87-b80d34260378",
    "metadata": {},
    "source": [
-    "## `print()`\n",
+    "### `print()`\n",
     "\n",
     "`print(*objects, sep=' ', end='\\n', file=sys.stdout, flush=False)`\n",
     "\n",
@@ -2807,27 +2740,67 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2bc22aaa-6192-4d5d-a01f-4c36e8e41ac0",
+   "id": "d07f8320-b048-477b-8b59-f171b5dbecd3",
    "metadata": {},
    "source": [
-    "## Files\n",
+    "### pathlib\n",
     "\n",
-    "Another built in type in Python.\n",
+    "There are a few ways of working with files in Python, mostly due to improvements over time.\n",
     "\n",
-    "Requires us to understand a bit more about how files & memory work.\n",
+    "You'll still sometimes see code that uses the older method with `open`, but there's almost no reason to write code in that style now that `pathlib` is widely available.\n",
     "\n",
-    "### Typical workflow:\n",
+    "To use `pathlib`, you'll need to import the `Path` object. (We'll discuss these imports more soon.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "118ba56e-91c3-4ad9-935e-2a44d1fd064c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c24256b-e9df-44da-884c-db0670bda68b",
+   "metadata": {},
+   "source": [
+    "Imports like this should be at the top of the file.\n",
+    "\n",
+    "To use this type you'll create objects with file paths, for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "c07cd9ee-48bc-4ae1-b813-20380bb2733d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# this looks like a function call\n",
+    "# but the capital letter denotes that this is instead a class\n",
+    "file_path = Path(\"data/names.txt\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cfec4041-a186-40ed-9967-e096d4f11ffb",
+   "metadata": {},
+   "source": [
+    "#### Typical workflow:\n",
     "\n",
     "- Read contents of file(s) from disk into working memory.\n",
     "- Parse and/or manipulate data as needed.\n",
     "- (Optional) Write data back to disk with modifications.\n",
     "\n",
-    "### Other Workflows\n",
+    "#### Other Workflows\n",
     "\n",
     "- Append-only (e.g. logging)\n",
     "- Streaming data (needed for large files where we can't fit into memory)\n",
     "\n",
-    "### Text vs. Binary\n",
+    "#### Text vs. Binary\n",
     "\n",
     "We're opening our files in the default, text mode. It is also possible to open files in a binary mode where it isn't assumed we're reading strings."
    ]
@@ -2850,74 +2823,23 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 7,
    "id": "e205aaba",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "data/emails.txt\n"
+     ]
+    }
+   ],
    "source": [
-    "# to access a file's contents, we need to open it\n",
-    "fd = open(\"emails.txt\")\n",
-    "\n",
-    "print(fd)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9a4277ac",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# fd is a `file` object, we can use methods to read from the file\n",
-    "emails = fd.read()\n",
-    "print(type(emails))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fd55b361",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# read() got all the data at once, split with \\n newlines\n",
-    "\n",
-    "# We can also iterate over the lines in the file\n",
-    "\n",
-    "fd.readlines()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "86dc9c0c-0712-4100-8da1-58bc084ed08c",
-   "metadata": {},
-   "source": [
-    "Open files have a 'cursor', we've reached the end of the file (EOF) so there isn't more to read."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f3691c99",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# if we use 'seek' we can rewind to the beginning of the file\n",
-    "fd.seek(0)\n",
-    "fd.readlines()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0e12a4c5",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# we can also iterate over the file\n",
-    "f = open(\"emails.txt\")\n",
-    "for email in f.readlines():\n",
-    "    print(email.strip())   # extra newline?"
+    "# to access a file's contents, we create the path, and then\n",
+    "# use read_text()\n",
+    "emails_path = Path(\"data/emails.txt\")\n",
+    "emails = emails_path.read_text()"
    ]
   },
   {
@@ -2932,48 +2854,55 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 22,
    "id": "d0aa3bf0",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[38;5;8m───────┬────────────────────────────────────────────────────────────────────────\u001b[0m\n",
+      "       \u001b[38;5;8m│ \u001b[0mFile: \u001b[1mdata/animals.txt\u001b[0m   <EMPTY>\n",
+      "\u001b[38;5;8m───────┴────────────────────────────────────────────────────────────────────────\u001b[0m\n"
+     ]
+    }
+   ],
    "source": [
-    "!rm names.txt\n",
+    "names_file = Path(\"data/animals.txt\").open(\"w\")\n",
+    "names_file.write(\"Aardvark\\nChimpanzee\\nElephant\\n\")\n",
     "\n",
-    "f = open(\"names.txt\", \"w\")\n",
-    "f.write(\"Bob\\nPhil\\n\")\n",
-    "f.write(\"Sally\\n\")\n",
-    "f.write(\"Rebecca\\n\")\n",
-    "f.write(\"Joan\\n\")\n",
-    "f.close()\n",
-    "\n",
-    "!cat names.txt"
+    "# (the ! indicates this is is a shell command, not Python)\n",
+    "!cat data/animals.txt"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 23,
    "id": "d9e2b317",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\u001b[38;5;8m───────┬────────────────────────────────────────────────────────────────────────\u001b[0m\n",
+      "       \u001b[38;5;8m│ \u001b[0mFile: \u001b[1mdata/animals.txt\u001b[0m\n",
+      "\u001b[38;5;8m───────┼────────────────────────────────────────────────────────────────────────\u001b[0m\n",
+      "\u001b[38;5;8m   1\u001b[0m   \u001b[38;5;8m│\u001b[0m \u001b[37mAardvark\u001b[0m\n",
+      "\u001b[38;5;8m   2\u001b[0m   \u001b[38;5;8m│\u001b[0m \u001b[37mChimpanzee\u001b[0m\n",
+      "\u001b[38;5;8m   3\u001b[0m   \u001b[38;5;8m│\u001b[0m \u001b[37mElephant\u001b[0m\n",
+      "\u001b[38;5;8m   4\u001b[0m   \u001b[38;5;8m│\u001b[0m \u001b[37mKangaroo\u001b[0m\n",
+      "\u001b[38;5;8m───────┴────────────────────────────────────────────────────────────────────────\u001b[0m\n"
+     ]
+    }
+   ],
    "source": [
-    "f = open(\"names.txt\", \"a\")\n",
-    "f.write(\"Hector\\n\")\n",
-    "f.flush()\n",
-    "!cat names.txt"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d8b46e54-4aca-4587-9a02-899bc04bbf5c",
-   "metadata": {},
-   "source": [
-    "**Important:** Opening in write mode clears the contents of the file.\n",
-    "\n",
-    "\"r\" : Read (default)\n",
-    "\n",
-    "\"w\" : Write\n",
-    "\n",
-    "\"a\" : Append"
+    "# open(\"w\") erases the file, use \"a\" if you want to append\n",
+    "names_file = Path(\"data/animals.txt\").open(\"a\")\n",
+    "names_file.write(\"Kangaroo\\n\")\n",
+    "names_file.flush()\n",
+    "!cat data/animals.txt"
    ]
   },
   {
@@ -2981,9 +2910,13 @@
    "id": "554800e9-bc9e-4c03-94f8-34da10d205fe",
    "metadata": {},
    "source": [
-    "#### `close`\n",
+    "#### `flush` and `close`\n",
     "\n",
-    "Very important to close a file.\n",
+    "`flush` ensures that the in-memory contents get written to disk, actually saved.\n",
+    "\n",
+    "(Analogy: program crashes and you lose your unsaved work)\n",
+    "\n",
+    "At the end, important to `close` the file.\n",
     "\n",
     "- Frees resources.\n",
     "- Allows other programs to access file contents.\n",
@@ -3003,7 +2936,7 @@
     "\n",
     "```python\n",
     "\n",
-    "with open(filename) as variable:\n",
+    "with path.open() as variable:\n",
     "    statement1\n",
     "    statement2\n",
     "```\n",
@@ -3126,10 +3059,46 @@
    "outputs": [],
    "source": []
   },
+  {
+   "cell_type": "markdown",
+   "id": "02eb6878-7a2d-46b0-a0f3-e7aab18ebe11",
+   "metadata": {},
+   "source": [
+    "### Note: Relative Paths\n",
+    "\n",
+    "You may find that if you are running your code from, for example, the homework1 directory instead of homework1/problem3, you'd need to modify this path to be `Path(\"problem3/towing.csv\")`.\n",
+    "\n",
+    "That is because by default, paths are *relative*, meaning that they are assumed to start in the directory that you are running your code from.\n",
+    "\n",
+    "This can be frustrating at first, you want your code to work the same regardless of what directory you are in.\n",
+    "\n",
+    "### Building an absolute path\n",
+    "\n",
+    "To get around this, you can construct an absolute path:\n",
+    "\n",
+    "First you can use the special `__file__` variable which always contains the path to the current file.\n",
+    "\n",
+    "Then you can use that as the \"anchor\" of your path, and navigate from there.\n",
+    "\n",
+    "A common pattern then is to get the current file's parent, and navigate from there:\n",
+    "\n",
+    "```python\n",
+    "from pathlib import Path\n",
+    "\n",
+    "path = Path(__file__).parent / \"towing.csv\"\n",
+    "```\n",
+    "\n",
+    "This line uses the special built-in variable `__file__` to get the path of the Python file itself.\n",
+    "It then gets this file's parent directory (`.parent`) and appends the filename \"towing.csv\" to it.\n",
+    "\n",
+    "Using this technique in your code allows you to set paths that don't depend on the current working directory.\n",
+    "\n"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "d0ab6c18-cf27-41a9-96fb-bf1fe78dfdca",
+   "id": "5a9bf4ab-dbcd-4dd3-a928-89d33ce88273",
    "metadata": {},
    "outputs": [],
    "source": []
@@ -3151,7 +3120,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.6"
+   "version": "3.10.15"
   },
   "toc": {
    "base_numbering": 1,
diff --git a/data/animals.txt b/data/animals.txt
new file mode 100644
index 0000000..f269a06
--- /dev/null
+++ b/data/animals.txt
@@ -0,0 +1,4 @@
+Aardvark
+Chimpanzee
+Elephant
+Kangaroo
diff --git a/data/cnetids.txt b/data/cnetids.txt
new file mode 100644
index 0000000..573d2a4
--- /dev/null
+++ b/data/cnetids.txt
@@ -0,0 +1,3 @@
+borja
+jturk
+lamonts
diff --git a/data/emails.txt b/data/emails.txt
new file mode 100644
index 0000000..dce532e
--- /dev/null
+++ b/data/emails.txt
@@ -0,0 +1,3 @@
+borja@cs.uchicago.edu
+jturk@uchicago.edu
+lamonts@uchicago.edu
diff --git a/data/names.txt b/data/names.txt
new file mode 100644
index 0000000..d44c9a9
--- /dev/null
+++ b/data/names.txt
@@ -0,0 +1,2 @@
+Bob
+Phil