From 23a364774e39345bbbd24140a19fa13ecae88fab Mon Sep 17 00:00:00 2001 From: James Turk Date: Mon, 30 Sep 2024 00:23:37 -0500 Subject: [PATCH] 01 and 02 --- 01.basics-1.ipynb | 20 ++- 02.basics-2.ipynb | 357 +++++++++++++++++++++------------------------- data/animals.txt | 4 + data/cnetids.txt | 3 + data/emails.txt | 3 + data/names.txt | 2 + 6 files changed, 183 insertions(+), 206 deletions(-) create mode 100644 data/animals.txt create mode 100644 data/cnetids.txt create mode 100644 data/emails.txt create mode 100644 data/names.txt diff --git a/01.basics-1.ipynb b/01.basics-1.ipynb index 1a3d5d4..21ffbb2 100644 --- a/01.basics-1.ipynb +++ b/01.basics-1.ipynb @@ -1327,7 +1327,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 1, "id": "f3b01815", "metadata": { "slideshow": { @@ -1353,12 +1353,12 @@ "\n", "bad_tuple = (1+492)\n", "\n", - "print(bad_tuple)" + "print(bad_tuple) # why is this not a tuple?" ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 2, "id": "f6c54b13", "metadata": { "slideshow": { @@ -1368,16 +1368,12 @@ }, "outputs": [], "source": [ - "multi_item = (1, 2.0, \"three\")\n", - "\n", - "# parentheses are optional\n", - "\n", - "multi_item2 = 1, 2.0" + "multi_item = (1, 2.0, \"three\")" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 4, "id": "310a20b3-ef99-43cf-a9eb-60674beb8967", "metadata": { "tags": [] @@ -1387,12 +1383,12 @@ "name": "stdout", "output_type": "stream", "text": [ - "(1, 2.0)\n" + "(1, 2.0, 'three')\n" ] } ], "source": [ - "print(multi_item2)" + "print(multi_item)" ] }, { @@ -2551,7 +2547,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.6" + "version": "3.10.15" }, "toc": { "base_numbering": 1, diff --git a/02.basics-2.ipynb b/02.basics-2.ipynb index c13347c..5130135 100644 --- a/02.basics-2.ipynb +++ b/02.basics-2.ipynb @@ -29,7 +29,7 @@ "source": [ "## Iteration\n", "\n", - "Last week we ended on `for` loops.\n", + "Last week we introduced `for` loops.\n", "\n", "```\n", "for var_name in iterable:\n", @@ -38,6 +38,8 @@ "\n", "What is an **iterable**? Why not just say **sequence**?\n", "\n", + "What **sequences** have we seen?\n", + "\n", "### More Iterables" ] }, @@ -60,7 +62,7 @@ "\n", "Same rules as slice, always **inclusive** of start, **exclusive** of stop.\n", "\n", - "*or as you'd write mathematically:* ```[start, stop)``` -- we've seen this before with slicing" + "or as you might write: ```[start, stop)``` -- we've seen this before with slicing" ] }, { @@ -154,14 +156,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "id": "7ed88d5c-e848-46f8-8fc5-8d48fadc303e", "metadata": { "tags": [] }, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0 A\n", + "1 B\n", + "2 C\n" + ] + } + ], "source": [ - "\n", "i = 0\n", "for x in [\"A\", \"B\", \"C\"]:\n", " print(i, x)\n", @@ -2602,7 +2613,7 @@ "toc-hr-collapsed": true }, "source": [ - "## Functions\n", + "## Functions Revisited\n", "\n", "A function is a set of statements that can be called more than once.\n", "\n", @@ -2667,84 +2678,6 @@ "This means mutability determines whether or not a function can modify a parameter in the outer scope." ] }, - { - "cell_type": "markdown", - "id": "d9f602cd-d726-47fa-b3d6-ec51a86f760d", - "metadata": { - "slideshow": { - "slide_type": "subslide" - } - }, - "source": [ - "### return\n", - "\n", - "- `return` may appear anywhere in a function body, including multiple times.\n", - "\n", - "- The first `return` encountered exits the function.\n", - "\n", - "- Every function in python returns a value. \n", - "\n", - "- If no `return` statement is present, `None` is implicitly returned." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "3da80043-16b9-47e7-9aa2-157b5fa29ea0", - "metadata": {}, - "outputs": [], - "source": [ - "def is_even(num):\n", - " return num % 2 == 0\n", - "\n", - "\n", - "print(is_even(3))" - ] - }, - { - "cell_type": "markdown", - "id": "31cd6b0d-6932-43cb-94ea-183fbe3491b4", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### `pass` statement\n", - "\n", - "Can be used whenever you need to leave a block empty. Usually temporarily.\n", - "\n", - "```python\n", - "\n", - "if x < 0:\n", - " pass # TODO: figure this out later\n", - "\n", - "\n", - "def func():\n", - " pass\n", - "```\n", - "\n", - "**What does func return?**" - ] - }, - { - "cell_type": "markdown", - "id": "61fdaebf-2b45-4c83-a4c0-f46bbeced5c1", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "### docstrings\n", - "\n", - "Functions should provide docstrings, which are strings declared as the first statement within the function body.\n", - "\n", - "Almost always use triple-quotes to allow multi-line formatting.\n", - "\n", - "The style guide & assignments show examples of the format we expect." - ] - }, { "cell_type": "markdown", "id": "4729a9a2-66af-4d9c-af2e-441c8486a92c", @@ -2753,7 +2686,7 @@ "toc-hr-collapsed": true }, "source": [ - "# I/O" + "## I/O" ] }, { @@ -2761,7 +2694,7 @@ "id": "59141a54-daf9-4a55-ad87-b80d34260378", "metadata": {}, "source": [ - "## `print()`\n", + "### `print()`\n", "\n", "`print(*objects, sep=' ', end='\\n', file=sys.stdout, flush=False)`\n", "\n", @@ -2807,27 +2740,67 @@ }, { "cell_type": "markdown", - "id": "2bc22aaa-6192-4d5d-a01f-4c36e8e41ac0", + "id": "d07f8320-b048-477b-8b59-f171b5dbecd3", "metadata": {}, "source": [ - "## Files\n", + "### pathlib\n", "\n", - "Another built in type in Python.\n", + "There are a few ways of working with files in Python, mostly due to improvements over time.\n", "\n", - "Requires us to understand a bit more about how files & memory work.\n", + "You'll still sometimes see code that uses the older method with `open`, but there's almost no reason to write code in that style now that `pathlib` is widely available.\n", "\n", - "### Typical workflow:\n", + "To use `pathlib`, you'll need to import the `Path` object. (We'll discuss these imports more soon.)" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "118ba56e-91c3-4ad9-935e-2a44d1fd064c", + "metadata": {}, + "outputs": [], + "source": [ + "from pathlib import Path" + ] + }, + { + "cell_type": "markdown", + "id": "9c24256b-e9df-44da-884c-db0670bda68b", + "metadata": {}, + "source": [ + "Imports like this should be at the top of the file.\n", + "\n", + "To use this type you'll create objects with file paths, for example:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "c07cd9ee-48bc-4ae1-b813-20380bb2733d", + "metadata": {}, + "outputs": [], + "source": [ + "# this looks like a function call\n", + "# but the capital letter denotes that this is instead a class\n", + "file_path = Path(\"data/names.txt\")" + ] + }, + { + "cell_type": "markdown", + "id": "cfec4041-a186-40ed-9967-e096d4f11ffb", + "metadata": {}, + "source": [ + "#### Typical workflow:\n", "\n", "- Read contents of file(s) from disk into working memory.\n", "- Parse and/or manipulate data as needed.\n", "- (Optional) Write data back to disk with modifications.\n", "\n", - "### Other Workflows\n", + "#### Other Workflows\n", "\n", "- Append-only (e.g. logging)\n", "- Streaming data (needed for large files where we can't fit into memory)\n", "\n", - "### Text vs. Binary\n", + "#### Text vs. Binary\n", "\n", "We're opening our files in the default, text mode. It is also possible to open files in a binary mode where it isn't assumed we're reading strings." ] @@ -2850,74 +2823,23 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "id": "e205aaba", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "data/emails.txt\n" + ] + } + ], "source": [ - "# to access a file's contents, we need to open it\n", - "fd = open(\"emails.txt\")\n", - "\n", - "print(fd)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "9a4277ac", - "metadata": {}, - "outputs": [], - "source": [ - "# fd is a `file` object, we can use methods to read from the file\n", - "emails = fd.read()\n", - "print(type(emails))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "fd55b361", - "metadata": {}, - "outputs": [], - "source": [ - "# read() got all the data at once, split with \\n newlines\n", - "\n", - "# We can also iterate over the lines in the file\n", - "\n", - "fd.readlines()" - ] - }, - { - "cell_type": "markdown", - "id": "86dc9c0c-0712-4100-8da1-58bc084ed08c", - "metadata": {}, - "source": [ - "Open files have a 'cursor', we've reached the end of the file (EOF) so there isn't more to read." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "f3691c99", - "metadata": {}, - "outputs": [], - "source": [ - "# if we use 'seek' we can rewind to the beginning of the file\n", - "fd.seek(0)\n", - "fd.readlines()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "0e12a4c5", - "metadata": {}, - "outputs": [], - "source": [ - "# we can also iterate over the file\n", - "f = open(\"emails.txt\")\n", - "for email in f.readlines():\n", - " print(email.strip()) # extra newline?" + "# to access a file's contents, we create the path, and then\n", + "# use read_text()\n", + "emails_path = Path(\"data/emails.txt\")\n", + "emails = emails_path.read_text()" ] }, { @@ -2932,48 +2854,55 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 22, "id": "d0aa3bf0", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;8m───────┬────────────────────────────────────────────────────────────────────────\u001b[0m\n", + " \u001b[38;5;8m│ \u001b[0mFile: \u001b[1mdata/animals.txt\u001b[0m \n", + "\u001b[38;5;8m───────┴────────────────────────────────────────────────────────────────────────\u001b[0m\n" + ] + } + ], "source": [ - "!rm names.txt\n", + "names_file = Path(\"data/animals.txt\").open(\"w\")\n", + "names_file.write(\"Aardvark\\nChimpanzee\\nElephant\\n\")\n", "\n", - "f = open(\"names.txt\", \"w\")\n", - "f.write(\"Bob\\nPhil\\n\")\n", - "f.write(\"Sally\\n\")\n", - "f.write(\"Rebecca\\n\")\n", - "f.write(\"Joan\\n\")\n", - "f.close()\n", - "\n", - "!cat names.txt" + "# (the ! indicates this is is a shell command, not Python)\n", + "!cat data/animals.txt" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 23, "id": "d9e2b317", "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;8m───────┬────────────────────────────────────────────────────────────────────────\u001b[0m\n", + " \u001b[38;5;8m│ \u001b[0mFile: \u001b[1mdata/animals.txt\u001b[0m\n", + "\u001b[38;5;8m───────┼────────────────────────────────────────────────────────────────────────\u001b[0m\n", + "\u001b[38;5;8m 1\u001b[0m \u001b[38;5;8m│\u001b[0m \u001b[37mAardvark\u001b[0m\n", + "\u001b[38;5;8m 2\u001b[0m \u001b[38;5;8m│\u001b[0m \u001b[37mChimpanzee\u001b[0m\n", + "\u001b[38;5;8m 3\u001b[0m \u001b[38;5;8m│\u001b[0m \u001b[37mElephant\u001b[0m\n", + "\u001b[38;5;8m 4\u001b[0m \u001b[38;5;8m│\u001b[0m \u001b[37mKangaroo\u001b[0m\n", + "\u001b[38;5;8m───────┴────────────────────────────────────────────────────────────────────────\u001b[0m\n" + ] + } + ], "source": [ - "f = open(\"names.txt\", \"a\")\n", - "f.write(\"Hector\\n\")\n", - "f.flush()\n", - "!cat names.txt" - ] - }, - { - "cell_type": "markdown", - "id": "d8b46e54-4aca-4587-9a02-899bc04bbf5c", - "metadata": {}, - "source": [ - "**Important:** Opening in write mode clears the contents of the file.\n", - "\n", - "\"r\" : Read (default)\n", - "\n", - "\"w\" : Write\n", - "\n", - "\"a\" : Append" + "# open(\"w\") erases the file, use \"a\" if you want to append\n", + "names_file = Path(\"data/animals.txt\").open(\"a\")\n", + "names_file.write(\"Kangaroo\\n\")\n", + "names_file.flush()\n", + "!cat data/animals.txt" ] }, { @@ -2981,9 +2910,13 @@ "id": "554800e9-bc9e-4c03-94f8-34da10d205fe", "metadata": {}, "source": [ - "#### `close`\n", + "#### `flush` and `close`\n", "\n", - "Very important to close a file.\n", + "`flush` ensures that the in-memory contents get written to disk, actually saved.\n", + "\n", + "(Analogy: program crashes and you lose your unsaved work)\n", + "\n", + "At the end, important to `close` the file.\n", "\n", "- Frees resources.\n", "- Allows other programs to access file contents.\n", @@ -3003,7 +2936,7 @@ "\n", "```python\n", "\n", - "with open(filename) as variable:\n", + "with path.open() as variable:\n", " statement1\n", " statement2\n", "```\n", @@ -3126,10 +3059,46 @@ "outputs": [], "source": [] }, + { + "cell_type": "markdown", + "id": "02eb6878-7a2d-46b0-a0f3-e7aab18ebe11", + "metadata": {}, + "source": [ + "### Note: Relative Paths\n", + "\n", + "You may find that if you are running your code from, for example, the homework1 directory instead of homework1/problem3, you'd need to modify this path to be `Path(\"problem3/towing.csv\")`.\n", + "\n", + "That is because by default, paths are *relative*, meaning that they are assumed to start in the directory that you are running your code from.\n", + "\n", + "This can be frustrating at first, you want your code to work the same regardless of what directory you are in.\n", + "\n", + "### Building an absolute path\n", + "\n", + "To get around this, you can construct an absolute path:\n", + "\n", + "First you can use the special `__file__` variable which always contains the path to the current file.\n", + "\n", + "Then you can use that as the \"anchor\" of your path, and navigate from there.\n", + "\n", + "A common pattern then is to get the current file's parent, and navigate from there:\n", + "\n", + "```python\n", + "from pathlib import Path\n", + "\n", + "path = Path(__file__).parent / \"towing.csv\"\n", + "```\n", + "\n", + "This line uses the special built-in variable `__file__` to get the path of the Python file itself.\n", + "It then gets this file's parent directory (`.parent`) and appends the filename \"towing.csv\" to it.\n", + "\n", + "Using this technique in your code allows you to set paths that don't depend on the current working directory.\n", + "\n" + ] + }, { "cell_type": "code", "execution_count": null, - "id": "d0ab6c18-cf27-41a9-96fb-bf1fe78dfdca", + "id": "5a9bf4ab-dbcd-4dd3-a928-89d33ce88273", "metadata": {}, "outputs": [], "source": [] @@ -3151,7 +3120,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.6" + "version": "3.10.15" }, "toc": { "base_numbering": 1, diff --git a/data/animals.txt b/data/animals.txt new file mode 100644 index 0000000..f269a06 --- /dev/null +++ b/data/animals.txt @@ -0,0 +1,4 @@ +Aardvark +Chimpanzee +Elephant +Kangaroo diff --git a/data/cnetids.txt b/data/cnetids.txt new file mode 100644 index 0000000..573d2a4 --- /dev/null +++ b/data/cnetids.txt @@ -0,0 +1,3 @@ +borja +jturk +lamonts diff --git a/data/emails.txt b/data/emails.txt new file mode 100644 index 0000000..dce532e --- /dev/null +++ b/data/emails.txt @@ -0,0 +1,3 @@ +borja@cs.uchicago.edu +jturk@uchicago.edu +lamonts@uchicago.edu diff --git a/data/names.txt b/data/names.txt new file mode 100644 index 0000000..d44c9a9 --- /dev/null +++ b/data/names.txt @@ -0,0 +1,2 @@ +Bob +Phil