410 lines
11 KiB
Plaintext
410 lines
11 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "907bb05d-99ef-400b-93ec-9fd2bf8b619c",
|
|
"metadata": {
|
|
"tags": [],
|
|
"toc-hr-collapsed": true
|
|
},
|
|
"source": [
|
|
"## I/O"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ff711218-4d4c-4999-a378-3a03ae5d6222",
|
|
"metadata": {},
|
|
"source": [
|
|
"### `print()`\n",
|
|
"\n",
|
|
"`print(*objects, sep=' ', end='\\n', file=sys.stdout, flush=False)`\n",
|
|
"\n",
|
|
"https://docs.python.org/3/library/functions.html#print"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "894ed224-a63a-44d7-b3a5-3dd4233830b7",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"print(\"Can\", \"pass\", \"multiple\", {\"objects\": True})\n",
|
|
"print(\"Hello\", \"World\", sep=\"~~~~\", end=\"!\")\n",
|
|
"print(\"Same line\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0d05db43-1062-4a51-a205-b91e37d7d9c1",
|
|
"metadata": {},
|
|
"source": [
|
|
"### `input()`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "302e8ee6-9f25-4202-99e0-a3556cd53da7",
|
|
"metadata": {
|
|
"scrolled": true
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"name = input(\"What is your name: \")\n",
|
|
"print(f\"Hello {name}\")\n",
|
|
"\n",
|
|
"# always a string\n",
|
|
"year = input(\"What year is it: \")\n",
|
|
"print(year, type(year))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e12f7814-b77d-4c57-9857-7dd5238e5d08",
|
|
"metadata": {},
|
|
"source": [
|
|
"### pathlib\n",
|
|
"\n",
|
|
"There are a few ways of working with files in Python, mostly due to improvements over time.\n",
|
|
"\n",
|
|
"You'll still sometimes see code that uses the older method with `open`, but there's almost no reason to write code in that style now that `pathlib` is widely available.\n",
|
|
"\n",
|
|
"To use `pathlib`, you'll need to import the `Path` object. (We'll discuss these imports more soon.)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6d64c104-bf39-4932-9178-922bb2ecb43d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from pathlib import Path"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3be8ec33-dcea-4834-8fc4-0bf71b19b72d",
|
|
"metadata": {},
|
|
"source": [
|
|
"Imports like this should be at the top of the file.\n",
|
|
"\n",
|
|
"To use this type you'll create objects with file paths, for example:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "25f2bbab-d317-404c-be35-c31df5180a5a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# this looks like a function call\n",
|
|
"# but the capital letter denotes that this is instead a class\n",
|
|
"file_path = Path(\"data/names.txt\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ad9a7bb9-e8ff-4929-a50d-5bc4e1aeb9f4",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Typical workflow:\n",
|
|
"\n",
|
|
"- Read contents of file(s) from disk into working memory.\n",
|
|
"- Parse and/or manipulate data as needed.\n",
|
|
"- (Optional) Write data back to disk with modifications.\n",
|
|
"\n",
|
|
"#### Other Workflows\n",
|
|
"\n",
|
|
"- Append-only (e.g. logging)\n",
|
|
"- Streaming data (needed for large files where we can't fit into memory)\n",
|
|
"\n",
|
|
"#### Text vs. Binary\n",
|
|
"\n",
|
|
"We're opening our files in the default, text mode. It is also possible to open files in a binary mode where it isn't assumed we're reading strings."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "31b06442-f771-48cd-b821-0ec6f54b1188",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Reading From a File\n",
|
|
"\n",
|
|
"**emails.txt**\n",
|
|
"\n",
|
|
"```\n",
|
|
"borja@cs.uchicago.edu\n",
|
|
"jturk@uchicago.edu\n",
|
|
"lamonts@uchicago.edu\n",
|
|
"```"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3b6afd77-85ac-4be1-82d5-39273e7df035",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# to access a file's contents, we create the path, and then\n",
|
|
"# use read_text()\n",
|
|
"emails_path = Path(\"data/emails.txt\")\n",
|
|
"emails = emails_path.read_text()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "97883212-61e5-4c22-9d4b-dbfee04bf382",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Writing to a File\n",
|
|
"\n",
|
|
"We need to open the file with write or append permissions."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "51000430-0b55-41a5-8573-759b76529494",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"names_file = Path(\"data/animals.txt\").open(\"w\")\n",
|
|
"names_file.write(\"Aardvark\\nChimpanzee\\nElephant\\n\")\n",
|
|
"\n",
|
|
"# (the ! indicates this is is a shell command, not Python)\n",
|
|
"!cat data/animals.txt"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "a379fae3-a5e8-4917-898d-e4ec6c3e91c1",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# open(\"w\") erases the file, use \"a\" if you want to append\n",
|
|
"names_file = Path(\"data/animals.txt\").open(\"a\")\n",
|
|
"names_file.write(\"Kangaroo\\n\")\n",
|
|
"names_file.flush()\n",
|
|
"!cat data/animals.txt"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "64cd8142-037d-4005-b591-4a1cda922b07",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### `flush` and `close`\n",
|
|
"\n",
|
|
"`flush` ensures that the in-memory contents get written to disk, actually saved.\n",
|
|
"\n",
|
|
"(Analogy: program crashes and you lose your unsaved work)\n",
|
|
"\n",
|
|
"At the end, important to `close` the file.\n",
|
|
"\n",
|
|
"- Frees resources.\n",
|
|
"- Allows other programs to access file contents.\n",
|
|
"- Ensures edits are written to disk."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fb6aa734-ea51-4582-9d4a-aea652d9dec0",
|
|
"metadata": {},
|
|
"source": [
|
|
"### `with`\n",
|
|
"\n",
|
|
"The file object is a \"context manager\", we'll cover those in more detail in a few weeks.\n",
|
|
"\n",
|
|
"The `with` statement allows us to safely use files without fear of leaving them open.\n",
|
|
"\n",
|
|
"```python\n",
|
|
"\n",
|
|
"with path.open() as variable:\n",
|
|
" statement1\n",
|
|
" statement2\n",
|
|
"```\n",
|
|
"\n",
|
|
"No matter what happens inside `with` block, the file will be closed."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e1428cd7-53ac-4138-8b43-fe7b477ccb24",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"f = open(\"names.txt\", \"w\")\n",
|
|
"f.write(\"Bob\\n\")\n",
|
|
"f.write(\"Phil\\n\")\n",
|
|
"1 / 0"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2a632c3d-9d35-441d-a123-49a6ba44c432",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!cat names.txt"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0c62751d-6f85-4bfb-8e15-788d2d928089",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Full Example\n",
|
|
"\n",
|
|
"# load data into our chosen data structure\n",
|
|
"emails = []\n",
|
|
"with open(\"data/emails.txt\") as f:\n",
|
|
" for email in f:\n",
|
|
" emails.append(email)\n",
|
|
"print(emails)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "81df15e5-da33-4367-ae08-dea08cfc2cf6",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# transform data\n",
|
|
"cnet_ids = []\n",
|
|
"for email in emails:\n",
|
|
" cnet_id, domain = email.split(\"@\")\n",
|
|
" cnet_ids.append(cnet_id)\n",
|
|
"print(cnet_ids)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6b5c5988-90ea-4464-8c11-c51518f0657c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# write new data\n",
|
|
"with open(\"data/cnetids.txt\", \"w\") as f:\n",
|
|
" for cnet_id in cnet_ids:\n",
|
|
" # print() adds newlines by default\n",
|
|
" print(cnet_id, file=f)\n",
|
|
" # or\n",
|
|
" # f.write(cnet_id + \"\\n\")\n",
|
|
"\n",
|
|
"!cat data/cnetids.txt"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "eb00e0ce-ff29-40d2-aa72-fa34315fa9a5",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Useful `file` Methods\n",
|
|
"\n",
|
|
"| Operation | Purpose |\n",
|
|
"|-----------|---------|\n",
|
|
"| `f.read()` | Read entire file & return contents. |\n",
|
|
"| `f.read(N)` | Read N characters (or bytes). |\n",
|
|
"| `f.readline()` | Read up to (and including) next newline. |\n",
|
|
"| `f.readlines() ` | Read entire file split into list of lines. |\n",
|
|
"| `f.write(aStr)` | Write string `aStr` into file. |\n",
|
|
"| `f.writelines(lines)` | Write list of strings into file. |\n",
|
|
"| `f.close()` | Close file, prefer `with open()` instead. |\n",
|
|
"| `f.flush()` | Manually flush output to disk without closing. |\n",
|
|
"| `f.seek(N)` | Move cursor to position N. |\n",
|
|
"\n",
|
|
"-- Table based on Learning Python 2013"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f152aaaf-0c07-41c4-a90a-75891606c14e",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Common Gotchas\n",
|
|
"\n",
|
|
"- Relative paths - use `pathlib` https://docs.python.org/3/library/pathlib.html\n",
|
|
"- File permissions\n",
|
|
"- Mind file mode (read/write/append)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b01a89d7-1419-40bf-8005-78cbedad82b8",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cc364347-7e0a-47ef-a90d-e7b3d91e3fc1",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Note: Relative Paths\n",
|
|
"\n",
|
|
"You may find that if you are running your code from, for example, the homework1 directory instead of homework1/problem3, you'd need to modify this path to be `Path(\"problem3/towing.csv\")`.\n",
|
|
"\n",
|
|
"That is because by default, paths are *relative*, meaning that they are assumed to start in the directory that you are running your code from.\n",
|
|
"\n",
|
|
"This can be frustrating at first, you want your code to work the same regardless of what directory you are in.\n",
|
|
"\n",
|
|
"### Building an absolute path\n",
|
|
"\n",
|
|
"To get around this, you can construct an absolute path:\n",
|
|
"\n",
|
|
"First you can use the special `__file__` variable which always contains the path to the current file.\n",
|
|
"\n",
|
|
"Then you can use that as the \"anchor\" of your path, and navigate from there.\n",
|
|
"\n",
|
|
"A common pattern then is to get the current file's parent, and navigate from there:\n",
|
|
"\n",
|
|
"```python\n",
|
|
"from pathlib import Path\n",
|
|
"\n",
|
|
"path = Path(__file__).parent / \"towing.csv\"\n",
|
|
"```\n",
|
|
"\n",
|
|
"This line uses the special built-in variable `__file__` to get the path of the Python file itself.\n",
|
|
"It then gets this file's parent directory (`.parent`) and appends the filename \"towing.csv\" to it.\n",
|
|
"\n",
|
|
"Using this technique in your code allows you to set paths that don't depend on the current working directory.\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.15"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|