51042-notes/02.basics-2.ipynb
2024-10-09 22:26:21 -05:00

2043 lines
44 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "4444d9dd",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"# Compound Data Types"
]
},
{
"cell_type": "markdown",
"id": "a5918cff-1c93-41bd-811a-69d97c797f49",
"metadata": {
"tags": [],
"toc-hr-collapsed": true
},
"source": [
"## Iteration\n",
"\n",
"Last week we introduced `for` loops.\n",
"\n",
"```\n",
"for var_name in iterable:\n",
" statement # presumably using var_name\n",
"```\n",
"\n",
"What is an **iterable**? Why not just say **sequence**?\n",
"\n",
"What **sequences** have we seen?\n",
"\n",
"### More Iterables"
]
},
{
"cell_type": "markdown",
"id": "c0f1720d-937e-4030-b3a4-18c1382fb3ec",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### range\n",
"\n",
"Another iterable!\n",
"\n",
"`range(stop)` # goes from 0 to (stop-1)\n",
"\n",
"`range(start, stop)` # goes from start to (stop-1)\n",
"\n",
"Same rules as slice, always **inclusive** of start, **exclusive** of stop.\n",
"\n",
"or as you might write: ```[start, stop)``` -- we've seen this before with slicing"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "66d82d83-b8b8-4ad4-9f5a-b237d8bbe1d8",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"for x in range(12):\n",
" print(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f6334e0-eeaa-45f7-a3c0-f975911f5ddb",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"for x in range(8, 12):\n",
" print(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8280f1b2-1935-496c-b372-3ccc3d1bc7f2",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"z = range(12) # hmm\n",
"print(type(z))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "58dc9fc1-9056-4be4-9f91-8747aa7e7925",
"metadata": {
"tags": []
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ed88d5c-e848-46f8-8fc5-8d48fadc303e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"i = 0\n",
"for x in [\"A\", \"B\", \"C\"]:\n",
" print(i, x)\n",
" i += 1"
]
},
{
"cell_type": "markdown",
"id": "c241b5fd-d0b6-4d72-a554-238931d28d36",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### `enumerate`\n",
"\n",
"Another function that returns an iterable, for when we need the index along with the object.\n",
"\n",
"`enumerate(original_iterable)` yields two element tuples: `(index, element)` for every item in the original."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8a79062-4012-4700-9611-83a4cdcd641c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# \"incorrect\" example\n",
"# find using range/len - as you might think to write it based on past experience\n",
"def find_r(s, letter_to_find):\n",
" for i in range(len(s)):\n",
" if s[i] == letter_to_find:\n",
" return i\n",
" return -1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "844ce2af-b8f2-4dd3-b1fe-46ff050b2664",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"find_r(\"Hello World\", \"W\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21715b8f-4ce2-4c63-879b-69e06e9cef00",
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# find using enumerate - Pythonic, more efficient\n",
"def find_e(s, letter_to_find):\n",
" for i, letter in enumerate(s): # tuple unpacking\n",
" print(i, letter)\n",
" if letter == letter_to_find:\n",
" return i\n",
" return -1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b37d43a-bb10-420f-b86e-c70a8c56c55e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"find_e(\"Hello world\", \"w\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75e6a9da-e5a9-4f8a-93f2-ee0964a6efb4",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"find_r(\"Hello world\", \"?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "04a6fdb6-f136-47a8-b6dc-c32c87a40542",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"s = \"Hello world\"\n",
"s.find(\"w\") # built-ins are best"
]
},
{
"cell_type": "markdown",
"id": "527d4283-34bb-4a44-86c8-d1bee46be555",
"metadata": {},
"source": [
"Note: For HW#0 it is OK to use range for iteration, for future HWs if you are using the index & value, `enumerate` is the Pythonic way to do this."
]
},
{
"cell_type": "markdown",
"id": "13234aa7-ec30-44f1-8453-778db6ecd6ce",
"metadata": {
"tags": []
},
"source": [
"### aside: sequence unpacking\n",
"\n",
"When you know exactly how many elements are in a sequence, you can use this syntax to \"unpack\" them into variables:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bedcf76f-d5f9-42d3-bc01-a845dd2e75b1",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"tup = (1, 2, 3)\n",
"lst = [\"a\", \"b\", \"c\"]\n",
"\n",
"x, y, z = tup\n",
"print(x, y, z)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1bdb25b-f6fc-4a93-b8a0-14b7e3e5eaab",
"metadata": {},
"outputs": [],
"source": [
"for idx, elem in enumerate(iterable):\n",
" pass"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "dc82927d-c336-4eff-abde-4d9dc1507bc5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"x = 7\n",
"y = 8"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f1e0714f-5d7b-426b-9748-a82b5c179251",
"metadata": {},
"outputs": [],
"source": [
"x, y = y, x"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7646739a-8b0e-45cb-b63d-f1e54a9aa05b",
"metadata": {
"tags": []
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 3,
"id": "e1d328df-5016-4f82-9f8c-8c425f47146c",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"8 7\n"
]
}
],
"source": [
"print(x, y)"
]
},
{
"cell_type": "markdown",
"id": "2b894ba7",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": [],
"toc-hr-collapsed": true
},
"source": [
"## `dict`\n",
"\n",
"A collection of key-value pairs. (aka map/hashmap in other languages)\n",
"\n",
"- Keys must be hashable. `tuple`, `str`, scalars -- why?\n",
"- Values are references, can be any type.\n",
"- Dynamically resizable\n",
"- Implemented using a hashtable, lookup is constant-time. **O(1)**\n",
"\n",
"- Iterable? Yes\n",
"- Mutable? Yes\n",
"- Sequence? No. (Why not?)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "7aec887f-2f55-4f8f-80ce-ea3324fde2c6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'name': 'Anna', 2024: 42, 2023: 12}\n"
]
}
],
"source": [
"record1 = {\n",
" \"name\": \"Anna\",\n",
" 2024: 42,\n",
" 2023: 12,\n",
"}\n",
"print(record1)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "b5123db7",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"False"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# declaration\n",
"record1 = {\n",
" \"name\": \"Anna\",\n",
" \"age\": 42,\n",
"}\n",
"record1[\"name\"] = \"James\"\n",
"\n",
"empty = {}\n",
"\n",
"# alternate form\n",
"record2 = dict(age=42, name=\"Anna\")\n",
"# list(\"a\", \"b\")\n",
"\n",
"# can also construct from sequence of tuples\n",
"\n",
"record3 = dict(\n",
" [\n",
" (\"name\", \"Anna\"),\n",
" (\"age\", 42)\n",
" ]\n",
")\n",
"\n",
"# can compare for equality\n",
"record1 == record2"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "9948e31b-aff8-4ce4-8a28-50d0d07dc18a",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'name': 'Anna', 'age': 42} {'age': 42, 'name': 'Anna'}\n"
]
}
],
"source": [
"print(record1, record2)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "af8c64ca",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"James\n"
]
}
],
"source": [
"# indexing by key\n",
"print(record1[\"name\"])"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "903268d7-73e0-4a63-9404-0c52936c6e9c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"record1[\"name\"] = \"Anne\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7f1c685a",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'name': 'Anne', 'age': 42}\n",
"True\n",
"False\n"
]
}
],
"source": [
"# 'in' tests if a key exists (not a value!)\n",
"print(record1)\n",
"print(\"name\" in record1)\n",
"print(42 in record1)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "41b1f5a1",
"metadata": {
"scrolled": true,
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dict_keys(['name', 'age'])\n",
"dict_values(['Anna', 42])\n",
"dict_items([('name', 'Anna'), ('age', 42)])\n"
]
}
],
"source": [
"# keys, values, items\n",
"print(record1.keys())\n",
"print(record1.values())\n",
"print((record1.items()))"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "b73dbddb-5e64-4340-9037-d6400fca8218",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"name Anne\n",
"age 42\n"
]
}
],
"source": [
"for k, v in record1.items():\n",
" print(k, v)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "300408f0-e9f2-4cdc-96ba-7db185154f16",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"name Anna\n",
"age 42\n"
]
}
],
"source": [
"for k,v in record1.items():\n",
" print(k, v)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "0f6a9550-84b0-4f2d-8cb6-e48272be69a7",
"metadata": {
"tags": []
},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'dict'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[16], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28;43mhash\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m{\u001b[49m\u001b[43m}\u001b[49m\u001b[43m)\u001b[49m\n",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'dict'"
]
}
],
"source": [
"hash({})"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "033855c4-ec3a-4ea9-ba67-9abee0715840",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash(1)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "0566050d-86ae-4961-983c-bcea78960580",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{(1, 2, 3): 4}\n"
]
}
],
"source": [
"d = {}\n",
"d[(1, 2, 3)] = 4\n",
"print(d)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "37c96ad5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"hash('abc')=-7376796221354515387\n",
"hash(1234.3)=691752902764004562\n",
"hash((1,2,3))=529344067295497451\n"
]
},
{
"ename": "TypeError",
"evalue": "unhashable type: 'list'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[25], line 7\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mhash\u001b[39m(\u001b[38;5;241m1234.3\u001b[39m)\u001b[38;5;132;01m=}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 5\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28mhash\u001b[39m((\u001b[38;5;241m1\u001b[39m,\u001b[38;5;241m2\u001b[39m,\u001b[38;5;241m3\u001b[39m))\u001b[38;5;132;01m=}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n\u001b[0;32m----> 7\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00m\u001b[38;5;28;43mhash\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m,\u001b[49m\u001b[38;5;241;43m3\u001b[39;49m\u001b[43m,\u001b[49m\u001b[38;5;241;43m4\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m)\u001b[49m\u001b[38;5;132;01m=}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'"
]
}
],
"source": [
"## hashable?\n",
"\n",
"print(f\"{hash('abc')=}\")\n",
"print(f\"{hash(1234.3)=}\")\n",
"print(f\"{hash((1,2,3))=}\")\n",
"\n",
"print(f\"{hash([1,2,3,4])=}\")"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "007eeb5c-550b-4ede-9c87-a411d75d1dcd",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"-4894370073748428294"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash(\"abc\")"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "98e1b75f-f9a2-47f4-822a-57c2053295ad",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"8446955659539365509"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash(\"abd\")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "d9939e43",
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "unhashable type: 'list'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[30], line 2\u001b[0m\n\u001b[1;32m 1\u001b[0m d2 \u001b[38;5;241m=\u001b[39m {}\n\u001b[0;32m----> 2\u001b[0m \u001b[43md2\u001b[49m\u001b[43m[\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m1\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m2\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m3\u001b[39;49m\u001b[43m]\u001b[49m\u001b[43m]\u001b[49m \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mOK\u001b[39m\u001b[38;5;124m\"\u001b[39m\n",
"\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'"
]
}
],
"source": [
"d2 = {}\n",
"d2[[1, 2, 3]] = \"OK\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90ea61ba",
"metadata": {},
"outputs": [],
"source": [
"hash(\"Python\")"
]
},
{
"cell_type": "markdown",
"id": "9b3ffe31",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Mutability\n",
"\n",
"Dictionaries are *mutable*, you can change, expand, and shrink them in place.\n",
"\n",
"This means we aren't copying/creating new dictionaries on every edit."
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "56ada375",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'spam': 1, 'eggs': 2, 'coffee': 1, 'sausage': 1}\n"
]
}
],
"source": [
"order = {\"spam\": 1, \"eggs\": 2, \"coffee\": 1}\n",
"\n",
"order[\"sausage\"] = 1\n",
"print(order)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "aa6d1aed",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'spam': 5, 'coffee': 1}\n"
]
}
],
"source": [
"del order[\"eggs\"]\n",
"print(order)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "8450549b",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'spam': 5, 'coffee': 1, 'bagel': 1}\n"
]
}
],
"source": [
"order[\"bagel\"] = 1\n",
"print(order)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "ae853627",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"(3611625396340438220, -2119394878459364811)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"hash(\"bagel\"), hash(\"Bagel\")"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f5d1307f",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"spam\n",
"coffee\n",
"bagel\n"
]
}
],
"source": [
"## dictionaries are iterable\n",
"\n",
"for key in order:\n",
" print(key)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96ece38d",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [],
"source": [
"# can use .items() or .values() to loop over non-keys\n",
"for key, value in order.items():\n",
" print(f\"{key=} {value=}\")\n",
"\n",
"\n",
"print(order.items())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "182bd434",
"metadata": {},
"outputs": [],
"source": [
"# can use .items() or .values() to loop over non-keys\n",
"for a_tuple in order.items():\n",
" print(a_tuple[0], a_tuple[1])"
]
},
{
"cell_type": "markdown",
"id": "01a7d316",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### common dictionary methods\n",
"\n",
"| Operation | Meaning |\n",
"|-----------|---------|\n",
"| `d.keys()` | View of all keys. |\n",
"| `d.values()` | View of all values. |\n",
"| `d.items()` | View of key, value tuples. |\n",
"| `d.copy()` | Make a (shallow) copy. |\n",
"| `d.clear()` | Remove all items. |\n",
"| `d.get(key, default=None)` | Same as d[key] except if item isn't present, default will be returned. |\n",
"| `d.pop(key, default=None)` | Fetch item & remove it from dict. |\n",
"| `len(d)` | Number of stored entries. |\n",
"\n",
"See all at https://docs.python.org/3/library/stdtypes.html#dict"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "d5801652-9284-4a55-8f94-53f4a1cce0cf",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"james ordered 0 fish\n"
]
}
],
"source": [
"d = order\n",
"#print(order)\n",
"key = \"fish\"\n",
"\n",
"print(\"james ordered\", d.get(key, 0), key)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "c8d24981-0150-483e-a338-a617bb7e92f8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'spam': 5, 'coffee': 1, 'bagel': 1}"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "ca4b8660-8307-4f52-a9e9-154d51c7ab94",
"metadata": {
"tags": []
},
"outputs": [
{
"ename": "KeyError",
"evalue": "'coffee'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[17], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43md\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mpop\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mcoffee\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m)\n",
"\u001b[0;31mKeyError\u001b[0m: 'coffee'"
]
}
],
"source": [
"\n",
"print(d.pop(\"coffee\"))"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "2f3dcdac-5818-4d84-87a5-d3375bfb35b2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'spam': 5, 'bagel': 1}"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"d"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "462515ae-a3e3-4886-bfab-d95b6c847b84",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"len(record1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "567de037-1184-4814-9be1-15a7b66db504",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"record1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bef0fcac",
"metadata": {},
"outputs": [],
"source": [
"order\n",
"\n",
"number_ordered = order.pop(\"spam\", 0)\n",
"print(number_ordered)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bd5364b0",
"metadata": {},
"outputs": [],
"source": [
"print(order)"
]
},
{
"cell_type": "markdown",
"id": "6cf39963",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Dictionary View Objects\n",
"\n",
"As noted above, `keys(), values() and items()` return \"view objects.\"\n",
"\n",
"The returned object is a dynamic view, so when the dictionary changes, the view changes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0f1881a1",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"dishes = {\"eggs\": 2, \"sausage\": 1, \"bacon\": 1, \"spam\": 500}\n",
"\n",
"# Keys is a view object of the keys from the dishes dictionary\n",
"keys = dishes.keys()\n",
"values = dishes.values()\n",
"items = dishes.items()\n",
"\n",
"print(keys)\n",
"print(values)\n",
"print(items)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "674cc686",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [],
"source": [
"# View objects are dynamic and reflect dictionary changes\n",
"\n",
"# Lets delete the 'eggs' entry\n",
"del dishes[\"eggs\"]\n",
"\n",
"# Notice the both the views have removed key and its value\n",
"print(keys)\n",
"print(values)\n",
"print(items)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "b6658174",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'BLT': 3.99, 'Chicken': 5.99, 'Salad': 4.5}\n",
"4.5\n"
]
}
],
"source": [
"# Nested Dictionaries Example\n",
"\n",
"menu = {\n",
" \"Breakfast\": {\"Eggs\": 2.19, \"Toast\": 0.99, \"Orange Juice\": 1.99},\n",
" \"Lunch\": {\"BLT\": 3.99, \"Chicken\": 5.99, \"Salad\": 4.50},\n",
" \"Dinner\": {\"Cheeseburger\": 9.99, \"Salad\": 7.50, \"Special\": 8.49},\n",
"}\n",
"\n",
"print(menu[\"Lunch\"])\n",
"\n",
"print(menu[\"Lunch\"][\"Salad\"])"
]
},
{
"cell_type": "markdown",
"id": "da42d6b5",
"metadata": {},
"source": [
"### Caveats\n",
"\n",
"- Downsides of mutables?\n",
"- Modifying a `dict` while iterating through it."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "ecb495c8-ac3a-4b70-9e74-1bfed1d4b466",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'A': 100}\n"
]
}
],
"source": [
"def something(d):\n",
" to_remove = []\n",
"\n",
" d_copy = d.copy()\n",
" for k, v in d.items():\n",
" if v < 50:\n",
" d_copy.pop(k)\n",
" #to_remove.append(k)\n",
"\n",
" #for item in to_remove:\n",
" # d.pop(item)\n",
" # ...\n",
" return d_copy\n",
"\n",
"\n",
"scores = {\"A\": 100, \"B\": 20, \"C\": 48}\n",
"something(scores)\n",
"print(scores)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c988c56",
"metadata": {},
"outputs": [],
"source": [
"# iteration example\n",
"d = {\"A\": 1, \"B\": 2, \"C\": 3}\n",
"to_remove = []\n",
"for key, value in d.items():\n",
" if value == 2:\n",
" to_remove.append(key)\n",
"for key in to_remove:\n",
" d.pop(key)\n",
"\n",
"print(d)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "410613ac",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'Anne': 98, 'Zach': 65}\n"
]
}
],
"source": [
"students = {\n",
" \"Anne\": 98,\n",
" \"Mitch\": 13,\n",
" \"Zach\": 65,\n",
"}\n",
"\n",
"below_60 = []\n",
"\n",
"for student in students:\n",
" grade = students[student]\n",
" if grade < 60:\n",
" below_60.append(student)\n",
"\n",
"for name in below_60:\n",
" students.pop(name)\n",
"\n",
"print(students)"
]
},
{
"cell_type": "markdown",
"id": "976e988f",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"toc-hr-collapsed": true
},
"source": [
"## `set`\n",
"\n",
"Sets contain an unordered collection of *unique* & *immutable* values.\n",
"\n",
" - Unique: no duplicates\n",
"\n",
" - Immutable: values cannot be `dict`, `set`, `list`.\n",
"\n",
"\n",
"Sets themselves are *mutable*."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "3a3db482",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'panda', 'ostrich', 'llama'}\n",
"{'panda', 'llama', 'ostrich'}\n"
]
}
],
"source": [
"# defining a set\n",
"animals = {\"llama\", \"panda\", \"ostrich\"}\n",
"print(animals)\n",
"\n",
"# or can be made from an iterable\n",
"animals = set([\"llama\", \"panda\", \"ostrich\"])\n",
"print(animals)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b240ba8b-2e54-464c-8003-bca5b66c532e",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 24,
"id": "2bb720e8-5830-4fd3-a7a5-cf50af5a0868",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"s = set()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "4628529f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'panda', 'llama', 'ostrich'}\n"
]
}
],
"source": [
"# no duplicates\n",
"animals = set([\"llama\", \"panda\", \"ostrich\", \"ostrich\", \"panda\"])\n",
"print(animals)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "bbe43b2e-d95d-4e16-be98-ac03f227003d",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"lst = [1, 23, 4920, 2091, 4920, 4920, 4920, 23]"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "1fd3e328-c0f5-4fd0-8407-b1cd33010f90",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[4920, 1, 2091, 23]\n"
]
}
],
"source": [
"deduped = list(set(lst))\n",
"print(deduped)"
]
},
{
"cell_type": "markdown",
"id": "56cb6bab",
"metadata": {},
"source": [
"\n",
"### Set Theory Operations\n",
"\n",
"Sets are fundamentally mathematical in nature and contain operations based on set theory. They allow the following operations:\n",
"\n",
" - Union (`union()` or `|`}: A set containing all elements that are in both sets\n",
"\n",
" - Difference (`difference()` or `-`): A set that consists of elements that are in one set but not the other.\n",
"\n",
" - Intersection (`intersection` or `&`): A set that consists of all elements that are in both sets.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "9ef87b91",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"A = {'a', 'c', 'b', 'e', 'd'}\n",
"\n",
"B = {'y', 'z', 'b', 'x', 'd'}\n"
]
}
],
"source": [
"# The following creates a set of single strings 'a','b','c','d','e'\n",
"# and another set of single strings 'b','d','x','y','z'\n",
"A = set(\"abcde\")\n",
"B = set([\"b\", \"d\", \"x\", \"y\", \"z\"])\n",
"\n",
"print(\"A = \", A)\n",
"print()\n",
"print(\"B = \", B)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "cd7cd6d7",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'y', 'a', 'z', 'c', 'b', 'x', 'e', 'd'}\n",
"---\n",
"{'y', 'a', 'z', 'c', 'b', 'x', 'e', 'd'}\n"
]
}
],
"source": [
"# Union Operation\n",
"new_set = A | B\n",
"print(new_set)\n",
"print(\"---\")\n",
"new_set = A.union(B) # Same operation as above but using method\n",
"print(new_set)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "cb6bd2f9",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'c', 'a', 'e'}\n",
"---\n",
"{'y', 'z', 'x'}\n"
]
}
],
"source": [
"# Difference Operation\n",
"new_set = A - B\n",
"print(new_set)\n",
"print(\"---\")\n",
"new_set = B.difference(A) # note that order matters for difference\n",
"print(new_set)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "4e516175",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'d', 'b'}\n",
"---\n",
"{'d', 'b'}\n"
]
}
],
"source": [
"# Intersection Operation\n",
"new_set = A & B\n",
"print(new_set)\n",
"print(\"---\")\n",
"new_set = A.intersection(B) # same operation as above but using method\n",
"print(new_set)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "6d6fcff7",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'z', 'y', 'e', 'c', 'x', 'a'}\n",
"---\n",
"{'z', 'y', 'e', 'c', 'x', 'a'}\n"
]
}
],
"source": [
"# Symmetric Difference Operation\n",
"new_set = A ^ B\n",
"print(new_set)\n",
"print(\"---\")\n",
"new_set = A.symmetric_difference(B) # same operation as above but using method\n",
"print(new_set)"
]
},
{
"cell_type": "markdown",
"id": "2558302f",
"metadata": {},
"source": [
"### Other Set Methods\n",
"\n",
"| Method | Purpose | \n",
"|--------|---------|\n",
"| `s.add(item)` | Adds an item to set. |\n",
"| `s.update(iterable)` | Adds all items from iterable to the set. |\n",
"| `s.remove(item)` | Remove an item from set. |\n",
"| `s.discard(item)` | Remove an item from set if it is present, fail silently if not. |\n",
"| `s.pop()` | Remove an arbitrary item from the set. |\n",
"| `s.clear()` | Remove all items from the set. |"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "e7be6cfb-5778-4106-a454-ab13cbd96e30",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"None\n"
]
}
],
"source": [
"s = {1, 2, 3}\n",
"print(s.remove(4))\n",
"#print(s)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "322c8f1a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Removed Ace\n",
"{'J', '5', '8', '4', '6', '9', 'Q', 'K', '2', '3', '7'}\n"
]
}
],
"source": [
"s = set() # why not {}?\n",
"\n",
"s.update([\"A\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7\", \"8\", \"9\", \"J\", \"Q\", \"K\"])\n",
"\n",
"s.remove(\"A\")\n",
"print(\"Removed Ace\")\n",
"print(s)"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "41a8ea98-1ffe-43b9-9548-fa43197cad94",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"'J'"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"s.pop()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2555b568",
"metadata": {},
"outputs": [],
"source": [
"s.discard(\"9\")\n",
"# print(\"Discarded Ace\")\n",
"print(s)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ccc84afc",
"metadata": {},
"outputs": [],
"source": [
"card = s.pop()\n",
"print(\"Popped\", card)\n",
"print(s)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb3ed63b",
"metadata": {},
"outputs": [],
"source": [
"print(\"---\")\n",
"s.add(\"Joker\")\n",
"print(s)\n",
"\n",
"\n",
"\"Honda Civic\" in [\n",
" \"Honda Civic\",\n",
" \"Ford Focus\",\n",
" \"Honda Civic\",\n",
" \"Honda Civic\",\n",
" \"Honda Civic\",\n",
" \"Honda Civic\",\n",
" \"Honda Civic\",\n",
" \"Escalade\",\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "c1ec32b4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All 3 ordered: {'eggs'}\n",
"Only ordered by #1: {'juice', 'pancakes'}\n"
]
}
],
"source": [
"d1 = {\"eggs\": 2, \"pancakes\": 100, \"juice\": 1}\n",
"d2 = {\"eggs\": 3, \"waffles\": 1, \"coffee\": 1}\n",
"d3 = {\"eggs\": 1, \"fruit salad\": 1}\n",
"\n",
"print(\"All 3 ordered:\", set(d1) & set(d2) & set(d3))\n",
"print(\"Only ordered by #1:\", set(d1) - set(d2))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "434fc861",
"metadata": {},
"outputs": [],
"source": [
"set(d1.items())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "32718b95",
"metadata": {},
"outputs": [],
"source": [
"s = {\"one\", \"two\", \"three\", \"four\"}\n",
"for x in s:\n",
" print(x)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "c86b847d",
"metadata": {},
"outputs": [],
"source": [
"students = [\n",
" {\"name\": \"adam\", \"num\": 123},\n",
" {\"name\": \"quynh\", \"num\": 456},\n",
" {\"name\": \"quynh\", \"num\": 456},\n",
" {\"name\": \"adam\", \"num\": 999},\n",
"]\n",
"\n",
"s = set()\n",
"for student in students:\n",
" s.add(tuple(student.items()))\n",
" # not \n",
" #s.add(student)\n",
"deduplicated = s\n"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "236f3706",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'name': 'adam', 'num': 123}\n",
"{'name': 'adam', 'num': 999}\n",
"{'name': 'quynh', 'num': 456}\n"
]
}
],
"source": [
"for student in deduplicated:\n",
" print(dict(student))"
]
},
{
"cell_type": "markdown",
"id": "5ddacfcf",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"toc-hr-collapsed": true
},
"source": [
"## Discussion\n",
"\n",
"#### Are sets sequences?\n",
"\n",
"#### Why do set members need to be immutable?\n",
"\n",
"#### How can we store compound values in sets?\n",
"\n",
"#### Why do dictionary keys have the same restrictions?"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "cdf70650-cd3f-440c-a9e1-2d4d27dfca99",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"frozenset({1, 2, 3})\n"
]
}
],
"source": [
"# frozenset demo\n",
"nums = [1, 2, 2, 2, 3, 3]\n",
"frozen_nums = frozenset(nums)\n",
"print(frozen_nums)"
]
},
{
"cell_type": "code",
"execution_count": 50,
"id": "f3a752f5-c01e-4b7e-83f9-560f2db27559",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{frozenset({1, 2, 3}), frozenset({'B', 'C', 'A'})}\n"
]
},
{
"ename": "AttributeError",
"evalue": "'frozenset' object has no attribute 'add'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[50], line 5\u001b[0m\n\u001b[1;32m 1\u001b[0m nested \u001b[38;5;241m=\u001b[39m {frozen_nums, \u001b[38;5;28mfrozenset\u001b[39m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mABC\u001b[39m\u001b[38;5;124m\"\u001b[39m)}\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28mprint\u001b[39m(nested)\n\u001b[0;32m----> 5\u001b[0m \u001b[43mfrozen_nums\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43madd\u001b[49m(\u001b[38;5;241m4\u001b[39m)\n",
"\u001b[0;31mAttributeError\u001b[0m: 'frozenset' object has no attribute 'add'"
]
}
],
"source": [
"nested = {frozen_nums, frozenset(\"ABC\")}\n",
"\n",
"print(nested)\n",
"\n",
"frozen_nums.add(4)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "690909f7-5bd4-4040-94ee-fad99d4e6c85",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"xx = set(\"hello\")"
]
},
{
"cell_type": "code",
"execution_count": 57,
"id": "72ed3c15-7a07-40a1-b5bd-af08213283a9",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'e', 'a', 'u', 'o', 'i'}\n"
]
}
],
"source": [
"vowels = set(\"aeiou\")\n",
"print(vowels)"
]
},
{
"cell_type": "code",
"execution_count": 58,
"id": "b3c19f35-ced1-44a9-b82a-c6ef116a7a63",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'h', 'l'}"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xx - vowels"
]
},
{
"cell_type": "markdown",
"id": "3037023e",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"toc-hr-collapsed": true
},
"source": [
"## Mutability\n",
"\n",
"Mutable values can be changed in place.\n",
"\n",
"We've seen that `list` was mutable, and `dict` and `set` as well now.\n",
"\n",
"#### Mutable\n",
" - `list`\n",
" - `dict`\n",
" - `set`\n",
" \n",
"#### Immutable\n",
" - `str`\n",
" - `tuple`\n",
" - `frozenset`\n",
" - scalars: `int`, `float`, `complex`, `bool`, `None`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38a9d1bf",
"metadata": {},
"outputs": [],
"source": [
"# list\n",
"d = [1, 2, 3]\n",
"d.append(4)\n",
"print(d)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62198c82",
"metadata": {},
"outputs": [],
"source": [
"# str\n",
"s = \"Hello\"\n",
"s = s + \" World\"\n",
"s\n",
"\n",
"# how did s change?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b52f9779-83e9-4781-a785-3c8fff6341cf",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"s = \"Hello World\"\n",
"t = s.lower()\n",
"print(s)\n",
"print(t)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a9bf4ab-dbcd-4dd3-a928-89d33ce88273",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.15"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "305.8px"
},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 5
}