51042-notes/13.python-data-model.ipynb
2024-11-18 09:45:21 -06:00

887 lines
22 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Python Data Model\n",
"\n",
"## Python Data Model\n",
"\n",
"Reference: https://docs.python.org/3/reference/datamodel.html\n",
"\n",
"### StaticArray Example\n",
"\n",
"To demonstrate operator overloading, we'll implement a sequence type seen in other languages known as a *static array*:\n",
"\n",
"- A static array is a sequence type (review: what makes a sequence type) where there is a fixed capacity to number of items the collection can hold.\n",
"\n",
"- Resizing of the array is not allowed after initialization. \n",
"\n",
"- We will define a class ``StaticArray`` that will allow use to use built-in python operators.\n",
"\n",
"We'll be able to use it like this:\n",
"\n",
"```python\n",
">>> from static_array import StaticArray\n",
">>> sa = StaticArray([1, 2, 3])\n",
">>> print(sa * 2)\n",
"# should produce the following output:\n",
"# [1, 2, 3, 1, 2, 3]\n",
">>> print(sa[1])\n",
"2\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"from collections.abc import Iterable\n",
"\n",
"\n",
"class StaticArray:\n",
" def __init__(self, init_val, capacity=5):\n",
" if isinstance(init_val, Iterable):\n",
" self.items = list(init_val)\n",
" self.capacity = len(self.items)\n",
" else:\n",
" self.items = [init_val] * capacity\n",
" self.capacity = capacity\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sa = StaticArray([1, 2, 3])\n",
"print(sa)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"sa = StaticArray(0, 5)\n",
"print(sa)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# we can fix that by defining a __repr__ method\n",
"\n",
"\n",
"class StaticArray:\n",
" def __init__(self, init_val, capacity=5):\n",
" if isinstance(init_val, Iterable):\n",
" self.items = list(init_val)\n",
" self.capacity = len(self.items)\n",
" else:\n",
" self.items = [init_val] * capacity\n",
" self.capacity = capacity\n",
"\n",
" def __repr__(self):\n",
" return f\"StaticArray({self.items})\"\n",
"\n",
"\n",
"sa = StaticArray([1, 2, 3])\n",
"print(sa)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"sa = StaticArray(0, 5)\n",
"print(sa)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Emulating Collections & Sequences\n",
"\n",
"**Collections**\n",
"\n",
"- Have a length: `len(obj)`\n",
"- Can be iterated over: `for item in obj`\n",
"- Can query for membership: `item in obj`\n",
"\n",
"**Sequences**\n",
"\n",
"- Everything a collection can do\n",
"- Can be indexed: `obj[0]`\n",
"\n",
"| You Write... | Python Calls... |\n",
"|--------------|-----------------|\n",
"| `len(obj)` | `obj.__len__()` |\n",
"| `for item in obj` | `obj.__iter__()` |\n",
"| `item in obj` | `obj.__contains__(item)` |\n",
"| `obj[i]` | `obj.__getitem__(i)` |\n",
"| `obj[i] = x` | `obj.__setitem__(i, x)` |\n",
"| `del obj[i]` | `obj.__delitem__(i)` |\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class StaticArray:\n",
" def __init__(self, init_val, capacity=5):\n",
" if isinstance(init_val, Iterable):\n",
" self.items = list(init_val)\n",
" self.capacity = len(self.items)\n",
" else:\n",
" self.items = [init_val] * capacity\n",
" self.capacity = capacity\n",
"\n",
" def __repr__(self):\n",
" return f\"StaticArray({self.items})\"\n",
"\n",
" def __str__(self):\n",
" return f\"StaticArray({self.items})\"\n",
"\n",
" def __len__(self):\n",
" return self.capacity\n",
"\n",
" def __contains__(self, item):\n",
" return item in self.items\n",
"\n",
" def __getitem__(self, index):\n",
" if index >= self.capacity or index < -self.capacity:\n",
" raise IndexError(\"Index out of range\")\n",
" return self.items[index]\n",
"\n",
" def __setitem__(self, index, val):\n",
" if index >= self.capacity or index < -self.capacity:\n",
" raise IndexError(\"Index out of range\")\n",
" self.items[index] = val\n",
"\n",
" def __delitem__(self, index):\n",
" raise NotImplementedError(\"StaticArray does not support deletion\")\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sa = StaticArray([1, \"hi\", 3.14, True])\n",
"len(sa)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"42 in sa\n",
"\"hi\" in sa"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sa[3]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sa[42] = \"hello\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Numeric Operators\n",
"\n",
"| You Write... | Python Calls... |\n",
"|--------------|-----------------|\n",
"| `x + y` | `x.__add__(y)` |\n",
"| `x - y` | `x.__sub__(y)` |\n",
"| `x * y` | `x.__mul__(y)` |\n",
"| `x / y` | `x.__truediv__(y)` |\n",
"| `x // y` | `x.__floordiv__(y)` |\n",
"| `x % y` | `x.__mod__(y)` |\n",
"| `x ** y` | `x.__pow__(y)` |\n",
"| `x @ y` | `x.__matmul__(y)` |\n",
"\n",
"### Reverse / Reflected / Right Operators\n",
"\n",
"| You Write... | Python Calls... |\n",
"|--------------|-----------------|\n",
"| `x + y` | `y.__radd__(x)` |\n",
"| `x - y` | `y.__rsub__(x)` |\n",
"| `x * y` | `y.__rmul__(x)` |\n",
"| `x / y` | `y.__rtruediv__(x)` |\n",
"| `x // y` | `y.__rfloordiv__(x)` |\n",
"| `x % y` | `y.__rmod__(x)` |\n",
"| `x ** y` | `y.__rpow__(x)` |\n",
"| `x @ y` | `y.__rmatmul__(x)` |\n",
"\n",
"![](images/reverse_operators.png)\n",
"\n",
"### Comparison Operators\n",
"\n",
"| You Write... | Python Calls... |\n",
"|--------------|-----------------| \n",
"| `x < y` | `x.__lt__(y)` |\n",
"| `x <= y` | `x.__le__(y)` |\n",
"| `x > y` | `x.__gt__(y)` |\n",
"| `x >= y` | `x.__ge__(y)` |\n",
"| `x == y` | `x.__eq__(y)` |\n",
"| `x != y` | `x.__ne__(y)` |\n",
"\n",
"\n",
"### Iteration\n",
"\n",
"Iterable vs. Iterator (Review)\n",
"\n",
"Objects like lists, tuples, and strings are *iterable*.\n",
"\n",
"To keep track of the position within a given iteration (for loop, calls to `next`), Python uses an *iterator*."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ll = [1, 2, 3, 4]\n",
"iterator = iter(ll)\n",
"print(\"iterator 1 next()\", next(iterator))\n",
"print(\"iterator 1 next()\", next(iterator))\n",
"iterator2 = iter(ll)\n",
"print(\"iterator 2 next()\", next(iterator2))\n",
"print(\"iterator 1 next()\", next(iterator))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To be iterable, a class needs an `__iter__` method that returns an iterator.\n",
"\n",
"An iterator is an object with a `__next__` method that returns the next item in the iteration. It should raise `StopIteration` when there are no more items.\n",
"\n",
"Common Pattern: If a class only needs to be iterable once, it can return itself as the iterator, thus fulfilling both roles."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for i in iterable:\n",
" print(i)\n",
"\n",
"iterator = iter(iterable)\n",
"while True:\n",
" print(next(iterator))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"class SimpleRange:\n",
" def __init__(self, n):\n",
" self.current = 0\n",
" self.n = n\n",
"\n",
" def __iter__(self):\n",
" print(\"iter has been called\")\n",
" return self\n",
"\n",
" def __next__(self):\n",
" if self.current >= self.n:\n",
" print(\"at the end\")\n",
" raise StopIteration\n",
" else:\n",
" print(f\"next was called, moving {self.current} to {self.current+1}\")\n",
" self.current += 1\n",
" return self.current - 1\n",
"\n",
" def __repr__(self):\n",
" return f\"SimpleRange({self.n}, current={self.current})\"\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"iter has been called\n",
"next was called, moving 0 to 1\n",
"iter has been called\n",
"next was called, moving 1 to 2\n",
"0 1\n",
"next was called, moving 2 to 3\n",
"0 2\n",
"at the end\n",
"at the end\n"
]
}
],
"source": [
"sr = SimpleRange(3)\n",
"for i in sr:\n",
" for j in sr:\n",
" print(i, j)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"sr = SimpleRange(5)\n",
"siter = iter(sr)\n",
"print(siter)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"siter is sr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"next(siter)\n",
"print(siter)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Iteration Advice \n",
"\n",
"1. Do not implement the ``__next__()`` in a class that should only be an iterable. \n",
"2. In order to support multiple traversals, the iterator must be a seperate object. \n",
"3. A common design pattern is to delegate iteration to a seperate class that is iterable.\n",
"\n",
"For example, defining an ``StaticArrayIterator`` class that is in charge iterating through the objects within an ``StaticArray`` object. "
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"# Adding __iter__ to StaticArray\n",
"class StaticArrayIterator:\n",
" def __init__(self, values):\n",
" self.values = values\n",
" self.position = 0\n",
"\n",
" def __next__(self):\n",
" if self.position >= len(self.values):\n",
" raise StopIteration\n",
" item = self.values[self.position]\n",
" self.position += 1\n",
" return item\n",
"\n",
" def __repr__(self):\n",
" return f\"iterating over {self.values}, at position {self.position}\"\n",
"\n",
"\n",
"# Adding __iter__\n",
"\n",
"\n",
"class StaticArray:\n",
" def __init__(self, capacity, initial=None):\n",
" self._items = [initial] * capacity\n",
" self._capacity = capacity\n",
" self._iter_position = 0\n",
"\n",
" @classmethod\n",
" def from_iterable(self, iterable):\n",
" new_array = StaticArray(len(iterable))\n",
" for idx, item in enumerate(iterable):\n",
" new_array._items[idx] = item\n",
" return new_array\n",
"\n",
" def __repr__(self):\n",
" # __repr__ is the unambiguous string representation\n",
" # of an object\n",
" return f\"StaticArray({self._capacity}, {self._items})\"\n",
"\n",
" def __str__(self):\n",
" return repr(self._items)\n",
"\n",
" # Sequence Operations ####\n",
"\n",
" def __len__(self):\n",
" return self._capacity\n",
"\n",
" def __contains__(self, x):\n",
" return x in self._items\n",
"\n",
" def __getitem__(self, i):\n",
" if i >= self._capacity or i < -self._capacity:\n",
" raise IndexError # an invalid index\n",
" return self._items[i]\n",
"\n",
" def __setitem__(self, i, x):\n",
" if i >= self._capacity or i < -self._capacity:\n",
" raise IndexError # an invalid index\n",
" self._items[i] = x\n",
"\n",
" def __delitem__(self, i):\n",
" raise NotImplementedError(\"Cannot delete from a static array\")\n",
"\n",
" # Iterable Operations ####\n",
" def __iter__(self):\n",
" return StaticArrayIterator(self._items.copy())\n"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1, 2, 3, 4, 5]\n",
"1 1\n",
"1 2\n",
"1 3\n",
"1 4\n",
"1 5\n",
"2 1\n",
"2 2\n",
"2 3\n",
"2 4\n",
"2 5\n",
"3 1\n",
"3 2\n",
"3 3\n",
"3 4\n",
"3 5\n",
"4 1\n",
"4 2\n",
"4 3\n",
"4 4\n",
"4 5\n",
"5 1\n",
"5 2\n",
"5 3\n",
"5 4\n",
"5 5\n"
]
}
],
"source": [
"sa = StaticArray(5, 2)\n",
"sa[0] = 1\n",
"sa[1] = 2\n",
"sa[2] = 3\n",
"sa[3] = 4\n",
"sa[4] = 5\n",
"print(sa)\n",
"for x in sa:\n",
" for y in sa:\n",
" print(x, y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Context Managers / `with`\n",
"\n",
"We also saw this idea of needing to clean up after ourselves when we used `with` to open files.\n",
"\n",
"```python\n",
"\n",
"with open(filename) as f:\n",
" # do things with f\n",
" g(f)\n",
"# f is guaranteed to be closed even if \n",
"# exceptions are raised within with block\n",
"```\n",
"\n",
"### Yet another set of dunder methods..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class DatabaseConnection:\n",
" def __init__(self, username, password):\n",
" # connect to database\n",
" self.username = username\n",
" self.password = password\n",
" self.connected = True\n",
"\n",
" def __enter__(self):\n",
" print(\"__enter__\")\n",
" # must return self!\n",
" return self\n",
"\n",
" def __exit__(self, exc_type, exc_val, exc_traceback):\n",
" print(\"__exit__\")\n",
" if exc_type:\n",
" print(\"rolling back changes\")\n",
" self.connected = False\n",
"\n",
" def query(self, sql):\n",
" print(\"ran query\", sql)\n",
"\n",
" def __repr__(self):\n",
" return f\"Connection connected={self.connected}\"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"db = DatabaseConnection(\"hello\", \"world\")\n",
"db.query(\"SELECT * FROM users;\")\n",
"# do something dangerous\n",
"1 / 0"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# our connection is possibly left in a broken state\n",
"print(db)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"with DatabaseConnection(\"hello\", \"world\") as db:\n",
" # __enter__\n",
" db.query(\"SELECT * from users;\")\n",
" 1 / 0\n",
" # __exit__"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# changes were rolled back, and our connection is safe\n",
"db.connected\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Callable Objects Examples\n",
"\n",
"Functions have a few attributes like `__name__` and `__doc__` that we can use to introspect on them."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"add\n",
"Adds two numbers\n"
]
}
],
"source": [
"def add(x, y):\n",
" \"\"\"Adds two numbers\"\"\"\n",
" return x + y\n",
"\n",
"\n",
"print(add.__name__)\n",
"print(add.__doc__)\n",
"\n",
"x = add"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"x.__name__"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"class Example:\n",
" def __init__(self, name):\n",
" self.name = name\n",
" self.num_calls = 0\n",
" def __call__(self, **kwargs):\n",
" print(self.num_calls)\n",
" self.num_calls += 1\n",
" print(self.name, \"got\", args)\n",
"\n",
"example = Example(\"one\")\n",
"two = Example(\"two\")"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"7\n",
"one got (1, 2, 3)\n"
]
}
],
"source": [
"example(1, 2, 3)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2\n",
"two got ()\n"
]
}
],
"source": [
"two()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"They also have a `__call__` method that allows us to make our own objects callable. For example:"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [],
"source": [
"class Memoized:\n",
" def __init__(self, func):\n",
" self.cache = {}\n",
" self.wrapped_func = func\n",
"\n",
" def __call__(self, *args):\n",
" if args not in self.cache:\n",
" self.cache[args] = self.wrapped_func(*args)\n",
" return self.cache[args]\n"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"running expensive_func\n",
"6\n",
"6\n"
]
}
],
"source": [
"@Memoized\n",
"def expensive_func(a, b, c):\n",
" print(\"running expensive_func\")\n",
" return a + b + c\n",
"\n",
"#expensive_func = Memoized(expensive_func)\n",
"\n",
"print(expensive_func(1, 2, 3))\n",
"print(expensive_func(1, 2, 3))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"class PartialFunc:\n",
" # simplified functools.partial\n",
"\n",
" def __init__(self, func, *args, **kwargs):\n",
" self.func = func\n",
" self.args = args\n",
" self.kwargs = kwargs\n",
"\n",
" def __call__(self, *args, **kwargs):\n",
" temp_kwargs = self.kwargs.copy()\n",
" temp_kwargs.update(kwargs)\n",
" return self.func(*(self.args + args), **temp_kwargs)\n",
"\n",
" @property\n",
" def __name__(self):\n",
" return f\"{self.func.__name__}(args={self.args} kwargs={self.kwargs})\"\n",
"\n",
" @property\n",
" def __doc__(self):\n",
" return self.func.__doc__\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def add(x, y):\n",
" \"\"\"Adds two numbers\"\"\"\n",
" return x + y\n",
"\n",
"add_5 = PartialFunc(add, 5)\n",
"print(add_5(10))\n",
"\n",
"print(add_5.__name__)\n",
"print(add_5.__doc__)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.15"
}
},
"nbformat": 4,
"nbformat_minor": 4
}