1135 lines
41 KiB
Plaintext
1135 lines
41 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c13b000e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## modules\n",
|
|
"\n",
|
|
"Why do we use modules?\n",
|
|
"\n",
|
|
"- Code reuse: allows code to be shared & reused.\n",
|
|
"- Namespace partitioning: Avoid namespace clashes among different parts of your program.\n",
|
|
"\n",
|
|
"e.g.\n",
|
|
"```\n",
|
|
"math.isclose(a, b) # compares two floats (math.isclose(0.1+0.2, 0.3) == True)\n",
|
|
"directions.isclose(point1, location) \n",
|
|
"```\n",
|
|
"\n",
|
|
"### Terminology\n",
|
|
"\n",
|
|
"Python files can either be:\n",
|
|
"\n",
|
|
"**Top Level Files**\n",
|
|
"\n",
|
|
"Sometimes called a \"script\", consists of main control flow of program. Will typically use modules.\n",
|
|
"\n",
|
|
"**Modules**\n",
|
|
"\n",
|
|
"Define set of variables, functions, classes, etc. that can be used by other programs/modules.\n",
|
|
"\n",
|
|
"**Application**\n",
|
|
"\n",
|
|
"Top-level file that uses other modules.\n",
|
|
"\n",
|
|
"**Library**\n",
|
|
"\n",
|
|
"Collection of one or more modules with no top level file.\n",
|
|
"\n",
|
|
"\n",
|
|
"### Import Syntax\n",
|
|
"\n",
|
|
"```python\n",
|
|
"# bring `modulename` into current scope\n",
|
|
"import modulename \n",
|
|
"\n",
|
|
"# brings `thing1`, `thing2` into current scope\n",
|
|
"from math import sin, cos \n",
|
|
"\n",
|
|
"# bring `thing1` into current scope, but with `new_name`\n",
|
|
"from modulename import thing1 as new_name \n",
|
|
"\n",
|
|
"# import everything from `modulename` into scope (DO NOT USE)\n",
|
|
"from modulename import *\n",
|
|
"```\n",
|
|
"\n",
|
|
"When an `import` statement is run (either form), the following happens:\n",
|
|
"\n",
|
|
"- Python searches on disk for the module. (order determined by PYTHONPATH)\n",
|
|
"- Once found, the file is executed until the end of the file is reached.\n",
|
|
"- If `import modname`, then all top-level definitions are assigned to the module namespace.\n",
|
|
"- If `from modname`, then the imported definitions are added to the global namespace.\n",
|
|
"\n",
|
|
"Note: `print` statements & other top-level code will run."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "de09d942",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "4d5f6e18",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Help on module statistics:\n",
|
|
"\n",
|
|
"NAME\n",
|
|
" statistics - Basic statistics module.\n",
|
|
"\n",
|
|
"MODULE REFERENCE\n",
|
|
" https://docs.python.org/3.10/library/statistics.html\n",
|
|
" \n",
|
|
" The following documentation is automatically generated from the Python\n",
|
|
" source files. It may be incomplete, incorrect or include features that\n",
|
|
" are considered implementation detail and may vary between Python\n",
|
|
" implementations. When in doubt, consult the module reference at the\n",
|
|
" location listed above.\n",
|
|
"\n",
|
|
"DESCRIPTION\n",
|
|
" This module provides functions for calculating statistics of data, including\n",
|
|
" averages, variance, and standard deviation.\n",
|
|
" \n",
|
|
" Calculating averages\n",
|
|
" --------------------\n",
|
|
" \n",
|
|
" ================== ==================================================\n",
|
|
" Function Description\n",
|
|
" ================== ==================================================\n",
|
|
" mean Arithmetic mean (average) of data.\n",
|
|
" fmean Fast, floating point arithmetic mean.\n",
|
|
" geometric_mean Geometric mean of data.\n",
|
|
" harmonic_mean Harmonic mean of data.\n",
|
|
" median Median (middle value) of data.\n",
|
|
" median_low Low median of data.\n",
|
|
" median_high High median of data.\n",
|
|
" median_grouped Median, or 50th percentile, of grouped data.\n",
|
|
" mode Mode (most common value) of data.\n",
|
|
" multimode List of modes (most common values of data).\n",
|
|
" quantiles Divide data into intervals with equal probability.\n",
|
|
" ================== ==================================================\n",
|
|
" \n",
|
|
" Calculate the arithmetic mean (\"the average\") of data:\n",
|
|
" \n",
|
|
" >>> mean([-1.0, 2.5, 3.25, 5.75])\n",
|
|
" 2.625\n",
|
|
" \n",
|
|
" \n",
|
|
" Calculate the standard median of discrete data:\n",
|
|
" \n",
|
|
" >>> median([2, 3, 4, 5])\n",
|
|
" 3.5\n",
|
|
" \n",
|
|
" \n",
|
|
" Calculate the median, or 50th percentile, of data grouped into class intervals\n",
|
|
" centred on the data values provided. E.g. if your data points are rounded to\n",
|
|
" the nearest whole number:\n",
|
|
" \n",
|
|
" >>> median_grouped([2, 2, 3, 3, 3, 4]) #doctest: +ELLIPSIS\n",
|
|
" 2.8333333333...\n",
|
|
" \n",
|
|
" This should be interpreted in this way: you have two data points in the class\n",
|
|
" interval 1.5-2.5, three data points in the class interval 2.5-3.5, and one in\n",
|
|
" the class interval 3.5-4.5. The median of these data points is 2.8333...\n",
|
|
" \n",
|
|
" \n",
|
|
" Calculating variability or spread\n",
|
|
" ---------------------------------\n",
|
|
" \n",
|
|
" ================== =============================================\n",
|
|
" Function Description\n",
|
|
" ================== =============================================\n",
|
|
" pvariance Population variance of data.\n",
|
|
" variance Sample variance of data.\n",
|
|
" pstdev Population standard deviation of data.\n",
|
|
" stdev Sample standard deviation of data.\n",
|
|
" ================== =============================================\n",
|
|
" \n",
|
|
" Calculate the standard deviation of sample data:\n",
|
|
" \n",
|
|
" >>> stdev([2.5, 3.25, 5.5, 11.25, 11.75]) #doctest: +ELLIPSIS\n",
|
|
" 4.38961843444...\n",
|
|
" \n",
|
|
" If you have previously calculated the mean, you can pass it as the optional\n",
|
|
" second argument to the four \"spread\" functions to avoid recalculating it:\n",
|
|
" \n",
|
|
" >>> data = [1, 2, 2, 4, 4, 4, 5, 6]\n",
|
|
" >>> mu = mean(data)\n",
|
|
" >>> pvariance(data, mu)\n",
|
|
" 2.5\n",
|
|
" \n",
|
|
" \n",
|
|
" Statistics for relations between two inputs\n",
|
|
" -------------------------------------------\n",
|
|
" \n",
|
|
" ================== ====================================================\n",
|
|
" Function Description\n",
|
|
" ================== ====================================================\n",
|
|
" covariance Sample covariance for two variables.\n",
|
|
" correlation Pearson's correlation coefficient for two variables.\n",
|
|
" linear_regression Intercept and slope for simple linear regression.\n",
|
|
" ================== ====================================================\n",
|
|
" \n",
|
|
" Calculate covariance, Pearson's correlation, and simple linear regression\n",
|
|
" for two inputs:\n",
|
|
" \n",
|
|
" >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
|
|
" >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]\n",
|
|
" >>> covariance(x, y)\n",
|
|
" 0.75\n",
|
|
" >>> correlation(x, y) #doctest: +ELLIPSIS\n",
|
|
" 0.31622776601...\n",
|
|
" >>> linear_regression(x, y) #doctest:\n",
|
|
" LinearRegression(slope=0.1, intercept=1.5)\n",
|
|
" \n",
|
|
" \n",
|
|
" Exceptions\n",
|
|
" ----------\n",
|
|
" \n",
|
|
" A single exception is defined: StatisticsError is a subclass of ValueError.\n",
|
|
"\n",
|
|
"CLASSES\n",
|
|
" builtins.ValueError(builtins.Exception)\n",
|
|
" StatisticsError\n",
|
|
" builtins.object\n",
|
|
" NormalDist\n",
|
|
" \n",
|
|
" class NormalDist(builtins.object)\n",
|
|
" | NormalDist(mu=0.0, sigma=1.0)\n",
|
|
" | \n",
|
|
" | Normal distribution of a random variable\n",
|
|
" | \n",
|
|
" | Methods defined here:\n",
|
|
" | \n",
|
|
" | __add__(x1, x2)\n",
|
|
" | Add a constant or another NormalDist instance.\n",
|
|
" | \n",
|
|
" | If *other* is a constant, translate mu by the constant,\n",
|
|
" | leaving sigma unchanged.\n",
|
|
" | \n",
|
|
" | If *other* is a NormalDist, add both the means and the variances.\n",
|
|
" | Mathematically, this works only if the two distributions are\n",
|
|
" | independent or if they are jointly normally distributed.\n",
|
|
" | \n",
|
|
" | __eq__(x1, x2)\n",
|
|
" | Two NormalDist objects are equal if their mu and sigma are both equal.\n",
|
|
" | \n",
|
|
" | __getstate__(self)\n",
|
|
" | \n",
|
|
" | __hash__(self)\n",
|
|
" | NormalDist objects hash equal if their mu and sigma are both equal.\n",
|
|
" | \n",
|
|
" | __init__(self, mu=0.0, sigma=1.0)\n",
|
|
" | NormalDist where mu is the mean and sigma is the standard deviation.\n",
|
|
" | \n",
|
|
" | __mul__(x1, x2)\n",
|
|
" | Multiply both mu and sigma by a constant.\n",
|
|
" | \n",
|
|
" | Used for rescaling, perhaps to change measurement units.\n",
|
|
" | Sigma is scaled with the absolute value of the constant.\n",
|
|
" | \n",
|
|
" | __neg__(x1)\n",
|
|
" | Negates mu while keeping sigma the same.\n",
|
|
" | \n",
|
|
" | __pos__(x1)\n",
|
|
" | Return a copy of the instance.\n",
|
|
" | \n",
|
|
" | __radd__ = __add__(x1, x2)\n",
|
|
" | \n",
|
|
" | __repr__(self)\n",
|
|
" | Return repr(self).\n",
|
|
" | \n",
|
|
" | __rmul__ = __mul__(x1, x2)\n",
|
|
" | \n",
|
|
" | __rsub__(x1, x2)\n",
|
|
" | Subtract a NormalDist from a constant or another NormalDist.\n",
|
|
" | \n",
|
|
" | __setstate__(self, state)\n",
|
|
" | \n",
|
|
" | __sub__(x1, x2)\n",
|
|
" | Subtract a constant or another NormalDist instance.\n",
|
|
" | \n",
|
|
" | If *other* is a constant, translate by the constant mu,\n",
|
|
" | leaving sigma unchanged.\n",
|
|
" | \n",
|
|
" | If *other* is a NormalDist, subtract the means and add the variances.\n",
|
|
" | Mathematically, this works only if the two distributions are\n",
|
|
" | independent or if they are jointly normally distributed.\n",
|
|
" | \n",
|
|
" | __truediv__(x1, x2)\n",
|
|
" | Divide both mu and sigma by a constant.\n",
|
|
" | \n",
|
|
" | Used for rescaling, perhaps to change measurement units.\n",
|
|
" | Sigma is scaled with the absolute value of the constant.\n",
|
|
" | \n",
|
|
" | cdf(self, x)\n",
|
|
" | Cumulative distribution function. P(X <= x)\n",
|
|
" | \n",
|
|
" | inv_cdf(self, p)\n",
|
|
" | Inverse cumulative distribution function. x : P(X <= x) = p\n",
|
|
" | \n",
|
|
" | Finds the value of the random variable such that the probability of\n",
|
|
" | the variable being less than or equal to that value equals the given\n",
|
|
" | probability.\n",
|
|
" | \n",
|
|
" | This function is also called the percent point function or quantile\n",
|
|
" | function.\n",
|
|
" | \n",
|
|
" | overlap(self, other)\n",
|
|
" | Compute the overlapping coefficient (OVL) between two normal distributions.\n",
|
|
" | \n",
|
|
" | Measures the agreement between two normal probability distributions.\n",
|
|
" | Returns a value between 0.0 and 1.0 giving the overlapping area in\n",
|
|
" | the two underlying probability density functions.\n",
|
|
" | \n",
|
|
" | >>> N1 = NormalDist(2.4, 1.6)\n",
|
|
" | >>> N2 = NormalDist(3.2, 2.0)\n",
|
|
" | >>> N1.overlap(N2)\n",
|
|
" | 0.8035050657330205\n",
|
|
" | \n",
|
|
" | pdf(self, x)\n",
|
|
" | Probability density function. P(x <= X < x+dx) / dx\n",
|
|
" | \n",
|
|
" | quantiles(self, n=4)\n",
|
|
" | Divide into *n* continuous intervals with equal probability.\n",
|
|
" | \n",
|
|
" | Returns a list of (n - 1) cut points separating the intervals.\n",
|
|
" | \n",
|
|
" | Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles.\n",
|
|
" | Set *n* to 100 for percentiles which gives the 99 cuts points that\n",
|
|
" | separate the normal distribution in to 100 equal sized groups.\n",
|
|
" | \n",
|
|
" | samples(self, n, *, seed=None)\n",
|
|
" | Generate *n* samples for a given mean and standard deviation.\n",
|
|
" | \n",
|
|
" | zscore(self, x)\n",
|
|
" | Compute the Standard Score. (x - mean) / stdev\n",
|
|
" | \n",
|
|
" | Describes *x* in terms of the number of standard deviations\n",
|
|
" | above or below the mean of the normal distribution.\n",
|
|
" | \n",
|
|
" | ----------------------------------------------------------------------\n",
|
|
" | Class methods defined here:\n",
|
|
" | \n",
|
|
" | from_samples(data) from builtins.type\n",
|
|
" | Make a normal distribution instance from sample data.\n",
|
|
" | \n",
|
|
" | ----------------------------------------------------------------------\n",
|
|
" | Readonly properties defined here:\n",
|
|
" | \n",
|
|
" | mean\n",
|
|
" | Arithmetic mean of the normal distribution.\n",
|
|
" | \n",
|
|
" | median\n",
|
|
" | Return the median of the normal distribution\n",
|
|
" | \n",
|
|
" | mode\n",
|
|
" | Return the mode of the normal distribution\n",
|
|
" | \n",
|
|
" | The mode is the value x where which the probability density\n",
|
|
" | function (pdf) takes its maximum value.\n",
|
|
" | \n",
|
|
" | stdev\n",
|
|
" | Standard deviation of the normal distribution.\n",
|
|
" | \n",
|
|
" | variance\n",
|
|
" | Square of the standard deviation.\n",
|
|
" \n",
|
|
" class StatisticsError(builtins.ValueError)\n",
|
|
" | Method resolution order:\n",
|
|
" | StatisticsError\n",
|
|
" | builtins.ValueError\n",
|
|
" | builtins.Exception\n",
|
|
" | builtins.BaseException\n",
|
|
" | builtins.object\n",
|
|
" | \n",
|
|
" | Data descriptors defined here:\n",
|
|
" | \n",
|
|
" | __weakref__\n",
|
|
" | list of weak references to the object (if defined)\n",
|
|
" | \n",
|
|
" | ----------------------------------------------------------------------\n",
|
|
" | Methods inherited from builtins.ValueError:\n",
|
|
" | \n",
|
|
" | __init__(self, /, *args, **kwargs)\n",
|
|
" | Initialize self. See help(type(self)) for accurate signature.\n",
|
|
" | \n",
|
|
" | ----------------------------------------------------------------------\n",
|
|
" | Static methods inherited from builtins.ValueError:\n",
|
|
" | \n",
|
|
" | __new__(*args, **kwargs) from builtins.type\n",
|
|
" | Create and return a new object. See help(type) for accurate signature.\n",
|
|
" | \n",
|
|
" | ----------------------------------------------------------------------\n",
|
|
" | Methods inherited from builtins.BaseException:\n",
|
|
" | \n",
|
|
" | __delattr__(self, name, /)\n",
|
|
" | Implement delattr(self, name).\n",
|
|
" | \n",
|
|
" | __getattribute__(self, name, /)\n",
|
|
" | Return getattr(self, name).\n",
|
|
" | \n",
|
|
" | __reduce__(...)\n",
|
|
" | Helper for pickle.\n",
|
|
" | \n",
|
|
" | __repr__(self, /)\n",
|
|
" | Return repr(self).\n",
|
|
" | \n",
|
|
" | __setattr__(self, name, value, /)\n",
|
|
" | Implement setattr(self, name, value).\n",
|
|
" | \n",
|
|
" | __setstate__(...)\n",
|
|
" | \n",
|
|
" | __str__(self, /)\n",
|
|
" | Return str(self).\n",
|
|
" | \n",
|
|
" | with_traceback(...)\n",
|
|
" | Exception.with_traceback(tb) --\n",
|
|
" | set self.__traceback__ to tb and return self.\n",
|
|
" | \n",
|
|
" | ----------------------------------------------------------------------\n",
|
|
" | Data descriptors inherited from builtins.BaseException:\n",
|
|
" | \n",
|
|
" | __cause__\n",
|
|
" | exception cause\n",
|
|
" | \n",
|
|
" | __context__\n",
|
|
" | exception context\n",
|
|
" | \n",
|
|
" | __dict__\n",
|
|
" | \n",
|
|
" | __suppress_context__\n",
|
|
" | \n",
|
|
" | __traceback__\n",
|
|
" | \n",
|
|
" | args\n",
|
|
"\n",
|
|
"FUNCTIONS\n",
|
|
" correlation(x, y, /)\n",
|
|
" Pearson's correlation coefficient\n",
|
|
" \n",
|
|
" Return the Pearson's correlation coefficient for two inputs. Pearson's\n",
|
|
" correlation coefficient *r* takes values between -1 and +1. It measures the\n",
|
|
" strength and direction of the linear relationship, where +1 means very\n",
|
|
" strong, positive linear relationship, -1 very strong, negative linear\n",
|
|
" relationship, and 0 no linear relationship.\n",
|
|
" \n",
|
|
" >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
|
|
" >>> y = [9, 8, 7, 6, 5, 4, 3, 2, 1]\n",
|
|
" >>> correlation(x, x)\n",
|
|
" 1.0\n",
|
|
" >>> correlation(x, y)\n",
|
|
" -1.0\n",
|
|
" \n",
|
|
" covariance(x, y, /)\n",
|
|
" Covariance\n",
|
|
" \n",
|
|
" Return the sample covariance of two inputs *x* and *y*. Covariance\n",
|
|
" is a measure of the joint variability of two inputs.\n",
|
|
" \n",
|
|
" >>> x = [1, 2, 3, 4, 5, 6, 7, 8, 9]\n",
|
|
" >>> y = [1, 2, 3, 1, 2, 3, 1, 2, 3]\n",
|
|
" >>> covariance(x, y)\n",
|
|
" 0.75\n",
|
|
" >>> z = [9, 8, 7, 6, 5, 4, 3, 2, 1]\n",
|
|
" >>> covariance(x, z)\n",
|
|
" -7.5\n",
|
|
" >>> covariance(z, x)\n",
|
|
" -7.5\n",
|
|
" \n",
|
|
" fmean(data)\n",
|
|
" Convert data to floats and compute the arithmetic mean.\n",
|
|
" \n",
|
|
" This runs faster than the mean() function and it always returns a float.\n",
|
|
" If the input dataset is empty, it raises a StatisticsError.\n",
|
|
" \n",
|
|
" >>> fmean([3.5, 4.0, 5.25])\n",
|
|
" 4.25\n",
|
|
" \n",
|
|
" geometric_mean(data)\n",
|
|
" Convert data to floats and compute the geometric mean.\n",
|
|
" \n",
|
|
" Raises a StatisticsError if the input dataset is empty,\n",
|
|
" if it contains a zero, or if it contains a negative value.\n",
|
|
" \n",
|
|
" No special efforts are made to achieve exact results.\n",
|
|
" (However, this may change in the future.)\n",
|
|
" \n",
|
|
" >>> round(geometric_mean([54, 24, 36]), 9)\n",
|
|
" 36.0\n",
|
|
" \n",
|
|
" harmonic_mean(data, weights=None)\n",
|
|
" Return the harmonic mean of data.\n",
|
|
" \n",
|
|
" The harmonic mean is the reciprocal of the arithmetic mean of the\n",
|
|
" reciprocals of the data. It can be used for averaging ratios or\n",
|
|
" rates, for example speeds.\n",
|
|
" \n",
|
|
" Suppose a car travels 40 km/hr for 5 km and then speeds-up to\n",
|
|
" 60 km/hr for another 5 km. What is the average speed?\n",
|
|
" \n",
|
|
" >>> harmonic_mean([40, 60])\n",
|
|
" 48.0\n",
|
|
" \n",
|
|
" Suppose a car travels 40 km/hr for 5 km, and when traffic clears,\n",
|
|
" speeds-up to 60 km/hr for the remaining 30 km of the journey. What\n",
|
|
" is the average speed?\n",
|
|
" \n",
|
|
" >>> harmonic_mean([40, 60], weights=[5, 30])\n",
|
|
" 56.0\n",
|
|
" \n",
|
|
" If ``data`` is empty, or any element is less than zero,\n",
|
|
" ``harmonic_mean`` will raise ``StatisticsError``.\n",
|
|
" \n",
|
|
" linear_regression(x, y, /)\n",
|
|
" Slope and intercept for simple linear regression.\n",
|
|
" \n",
|
|
" Return the slope and intercept of simple linear regression\n",
|
|
" parameters estimated using ordinary least squares. Simple linear\n",
|
|
" regression describes relationship between an independent variable\n",
|
|
" *x* and a dependent variable *y* in terms of linear function:\n",
|
|
" \n",
|
|
" y = slope * x + intercept + noise\n",
|
|
" \n",
|
|
" where *slope* and *intercept* are the regression parameters that are\n",
|
|
" estimated, and noise represents the variability of the data that was\n",
|
|
" not explained by the linear regression (it is equal to the\n",
|
|
" difference between predicted and actual values of the dependent\n",
|
|
" variable).\n",
|
|
" \n",
|
|
" The parameters are returned as a named tuple.\n",
|
|
" \n",
|
|
" >>> x = [1, 2, 3, 4, 5]\n",
|
|
" >>> noise = NormalDist().samples(5, seed=42)\n",
|
|
" >>> y = [3 * x[i] + 2 + noise[i] for i in range(5)]\n",
|
|
" >>> linear_regression(x, y) #doctest: +ELLIPSIS\n",
|
|
" LinearRegression(slope=3.09078914170..., intercept=1.75684970486...)\n",
|
|
" \n",
|
|
" mean(data)\n",
|
|
" Return the sample arithmetic mean of data.\n",
|
|
" \n",
|
|
" >>> mean([1, 2, 3, 4, 4])\n",
|
|
" 2.8\n",
|
|
" \n",
|
|
" >>> from fractions import Fraction as F\n",
|
|
" >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)])\n",
|
|
" Fraction(13, 21)\n",
|
|
" \n",
|
|
" >>> from decimal import Decimal as D\n",
|
|
" >>> mean([D(\"0.5\"), D(\"0.75\"), D(\"0.625\"), D(\"0.375\")])\n",
|
|
" Decimal('0.5625')\n",
|
|
" \n",
|
|
" If ``data`` is empty, StatisticsError will be raised.\n",
|
|
" \n",
|
|
" median(data)\n",
|
|
" Return the median (middle value) of numeric data.\n",
|
|
" \n",
|
|
" When the number of data points is odd, return the middle data point.\n",
|
|
" When the number of data points is even, the median is interpolated by\n",
|
|
" taking the average of the two middle values:\n",
|
|
" \n",
|
|
" >>> median([1, 3, 5])\n",
|
|
" 3\n",
|
|
" >>> median([1, 3, 5, 7])\n",
|
|
" 4.0\n",
|
|
" \n",
|
|
" median_grouped(data, interval=1)\n",
|
|
" Return the 50th percentile (median) of grouped continuous data.\n",
|
|
" \n",
|
|
" >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5])\n",
|
|
" 3.7\n",
|
|
" >>> median_grouped([52, 52, 53, 54])\n",
|
|
" 52.5\n",
|
|
" \n",
|
|
" This calculates the median as the 50th percentile, and should be\n",
|
|
" used when your data is continuous and grouped. In the above example,\n",
|
|
" the values 1, 2, 3, etc. actually represent the midpoint of classes\n",
|
|
" 0.5-1.5, 1.5-2.5, 2.5-3.5, etc. The middle value falls somewhere in\n",
|
|
" class 3.5-4.5, and interpolation is used to estimate it.\n",
|
|
" \n",
|
|
" Optional argument ``interval`` represents the class interval, and\n",
|
|
" defaults to 1. Changing the class interval naturally will change the\n",
|
|
" interpolated 50th percentile value:\n",
|
|
" \n",
|
|
" >>> median_grouped([1, 3, 3, 5, 7], interval=1)\n",
|
|
" 3.25\n",
|
|
" >>> median_grouped([1, 3, 3, 5, 7], interval=2)\n",
|
|
" 3.5\n",
|
|
" \n",
|
|
" This function does not check whether the data points are at least\n",
|
|
" ``interval`` apart.\n",
|
|
" \n",
|
|
" median_high(data)\n",
|
|
" Return the high median of data.\n",
|
|
" \n",
|
|
" When the number of data points is odd, the middle value is returned.\n",
|
|
" When it is even, the larger of the two middle values is returned.\n",
|
|
" \n",
|
|
" >>> median_high([1, 3, 5])\n",
|
|
" 3\n",
|
|
" >>> median_high([1, 3, 5, 7])\n",
|
|
" 5\n",
|
|
" \n",
|
|
" median_low(data)\n",
|
|
" Return the low median of numeric data.\n",
|
|
" \n",
|
|
" When the number of data points is odd, the middle value is returned.\n",
|
|
" When it is even, the smaller of the two middle values is returned.\n",
|
|
" \n",
|
|
" >>> median_low([1, 3, 5])\n",
|
|
" 3\n",
|
|
" >>> median_low([1, 3, 5, 7])\n",
|
|
" 3\n",
|
|
" \n",
|
|
" mode(data)\n",
|
|
" Return the most common data point from discrete or nominal data.\n",
|
|
" \n",
|
|
" ``mode`` assumes discrete data, and returns a single value. This is the\n",
|
|
" standard treatment of the mode as commonly taught in schools:\n",
|
|
" \n",
|
|
" >>> mode([1, 1, 2, 3, 3, 3, 3, 4])\n",
|
|
" 3\n",
|
|
" \n",
|
|
" This also works with nominal (non-numeric) data:\n",
|
|
" \n",
|
|
" >>> mode([\"red\", \"blue\", \"blue\", \"red\", \"green\", \"red\", \"red\"])\n",
|
|
" 'red'\n",
|
|
" \n",
|
|
" If there are multiple modes with same frequency, return the first one\n",
|
|
" encountered:\n",
|
|
" \n",
|
|
" >>> mode(['red', 'red', 'green', 'blue', 'blue'])\n",
|
|
" 'red'\n",
|
|
" \n",
|
|
" If *data* is empty, ``mode``, raises StatisticsError.\n",
|
|
" \n",
|
|
" multimode(data)\n",
|
|
" Return a list of the most frequently occurring values.\n",
|
|
" \n",
|
|
" Will return more than one result if there are multiple modes\n",
|
|
" or an empty list if *data* is empty.\n",
|
|
" \n",
|
|
" >>> multimode('aabbbbbbbbcc')\n",
|
|
" ['b']\n",
|
|
" >>> multimode('aabbbbccddddeeffffgg')\n",
|
|
" ['b', 'd', 'f']\n",
|
|
" >>> multimode('')\n",
|
|
" []\n",
|
|
" \n",
|
|
" pstdev(data, mu=None)\n",
|
|
" Return the square root of the population variance.\n",
|
|
" \n",
|
|
" See ``pvariance`` for arguments and other details.\n",
|
|
" \n",
|
|
" >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])\n",
|
|
" 0.986893273527251\n",
|
|
" \n",
|
|
" pvariance(data, mu=None)\n",
|
|
" Return the population variance of ``data``.\n",
|
|
" \n",
|
|
" data should be a sequence or iterable of Real-valued numbers, with at least one\n",
|
|
" value. The optional argument mu, if given, should be the mean of\n",
|
|
" the data. If it is missing or None, the mean is automatically calculated.\n",
|
|
" \n",
|
|
" Use this function to calculate the variance from the entire population.\n",
|
|
" To estimate the variance from a sample, the ``variance`` function is\n",
|
|
" usually a better choice.\n",
|
|
" \n",
|
|
" Examples:\n",
|
|
" \n",
|
|
" >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]\n",
|
|
" >>> pvariance(data)\n",
|
|
" 1.25\n",
|
|
" \n",
|
|
" If you have already calculated the mean of the data, you can pass it as\n",
|
|
" the optional second argument to avoid recalculating it:\n",
|
|
" \n",
|
|
" >>> mu = mean(data)\n",
|
|
" >>> pvariance(data, mu)\n",
|
|
" 1.25\n",
|
|
" \n",
|
|
" Decimals and Fractions are supported:\n",
|
|
" \n",
|
|
" >>> from decimal import Decimal as D\n",
|
|
" >>> pvariance([D(\"27.5\"), D(\"30.25\"), D(\"30.25\"), D(\"34.5\"), D(\"41.75\")])\n",
|
|
" Decimal('24.815')\n",
|
|
" \n",
|
|
" >>> from fractions import Fraction as F\n",
|
|
" >>> pvariance([F(1, 4), F(5, 4), F(1, 2)])\n",
|
|
" Fraction(13, 72)\n",
|
|
" \n",
|
|
" quantiles(data, *, n=4, method='exclusive')\n",
|
|
" Divide *data* into *n* continuous intervals with equal probability.\n",
|
|
" \n",
|
|
" Returns a list of (n - 1) cut points separating the intervals.\n",
|
|
" \n",
|
|
" Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles.\n",
|
|
" Set *n* to 100 for percentiles which gives the 99 cuts points that\n",
|
|
" separate *data* in to 100 equal sized groups.\n",
|
|
" \n",
|
|
" The *data* can be any iterable containing sample.\n",
|
|
" The cut points are linearly interpolated between data points.\n",
|
|
" \n",
|
|
" If *method* is set to *inclusive*, *data* is treated as population\n",
|
|
" data. The minimum value is treated as the 0th percentile and the\n",
|
|
" maximum value is treated as the 100th percentile.\n",
|
|
" \n",
|
|
" stdev(data, xbar=None)\n",
|
|
" Return the square root of the sample variance.\n",
|
|
" \n",
|
|
" See ``variance`` for arguments and other details.\n",
|
|
" \n",
|
|
" >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75])\n",
|
|
" 1.0810874155219827\n",
|
|
" \n",
|
|
" variance(data, xbar=None)\n",
|
|
" Return the sample variance of data.\n",
|
|
" \n",
|
|
" data should be an iterable of Real-valued numbers, with at least two\n",
|
|
" values. The optional argument xbar, if given, should be the mean of\n",
|
|
" the data. If it is missing or None, the mean is automatically calculated.\n",
|
|
" \n",
|
|
" Use this function when your data is a sample from a population. To\n",
|
|
" calculate the variance from the entire population, see ``pvariance``.\n",
|
|
" \n",
|
|
" Examples:\n",
|
|
" \n",
|
|
" >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]\n",
|
|
" >>> variance(data)\n",
|
|
" 1.3720238095238095\n",
|
|
" \n",
|
|
" If you have already calculated the mean of your data, you can pass it as\n",
|
|
" the optional second argument ``xbar`` to avoid recalculating it:\n",
|
|
" \n",
|
|
" >>> m = mean(data)\n",
|
|
" >>> variance(data, m)\n",
|
|
" 1.3720238095238095\n",
|
|
" \n",
|
|
" This function does not check that ``xbar`` is actually the mean of\n",
|
|
" ``data``. Giving arbitrary values for ``xbar`` may lead to invalid or\n",
|
|
" impossible results.\n",
|
|
" \n",
|
|
" Decimals and Fractions are supported:\n",
|
|
" \n",
|
|
" >>> from decimal import Decimal as D\n",
|
|
" >>> variance([D(\"27.5\"), D(\"30.25\"), D(\"30.25\"), D(\"34.5\"), D(\"41.75\")])\n",
|
|
" Decimal('31.01875')\n",
|
|
" \n",
|
|
" >>> from fractions import Fraction as F\n",
|
|
" >>> variance([F(1, 6), F(1, 2), F(5, 3)])\n",
|
|
" Fraction(67, 108)\n",
|
|
"\n",
|
|
"DATA\n",
|
|
" __all__ = ['NormalDist', 'StatisticsError', 'correlation', 'covariance...\n",
|
|
"\n",
|
|
"FILE\n",
|
|
" /Users/jamesturk/.local/share/uv/python/cpython-3.10.15-macos-aarch64-none/lib/python3.10/statistics.py\n",
|
|
"\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import statistics\n",
|
|
"help(statistics)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ce952a10",
|
|
"metadata": {},
|
|
"source": [
|
|
"### import `modulename`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "5db28bf9-2ff6-4847-bd18-ae46ea99b57c",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"['Counter',\n",
|
|
" 'Decimal',\n",
|
|
" 'Fraction',\n",
|
|
" 'LinearRegression',\n",
|
|
" 'NormalDist',\n",
|
|
" 'StatisticsError',\n",
|
|
" '__all__',\n",
|
|
" '__builtins__',\n",
|
|
" '__cached__',\n",
|
|
" '__doc__',\n",
|
|
" '__file__',\n",
|
|
" '__loader__',\n",
|
|
" '__name__',\n",
|
|
" '__package__',\n",
|
|
" '__spec__',\n",
|
|
" '_coerce',\n",
|
|
" '_convert',\n",
|
|
" '_exact_ratio',\n",
|
|
" '_fail_neg',\n",
|
|
" '_find_lteq',\n",
|
|
" '_find_rteq',\n",
|
|
" '_isfinite',\n",
|
|
" '_normal_dist_inv_cdf',\n",
|
|
" '_ss',\n",
|
|
" '_sum',\n",
|
|
" 'bisect_left',\n",
|
|
" 'bisect_right',\n",
|
|
" 'correlation',\n",
|
|
" 'covariance',\n",
|
|
" 'erf',\n",
|
|
" 'exp',\n",
|
|
" 'fabs',\n",
|
|
" 'fmean',\n",
|
|
" 'fsum',\n",
|
|
" 'geometric_mean',\n",
|
|
" 'groupby',\n",
|
|
" 'harmonic_mean',\n",
|
|
" 'hypot',\n",
|
|
" 'itemgetter',\n",
|
|
" 'linear_regression',\n",
|
|
" 'log',\n",
|
|
" 'math',\n",
|
|
" 'mean',\n",
|
|
" 'median',\n",
|
|
" 'median_grouped',\n",
|
|
" 'median_high',\n",
|
|
" 'median_low',\n",
|
|
" 'mode',\n",
|
|
" 'multimode',\n",
|
|
" 'namedtuple',\n",
|
|
" 'numbers',\n",
|
|
" 'pstdev',\n",
|
|
" 'pvariance',\n",
|
|
" 'quantiles',\n",
|
|
" 'random',\n",
|
|
" 'repeat',\n",
|
|
" 'sqrt',\n",
|
|
" 'stdev',\n",
|
|
" 'tau',\n",
|
|
" 'variance']"
|
|
]
|
|
},
|
|
"execution_count": 3,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"dir(statistics)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "09c27388",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Help on built-in function cos in module math:\n",
|
|
"\n",
|
|
"cos(x, /)\n",
|
|
" Return the cosine of x (measured in radians).\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"-1.0"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"import math\n",
|
|
"\n",
|
|
"help(math.cos)\n",
|
|
"math.cos(math.pi)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 4,
|
|
"id": "4b76400a-82af-4ce6-91a9-2b4c4e125e5b",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Help on class repeat in module itertools:\n",
|
|
"\n",
|
|
"class repeat(builtins.object)\n",
|
|
" | repeat(object [,times]) -> create an iterator which returns the object\n",
|
|
" | for the specified number of times. If not specified, returns the object\n",
|
|
" | endlessly.\n",
|
|
" | \n",
|
|
" | Methods defined here:\n",
|
|
" | \n",
|
|
" | __getattribute__(self, name, /)\n",
|
|
" | Return getattr(self, name).\n",
|
|
" | \n",
|
|
" | __iter__(self, /)\n",
|
|
" | Implement iter(self).\n",
|
|
" | \n",
|
|
" | __length_hint__(...)\n",
|
|
" | Private method returning an estimate of len(list(it)).\n",
|
|
" | \n",
|
|
" | __next__(self, /)\n",
|
|
" | Implement next(self).\n",
|
|
" | \n",
|
|
" | __reduce__(...)\n",
|
|
" | Return state information for pickling.\n",
|
|
" | \n",
|
|
" | __repr__(self, /)\n",
|
|
" | Return repr(self).\n",
|
|
" | \n",
|
|
" | ----------------------------------------------------------------------\n",
|
|
" | Static methods defined here:\n",
|
|
" | \n",
|
|
" | __new__(*args, **kwargs) from builtins.type\n",
|
|
" | Create and return a new object. See help(type) for accurate signature.\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"help(statistics.repeat)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9d5f6a9f",
|
|
"metadata": {},
|
|
"source": [
|
|
"### `help` and `dir`\n",
|
|
"\n",
|
|
"`help` can be called on functions or modules and returns their docstring\n",
|
|
"\n",
|
|
"`dir` can be called on any object and returns all properties"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "164caf5b",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Help on built-in function cos in module math:\n",
|
|
"\n",
|
|
"cos(x, /)\n",
|
|
" Return the cosine of x (measured in radians).\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"#help(math)\n",
|
|
"help(math.cos)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"id": "58c6b06f",
|
|
"metadata": {
|
|
"tags": []
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#dir(math)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0128b8f4",
|
|
"metadata": {},
|
|
"source": [
|
|
"### from `modulename` import `thing`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 8,
|
|
"id": "87bef5f0",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"39.4"
|
|
]
|
|
},
|
|
"execution_count": 8,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"from statistics import mean\n",
|
|
"\n",
|
|
"mean([34, 44, 16, 21, 82])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"id": "1604f2d7",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Help on function mode in module statistics:\n",
|
|
"\n",
|
|
"mode(data)\n",
|
|
" Return the most common data point from discrete or nominal data.\n",
|
|
" \n",
|
|
" ``mode`` assumes discrete data, and returns a single value. This is the\n",
|
|
" standard treatment of the mode as commonly taught in schools:\n",
|
|
" \n",
|
|
" >>> mode([1, 1, 2, 3, 3, 3, 3, 4])\n",
|
|
" 3\n",
|
|
" \n",
|
|
" This also works with nominal (non-numeric) data:\n",
|
|
" \n",
|
|
" >>> mode([\"red\", \"blue\", \"blue\", \"red\", \"green\", \"red\", \"red\"])\n",
|
|
" 'red'\n",
|
|
" \n",
|
|
" If there are multiple modes with same frequency, return the first one\n",
|
|
" encountered:\n",
|
|
" \n",
|
|
" >>> mode(['red', 'red', 'green', 'blue', 'blue'])\n",
|
|
" 'red'\n",
|
|
" \n",
|
|
" If *data* is empty, ``mode``, raises StatisticsError.\n",
|
|
"\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"import statistics\n",
|
|
"help(statistics.mode)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "47280629-2406-4730-b04e-b2df88c37b14",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Ed Post on importing your code from IPython: \n",
|
|
"\n",
|
|
"https://edstem.org/us/courses/68016/discussion/5533114\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 3,
|
|
"id": "1b80b290",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#dir(__builtins__)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "361acaa7",
|
|
"metadata": {},
|
|
"source": [
|
|
"## module conventions\n",
|
|
"\n",
|
|
"Named in snake_case, typically concise.\n",
|
|
"\n",
|
|
"Convention is to use underscore prefix for modules intended to be internal:\n",
|
|
"\n",
|
|
"`import _util`\n",
|
|
"\n",
|
|
"Avoid built-in module names, `fast_math` not `math`."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 7,
|
|
"id": "57d95a79",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#from .. import symbol"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "29519f78",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.10.15"
|
|
},
|
|
"toc": {
|
|
"base_numbering": 1,
|
|
"nav_menu": {},
|
|
"number_sections": false,
|
|
"sideBar": true,
|
|
"skip_h1_title": false,
|
|
"title_cell": "Table of Contents",
|
|
"title_sidebar": "Contents",
|
|
"toc_cell": false,
|
|
"toc_position": {},
|
|
"toc_section_display": true,
|
|
"toc_window_display": true
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|