## Python Ecosystem

One of the reason's for Python's early success was its "batteries included" philosophy.

If you remember being a kid and getting a new toy, only to have to purchase (or charge) its batteries before playing-- the analogy here is that Python comes with everything you need to play (be productive) out of the box.

We've seen this with some built in modules:

- datetime
- math
- csv
- json
- functools
- itertools
- collections
- abc


The standard library contains tools for common (and less-common) file types, data structures, dealing with data from the web, a simple UI toolkit, and much more:

<https://docs.python.org/3/library/index.html>

These modules are written in Python or C, and distributed with Python itself. 

The standard library is great for many things but in practice it has a few shortcomings:

- can't possibly include *everything*
- typically only one way of doing things, and if an alternate approach is needed, an alternate library should be (example: streaming JSON)
- once a package is in standard library it is in effect *frozen* due to Python's strict backwards compatibility rules

## Beyond the Standard Library

It is helpful to have a wider ecosystem that all Python developers can share-- enter the Python Package Index or PyPI (typically pronounced pie-pee-eye to avoid confusing with PyPy pronounced pie-pie).

https://pypi.org

Over half a million projects released on PyPI, with billions of downloads a day.

We've seen packages that come from PyPI:

- rich
- httpx
- pandas
- polars
- networkx
- matplotlib
- altair
- Pillow
- Flask
- Django

### Important Note: Anyone can publish to PyPI

Installing a package means trusting its authors to some extent. 

Packages I ask you to install are packages I trust, and which have established reputations & teams.

If you find a small package you should be mindful of the fact that letting someone run code on your system means letting that person do whatever their code does. 

If in doubt:
- look for signs of activity on GitHub/etc.: a popular library with dozens of tutorials is one thing -- an obscure library only one person used may also be fine, but worth a bit of vetting
- look at the source code!
- ask someone! (James, TAs, etc.)

### Licensing

Code that is published & open source comes with a license, a set of rules saying what you may and may not do with it. Typically this prohibits redistribution of the work without the license, but in some cases may mean that your own work needs to be open source to use it.  **Using open source code without following the license is plagarism/theft and can come with serious consequences here and in any workplace since your employer would carry the legal burden.**

Make sure that the code that you are using is under a license that allows you to use it in the environment that you are in. That isn't much of an issue here in class, but in companies you may not be able to use certain licenses. 

## Python's Worst Flaw

A key part of Python's general philosophy is that there should be an obvious way to do things. It is generally acknowledged that we have fallen short when it comes to package management. 

The situation is improving, and you have already been using the latest & greatest tool. But as you venture out into the wider world of packages, you'll notice some rough edges for sure.

When you look at instructions, some libraries will tell you to:

- `pip install matplotlib`

You may also have used `conda` before, and done something similar.

In this class, we've been using `uv`, and you may also encounter:

- `pdm`
- `poetry`
- `pip-compile`
- `pipx`
- `pip-tools`

And probably a dozen other solutions to installing & managing packages.

*What's going on?*

### How Python Packages Work

Recall that python packages are just directories.  When you install a package you are getting something that resembles:

```
baking-pkg
├── baking
│   ├── __init__.py
│   ├── cli.py
│   ├── ingredients.py
│   ├── sizes.py
│   ├── units.py
│   └── utils.py
├── LICENSE
├── README.md
└── tests
    ├── test_baking.py
    ├── test_units.py
    └── test_utils.py
```

If I gave you these files and you put them on your desktop, you would find that you could only use them if you were in the same directory.  That is if you put your own files in `baking-pkg` you would be able to `import baking` but otherwise it would fail.

When you type an `import` statement, Python uses an **environment variable** a variable set at the operating system level, not specific to Python named `PYTHONPATH` to look up what directories it should search.

This variable contains a list of directories, which might resemble:

- /home/user/james/.poetry/global/
- /usr/lib/python3.10
- /usr/lib/python3.10/lib-dynload
- /usr/local/lib/python3.10/dist-packages
- /usr/lib/python3/dist-packages
- /home/user/james/.pipx/ruff/
- /home/user/james/.pipx/jupyter/

These directories are searched in order, so if I have a `math` library in `/home/user/james/.poetry/global/` it will supercede the built in math library in `/usr/lib/python3.10/`.

This means that can only have one library of a given name that is importable.

### Version Conflicts

In practice though, you might need to have multiple copies of a library installed on your system, imagine the following:


**Project 1 (MPCS 51042)**
- polars==1.14
- seaborn==2.0
    - matplotlib==3.0


**Project 2 (MPCS 52000)**
- polars==0.44
- matplotlib==2.0



**Project 3 (Work)**
- altair==5.0
- customlibrary
     - matplotlib==2.4
 
Your system would need 3 versions of `matplotlib` to be able to correctly run the code in question.

### Installing Packages (the wrong way)

If you use `pip` on its own, each time you install a set of packages, it will uninstall conflicts & replace the version with the latest.

By default `pip` installs to the `/usr/lib/python3.10/dist-packages/` directory (or equivalent), but wherever it installs on your PYTHONPATH would have the same issue.  We can only have one of a package installed at a time.

### Virtualenvs

For this reason, Python has the concept of `virtualenvs`.  These are directories of Python packages that are meant to only be added to PYTHONPATH when a given project is being used.

If we installed all of the packages for our three projects to three different directories:

```
.
├── proj1-venv
│   ├── matplotlib             # v3.0
│   ├── polars
│   └── seaborn
├── proj2-venv
│   ├── matplotlib             # v2.0
│   └── polars
└── proj3-venv
    ├── altair
    ├── customlibrary
    └── matplotlib             # v2.4
```

Now we can add the correct `venv` directory to our PYTHONPATH before running the appropriate command.

This is what `venv` does for us:
<https://docs.python.org/3/library/venv.html>

## `uv` (and other package managers)

In practice, managing a venv can be tedious and error-prone, which is why a suite of tools exists to help manage them for you.

In this class we have opted for `uv`, among the newest of these tools which (IMO) is the easiest to use yet.

`uv` demo:
- uv creates `.venv`
- `uv add` updates lockfile and venv
- `uv run` ensures that the correct venv is activated before Python starts

https://mpcs51042.netlify.app/guides/uv/

## Recommendation: Always start new projects with `uv init` (or similar!)

Never use `pip` or `venv` directly!