Python packaging, environment and dependency management
“import antigravity” raises an exception if you want CUDA antigravity
April 18, 2011 — February 18, 2025
Suspiciously similar content
How do I install the right versions of everything for some Python code I am developing? How do I deploy it robustly and reliably? How do I share it with others? How do I minimise the tedious manual labour of doing the above?
As with many other vibrant, dynamic languages, maintaining consistent project dependencies in Python is a complicated mess. See also software package managers, software dependency managers and intractable standards battles…
Python is especially terrible for this—IMO, worse even than the notoriously bad node.js
. While node.js
has been criticised for settling on a bad package manager, Python settled on several bad package managers. It is confusing and chaotic, inconvenient and dangerous. In my experience, this fact is the single biggest barrier to entry for new Python users, both technically and socio-culturally, because all the systems are plagued by long-running arguments, some of which are highly personal. To use a language, you should not have to develop opinions about so many community disputes. Here is one angry overview of the situation.
Although Python is one of the world’s most popular programming languages, there are no easy answers to basic questions about how to install it, and sometimes even the hard answers are insane.
Harm reduction tips follow.
1 Things we can safely ignore
In the before-times, many Python packaging standards existed. AFAICT, unless migrating extremely old code or performing digital archaeology, I should ignore everything about these.
The following are all deprecated or irrelevant: distutils
, easy_install
, virtualenv
(superseded by venv
)… Any HOWTOs that include them are probably not going to be useful.
Are you considering rye
? Rye is kinda merging with uv
so maybe you should consider uv
instead.
Next decision point: Are you doing machine learning on the GPU? If so, go to the GPU quagmire section, because that is some stuff you cannot ignore.
If you are CUDA-free, some of the more horrible complexities can be ignored. If I were you, o blessed innocent, I would simply use uv if I want to be modern. I should be modern, indeed, because tradition has even less to recommend it than the awkward and confusing modernity.
That said, I would not mindlessly follow the latest fad because that strategy is dangerous too. If I took the latest fad every time I started a new project, I would have migrated between conda, flit, poetry, uv, mamba, pipenv, spack, robocorp, pdm and would not have had any time to write code. If something is new, consider ignoring it for a year or two.
2 pip
-like Systems
pip
is the default Python package installer. It is best invoked as
Note that we can use pip
to install packages outside of a virtual environment. It will happily execute outside an active environment without complaint.
But don’t do that. Doing so does extremely weird things, installing possibly-conflicting packages into a global Python. I have no idea what it would be good for, apart from creating confusing errors and introducing bugs. Depending on which platform I am on, it will either work, fail, or introduce subtle problems that I will notice later when other things break. I have not found a use case for installing packages outside a contained virtual environment in more than a decade of Python programming. Maybe in a Docker container or something?
Anyway, read on for the important bit.
2.1 pip
+ venv
venv
is the built-in Python virtual environment system in Python 3, replacing virtualenv
which we still find documented about the place. It creates self-contained environments that do not interfere with each other, which is what most people want.
venv works, for all that I would like it to work more smoothly. It is a built-in Python virtual environment system in Python 3. While it doesn’t support Python 2 (but, also, let Python 2 go unless someone is paying you money to keep a grip on it), it does fix various problems, e.g. it supports framework Python on macOS which is important for GUIs, and is covered by the Python docs in the Python virtual environment introduction. venv
is a good default choice, widely supported and adequate, if not awesome, workflow.
# Create venv in the current folder
python3 -m venv ./venv --prompt some_arbitrary_name
# or if we want to use system packages:
python3 -m venv ./venv --prompt some_arbitrary_name --system-site-packages
# Use venv from fish OR
source ./venv/bin/activate.fish
# Use venv from bash
source ./venv/bin/activate
Hereafter, I assume we are in an active venv
. Now we use pip
. I always begin by upgrading pip
itself and installing wheel
which is some bit of installation infrastructure that is helpful in practice. Thereafter, everything else should install more correctly, except when it doesn’t.
To snapshot dependencies in requirement.txt
:
I do not recommend using the freeze
command except as a first draft. It is too specific and includes very precise version numbers and obscure, locally specific sub-dependencies. Best keep a tally of the actual hard dependencies and let pip
sort out the details.
To restore dependencies from a requirements.txt
:
Version specification in requirements.txt
looks something like this:
MyProject
YourProject == 1.3
SomeProject >= 1.2, < 2.0
SomeOtherProject[foo, bar]
OurProject ~= 1.4.2
TheirProject == 5.4 ; python_version < '3.8'
HerProject ; sys_platform == 'win32'
requests [security] >= 2.8.1, == 2.8.* ; python_version < "2.7"
The ~=
is a handy lazy shortcut; it permits point releases, but not minor releases, so e.g. ~=1.3.0
will also satisfy itself with version 1.3.9
but not 1.4.0
.
Gotcha: pip
’s requirements.txt
does not actually specify the version of Python itself when you install from it, although you might think it from the python_version
specifier. See Python versions to see how to stipulate the Python version at package development time.
2.2 uv
uv is “an extremely fast Python package and project manager, written in Rust.” It is getting strong reviews, e.g. Loopwerk: Revisiting uv. Source at astral-sh/uv.
Claimed highlights:
- 🚀 A single tool to replace
pip
,pip-tools
,pipx
,poetry
,pyenv
,twine
,virtualenv
, and more.- ⚡️ 10–100x faster than
pip
.- 🐍 Installs and manages Python versions.
- 🛠️ Runs and installs Python applications.
- ❇️ Runs scripts, with support for inline dependency metadata.
- 🗂️ Provides comprehensive project management, with a universal lockfile.
- 🔩 Includes a pip-compatible interface for a performance boost with a familiar CLI.
- 🏢 Supports Cargo-style workspaces for scalable projects.
- 💾 Disk-space efficient, with a global cache for dependency deduplication.
- ⏬ Installable without Rust or Python via
curl
orpip
.- 🖥️ Supports macOS, Linux, and Windows.
uv is backed by Astral, the creators of Ruff.
Notably, uv
has deep, highly specific pytorch support, so if you are doing ML on the GPU, this might be a good choice.
One thing it does not provide is a build back-end; we need to choose one. Who knew that was a thing? I did not know there was a build-front-end/vs build-backend distinction.
The uvx command invokes a tool without installing it.[…] Tools are installed into temporary, isolated environments when using uvx.
2.2.1 uv
installation
We can install uv
many ways. I prefer pipx
on Linux and homebrew on macOS.
Shell completions require manual intervention.:
# Determine your shell (e.g., with `echo $SHELL`), then run one of:
echo 'eval "$(uv generate-shell-completion bash)"' >> ~/.bashrc
echo 'eval "$(uv generate-shell-completion zsh)"' >> ~/.zshrc
echo 'uv generate-shell-completion fish | source' >> ~/.config/fish/config.fish
Annoyingly, uv
will not autocomplete project entry points, so e.g. uv run
will not autocomplete with the list of permissible scripts.
2.2.2 uv
project setup
uv
requires us to specify a build system in the pyproject.toml
file if we want to “install” the package we are currently working on (e.g. do relative imports, have sub-folders in the source tree…). It does not provide one per default. To get one, create the project as one of the following:
If the env is already created, we need to add the build system manually into pyproject.toml
; some useful ones are included.
Alternatively, we can set tool.uv.package = true
in pyproject.toml
.
2.2.3 Migrating to uv
- mkniewallner/migrate-to-uv: Migrate a project from Poetry/Pipenv/pip-tools/pip to uv package manager
- poetry-to-uv is a script which automates (some of) the process of migrating from poetry. (This one was a little fragile for me, but it might work for you.)
- uv-migrator: A New Tool to Easily Migrate Your Python Projects to UV Package Manager : r/Python
- python - How to migrate from Poetry to UV package manager? - Stack Overflow
- Sebastián Ramírez: “BTW, you can migrate from Poetry to uv in 5-10 min 🚀 uv uses the standard pyproject.toml format used by almost all others, Hatch, PDM, Flit… (except Poetry) 🤓 You can use another tool (PDM) to migrate the configs to this standard 📜 It’s 4 steps.
- Loopwerk: How to migrate your Poetry project to uv.
- Migrating to uv - Instructor.
2.3 Poetry
No! Wait! The new new new hipness is poetry
. All the other previous hipnesses were not the real eternal ultimate hipness that transcends time. I know we said this every previous time a new Python packaging system came out, but this time it’s real and our love will last forever ONO.
Surprise twist: it turns out this love was not actually eternal and my ardour for poetry has cooled. Poetry no longer has an edge over other similar projects in terms of function and has a problematic history of getting logjammed; see Issue #4595: Governance—or, “What do we do with all these pull requests?”.
It might be usable if your needs are modest or you are prepared to jump into the project discord, which seems to be where the poetry hobbyists organise, but since I want to use this project merely incidentally, as a tool to develop something else, hobbyist levels of engagement are not something I can participate in. poetry
is not ready for prime-time, at least for my use-case.
Note also that poetry is having difficulty staying current with the (admittedly annoying) local versions, as made famous by CUDA-supporting packages. There is an example of the kind of antics that make it work below.
Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on, and it will manage (install/update) them for you.
From the introduction:
Packaging systems and dependency management in Python are rather convoluted and hard to understand for newcomers. Even for seasoned developers, it might be cumbersome at times to create all files needed in a Python project:
setup.py
,requirements.txt
,setup.cfg
,MANIFEST.in
and the newly addedPipfile
.So I wanted a tool that would limit everything to a single configuration file to do: dependency management, packaging and publishing.
It takes inspiration from tools that exist in other languages, like
composer
(PHP) orcargo
(Rust).And, finally, I started
poetry
to bring another exhaustive dependency resolver to the Python community apart from Conda’s.What about Pipenv?
In short: I do not like the CLI it provides, or some of the decisions made, and I think we can make a better and more intuitive one.
Low-key dissing on similarly-dysfunctional competitors is an important part of Python packaging.
Lazy install is via this terrifying command line (do not run if you do not know what this does):
Poetry could be regarded as a similar tool to pipenv
, in that it (by default, but not necessarily) manages the dependencies in a local venv
. It has a much more full-service approach than systems built on pip
. For example, it has its own dependency resolver, which uses modern dependency metadata but also works with previous dependency specifications by brute force if needed. It separates specified dependencies from the ones that it resolves in practice, which means dependencies seem to transport much better than conda, which generally requires you to hand-maintain a special dependency file full of just the stuff you actually wanted. In practice, its many small conveniences and thoughtful workflow are helpful. For example, it sets up the current package for development by default so that imports work as similarly as possible across this local environment and when it is distributed to users.
poetry shell
finds the wrong venv
Yes, it does this for me sometimes too. It is not consistent, though, and seems to be a particular shell environment that causes this glitch.
Force it to use the correct venv with:
In fact, they have removed poetry shell
as of 2.0.0 because it is no good.
The new way is something like:
2.3.1 CUDA and other local versions in poetry
As mentioned,below, poetry does not support installing build variants/profiles, which means I cannot install GPU software, and thus in practice it is burdensome to use for machine learning applications. There are workarounds: Instructions for installing PyTorch shows a representative installation specification for PyTorch.
[tool.poetry.dependencies]
python = "^3.10"
numpy = "^1.23.2"
torch = { version = "1.12.1", source="torch"}
torchaudio = { version = "0.12.1", source="torch"}
torchvision = { version = "0.13.1", source="torch"}
[[tool.poetry.source]]
name = "torch"
url = "https://download.pytorch.org/whl/cu116"
secondary = true
Note that this produces various errors and downloads gigabytes of supporting files unnecessarily, but it eventually works. It was too burdensome for my workflow, so I switched back to pip.
There is a new way for torch 2.0 and later:
poetry source add --priority=supplemental torch https://download.pytorch.org/whl/cu118
poetry add torch==2.0.1+cu118 --source torch
or:
poetry add "https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl"
I have not tried it.
poetry and PyTorch notionally play nice in PyTorch 2.1, in the sense that PyTorch 2.1 is supposed to be installable with poetry, with CUDA. It is not yet clear to me how we would set up PyTorch so it works either with or without CUDA.
2.3.2 Jupyter kernels from my poetry env
Easy:
2.3.3 Dev dependencies
Poetry does not specifically support dev dependencies. What they do support are generic dependency groups which might happen to be dev dependencies but would say “don’t label me, man.”
[tool.poetry.group.dev] # This part can be left out
optional = true
[tool.poetry.group.dev.dependencies]
ipdb = "~0.13.13"
ipykernel = "~6.29.4"
scalene = "~1.5.41"
Now, we install:
2.4 pipenv
⛔️⛔️UPDATE⛔️⛔️: Note that the pipenv
system does not support “local versions” and is therefore unusable for machine learning applications. This project is dead to me. (Bear in mind that my opinions will become increasingly outdated depending on when you read this.)
venv
has a higher-level, er, …wrapper (?) interface called pipenv.
Pipenv is a production-ready tool that aims to bring the best of all packaging worlds to the Python world. It harnesses Pipfile, pip, and virtualenv into one single command.
I switched to pipenv from poetry because it looked less chaotic than poetry. I think it is, although not by much.
HOWEVER, it is still pretty awful for my use-case. To be honest, I’d just use plain pip and requirements.txt
, which, while primitive and broken, are at least broken and primitive in a well-understood way.
At the time of writing, the pipenv website was 3 weeks into an outage, because dependency management is a quagmire of sadness and comically broken management with terrible Bus factor. However, the backup docs site is semi-functional, albeit too curt to be useful and, as far as I can tell, outdated. The documentation site inside GitHub is readable. See also an introduction showing pipenv and venv used together.
The dependency resolver is, as the poetry devs point out, broken in its own special ways. The procedure to install modern ML frameworks, for example, is gruelling.
For my system, important settings are:
To get the venv inside the project (required for sanity in my HPC) I need the following:
Pipenv will automatically load dotenv files, which is a nice touch.
2.5 Zipapp/.pyz
Python supports zipped executable code bundles, under the name zipapp. Does that mean the problems are all solved with regard to Python packaging? No, because not only can it not bundle binary dependencies, but it still needs the system Python interpreter to exist and be compatible with the target system. Still, a cool trick that is occasionally useful for people who are not me.
2.6 pipx
Pro tip: pipx:
pipx is made specifically for application installation, as it adds isolation yet still makes the apps available in your shell: pipx creates an isolated environment for each application and its associated packages.
That is, pipx is an application that installs global applications for you.
2.7 Rye
2.8 PDM
is a modern Python package and dependency manager supporting the latest PEP standards. But it is more than a package manager. It boosts your development workflow in various aspects. The most significant benefit is it installs and manages packages in a similar way to
npm
that doesn’t need to create a virtualenv at all!Feature highlights:
2.9 Flit
Make the easy things easy and the hard things possible is an old motto from the Perl community. Flit is focused on the easy things part of that, and leaves the hard things up to other tools.
Specifically, the easy things are pure Python packages with no build steps (neither compiling C code, nor bundling Javascript, etc.). The vast majority of packages on PyPI are like this: plain Python code, with maybe some static data files like icons included.
It’s easy to underestimate the challenges involved in distributing and installing code, because it seems like you just need to copy some files into the right place. There’s a whole lot of metadata and tooling that has to work together around that fundamental step. But with the right tooling, a developer who wants to release their code doesn’t need to know about most of that.
What, specifically, does Flit make easy?
flit init
helps you set up the information Flit needs about your package.- Subpackages are automatically included: you only need to specify the top-level package.
- Data files within a package directory are automatically included. Missing data files have been a common packaging mistake with other tools.
- The version number is taken from your package’s
__version__
attribute, so it always matches the version that tools like pip see.flit publish
uploads a package to PyPI, so you don’t need a separate tool to do this.Setuptools, the most common tool for Python packaging, now has shortcuts for many of the same things. But it has to stay compatible with projects published many years ago, which limits what it can do by default.
2.10 Hatch
Hatch is a modern, extensible Python project manager.
Features:
- Standardised build system with reproducible builds by default
- Robust environment management with support for custom scripts
- Easy publishing to PyPI or other indexes
- Version management
- Configurable project generation with sane defaults
- Responsive CLI, ~2-3x faster than equivalent tools
3 Conda-like systems
A parallel system to pip, which is a generalisation and kinda-sorta compatible.
3.1 conda
Designed to handle the heavy lifting of installing Python software with hefty compiled dependencies.
There are two parts here with two separate licences:
- the Anaconda Python distribution
- the conda Python package manager
I am slightly confused about how these two relate. The distinction is important since licensing Anaconda proper can be expensive. See, e.g.
- Anaconda is not free for commercial use (any more) so what are the alternatives?
- Conda/Anaconda no longer free to use?
- See also mamba or Pixi below, which aim to reduce licensing risk by reimplementing the more licensing-vulnerable parts of the Anaconda ecosystem and improve in other ways.
- Can I install an entirely non-Anaconda Python distribution through the conda package manager?
Some things that are (or were?) painful to install by pip are painless via conda. Conversely, some things that are painful to install by conda are easy by pip.
I recommend figuring out which pain points are worse in this system through trial and error. If conda does not bring substantial value, choose pip. Sometimes it would be worth understanding conda’s current licensing and future licensing risks, but it would have to be a strong win.
This is an updated recommendation; previously I preferred conda — pip used to be much worse, and Anaconda’s licensing used to be less restrictive.
3.1.1 Setup
Download, e.g. Linux x64 Miniconda from the download page.
bash Miniconda3-latest-Linux-x86_64.sh
# login/logout here
# or do something like `exec bash -` if you are fancy
# Less aggressive conda
conda config --set auto_activate_base false
# conda for fish users
conda init fish
Alternatively, try miniforge: A conda-forge distribution or fastchan, fast.ai’s conda mini-distribution.
curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
bash Mambaforge-$(uname)-$(uname -m).sh
It is worth installing one of these minimalist distros rather than the default Anaconda distro since Anaconda default is gigantic but does not have what I need, so it simply wastes space. Some of these might have less onerous licensing than the mainline? I am not sure.
To install something with tricky dependencies like ViTables, I do this:
Aside: I use fish shell, so I need to do some extra setup. Specifically, I add the line
into ~/.config/fish/config.fish
. This is now automated by
For Jupyter compatibility, I need:
3.1.2 Dependencies
The main selling point of conda is that specifying dependencies for ad hoc Python scripts or packages is easy.
Conda has a slightly different dependency management and packaging workflow from the pip ecosystem. See, e.g. Tim Hopper’s explanation of this environment.yml
malarkey, or the creators’ rationale and manual.
One exports the current conda environment config, by convention, into environment.yml
.
Which to use out of conda env create
and conda create
? If it involves .yaml
environment configs, then conda env create
. Confusing these two is a quagmire of opaque errors, poor documentation and sadness.
One point of friction I quickly encountered is that the automatically created environments are not generic. I might specify from the command line a package that I know will install sanely on any platform (e.g., matplotlib
), but the version stored in the environment file is specific to where I installed it (macOS, Linux, Windows…) and architecture (x64, ARM…). For GPU software, there are even more incompatibilities because there are more choices of architecture. So to share environments with collaborators on different platforms, I need to… be them, I guess? Buy them new laptops that match my laptop? I don’t know, this seems weird. Maybe I’m missing something.
Conda will fill up my hard disk if not regularly disciplined via conda clean.
If I have limited space in my home directory, I might need to move the cache:
configure PKGS_DIR
in ~/.condarc
:
Possibly also required?
I might also want to avoid installing the gigantic MKL library, not being a fan. It comes baked in by default for most Anaconda installs. I can usually disable it by request:
Clearly, the packagers do not test this configuration often because it fails sometimes even for packages that notionally do not need MKL. Worth attempting, however. Between the various versions and installed copies, MKL alone used about 10GB on my Mac when I last checked. I also try to reduce the number of copies of MKL by starting from miniconda as my base Anaconda distribution, cautiously adding things as I need them.
3.1.3 Local environment
A local environment folder is more isolated, keeping packages in a local folder rather than keeping all environments somewhere global, where I need to remember what I named them all.
conda config --set env_prompt '({name})'
conda env create --prefix ./env/myenv --file environment_linux.yml
conda activate ./env/myenv
Gotcha: in fish shell the first line needs to be
I am not sure why. As far as I know, fish command substitution does not happen inside strings. Either way, this will add the line
to .condarc
.
3.2 Mamba
Mamba is a fully compatible drop-in replacement for conda. It was started in 2019 by Wolf Vollprecht.
The introductory blog post is an enlightening read, which also explains conda better than conda does. The mamba 1.0 release announcement is also very good. See mamba-org/mamba for more. The fact that the authors of this system can articulate their ideas is a major selling point in my opinion.
It explicitly targets package installation for less mainstream configurations such as R and vscode development environments. In fact, it is not even Python-specific.
Provide a convenient way to install developer tools in VSCode workspaces from conda-forge with micromamba. Get NodeJS, Go, Rust, Python or JupyterLab installed by running a single command.
It also inherits some of the debilities of conda, e.g. that dependencies are platform- and architecture-specific.
3.3 Pixi
7 Reasons to Switch from Conda to Pixi | prefix.dev:
…we’re solving conda users’ pain points with pixi – a package manager that’s 10x faster than conda, integrates with PyPI package world much more deeply, eliminates unnecessary steps, allows you to use tasks for collaboration, and more.
The goal of mamba was always to be a drop-in replacement for conda. With pixi, we are going a step further with a more opinionated workflow. Pixi is written in Rust, is up to 4x faster than micromamba, and natively supports lockfiles and cross-platform tasks.
3.4 Magic
Magic is a Pixi derivative that targets Modular’s python derivative Mojo.
- blog post: Why Magic?
3.5 Robocorp
Robocorp tools claim to make conda install more generic.
RCC is a command-line tool that allows you to create, manage, and distribute Python-based self-contained automation packages - or robots 🤖 as we call them.
Together with the robot.yaml configuration file,
rcc
provides the foundation to build and share automation with ease.In short, the RCC toolchain helps you to get rid of the phrase: “Works on my machine” so that you can actually build and run your robots more freely.
4 Build systems
pyproject.toml
now includes build system declarations. Python Packaging User Guide explains this:
Tools like pip and build do not actually convert your sources into a distribution package (like a wheel); that job is performed by a build backend. The build backend determines how your project will specify its configuration, including metadata (information about the project, for example, the name and tags that are displayed on PyPI) and input files. Build backends have different levels of functionality, such as whether they support building extension modules, and you should choose one that suits your needs and preferences.
You can choose from a number of backends; this tutorial uses Hatchling by default, but it will work identically with Setuptools, Flit, PDM, and others that support the
[project]
table for metadata.
Setuptools. Old. Popular for historical reasons.
Hatchling, the build backend from Hatch, seems to be the uv
default at the moment:
Poetry’s build system:
PDM’s build system seems popular for people with heavy C/C++ dependencies:
Flit’s build system (Looks like it only handles pure python?)
We presumably need to also install whichever build system we use.
5 Non-python-specific dependency managers
Does Python’s slapstick ongoing shambles of a failed consensus on dependency management system fill you with distrust? Do you have the vague feeling that perhaps you should use something else to manage Python since Python cannot manage itself? See generic dependency managers for an overview. Some of these are known to be an OK means of managing Python specifically, even though they are more general.
5.1 Spack
Supercomputing dep manager spack has Python-specific support.
PyPI has hundreds of thousands of packages that are not yet in Spack, and
pip
may be a perfectly valid alternative to using Spack. The main advantage of Spack overpip
is its ability to compile non-Python dependencies. It can also build cythonized versions of a package or link to an optimised BLAS/LAPACK library like MKL, resulting in calculations that run orders of magnitudes faster. Spack does not offer a significant advantage over other Python-management systems for installing and using tools like flake8 and sphinx. But if you need packages with non-Python dependencies like numpy and scipy, Spack will be very valuable to you.Anaconda is another great alternative to Spack and comes with its own
conda
package manager. Like Spack, Anaconda is capable of compiling non-Python dependencies. Anaconda contains many Python packages not yet in Spack, and Spack contains many Python packages not yet in Anaconda. The main advantage of Spack over Anaconda is its ability to choose a specific compiler and BLAS/LAPACK or MPI library. Spack also has better platform support for supercomputers and can build optimised binaries for your specific microarchitecture.
5.2 Meson
- meson-python uses The Meson Build system for Python. Is that… good?
6 Writing a package
Least nerdview guide: Vicki Boykis, Alice in Python projectland.
Simplest readable guide is python-packaging
PyPI Quick and Dirty, includes good tips such as using twine to make it more automatic.
Official docs are no longer awful but are slightly stale, and are especially perfunctory for compilation.
There is a community effort to document the issues of compiled packages in pypackaging-native (tldr it is hard)
Kenneth Reitz shows rather than tells with a heavily documented setup.py
Try Zed Shaw’s signature aggressively cynical and reasonably practical explanation of project structure, with bonus explication of how you should expect much time-wasting yak shaving if you want to do software.
- Or copy pyskel.
- Or generate a project structure with a templating/scaffolding system.
Updated: What the heck is
pyproject.toml
?
6.1 Scaffolding and templating
Generating all those files is boring. The Python packaging ecosystem has several tools to automate it.
6.1.1 Copier
Slightly newer than cookiecutter, copier is a more modern and flexible tool for generating project templates.
6.2 Documenting my package
Here are two famous options. I have used Sphinx, and it is adequate and well-integrated, but has its own markup language which the world outside of Python does not use. MkDocs seems less blessed by the mainstream Python foundation but also uses markdown which is more widely used and has a rich ecosystem of tools.
7 Python versions
If we are using conda then the Python version is handled for us, along with generic dependency managers. With pip
, we need to manage it ourselves. (Poetry is in between — it knows about Python versions but cannot install Python for us).
7.1 pyenv
I find pyenv
baffling as it interacts with all the other tools in the Python packaging ecosystem in a way that is not immediately obvious to me.
I prefer to avoid it completely and let uv
handle python versions, which it does in a relatively seamless way, without me needing to consider local and global versions and remember which sub-sub-version of python I compiled with what for which.
pyenv is the core tool of an ecosystem that eases and automates switching between Python versions. It manages Python and thus implicitly can be used as a manager for all the other managers.
BUT WHO MANAGES THE VIRTUALENV MANAGER MANAGER? Also, what is going on in this ecosystem of bits? Logan Jones explains:
- pyenv manages multiple versions of Python itself.
- virtualenv/venv manages virtual environments for a specific Python version.
- pyenv-virtualenv manages virtual environments across varying versions of Python.
Anyway, pyenv compiles a custom version of Python and as such is extremely isolated from everything else. An introduction with emphasis on my area: Intro to Pyenv for Machine Learning.
# initialize pyenv
pyenv init
# install a specific Python version
pyenv install 3.8.13
# ensure we can find that version
pyenv rehash
# switch to that version
pyenv shell 3.8.13
Of course, because this is adjacent to Python packaging, it is infected with the same brainworms. Everything immediately becomes complicated and confusing when I try to interact with the rest of the ecosystem, e.g.,
pyenv-virtualenvwrapper is different from
pyenv-virtualenv
, which provides extended commands likepyenv virtualenv 3.4.1 project_name
to directly help out with managing virtualenvs.pyenv-virtualenvwrapper
helps in interacting withvirtualenvwrapper
, butpyenv-virtualenv
provides more convenient commands, where virtualenvs are first-class pyenv versions, that can be (de)activated. That’s to say,pyenv
andvirtualenvwrapper
are still separated whilepyenv-virtualenv
is a nice combination.
Huh. I am already too bored to think. However, I did work out a command which installed a pyenv tensorflow with an isolated virtualenv:
brew install pyenv pyenv-virtualenv
pyenv install 3.8.6
pyenv virtualenv 3.8.6 tf2.4
pyenv activate tf2.4
pip install --upgrade pip wheel
pip install 'tensorflow-probability>=0.12' 'tensorflow<2.5' jupyter
For fish shell I needed to add some special lines to config.fish
:
set -x PYENV_ROOT $HOME/.pyenv
set -x PATH $PYENV_ROOT/bin $PATH
## fish <3.1
# status --is-interactive; and . (pyenv init -|psub)
# status --is-interactive; and . (pyenv virtualenv-init -|psub)
## fish >=3.1
status --is-interactive; and pyenv init - | source
status --is-interactive; and pyenv virtualenv-init - | source
For bash/zsh (resp. .bashrc
/.zshrc
) it is as follows:
8 Sorry, GPU/TPU/etc users
Users of GPUs must ignore any other options, no matter how attractive all the other options might seem at first glance. The stupid drudge work of venv
is the price of hardware support for now. Only pip and conda support hardware specification in practice.
UPDATE: poetry
now supports Pytorch with CUDA. uv
has had a crack at it too.
Terminology you need to learn: Many packages specify local versions for particular architectures as a part of their functionality. For example, pytorch comes in various flavours, which when using pip
, can be selected in the following fashion:
# CPU flavour
pip install torch==1.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
# GPU flavour
pip install torch==1.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
The local version is given by the +cpu
or +cu113
bit, and it changes what code will be executed when using these packages. Specifying a GPU version is essential for many machine learning projects (essential, that is, if I do not want my code to run orders of magnitude slower). The details of how this can be controlled with regard to the Python packaging ecosystem are somewhat contentious and complicated and thus not supported by any of the new wave options like poetry
or pipenv
. Brian Wilson argues,
During my dive into the open-source abyss that is ML packages and
+localVersions
I discovered lots of people have strong opinions about what it should not be and like to tell other people they’re wrong. Other people with opinions about what it could be are too afraid of voicing them lest there be some unintended consequence. PSF has asserted what they believe to be the intended state in PEP-440 (no local versions published) but the solution (PEP-459) is not an ML Model friendly solution because the installation providers (pip, pipenv, poetry) don’t have enough standardised hooks into the underlying hardware (cpu vs gpu vs cuda lib stack) to even understand which version to pull, let alone the Herculean effort it would take to get even just pytorch to update their package metadata.
There is no evidence that this logjam will resolve any time soon. However, it turns out that this machine learning thing is not going away, and ML projects use GPUs. It turns out that packaging projects with GPU code is hard. Since I do neural network stuff and thus use GPU/CPU versions of packages, this means that I can effectively ignore most of the Python environment alternatives on this page. The two that work are conda and pip. These all support a minimum viable local version package system de facto which does what we want. If you want something fancier, try containerization using a GPU-compatible system such as apptainer.