Python Environment in 2021 - Part 2, Python Package Management

This post is also available in: zh-tw

Welcome back to this series. The last post already introduced tools for Python runtime management, and this post will focus on tools that can help you manage Python packages for projects.

Background

pip is the package installer for Python, and you can find a lot of tutorials using pip with requirements.txt to create the Python environment. It's fine if you only use a few packages; or don't need to maintain the environment in the long term. But once your project gets bigger and bigger, you'll find it's very challenging to use only pip and requirements.txt for package management.

Main problem: Reproducibility

Some people only include the packages they need in requirements.txt but not all the underlying dependencies. However, the open-source packages evolve all the time, it's hard to guarantee you can create an identical environment next time.

A common solution is to add versions to requirements.txt or use pip freeze to dump the environment. But another problem comes up: how to add or delete a package without breaking the environment. pip doesn't do this for you. pip install a new package might update dependencies used in installed packages and break your environment.

So what is the solution? If you have experience in package management for other languages, you might know most modern package managers use two separate files to handle this:

A file with all direct dependencies
A lock file that contains all the dependencies (direct dependencies and all underlying dependencies)

We need to put these two files into the version control system, and then we can reproduce the environment anytime.

In the Python world, you can find many tools to achieve similar goals. In this post, I'll introduce pip-tools, Pipenv, Poetry, and conda-lock.

pip-tools

pip-tools is an open-source project maintained by a Python community called Jazzband. It provides two commands to keep your Python dependencies up-to-date:

pip-compile compiles your direct dependencies into a requirements.txt file.
pip-sync updates your virtualenv to match the packages in the compiled requirements.txt file.

pip-tools needs to work with virtualenv, and similar to pip, pip-tools must be installed to each virtualenv.

Usage

As written above, we need to install pip-tools in each virtual environment. This is different from other tools introduced in this post.

// If you are using pyenv
// Create a virtualenv 
pyenv virtualenv 3.9.1 pip-tools-test
// Activate
pyenv shell pip-tools-test
// Install!
python -m pip install pip-tools

You can also follow their installation instructions.

After installation, we can use commands from the virtualenv. By default, pip-compile will read your packages from setup.py or requirements.in, compile, and generate a requirements.txt to store all your dependencies. Let's say we need Django and Celery in our project, and we don't want to use Celery 5 now. We can add our needs to requirements.in.

# requirements.in
django
celery<5

If we execute pip-compile, we'll get requirements.txt like this:

#
# This file is autogenerated by pip-compile
# To update, run:
#
#    pip-compile
#
amqp==2.6.1
    # via kombu
asgiref==3.3.1
    # via django
billiard==3.6.3.0
    # via celery
celery==4.4.7
    # via -r requirements.in
django==3.1.7
    # via -r requirements.in
importlib-metadata==3.7.3
    # via kombu
kombu==4.6.11
    # via celery
pytz==2021.1
    # via
    #   celery
    #   django
sqlparse==0.4.1
    # via django
typing-extensions==3.7.4.3
    # via importlib-metadata
vine==1.3.0
    # via
    #   amqp
    #   celery
zipp==3.4.1
    # via importlib-metadata

It converts our needs and pins the package versions. You can also see where these dependencies come from in the comments. When we need to add or delete a package, we only need to edit requirements.in and regenerate the requirements.txt.

After requirements.txt is ready, we can run pip-sync to update our environment. It will add the new packages and remove packages if they are not in requirements.txt anymore.

In addition, pip-tools can also help you upgrade packages and manage development dependencies. pip-tools is simple and compatible with traditional Python setup. I think it should be enough for most Python projects.

Pipenv

Pipenv was started by a famous Python developer, Kenneth Reitz, and it is currently owned by PyPA community, which also maintain the core libraries for Python packaging.

Pipenv provides more features than pip-tools. It will create a virtualenv and manage for you so you don't need to consider the naming or where to put the virtualenv. Pipenv will use two files to keep track of your environment, Pipfile for your declared dependencies and Pipfile.lock for all underlying dependencies.

Usage

Here I assume you've install Pipenv to your machine (official guide). The usage of Pipenv is very straightforward:

# Add a package
pipenv install django "celery<5"
# Remove a package
pipenv uninstall celery
# Activate the virtualenv
pipenv shell
# Or run a script named foobar.py using the virtualenv
pipenv run python foobar.py

Poetry

Same as Pipenv, the author of Poetry, Sebastien Eustace also built some famous Python packages like Pendulum before. The reason he built Poetry is because he wants a single tool to manage his projects from start to finish. One big difference between Pipenv and Poetry is Poetry can also help you package your Python projects and publish them to PyPI.

Usage

Like Pipenv, you have to install Poetry as a CLI tool (official guide). Most Poetry usages are similar to Pipenv, but Poetry need users set up your project first:

# Create a project called foobar
# It'll create a foobar directory with some basic files
poetry new foobar
# Or you can also create a project in an existing directory
poetry init

After the project is ready, you can:

# Add a package
poetry add django "celery<5"
# Remove a package
poetry remove celery
# Activate the virtualenv
poetry shell
# Or run a script named foobar.py using the virtualenv
poetry run python foobar.py

Which should I use? Pipenv or Poetry?

One important Poetry feature is it can also help you publish your Python package while Pipenv focuses on making the process of building applications easier. You can see Pipenv's explaination here. So you should use Poetry if you need to build a library and want to use one tool to manage all.

Some other differences are from the dependency resolution. For example, they use different dependency specifications. Pipenv follows version specifiers in PEP 440, but Poetry uses the semantic versioning which is also used in npm and Cargo. They also have different resolver implementations. Therefore, their outputs are not always the same even if the inputs are the same.

conda-lock

If you need to work for some data science or machine learning projects, you might be wondering if it is possible to produce lock files for conda environments. conda-lock might be the tool you need. Its usage is quite simple: you provide an environment.yml and it will generate lock files for multiple platforms. For example,

# Generate the lock files for MacOS and Linux
conda-lock -f environment.yml -p osx-64 -p linux-64
# We can use the lock file to create a new conda environment 
conda-lock install -p foobar conda-linux-64.lock  # default file name
# Or use conda command
conda create -n foobar --file conda-linux-64.lock

It also support some other file formats like meta.yaml (conda-build) and specifications used in flit and Poetry. Please notice that conda-lock only focuses one simple feature: locking the environment and it lacks some useful features like maintaining existing environments. You will need to reinstall/recreate the environment everytime when you need to add and remove your dependencies. Another important missing feature is conda-lock doesn't support packages from PyPI (issue). You will need to run pip install to add PyPi packages to your conda environment after the conda is ready. But this might break your environment.

Summary

Python doesn't have an official package management tool yet so you can always see someone will build a new tool to solve existing problems. When you need to pick a tool for a long-term project, please read the manual carefully before your decision because there might be some unexpected behaviors. My choice for package management tool in 2021 is:

conda-lock if I need conda
pip-tools if the project is small
Poetry otherwise

Toolchain is evoling all the time and I will probably introduce some other new tools in the future. Stay tune and feel free to leave any feedbacks!

Comments

Python Environment in 2021 - Part 2, Python Package Management

Background

Main problem: Reproducibility

pip-tools

Usage

Pipenv

Usage

Poetry

Usage

Which should I use? Pipenv or Poetry?

conda-lock

Summary

Comments

Published

Category

Tags

Contact