This post is also available in: zh-tw
Welcome back to this series. The last post already introduced tools for Python runtime management, and this post will focus on tools that can help you manage Python packages for projects.
pip is the package installer for Python, and you can find a lot of tutorials using pip with
requirements.txt to create the Python environment. It's fine if you only use a few packages; or don't need to maintain the environment in the long term. But once your project gets bigger and bigger, you'll find it's very challenging to use only pip and
requirements.txt for package management.
Main problem: Reproducibility
Some people only include the packages they need in
requirements.txt but not all the underlying dependencies. However, the open-source packages evolve all the time, it's hard to guarantee you can create an identical environment next time.
A common solution is to add versions to
requirements.txt or use
pip freeze to dump the environment. But another problem comes up: how to add or delete a package without breaking the environment. pip doesn't do this for you.
pip install a new package might update dependencies used in installed packages and break your environment.
So what is the solution? If you have experience in package management for other languages, you might know most modern package managers use two separate files to handle this:
- A file with all direct dependencies
- A lock file that contains all the dependencies (direct dependencies and all underlying dependencies)
We need to put these two files into the version control system, and then we can reproduce the environment anytime.
In the Python world, you can find many tools to achieve similar goals. In this post, I'll introduce pip-tools, Pipenv, Poetry, and conda-lock.
pip-compilecompiles your direct dependencies into a
pip-syncupdates your virtualenv to match the packages in the compiled
pip-tools needs to work with virtualenv, and similar to pip, pip-tools must be installed to each virtualenv.
As written above, we need to install pip-tools in each virtual environment. This is different from other tools introduced in this post.
// If you are using pyenv // Create a virtualenv pyenv virtualenv 3.9.1 pip-tools-test // Activate pyenv shell pip-tools-test // Install! python -m pip install pip-tools
You can also follow their installation instructions.
After installation, we can use commands from the virtualenv. By default,
pip-compile will read your packages from
requirements.in, compile, and generate a
requirements.txt to store all your dependencies. Let's say we need Django and Celery in our project, and we don't want to use Celery 5 now. We can add our needs to
# requirements.in django celery<5
If we execute
pip-compile, we'll get
requirements.txt like this:
# # This file is autogenerated by pip-compile # To update, run: # # pip-compile # amqp==2.6.1 # via kombu asgiref==3.3.1 # via django billiard==188.8.131.52 # via celery celery==4.4.7 # via -r requirements.in django==3.1.7 # via -r requirements.in importlib-metadata==3.7.3 # via kombu kombu==4.6.11 # via celery pytz==2021.1 # via # celery # django sqlparse==0.4.1 # via django typing-extensions==184.108.40.206 # via importlib-metadata vine==1.3.0 # via # amqp # celery zipp==3.4.1 # via importlib-metadata
It converts our needs and pins the package versions. You can also see where these dependencies come from in the comments. When we need to add or delete a package, we only need to edit
requirements.in and regenerate the
requirements.txt is ready, we can run
pip-sync to update our environment. It will add the new packages and remove packages if they are not in
In addition, pip-tools can also help you upgrade packages and manage development dependencies. pip-tools is simple and compatible with traditional Python setup. I think it should be enough for most Python projects.
Pipenv provides more features than pip-tools. It will create a virtualenv and manage for you so you don't need to consider the naming or where to put the virtualenv. Pipenv will use two files to keep track of your environment, Pipfile for your declared dependencies and
Pipfile.lock for all underlying dependencies.
Here I assume you've install Pipenv to your machine (official guide). The usage of Pipenv is very straightforward:
# Add a package pipenv install django "celery<5" # Remove a package pipenv uninstall celery # Activate the virtualenv pipenv shell # Or run a script named foobar.py using the virtualenv pipenv run python foobar.py
Same as Pipenv, the author of Poetry, Sebastien Eustace also built some famous Python packages like Pendulum before. The reason he built Poetry is because he wants a single tool to manage his projects from start to finish. One big difference between Pipenv and Poetry is Poetry can also help you package your Python projects and publish them to PyPI.
Like Pipenv, you have to install Poetry as a CLI tool (official guide). Most Poetry usages are similar to Pipenv, but Poetry need users set up your project first:
# Create a project called foobar # It'll create a foobar directory with some basic files poetry new foobar # Or you can also create a project in an existing directory poetry init
After the project is ready, you can:
# Add a package poetry add django "celery<5" # Remove a package poetry remove celery # Activate the virtualenv poetry shell # Or run a script named foobar.py using the virtualenv poetry run python foobar.py
Which should I use? Pipenv or Poetry?
One important Poetry feature is it can also help you publish your Python package while Pipenv focuses on making the process of building applications easier. You can see Pipenv's explaination here. So you should use Poetry if you need to build a library and want to use one tool to manage all.
Some other differences are from the dependency resolution. For example, they use different dependency specifications. Pipenv follows version specifiers in PEP 440, but Poetry uses the semantic versioning which is also used in npm and Cargo. They also have different resolver implementations. Therefore, their outputs are not always the same even if the inputs are the same.
If you need to work for some data science or machine learning projects, you might be wondering if it is possible to produce lock files for conda environments. conda-lock might be the tool you need. Its usage is quite simple: you provide an
environment.yml and it will generate lock files for multiple platforms. For example,
# Generate the lock files for MacOS and Linux conda-lock -f environment.yml -p osx-64 -p linux-64 # We can use the lock file to create a new conda environment conda-lock install -p foobar conda-linux-64.lock # default file name # Or use conda command conda create -n foobar --file conda-linux-64.lock
It also support some other file formats like
meta.yaml (conda-build) and specifications used in flit and Poetry. Please notice that conda-lock only focuses one simple feature: locking the environment and it lacks some useful features like maintaining existing environments. You will need to reinstall/recreate the environment everytime when you need to add and remove your dependencies. Another important missing feature is conda-lock doesn't support packages from PyPI (issue). You will need to run
pip install to add PyPi packages to your conda environment after the conda is ready. But this might break your environment.
Python doesn't have an official package management tool yet so you can always see someone will build a new tool to solve existing problems. When you need to pick a tool for a long-term project, please read the manual carefully before your decision because there might be some unexpected behaviors. My choice for package management tool in 2021 is:
- conda-lock if I need conda
- pip-tools if the project is small
- Poetry otherwise
Toolchain is evoling all the time and I will probably introduce some other new tools in the future. Stay tune and feel free to leave any feedbacks!