Secure yourself against Python dependency confusion

(, en)

Dependency confusion is a tricky business in Python land, especially if you are an organisation that maintains a private Python package repository. There are not many options. The obvious first: by specifying --extra-index-url, pip will contact the extra index and the official index on pypi.org, check the versions and install the »highest« one. I did not look up what happens when there is the same version available on the private and public index, but I guess not that what you want or expect. So what options are left:

Use your own repository/index.

Specify the index in your pip.conf

# pip.conf
[global]
index-url = https://my.index.org/...

or use the --index-url parameter (not to be confused with --extra-index-url).

This setup gives you the advantage that you are fully in control. However, you need to setup and maintain everything. And the most important issue: only one glitch in network connectivity, a forgotten index parameter or pip configuration entry is needed and pip will fall back to pypi.org

Use hashes

Somewhat counter-intuitive, but unfortunately reality: it is not possible to specify hashes in pyproject.toml. Only with the help of pip-tools, you can create a hashed version of your requirements.

pip-compile --generate-hashes --output-file=requirements.txt pyproject.toml

Add this file to your git repo. Install using pip install --require-hashes -r requirements.txt.

With hashes you have a relatively secure approach, as long as you use requirements.txt (and not pyproject.toml) to install your dependencies. Of course you need to make sure that you pick the right indices to begin with. A big disadvantage is the initial setup and maintenance effort: hashes require time and care.

pypi organisations

In April 2023, pypi started to support organization accounts. Especially for organisations or bigger companies this is a desperately needed feature: namespaces can be reserved on the official pypi index. With a reserved namespace, you can be sure that you’ll not install some malicious package by accident. The maintenance is comparatively low, you can safely add your private package index via --extra-index-url. No extra changes in projects that use your private packages are needed.

Still, you probably need to have a certain (organisational) size to request an »organisational account«.

Conclusion

These are options, but in my opinion the whole concept of dependency confusion due to selecting the wrong index is a structural issue of pip (or the python ecosystem). The first step would be to assess potential threats and their impact (also known as Threat Modeling) and then make a decision on what to do. You might conclude that you don’t need any additional steps, which is fine, as long as it is a conscious decision.

See also