Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip overwrites existing files unconditionally during installation #4625

Open
davidedelvento opened this issue Jul 20, 2017 · 36 comments
Open
Labels
resolution: deferred till PR Further discussion will happen when a PR is made type: bug A confirmed bug or unintended behavior UX User experience related

Comments

@davidedelvento
Copy link

  • Pip version: any (tested on 1.5.4 and 9.0.1)

  • Python version: any (tested on 2.7.6 and 3.4.3)

  • Operating system: Should be OS independent (tested on Ubuntu 14.04.5 and Mint 17.3)

Description:

I'm installing packages with with pip, and it happens that some of these packages (with different names) have files with name clashes. Unfortunately, pip silently ignores the issue and the results depend on the order in which the installs happens. In an ideal world, I would expect this to be impossible, i.e. to force the package maintainer to use a unique namespace. As a (far worse, but better than current behavior) second choice, I would expect pip to warn the user before silently overwriting existing files, especially if pip itself installed those files.

What I've run:

Create a clean slate to work on:

/tmp $ virtualenv NAMECLASH
New python executable in NAMECLASH/bin/python
Installing setuptools, pip...done.
/tmp $ cd NAMECLASH
/tmp/NAMECLASH $ source bin/activate
(NAMECLASH)/tmp/NAMECLASH $ pip show -f pyjwt
(NAMECLASH)/tmp/NAMECLASH $ pip show -f jwt
(NAMECLASH)/tmp/NAMECLASH $ 

Install packages:

(NAMECLASH)/tmp/NAMECLASH $ pip install jwt
Downloading/unpacking jwt
  Downloading jwt-0.5.2.tar.gz
  Running setup.py (path:/tmp/NAMECLASH/build/jwt/setup.py) egg_info for package jwt
    
Downloading/unpacking cryptography==1.7.2 (from jwt)
  Downloading cryptography-1.7.2.tar.gz (420kB): 420kB downloaded
  Running setup.py (path:/tmp/NAMECLASH/build/cryptography/setup.py) egg_info for package cryptography
...
Successfully installed jwt cryptography typing idna pyasn1 six setuptools enum34 ipaddress cffi pycparser
Cleaning up...
(NAMECLASH)/tmp/NAMECLASH $ 

Lots of irrelevant stuff removed from log above (for the sake of making this issue easier to read). Analyzing what has been installed:

(NAMECLASH)/tmp/NAMECLASH $ pip show -f jwt
---
Name: jwt
Version: 0.5.2
Location: /tmp/NAMECLASH/lib/python2.7/site-packages
Requires: cryptography, typing
Files:
  ../jwt/jws.py
  ../jwt/jwk.py
  ../jwt/exceptions.py
  ../jwt/jwkset.py
  ../jwt/utils.py
  ../jwt/jwa.py
  ../jwt/jwt.py
  ../jwt/__init__.py
  ../jwt/jws.pyc
  ../jwt/jwk.pyc
  ../jwt/exceptions.pyc
  ../jwt/jwkset.pyc
  ../jwt/utils.pyc
  ../jwt/jwa.pyc
  ../jwt/jwt.pyc
  ../jwt/__init__.pyc
  ./
  PKG-INFO
  requires.txt
  SOURCES.txt
  dependency_links.txt
  top_level.txt

(NAMECLASH)/tmp/NAMECLASH $ grep Invalid /tmp/NAMECLASH/lib/python2.7/site-packages/jwt/exceptions.py
class InvalidKeyTypeError(JWTException):

Now install silently conflicting package, and analyzing what happened:

(NAMECLASH)/tmp/NAMECLASH $ pip install pyjwt
Downloading/unpacking pyjwt
  Downloading PyJWT-1.5.2-py2.py3-none-any.whl
Installing collected packages: pyjwt
Successfully installed pyjwt
Cleaning up...
(NAMECLASH)/tmp/NAMECLASH $ pip show -f pyjwt
---
Name: PyJWT
Version: 1.5.2
Location: /tmp/NAMECLASH/lib/python2.7/site-packages
Requires: 
Files:
Cannot locate installed-files.txt
(NAMECLASH)/tmp/NAMECLASH $ grep Invalid /tmp/NAMECLASH/lib/python2.7/site-packages/jwt/exceptions.py
class InvalidTokenError(Exception):
class DecodeError(InvalidTokenError):
class ExpiredSignatureError(InvalidTokenError):
class InvalidAudienceError(InvalidTokenError):
class InvalidIssuerError(InvalidTokenError):
class InvalidIssuedAtError(InvalidTokenError):
class ImmatureSignatureError(InvalidTokenError):
class InvalidKeyError(Exception):
class InvalidAlgorithmError(InvalidTokenError):
class MissingRequiredClaimError(InvalidTokenError):
InvalidAudience = InvalidAudienceError
InvalidIssuer = InvalidIssuerError

As you can see, pip silently allow the two packages to overwrite each other's jwt/exceptions.py file (and in particular the InvalidKeyTypeError is now gone, creating a whole lot of problem, depending on the order in which pip install ran)

This was referenced Jul 20, 2017
@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Aug 4, 2017
@darr1s
Copy link

darr1s commented Aug 6, 2017

Is there a workaround for now?

@RonnyPfannschmidt
Copy link
Contributor

the name clash pretty much is the fault ofthe package authors ^^ pip cant help bad packagers and pypi is not curated

@davidedelvento
Copy link
Author

@RonnyPfannschmidt So pip is so dumb that it cannot even check if it's overwriting files (better if it checked the directory/egg/package exist)? You are correct that pip can't help with name clashes "in the wild", but in my opinion it must warn the user when a name clash is happening on the user machine.

Especially because pip itself, during automatic dependencies resolving, could install a package that the user did not explicitly request and clash with another one.

@pradyunsg
Copy link
Member

The fact that the names clash is a packaging issue. The fact that pip installs such packages is (arguably) a pip bug.

I agree that pip should be printing a warning; I won't want pip to refuse to overwrite files and fail... Off the top of my head, I imagine this would break namespace package installation (or at least need them to special cased).


Is there a workaround for now?

I imagine the only "fix" to this issue is that the conflicting packages resolve such a conflict.


To me, this issue as a tracker for a warning being added to pip, that's printed when pip overwrites an existing file, when it doesn't expect it to happen.

@RonnyPfannschmidt
Copy link
Contributor

Any case of overwriting a file that is not a namespace declaration is likely a packaging bug

@pradyunsg
Copy link
Member

Any case of overwriting a file that is not a namespace declaration is likely a packaging bug

Agreed.

pip warning about the fact that it's overwriting is nicer than pip silently breaking the installation; even though the flaw is in the package... pip is the one taking the action that breaks the environment, even if it's because of the flawed packaging.

@pradyunsg
Copy link
Member

FTR; The fix for this issue IMO would be that pip starts printing warnings when it overwrites non-namespace declaration files.

I feel that until a PR comes around, it doesn't make sense to discuss the ramifications of this change.

@pradyunsg pradyunsg added the resolution: deferred till PR Further discussion will happen when a PR is made label Aug 6, 2017
@pfmoore
Copy link
Member

pfmoore commented Aug 6, 2017

Agreed. There's a whole load of corner cases / unusual usages that would have to be considered (upgrades, ``--target``` installs, for example) and it's much easier to debate details like that with actual code.

@tlandschoff-scale
Copy link

tlandschoff-scale commented Oct 11, 2017

We just got bitten by this - again. I finally researched for an issue for this as I expected that this is just an oversight and will be fixed anyway, if I report it or not.

Consider using two library packages using ipaddress but because we are still at Python 2 a backport is used. There is py2-ipaddress and ipaddress on PyPI. So one library package uses the first and the second the latter (different google filter bubbles, I guess).

Usually this worked out like this (without using the intermediate packages):

torsten.landschoff@horatio:~$ virtualenv --python python2 pip-nameclash
[...]
Installing setuptools, pip, wheel...done.
torsten.landschoff@horatio:~$ . pip-nameclash/bin/activate
(pip-nameclash) torsten.landschoff@horatio:~$ pip install ipaddress
[...]
Successfully installed ipaddress-1.0.18
(pip-nameclash) torsten.landschoff@horatio:~$ pip install py2-ipaddress
[...]
Successfully installed py2-ipaddress-3.4.1
(pip-nameclash) torsten.landschoff@horatio:~$ python -c "import ipaddress;print(ipaddress.ip_address('127.0.0.1'))"
127.0.0.1

However, with less luck this is the result:

torsten.landschoff@horatio:~$ . pip-nameclash/bin/activate
(pip-nameclash) torsten.landschoff@horatio:~$ pip install -q py2-ipaddress
(pip-nameclash) torsten.landschoff@horatio:~$ pip install -q ipaddress
(pip-nameclash) torsten.landschoff@horatio:~$ python -c "import ipaddress;print(ipaddress.ip_address('127.0.0.1'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/torsten.landschoff/pip-nameclash/lib/python2.7/site-packages/ipaddress.py", line 165, in ip_address
    ' a unicode object?' % address)
ipaddress.AddressValueError: '127.0.0.1' does not appear to be an IPv4 or IPv6 address. Did you pass in a bytes (str in Python 2) instead of a unicode object?

In reality we build using fresh virtualenvs each time and it seems like the installation order is deterministic. So our customers weren't affected. 😌

However we spent quite some time to find out what as going on when running the application on my machine, because somehow I got the dependencies installed in the wrong order.

I could understand that this feature is missing if pip did not track the installed files. So I looked if maybe pip does already record this information:

(pip-nameclash) torsten.landschoff@horatio:~/pip-nameclash$ find -type f -print0|xargs -0 grep ipaddress.py
Binary file ./lib/python2.7/site-packages/pip/_vendor/ipaddress.pyc matches
Binary file ./lib/python2.7/site-packages/ipaddress.pyc matches
./lib/python2.7/site-packages/pip-9.0.1.dist-info/RECORD:pip/_vendor/ipaddress.py,sha256=wimbqcE7rwwETlucn8A_4Qd_-NKXPOBcNxJHarUoXng,80176
./lib/python2.7/site-packages/pip-9.0.1.dist-info/RECORD:pip/_vendor/ipaddress.pyc,,
./lib/python2.7/site-packages/ipaddress-1.0.18.dist-info/RECORD:ipaddress.py,sha256=0U5GvYMuyiZrV5jScIRHYxFWbwLzM44HNiaY8x57qUA,80156
./lib/python2.7/site-packages/ipaddress-1.0.18.dist-info/RECORD:ipaddress.pyc,,
./lib/python2.7/site-packages/py2_ipaddress-3.4.1.dist-info/RECORD:ipaddress.py,sha256=lCp63yE4Z1-ia3T3qEiljD1CcG10UartIx3nYXXoRzE,74200
./lib/python2.7/site-packages/py2_ipaddress-3.4.1.dist-info/RECORD:ipaddress.pyc,,

Surprise, it is recorded - in a file named RECORD :-)

Question is: Would a merge request be accepted that adds a check for conflicting files? I'd be interested in implementing this as I carry a bit of fear that we could run into this in production.

For now I will be trying to come up with a script that checks the installed files vs. the RECORD information.

@pfmoore
Copy link
Member

pfmoore commented Oct 11, 2017

Question is: Would a merge request be accepted that adds a check for conflicting files?

Yes, a PR would be welcomed. As noted earlier in the thread, please make sure to cater for namespace packages (I don't know much about them myself, but a lot of people use them and they need to be supported).

@tlandschoff-scale
Copy link

As noted earlier in the thread, please make sure to cater for namespace packages

Good point. And probably non-trivial. Naive solution would be to ensure that all __init__.py have the same content (or overall allow conflicting files with identical content).

But I bet that will often not be the case.

@RonnyPfannschmidt
Copy link
Contributor

oh, and as a extra fun bit, pip will have to keep track of double installed files
if 2 namespace packages install a __init__ and one is uninstalled, the __init__ shouldnt disappear

@pfmoore
Copy link
Member

pfmoore commented Oct 11, 2017

... and at the moment, pip doesn't actually keep track of what's installed. That's done in the RECORD file which is technically package metadata (and so can't be modified for pip's private purposes). There's also the need to deal with packages that weren't installed by pip (e.g. raw setup.py install was used, or conda).

Sorry @tlandschoff-scale - I don't mean to dump loads of objections on you. It would still be good to have a PR for this, but as you can see, it's not as simple as it first seems. Unfortunately, that's far too common a situation in Python packaging 😞

@davidedelvento
Copy link
Author

@pfmoore I think packages installed in other ways (e.g. python setup.py install) are totally out of scope here. The issue here is: pip happily overwrites packages it installed, that's the bug. Tracking things installed in other ways could be a nice feature to have, but it's not as dramatically wrong as pip overwriting packages it installed without noticing!

@pfmoore
Copy link
Member

pfmoore commented Oct 12, 2017

Understood, but my point is that pip doesn't do any tracking of what it installed. It relies purely on standardised metadata to tell what files are installed and where (specifically the RECORD file). And that's essentially by design - we want package data (such as what is installed) to be introspectable by anyone, using well-defined standard metadata.

@pfmoore
Copy link
Member

pfmoore commented Oct 12, 2017

... anyway, it's not worth debating the details just yet. Let's wait for someone to submit a PR and we can discuss in the context of actual code.

@tlandschoff-scale
Copy link

From what I have found the RECORD file is actually rewritten during installation - but only when a wheel is installed. Which would probably account for 99% of all packages we are installing.

Actually this is done inside the (vendored) distlib package in src/pip/_vendor/distlib/wheel.py. So pip may not do any tracking but some tracking information is (usually) available.

@pradyunsg
Copy link
Member

I agree with @pfmoore here. I think it's best to not discuss the details now and to discuss them in the context of some actual implementation, when a PR is made for this by someone.

@mhsmith
Copy link

mhsmith commented Mar 25, 2018

There's a whole load of corner cases / unusual usages that would have to be considered (upgrades, ``--target``` installs, for example)

In the case of --target, things are even more confusing: it doesn't treat existing files and directories the same way as a normal install, because it only works at the granularity of top-level files and directories (#4230).

Like most of the other issues with --target, this behaviour seems to exist, not because it makes sense, but because it was the quickest solution to a problem someone had in the past. As it says in #1489 (comment):

Multiple installs don't say "this is already there" because it isn't. Upgrade won't work because it's not there to upgrade. Editable would probably be a mess, I have no idea how it would work. "Fragile" is the most polite term I can think of for the behaviour...

@anntzer
Copy link
Contributor

anntzer commented Mar 25, 2018

Just an observer's comment regarding

the name clash pretty much is the fault ofthe package authors ^^ pip cant help bad packagers and pypi is not curated

Arch Linux's packages are split in two parts: the "official" packages (https://www.archlinux.org/packages/) and the user-provided & uncurated packages (the Arch User Repositories, https://aur.archlinux.org/ https://wiki.archlinux.org/index.php/Arch_User_Repository). The package manager, pacman, can install packages from either list, and does refuse, by default, to let one package overwrite files already present in the system. Of course, upgrades are handled fine.

So lack of curation is not really an argument there IMO.

@mhsmith
Copy link

mhsmith commented Apr 13, 2018

The package manager, pacman, [...] does refuse, by default, to let one package overwrite files already present in the system.

Conda has the same policy.

@ncoghlan
Copy link
Member

(Arriving from pypa/packaging.python.org#513)

As noted there, I think more desirable behaviour here would be for pip to:

  1. When installing from a wheel, check whether or not there's an existing local RECORD for the package it is installing
  2. If there is, read it, and emit FutureWarning if it needs to overwrite a file not listed in RECORD (as that's a sign the file actually belongs to a different package)
  3. In a later release (after the assorted edge cases have been worked through), switch that warning to a hard failure of the installation attempt and require a --force option to override it

As a UX enhancement, pip could potentially start caching an SQLite DB somewhere with the reverse lookup from filenames to the corresponding RECORD file, and use that to report which packages are responsible for a file path conflict, but that wouldn't be essential to the base feature of conflict detection.

(Such a DB would also let pip more gracefully handle the shared __init__ case for namespace packages)

@pradyunsg pradyunsg changed the title Serious name clash issue pip overwrites existing files unconditionally during installation Jun 18, 2018
@pradyunsg pradyunsg removed the type: enhancement Improvements to functionality label Jun 18, 2018
@pradyunsg
Copy link
Member

Here's a different idea: The only legitimate usecase we know of, for this, is non-native namespace packages.

In that case, we can reasonably restrict this to only allow overwriting __init__.py files (with the relevant deprecation period) and introduce an error if the file being overwritten isn't an __init__.py file. If we really want, we could go further and restrict the file lengths and also enforce an import of pkgutil or pkg_resources and nothing else via AST parsing.

@uranusjr
Copy link
Member

But this still leaves the problem that when one of the packages gets uninstalled, it would break other packages that share the same __init__.py.

@PythonCHB
Copy link

Coming to this very late (from: https://discuss.python.org/t/python-module-conflict-discussion/23659)

https://discuss.python.org/t/python-module-conflict-discussion/23659

I think it could be kept fairly simple, taking into account that there may be other sources than PyPi for packages, and there may be tools other than pip for installing.

AFAICT, there are two different potential use cases here:

  1. Two different distributions that have nothing to do with each-other use the same package name / same filename. This is fundamentally a problem with uncurated package sources, but at least the user should know something’s up, and have to use a --force flag or some such to continue the install.

  2. Cooperating distributions, such as namespace packages – then maybe pip could know that it’s OK to overwrite certain files -- in that case, it would be part of the distribution being installed saying: "I expect that I might be overwriting this file -- and that's OK". This, of course, would open a door to malevolent behavior, but I don't think that's what we're trying to prevent at this point.

NOTE: namespace packages may not be the only use-case -- one I can think of off the top of my head are optional "accelerated" versions of packages. I might want the accelerated version to overwrite the "regular" version in some way.

@zooba
Copy link
Contributor

zooba commented Feb 9, 2023

Any regular upgrade process is going to remove the old package first. There's literally no reason for a package install to need to overwrite any existing file, and the patch ought to be as simple as banning overwrites when extracting.1

The only exception - and it would be part of error handling the failed overwrite, not an eager test - is to allow overwrites for identical contents. The only somewhat legitimate case right now is multiple cooperating packages trying to form a namespace that want to include the __init__.py, and all of those ought to have identical contents (ideally empty, but likely still using the command popularised during setuptools's per-distro install directories). These ought to be identical, and that's the only reasonable property we can test anyway.

So, in the absence of the --force option:

  • if a file being extracted already exists in the destination location, fail
  • if a failure occurs, check to see if the file is identical. If so, overwrite and warn

We don't need to fix uninstalls here - they're already busted. Most people I know don't ever uninstall stuff anymore, they just blow away the whole environment and start fresh. In any case, there are workarounds for it, and if pip starts warning on overwrites it'll be even more clear that it's because uninstalls won't work properly.

Footnotes

  1. Bearing in mind that the extract/copy process is supposedly going to be completely rearchitected, which is why I stopped my work on optimising it and am not intending to write a patch for this right now.

@pfmoore
Copy link
Member

pfmoore commented Feb 9, 2023

There's --ignore-installed, and while you may disagree with its existence, we have to be aware of it 🙁

And I disagree on uninstalls, as a general concept. If you're simply talking about uninstalls of namespace packages, then fair enough (I don't care much either way) but uninstalls in general are definitely an important thing to have, and "just blow away the environment" isn't a reasonable response to brokenness of uninstalls in general.

@zooba
Copy link
Contributor

zooba commented Feb 9, 2023

There's --ignore-installed, and while you may disagree with its existence, we have to be aware of it 🙁

I wasn't even aware 😄 But I see no reason to not treat that like --force in this context.

uninstalls in general are definitely an important thing to have, and "just blow away the environment" isn't a reasonable response to brokenness of uninstalls in general.

Maybe unreasonable, but it's what people do. That's the only point I was making. I never suggested further breaking or removing uninstall.

All I was really pointing out is that this change doesn't break uninstalls further, so there's no reason to block it on those grounds. (Also worth noting that to properly fix uninstalls, we need to solve the conflict problem first, so this one comes first anyway.)

@pradyunsg
Copy link
Member

pradyunsg commented Oct 27, 2023

From a conversation with @jaimergp at PackagingCon today, the way Conda has a flag for how to handle PathConflict:

We can probably add an option like that.

@pradyunsg
Copy link
Member

I'm gonna do a bit of a rework of the wheel installation logic, once pypa/installer#153 is completed.

@chinaexpert1
Copy link

chinaexpert1 commented Feb 18, 2024

Sorry I'm a bit green but how would one know this has occurred prior to some behavior? Is there something I can run before installation to avoid this? And is the environment ruined if I don't? Thanks

@bezborodow
Copy link

Is this a security risk in that a malicious package can purposefully overwrite a file in another package?

@zooba
Copy link
Contributor

zooba commented Aug 27, 2024

It's a security risk if you don't validate the contents of the package you're intending to install.

Whether it then installs malicious code, runs a malicious build-time script, or overwrites existing files is equivalent risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resolution: deferred till PR Further discussion will happen when a PR is made type: bug A confirmed bug or unintended behavior UX User experience related
Projects
None yet
Development

No branches or pull requests