Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose an approach for coherent path handling #1067

Open
AndydeCleyre opened this issue Feb 12, 2020 · 7 comments
Open

Choose an approach for coherent path handling #1067

AndydeCleyre opened this issue Feb 12, 2020 · 7 comments
Labels
enhancement Improvements to functionality needs discussion Need some more discussion

Comments

@AndydeCleyre
Copy link
Contributor

AndydeCleyre commented Feb 12, 2020

What's the problem this feature will solve?

pip-tools currently handles paths in general a bit weakly:

  • using string parsing and manipulation, achieving incorrect results depending on the path names, potentially overwriting unrelated user files
  • producing different compilation output depending on the form given for equivalent relative paths (./requirements.in vs requirements.in)
  • naively forwarding relative paths (relative to a reqsin file) into the output file annotations, which may be in a different folder
  • not yet annotating setup.py sources with relative paths if appropriate
  • occasionally issues are raised about transformation between absolute and relative paths for various purposes
  • sometimes flops when trying to write to un-writable cache paths

We need to reliably inspect and transform paths, absolute and relative, for issues present and future.

Describe the solution you'd like

I'd like to build a consensus on a high level approach to more robust and consistent path handling (which modules to use and how it will affect the packaging of pip-tools itself), and then use that approach to fix up and improve path operations.

Here's an example problem in the current code:

$ echo '# real important data' > some.txt
$ mkdir some.folder
$ echo 'requests' > some.folder/reqsin
$ pip-compile --no-header some.folder/reqsin
$ ls some.folder
reqsin
$ cat some.txt
certifi==2019.11.28       # via requests
chardet==3.0.4            # via requests
idna==2.8                 # via requests
requests==2.22.0          # via -r some.folder/reqsin (line 1)
urllib3==1.25.8           # via requests

Similarly, if some.txt content is uncommented, pip-compile fails because it tries to parse that for existing requirements.

See also: #1061 #204 #966 #616 #395 #1084 and discussion at #1058

Alternatives and Considerations

  • We're still supporting python2
  • pathlib (stdlib) + pathlib2 (pypi or vendored)
  • better use of os.path
  • plumbum (which could also supplant click)
@AndydeCleyre
Copy link
Contributor Author

AndydeCleyre commented Feb 12, 2020

Some things I like about plumbum's path objects, relative to pathlib:

  • subclass str
  • always absolute, with a different type RelativePath (glorified list of path parts) which can be generated from a path's .relative_to function
  • because of the above, there's none of this pathlib nonsense: p.parent == p
  • the 'absolute' paths do not auto-resolve symlinks like pathlib does -- AFAIK it's impossible to get pathlib to resolve relative to absolute paths without also resolving symlinks
  • no divergent handling for python2
  • can potentially replace click, meaning that we could keep the dep count equal, while using pathlib2 would increase that number. IMO the plumbum syntax for that is more readable
  • can also optionally replace current usage of subprocess, shutil, TemporaryDirectory, and os very neatly, if desired

@atugushev atugushev added enhancement Improvements to functionality needs discussion Need some more discussion labels Feb 14, 2020
AndydeCleyre added a commit to AndydeCleyre/pip-tools that referenced this issue Feb 19, 2020
…y and VCS reqs; fixes jazzband#881 and jazzband#293

Change Notes
------------

- Annotations now include reqsin/setup.py/stdin info when relevant
- Always send reverse_dependencies to writer._format_requirement as a dict, even if empty
- Invert checks for primary requirements not getting annotations, as they now do
- Bypass annotation noise in tests which don't target annotations anyway, by passing --no-annotate
- ireqs returned by get_best_match now have the parameter ireq's _source_ireqs, if any
- reverse_dependencies are no longer passed around independently outside the
  resolver/cache, but rather determined from an ireq's comes_from and _source_ireqs
- primary_packages are no longer passed around independently, but rather determined by
  an ireq's constraint
- In tests, reverse dependencies for fake ireqs are now specified by setting the ireq's
  comes_from
- These changes do _not_ improve handling of relative paths to reqsin files in annotations (see jazzband#1067)

Co-authored-by: Albert Tugushev <albert@tugushev.ru>
@atugushev
Copy link
Member

I'm in favor of pathlib (stdlib) + pathlib2 (pypi).

@AndydeCleyre
Copy link
Contributor Author

@atugushev alright, can we make that pathlib + os.path and pathlib2 + os.path, to avoid pathlib's forced symlink resolution?

I still don't prefer pathlib here for the above reasons (especially that this will have us using 2 PyPI modules and 5 different stdlib modules where 1 PyPI module could do, and will create extra python2-specific baggage, and pathlib2 will never get any updates/fixes), but am happy to implement it this way, as improving path handling from our current state is important to me.

@atugushev
Copy link
Member

@AndydeCleyre

I still don't prefer pathlib here for the above reasons (especially that this will have us using 2 PyPI modules and 5 different stdlib modules where 1 PyPI module could do, and will create extra python2-specific baggage, and pathlib2 will never get any updates/fixes

Yeah, that's unfortunate, but hopefully, pip (i.e. pip-tools) drops py2 support someday.

BTW, could you elaborate on better use of os.path alternative?

@atugushev

This comment has been minimized.

@AndydeCleyre
Copy link
Contributor Author

BTW, could you elaborate on better use of os.path alternative?

Probably we can get much safer and more reliable behavior than we have without using anything other than os.path, even though more convenient options are now available. Looking at the example problem in this issue description, we might handle it like:

diff --git a/piptools/scripts/compile.py b/piptools/scripts/compile.py
index 0572cd3..670c215 100755
--- a/piptools/scripts/compile.py
+++ b/piptools/scripts/compile.py
@@ -235,7 +235,7 @@ def cli(
             )
         # Otherwise derive the output file from the source file
         else:
-            base_name = src_files[0].rsplit(".", 1)[0]
+            base_name = os.path.splitext(src_files[0])[0]
             file_name = base_name + ".txt"
 
         output_file = click.open_file(file_name, "w+b", atomic=True, lazy=True)

I may be more tempted to go an all-os.path route here if feasible, since pathlib doesn't handle symlinks sufficiently IMO.

@atugushev
Copy link
Member

I may be more tempted to go an all-os.path route here if feasible, since pathlib doesn't handle symlinks sufficiently IMO.

Oh, I see. That's great if os.path would be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvements to functionality needs discussion Need some more discussion
Projects
None yet
Development

No branches or pull requests

2 participants