Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expression catastrophic backtracking in Git URL parsing #1902

Closed
3 tasks done
mschwager opened this issue Jan 16, 2020 · 3 comments · Fixed by #1913
Closed
3 tasks done

Regular expression catastrophic backtracking in Git URL parsing #1902

mschwager opened this issue Jan 16, 2020 · 3 comments · Fixed by #1913
Assignees
Labels
kind/bug Something isn't working as expected

Comments

@mschwager
Copy link

mschwager commented Jan 16, 2020

  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • OS version and name: Linux 4.15.0-74-generic x86_64 GNU/Linux
  • Poetry version: Github master
  • Link of a Gist with the contents of your pyproject.toml file: N/A

Issue

Hi there! I've been working on a new Python static analysis tool called Dlint. Most recently I've been working on a rule that searches for regular expression denial-of-service: DUO138. When running this rule against your codebase I found a few violations:

$ python -m flake8 --select=DUO138 poetry
poetry/vcs/git.py:22:5: DUO138 catastrophic "re" usage - denial-of-service possible
poetry/vcs/git.py:33:5: DUO138 catastrophic "re" usage - denial-of-service possible
poetry/version/version.py:14:19: DUO138 catastrophic "re" usage - denial-of-service possible

Note that DUO138 hasn't been released to PyPI yet, so if you want to run the rule yourself you'll have to install from Github: python -m pip install https://github.com/dlint-py/dlint/tarball/master.

After further investigation, it appears the violations in poetry/vcs/git.py are true positives, and the violation in poetry/version/version.py is a false positive.

If we dig into the Git parsing violations:

re.compile(
    r"^(git\+)?"
    r"(?P<protocol>https?|git|ssh|rsync|file)://"
    r"(?:(?P<user>.+)@)*"
    r"(?P<resource>[a-z0-9_.-]*)"
    r"(:?P<port>[\d]+)?"
    r"(?P<pathname>[:/]((?P<owner>[\w\-]+)/(?P<projects>([\w\-/]+)/)?)?"
    r"((?P<name>[\w\-.]+?)(\.git|/)?)?)"
    r"([@#](?P<rev>[^@#]+))?"
    r"$"
),
re.compile(
    r"^(?:(?P<user>.+)@)*"
    r"(?P<resource>[a-z0-9_.-]*)[:]*"
    r"(?P<port>[\d]+)?"
    r"(?P<pathname>/?(?P<owner>.+)/(?P<projects>([\w\-/]+)/)?(?P<name>.+).git)"
    r"([@#](?P<rev>[^@#]+))?"
    r"$"
),

The violations occur due to r"(?:(?P<user>.+)@)*" in both expressions. This is due to nested quantifiers with overlapping character space. We can confirm the bugs with the following code:

from poetry.vcs import git
git.ParsedUrl.parse("https://" + "@" * 64 + "!")
...Spins...
from poetry.vcs import git
git.ParsedUrl.parse("@" * 64 + "!")
...Spins...

To fix the issue you should be able to change both violations to r"(?:(?P<user>[^@]+)@)*", which avoids the overlapping character space. Note that Dlint will still flag these lines due to nested quantifiers - these are false positives that are still being ironed out.

Hope this is helpful, let me know if you have any questions!

@mschwager mschwager added the kind/bug Something isn't working as expected label Jan 16, 2020
@finswimmer
Copy link
Member

Hello @mschwager,

thanks a lot for pointing to this! 👍

Would this be fixed by changing it to r"^(?:(?P<user>\w+)@)*" like it is in the other regexes?

fin swimmer

@mschwager
Copy link
Author

Yes, that should work! \w and @ don't overlap so there should be no ReDoS.

@finswimmer finswimmer self-assigned this Jan 17, 2020
sdispater pushed a commit that referenced this issue Jan 22, 2020
…parsing (#1902 (#1913)

* fix (git): match for `\w` instead of `.` for getting user

* change (vcs.git): hold pattern of the regex parts in a dictionary to be consistent over all regexs

* new (vcs.git): test for `parse_url` and some fixes for the regex pattern

* new (vcs.git): test for `parse_url` with string that should fail

* fix (test.vcs.git): make flake8 happy
sdispater added a commit that referenced this issue Feb 21, 2020
* Fix Github actions cache issues (#1908)

* Fix case of `-f` flag

* Make it clearer what options to pass to `--format`

* fix (masonry.api): `get_requires_for_build_wheel` must return additional list of requirements for building a package, not listed in `pyproject.toml` and not dependencies for the package itself (#1875)

fix (tests): adopted tests

* Lazy Keyring intialization for PasswordManager (#1892)

* Fix Github Actions cache issues (#1928)

* Avoid nested quantifiers with overlapping character space on git url parsing (#1902 (#1913)

* fix (git): match for `\w` instead of `.` for getting user

* change (vcs.git): hold pattern of the regex parts in a dictionary to be consistent over all regexs

* new (vcs.git): test for `parse_url` and some fixes for the regex pattern

* new (vcs.git): test for `parse_url` with string that should fail

* fix (test.vcs.git): make flake8 happy

* fix: correct parsing of wheel version with regex. (#1932)

The previous regexp was only taking the first integer of the version number,
this presented problems when the major version number reached double digits.

Poetry would determine that the version of the dependency is '1', rather than,
ie: '14'. This caused failures to solve versions.

* Fix errors when using the --help option (#1910)

* Fix how repository credentials are retrieved from env vars (#1909)

# Conflicts:
#	poetry/utils/password_manager.py

* Fix downloading packages from Simplepypi (#1851)

* fix downloading packages from simplepypi

* unused code removed

* remove unused imports

* Upgrade dependencies for the 1.0.3 release (#1965)

* Bump version to 1.0.3 (#1966)

* Fix non-compliant Git URL matching

RFC 3986 § 2.3 permits more characters in a URL than were matched. This
corrects that, though there may be other deficiencies. This was a
regression from v1.0.2, where at least “.” was matched without error.

* Update README.md "Updating Poetry"

Currently the note in "Updating Poetry" is different from the one below in "Enable tab completion for Bash, Fish, or Zsh". This MR is to make them more consistent.

* init: change dev dependency prompt

* Fix CI issues (#2069)

Co-authored-by: brandonaut <brandon@hubermx.com>
Co-authored-by: finswimmer <finswimmer77@gmail.com>
Co-authored-by: Yannick PÉROUX <yannick.peroux@gmail.com>
Co-authored-by: Edward George <edwardgeorge@gmail.com>
Co-authored-by: Jan Škoda <skoda@jskoda.cz>
Co-authored-by: Andrew Marshall <andrew@johnandrewmarshall.com>
Co-authored-by: Andrew Selzer <andrewfselzer@gmail.com>
Co-authored-by: Andrii Maletskyi <andrii.maletskyi@gmail.com>
sdispater added a commit that referenced this issue Mar 20, 2020
* export: fix exporting extras sub-dependencies (#1294)

* Support POETRY_HOME for install (#794)

Allow the `POETRY_HOME` environment variable to be passed during installation to change the default installation directory of `~/.poetry`:

```
POETRY_HOME=/etc/poetry python get-poetry.py
```

* * check if relative filename is in excluded file list (#1459)

* * check if relative filename is in excluded file list
* removed find_excluded_files() method from wheel.py

* added test for excluding files in wheels

* creating an own test data folder, for testing excluding files by pyproject.toml

* use as_posix() to respect windows file path delimiters

* Exclude nested items (#784) (#1464)

* This PR impliments the feature request #784.

When a folder is explicit defined in `pyproject.toml` as excluded, all nested data, including subfolder, are excluded. It is no longer neccessary to use the glob `folder/**/*`

* use `Path` instead of `os.path.join` to create string for globbing

* try to fix linting error

* create glob pattern string by concatenating and not using Path

* using `os.path.isdir()`` for checking of explicit excluded name is a folder, because pathlib's `is_dir()` raises in exception under windows of name contains globing characters

* Remove nested data when wildcards where used.

Steps to do this are:
1. expand any wildcard used
2. if expanded path is a folder append  **/* and expand again

* fix linting

* only glob a second time if path is dir

* implement @sdispater 's suggestion for better readability

* fix glob for windows?

* On Windows, testing if a path with a glob is a directory will raise an OSError

* pathlibs  glob function doesn't return the correct case (https://bugs.python.org/issue26655). So switching back to  glob.glob()

* removing obsolete imports

* Update dependencies

* Deprecate allows-prereleases in favor of allow-prereleases for consistency

* Fix tests for Python 2.7

* Fix linting

* Fix linting

* Fix linting

* Fix typing import

* Correct a couple typos in get-poetry.py (#573)

* Docs: `self:update` changed to `self update` (#1588)

* Fix GitHub actions cache issues on develop (#1918)

* Fix Github actions cache issues

* Fix Github Actions cache issues (#1928)

* Add --source option to "poetry add" (#1912)

* Add --source option to 'poetry add'

* Add tests for 'poetry add --source'

* Merge master into develop (#2070)

* Fix Github actions cache issues (#1908)

* Fix case of `-f` flag

* Make it clearer what options to pass to `--format`

* fix (masonry.api): `get_requires_for_build_wheel` must return additional list of requirements for building a package, not listed in `pyproject.toml` and not dependencies for the package itself (#1875)

fix (tests): adopted tests

* Lazy Keyring intialization for PasswordManager (#1892)

* Fix Github Actions cache issues (#1928)

* Avoid nested quantifiers with overlapping character space on git url parsing (#1902 (#1913)

* fix (git): match for `\w` instead of `.` for getting user

* change (vcs.git): hold pattern of the regex parts in a dictionary to be consistent over all regexs

* new (vcs.git): test for `parse_url` and some fixes for the regex pattern

* new (vcs.git): test for `parse_url` with string that should fail

* fix (test.vcs.git): make flake8 happy

* fix: correct parsing of wheel version with regex. (#1932)

The previous regexp was only taking the first integer of the version number,
this presented problems when the major version number reached double digits.

Poetry would determine that the version of the dependency is '1', rather than,
ie: '14'. This caused failures to solve versions.

* Fix errors when using the --help option (#1910)

* Fix how repository credentials are retrieved from env vars (#1909)

# Conflicts:
#	poetry/utils/password_manager.py

* Fix downloading packages from Simplepypi (#1851)

* fix downloading packages from simplepypi

* unused code removed

* remove unused imports

* Upgrade dependencies for the 1.0.3 release (#1965)

* Bump version to 1.0.3 (#1966)

* Fix non-compliant Git URL matching

RFC 3986 § 2.3 permits more characters in a URL than were matched. This
corrects that, though there may be other deficiencies. This was a
regression from v1.0.2, where at least “.” was matched without error.

* Update README.md "Updating Poetry"

Currently the note in "Updating Poetry" is different from the one below in "Enable tab completion for Bash, Fish, or Zsh". This MR is to make them more consistent.

* init: change dev dependency prompt

* Fix CI issues (#2069)

Co-authored-by: brandonaut <brandon@hubermx.com>
Co-authored-by: finswimmer <finswimmer77@gmail.com>
Co-authored-by: Yannick PÉROUX <yannick.peroux@gmail.com>
Co-authored-by: Edward George <edwardgeorge@gmail.com>
Co-authored-by: Jan Škoda <skoda@jskoda.cz>
Co-authored-by: Andrew Marshall <andrew@johnandrewmarshall.com>
Co-authored-by: Andrew Selzer <andrewfselzer@gmail.com>
Co-authored-by: Andrii Maletskyi <andrii.maletskyi@gmail.com>

* pre-commit: replace isort mirror with isort upstream (#2118)

The isort pre-commit mirror has been deprecated. This change updates
configuration to use the upstream package repository instead of the
mirror.

* Add cache list command (#1187)

* Add poetry.locations.REPOSITORY_CACHE_DIR

The repository cache directory is used in multiple places in the
codebase. This change ensures that the value is reused.

* Add cache list command

This introduces a new cache sub-command that lists all available
caches.

Relates-to: #1162

Co-authored-by: Tom Milligan <tommilligan@users.noreply.github.com>
Co-authored-by: David Cramer <dcramer@users.noreply.github.com>
Co-authored-by: finswimmer <finswimmer77@gmail.com>
Co-authored-by: Kyle Altendorf <sda@fstab.net>
Co-authored-by: Justin Mayer <entroP@gmail.com>
Co-authored-by: Yannick PÉROUX <yannick.peroux@gmail.com>
Co-authored-by: brandonaut <brandon@hubermx.com>
Co-authored-by: Edward George <edwardgeorge@gmail.com>
Co-authored-by: Jan Škoda <skoda@jskoda.cz>
Co-authored-by: Andrew Marshall <andrew@johnandrewmarshall.com>
Co-authored-by: Andrew Selzer <andrewfselzer@gmail.com>
Co-authored-by: Andrii Maletskyi <andrii.maletskyi@gmail.com>
Co-authored-by: Arun Babu Neelicattu <arun.neelicattu@gmail.com>
sdispater added a commit that referenced this issue Mar 20, 2020
* Fix Github actions cache issues (#1908)

* Fix case of `-f` flag

* Make it clearer what options to pass to `--format`

* fix (masonry.api): `get_requires_for_build_wheel` must return additional list of requirements for building a package, not listed in `pyproject.toml` and not dependencies for the package itself (#1875)

fix (tests): adopted tests

* Lazy Keyring intialization for PasswordManager (#1892)

* Fix Github Actions cache issues (#1928)

* Avoid nested quantifiers with overlapping character space on git url parsing (#1902 (#1913)

* fix (git): match for `\w` instead of `.` for getting user

* change (vcs.git): hold pattern of the regex parts in a dictionary to be consistent over all regexs

* new (vcs.git): test for `parse_url` and some fixes for the regex pattern

* new (vcs.git): test for `parse_url` with string that should fail

* fix (test.vcs.git): make flake8 happy

* fix: correct parsing of wheel version with regex. (#1932)

The previous regexp was only taking the first integer of the version number,
this presented problems when the major version number reached double digits.

Poetry would determine that the version of the dependency is '1', rather than,
ie: '14'. This caused failures to solve versions.

* Fix errors when using the --help option (#1910)

* Fix how repository credentials are retrieved from env vars (#1909)

# Conflicts:
#	poetry/utils/password_manager.py

* Fix downloading packages from Simplepypi (#1851)

* fix downloading packages from simplepypi

* unused code removed

* remove unused imports

* Upgrade dependencies for the 1.0.3 release (#1965)

* Bump version to 1.0.3 (#1966)

* Fix non-compliant Git URL matching

RFC 3986 § 2.3 permits more characters in a URL than were matched. This
corrects that, though there may be other deficiencies. This was a
regression from v1.0.2, where at least “.” was matched without error.

* Update README.md "Updating Poetry"

Currently the note in "Updating Poetry" is different from the one below in "Enable tab completion for Bash, Fish, or Zsh". This MR is to make them more consistent.

* init: change dev dependency prompt

* Fix CI issues (#2069)

* fix (setup_reader): check if `func.value` has attr `id` (#2041)

* fix(git): get commit sha of git commit from annotated tags (#1948)

* fix(git): have annotated tags resolve to the commit sha

* fix(git): fix quote

* fix(git): change to rev-parse

* fix: use correct badge on README (#2065)

* Fix #1791: Load repository URL from config (#2061)

* Fix #1791: Load repository URL from config

* Ran black to fix linting errors

* Add test for repo URL env variable

* Changed schema to support url in multi dependencies (#2035)

* Fix handling of forward slashes and url encoding in credentials (#1911)

* Add support for forward slashes and url encoding in credentials

* Remove extra newline

* Remove unquote

* Bump actions/checkout from v1 to v2 (#2075)

* Update release.yml

* Update main.yml

* Fix vendor package as installed package (#1883) (#1981)

* Fix vendor package as installed package (#1883)

* import from

Co-Authored-By: Sébastien Eustace <sebastien.eustace@gmail.com>

* test vendor package as installed

* refactor

* remove blank line

Co-authored-by: Sébastien Eustace <sebastien.eustace@gmail.com>

* fix(utils.env): import cli_run from virtualenv (#2096)

* fix(utils.env): import cli_run from virtualenv if create_environment import failes

* fix (utils.env): added accidentally removed code

* list .venv when it exists (#1762)

* list .venv when it exists

* only list when in-project is true

* missing config

* move logic to manager.list

* Add .venv when it exists

* fix: exclude subpackage from `setup.py` if `__init__.py` is excluded (#1009) (#1626)

* fix: exclude subpackage from `setup.py` if `__init__.py` is excluded

Fixes: #1009

* fix: added missing test data

* fix: lint test data

* change (sdist.git): exclude folders with no python file

* fix (sdist.git): make black happy

* get_vcs starts searching git folder from tmp dir instead of project (#1946) (#1947)

* fix (builder): take `self._original_path` if available to find `.git` folder

* change (vcs): use `git rev-parse --show-toplevel` to find git root folder

* fix (vcs): change back to original working dir after finding vcs

* change (builder): introduce self._original_path to keep original path
if(vcs): resolve directory for `get_vcs`

* Normalize author name unicode before matching (#2006)

* Fix accented characters not being matched in author name

Fixes #2004

* Normalized the strings instead of modifying the pattern

* Applied isort & black

* Fix the url used for installation when fallbacking on PyPI (#2099)

* Upgrade dependencies before the 1.0.4 release (#2100)

* Upgrade dependencies before the 1.0.4 release (#2103)

* Release 1.0.4 (#2101)

* Update release script

* Bump version to 1.0.4

* Fix release script (#2104)

* Fix VCS when git is not in PATH

* Upgrade dependencies before the 1.0.5 release (#2111)

* Bump version to 1.0.5 (#2112)

* Fix GitHub URL for black

Black is now officially supported by the Python Software Foundation

* Update Contributing.md* Fix markdown formatting* Update link to official website FAQ

* Update managing-environments.md

Co-authored-by: brandonaut <brandon@hubermx.com>
Co-authored-by: finswimmer <finswimmer77@gmail.com>
Co-authored-by: Yannick PÉROUX <yannick.peroux@gmail.com>
Co-authored-by: Edward George <edwardgeorge@gmail.com>
Co-authored-by: Jan Škoda <skoda@jskoda.cz>
Co-authored-by: Andrew Marshall <andrew@johnandrewmarshall.com>
Co-authored-by: Andrew Selzer <andrewfselzer@gmail.com>
Co-authored-by: Andriy Maletsky <andriy.maletsky@gmail.com>
Co-authored-by: Julien Lhermitte <705366+jrmlhermitte@users.noreply.github.com>
Co-authored-by: Michael Aquilina <michaelaquilina@gmail.com>
Co-authored-by: Joshua Cannon <joshdcannon@gmail.com>
Co-authored-by: László Velinszky <laszlo.velinszky@meltwater.com>
Co-authored-by: Lu Zhu <misterzhu@gmail.com>
Co-authored-by: BSKY <git@bsky.moe>
Co-authored-by: Trim21 <github@trim21.me>
Co-authored-by: Frost Ming <frostming@tencent.com>
Co-authored-by: Raphael Yancey <raphael@badfile.net>
Co-authored-by: adisbladis <adisbladis@gmail.com>
Co-authored-by: Dimitri Merejkowsky <dimitri.merejkowsky@tanker.io>
Co-authored-by: Jules Chéron <jules.cheron@gmail.com>
Co-authored-by: Alex Povel <48824213+alexpovel@users.noreply.github.com>
Copy link

github-actions bot commented Mar 3, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Something isn't working as expected
Projects
None yet
2 participants