Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC3986 compatible URL.join honoring empty segments #1039

Merged
merged 36 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
778b5ac
rfc3986 - URL.join
commonism Jun 15, 2024
aa570ff
URL.join - align tests with empty parameters
commonism Jul 18, 2024
eeacc09
tests - URL.join with empty segments
commonism Jul 19, 2024
efea232
CHANGES
commonism Jul 19, 2024
14a0b18
CHANGES - links
commonism Jul 19, 2024
0fcb819
CHANGES - intersphinx ref
commonism Jul 19, 2024
d9acbe2
CHANGES - intersphinx
commonism Jul 19, 2024
b33a959
CHANGES - intersphinx
commonism Jul 19, 2024
66b7f7f
CHANGES - no intersphinx
commonism Jul 19, 2024
d379d8f
CHANGES - intersphinx
commonism Jul 19, 2024
4979bd6
URL.join - uncrustify joining other.path
commonism Jul 20, 2024
d4b689a
tests - adding cpython urljoin cases not covered yet
commonism Jul 22, 2024
68f2e55
import at the top
commonism Jul 23, 2024
f4c52dd
query parameters - the other way around
commonism Jul 24, 2024
c1b8a54
query - using query_string instead …
commonism Jul 24, 2024
e650ccf
CHANGES - no changes to parameters
commonism Jul 24, 2024
f9d24eb
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
40bf571
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
6275035
Update CHANGES/1039.bugfix.rst
commonism Jul 25, 2024
6c0966f
tests - …
commonism Jul 25, 2024
537fe57
fix URL.join for URL without authority
commonism Aug 8, 2024
bb7881b
Merge branch 'master' into rfc3986-urljoin
commonism Aug 8, 2024
c98cd58
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 14, 2024
10cf27d
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 14, 2024
e8c5829
Merge branch 'master' into rfc3986-urljoin
commonism Aug 14, 2024
e136d40
codecov was right with this one
commonism Aug 14, 2024
cb0d678
codecov was right with this one … as well
commonism Aug 14, 2024
1b1c078
Update yarl/_url.py
commonism Aug 15, 2024
46daaa0
Update yarl/_url.py
commonism Aug 15, 2024
4729518
lost change
commonism Aug 15, 2024
a93abd8
as suggested
commonism Aug 15, 2024
531ff90
pycon test
commonism Aug 15, 2024
46bc760
pycon - expected result
commonism Aug 16, 2024
5ef4396
Update tests/test_url.py
bdraco Aug 30, 2024
defbc90
Merge branch 'master' into rfc3986-urljoin
bdraco Aug 30, 2024
f94589c
lint
bdraco Aug 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions CHANGES/1039.bugfix.rst
bdraco marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
:meth:`URL.join() <yarl.URL.join>` has been changed to match
:rfc:`3986` and align with
:meth:`/ operation <yarl.URL.__truediv__>` and :meth:`URL.joinpath() <yarl.URL.joinpath>`
when joining URLs with empty segments.
Previously :py:func:`urllib.parse.urljoin` was used,
which has known issues with empty segments
(`python/cpython#84774 <https://github.com/python/cpython/issues/84774>`_).

Due to the semantics of :meth:`URL.join() <yarl.URL.join>`, joining an
URL with scheme requires making it relative, prefixing with ``./``.

.. code-block:: pycon

>>> URL("https://web.archive.org/web/").join(URL("./https://github.com/aio-libs/yarl"))
URL('https://web.archive.org/web/https://github.com/aio-libs/yarl')


Empty segments are honored in the base as well as the joined part.

.. code-block:: pycon

>>> URL("https://web.archive.org/web/https://").join(URL("github.com/aio-libs/yarl"))
URL('https://web.archive.org/web/https://github.com/aio-libs/yarl')



-- by :user:`commonism`
63 changes: 63 additions & 0 deletions tests/test_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -1664,6 +1664,69 @@
assert base.join(url) == expected


EMPTY_SEGMENTS = [
(
"https://web.archive.org/web/",
"./https://github.com/aio-libs/yarl",
"https://web.archive.org/web/https://github.com/aio-libs/yarl",
),
(
"https://web.archive.org/web/https://github.com/",
"aio-libs/yarl",
"https://web.archive.org/web/https://github.com/aio-libs/yarl",
),
]


@pytest.mark.parametrize("base,url,expected", EMPTY_SEGMENTS)
def test_join_empty_segments(base, url, expected):
base = URL(base)
url = URL(url)
expected = URL(expected)
joined = base.join(url)
assert joined == expected

Check warning on line 1687 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1682-L1687

Added lines #L1682 - L1687 were not covered by tests


SIMPLE_BASE = "http://a/b/c/d"
URLLIB_URLJOIN = [
("", "http://a/b/c/g?y/./x", "http://a/b/c/g?y/./x"),
("", "http://a/./g", "http://a/./g"),
("svn://pathtorepo/dir1", "dir2", "svn://pathtorepo/dir2"),
("svn+ssh://pathtorepo/dir1", "dir2", "svn+ssh://pathtorepo/dir2"),
("ws://a/b", "g", "ws://a/g"),
("wss://a/b", "g", "wss://a/g"),
# test for issue22118 duplicate slashes
(SIMPLE_BASE + "/", "foo", SIMPLE_BASE + "/foo"),
# Non-RFC-defined tests, covering variations of base and trailing
# slashes
("http://a/b/c/d/e/", "../../f/g/", "http://a/b/c/f/g/"),
("http://a/b/c/d/e", "../../f/g/", "http://a/b/f/g/"),
("http://a/b/c/d/e/", "/../../f/g/", "http://a/f/g/"),
("http://a/b/c/d/e", "/../../f/g/", "http://a/f/g/"),
("http://a/b/c/d/e/", "../../f/g", "http://a/b/c/f/g"),
("http://a/b/", "../../f/g/", "http://a/f/g/"),
("a", "b", "b"),
]


@pytest.mark.parametrize("base,url,expected", URLLIB_URLJOIN)
def test_join_cpython_urljoin(base, url, expected):

Check warning on line 1713 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1713

Added line #L1713 was not covered by tests
# tests from cpython urljoin
base = URL(base)
url = URL(url)
expected = URL(expected)
joined = base.join(url)
assert joined == expected

Check warning on line 1719 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1715-L1719

Added lines #L1715 - L1719 were not covered by tests


def test_join_cpython_urljoin_fail():
with pytest.raises(
TypeError, match=r"unsupported operand type\(s\) for \+: 'NoneType' and 'str'"

Check warning on line 1724 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1722-L1724

Added lines #L1722 - L1724 were not covered by tests
):
URL("http:///").join(URL(".."))
pytest.xfail("Shouldn't raise TypeError on empty host name")

Check warning on line 1727 in tests/test_url.py

View check run for this annotation

Codecov / codecov/patch

tests/test_url.py#L1726-L1727

Added lines #L1726 - L1727 were not covered by tests


def test_split_result_non_decoded():
with pytest.raises(ValueError):
URL(SplitResult("http", "example.com", "path", "qs", "frag"))
Expand Down
68 changes: 58 additions & 10 deletions yarl/_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
from contextlib import suppress
from ipaddress import ip_address
from typing import Union
from urllib.parse import SplitResult, parse_qsl, quote, urljoin, urlsplit, urlunsplit
from urllib.parse import SplitResult, parse_qsl, quote, urlsplit, urlunsplit
from urllib.parse import uses_netloc as uses_authority
from urllib.parse import uses_relative

import idna
from multidict import MultiDict, MultiDictProxy
Expand Down Expand Up @@ -259,12 +261,14 @@
netloc = authority
else:
tmp = SplitResult("", authority, "", "", "")
port = None if tmp.port == cls._default_port(scheme) else tmp.port
netloc = cls._make_netloc(
tmp.username, tmp.password, tmp.hostname, tmp.port, encode=True
tmp.username, tmp.password, tmp.hostname, port, encode=True

Check warning on line 266 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L266

Added line #L266 was not covered by tests
)
elif not user and not password and not host and not port:
netloc = ""
else:
port = None if port == cls._default_port(scheme) else port
netloc = cls._make_netloc(
user, password, host, port, encode=not encoded, encode_host=not encoded
)
Expand Down Expand Up @@ -455,12 +459,15 @@
def _get_default_port(self) -> Union[int, None]:
if not self.scheme:
return None
return self._default_port(self.scheme)

@staticmethod
def _default_port(scheme: str) -> Union[int, None]:
with suppress(KeyError):
return DEFAULT_PORTS[self.scheme]
return DEFAULT_PORTS[scheme]

with suppress(OSError):
return socket.getservbyname(self.scheme)
return socket.getservbyname(scheme)

return None

Expand Down Expand Up @@ -762,11 +769,7 @@
f"Appending path {path!r} starting from slash is forbidden"
)
path = path if encoded else self._PATH_QUOTER(path)
segments = [
segment for segment in reversed(path.split("/")) if segment != "."
]
if not segments:
continue
segments = list(reversed(path.split("/")))
# remove trailing empty segment for all but the last path
segment_slice_start = int(not last and segments[0] == "")
parsed += segments[segment_slice_start:]
Expand Down Expand Up @@ -1153,7 +1156,52 @@
# See docs for urllib.parse.urljoin
if not isinstance(url, URL):
raise TypeError("url should be URL")
return URL(urljoin(str(self), str(url)), encoded=True)
other: URL = url
scheme = other.scheme or self.scheme
parts = {
k: getattr(self, k) or ""
for k in ("authority", "path", "query_string", "fragment")

Check warning on line 1163 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L1162-L1163

Added lines #L1162 - L1163 were not covered by tests
}
parts["scheme"] = scheme

if scheme != self.scheme or scheme not in uses_relative:
return URL(str(other))

# scheme is in uses_authority as uses_authority is a superset of uses_relative
commonism marked this conversation as resolved.
Show resolved Hide resolved
if scheme in uses_authority and other.authority:
parts.update(
{
k: getattr(other, k) or ""
for k in ("authority", "path", "query_string", "fragment")

Check warning on line 1175 in yarl/_url.py

View check run for this annotation

Codecov / codecov/patch

yarl/_url.py#L1174-L1175

Added lines #L1174 - L1175 were not covered by tests
}
)
return URL.build(**parts)

if other.path or other.fragment:
parts["fragment"] = other.fragment or ""
if other.path or other.query:
parts["query_string"] = other.query_string or ""

if not other.path:
return URL.build(**parts)

if other.path[0] == "/":
parts["path"] = other.path
return URL.build(**parts)

if self.path[-1] == "/":
# using an intermediate to avoid URL.joinpath dropping query & fragment
parts["path"] = URL(self.path).joinpath(other.path).path
else:
# …
# and relativizing ".."
path = URL("/".join([*self.parts[1:-1], ""])).joinpath(other.path).path
if parts["authority"]:
parts["path"] = "/" + path
else:
parts["path"] = path

return URL.build(**parts)

def joinpath(self, *other, encoded=False):
"""Return a new URL with the elements in other appended to the path."""
Expand Down
Loading