Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing filter of mixed case hostnames #158

Merged
merged 1 commit into from
Feb 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Fixing filter of mixed case hostnames
  • Loading branch information
lipoja committed Feb 26, 2024
commit 83c2e2046d483bf39a7df95b52a06554108e34d1
2 changes: 2 additions & 0 deletions tests/unit/test_mixed_case.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ def test_mixed_case_defaults(urlextract, text, expected):
@pytest.mark.parametrize(
"text, expected",
[
("main_data_site.group.popular_data_desc", []),
("144.2 MB", []),
("ample.com", ["ample.com"]),
("http://Example.COM", []),
("www.example.com", ["www.example.com"]),
Expand Down
5 changes: 3 additions & 2 deletions urlextract/urlextract_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -673,9 +673,10 @@ def _is_domain_valid(

if not self.allow_mixed_case_hostname:
# we have to take url_parts.host instead of host variable because url_parts.host is not normalized
return all(s.islower() for s in url_parts.host if s.isalpha()) or all(
if not (all(s.islower() for s in url_parts.host if s.isalpha()) or all(
s.isupper() for s in url_parts.host if s.isalpha()
)
)):
return False

if self._permit_list and host not in self._permit_list:
return False
Expand Down
Loading