Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

//www.google.com cannot find such type of links #94

Closed
akshayanandraut opened this issue Aug 22, 2021 · 3 comments
Closed

//www.google.com cannot find such type of links #94

akshayanandraut opened this issue Aug 22, 2021 · 3 comments

Comments

@akshayanandraut
Copy link

No description provided.

@akshayanandraut
Copy link
Author

trying to fetch all URLs from a html response and some scripts.src have values like <script src="//www.google.com/somejsfile.js">

These type of links are not identified

In the html world, this format is quite common and is accepted.

I'm not sure if pre-processing the entire response to fix these links to a format identified by the urlextract lib is something I would do since that would mean that I parse through the entire response first correcting link formats and then use the find method which does the same. That would impact the performance on what I am doing.

Let's work on this as many on stackoverflow face similar issues when links start with a character before '//'

lipoja added a commit that referenced this issue Oct 4, 2021
lipoja added a commit that referenced this issue Oct 4, 2021
@lipoja
Copy link
Owner

lipoja commented Oct 4, 2021

@akshayanandraut Thank you for reporting this issue.
It should be fixed in next release (probably this week).

@lipoja lipoja closed this as completed Oct 5, 2021
@akshayanandraut
Copy link
Author

akshayanandraut commented Oct 8, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants