Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashing when checking the external hashes of URLs #787

Closed
stevecheckoway opened this issue Jan 8, 2023 · 0 comments · Fixed by #789
Closed

Crashing when checking the external hashes of URLs #787

stevecheckoway opened this issue Jan 8, 2023 · 0 comments · Fixed by #789

Comments

@stevecheckoway
Copy link
Contributor

stevecheckoway commented Jan 8, 2023

In order to check if a URL's fragment is valid, html-proofer uses Nokogiri to parse the document at the URL. Nokogiri has a default maximum tree depth of 400. (This can be changed.)

Since the content of external documents is outside the control of the user of html-proofer, it makes sense to catch any exceptions that result from parsing those (and probably all) documents with Nokogiri.

I don't have a URL off-hand that demonstrates this as it is one of 900 on my website. But I did confirm that htmlproofer --check-external-hash fails with the error

/path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:85:in `parse': Document tree depth limit exceeded (ArgumentError)
        from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:85:in `do_parse'
        from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:43:in `parse'
        from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5.rb:31:in `HTML5'
        from /path/vendor/bundle/ruby/3.1.0/gems/html-proofer-5.0.3/lib/html_proofer/utils.rb:22:in `create_nokogiri'
...

and htmlproofer --no-check-external-hash completes successfully.

Update: I realized I could change the log level to figure out which URL was causing the problem. In fact, the issue was a link to a PDF with a query fragment, similar to #663.

To reproduce put

<!DOCTYPE html>
<a href='https://checkoway.net/teaching/cs210/2021-fall/slides/Lec19%20ClocksandTiming.pdf#page=25'>X</a>

in a.html and run $ htmlproofer --log-level :debug a.html.

Trimmed output:

Running 3 checks (Images, Links, Scripts) in a.html on *.html files ...


Running Images in a.html
Running Links in a.html
Running Scripts in a.html
Checking 1 external link
ETHON: started MULTI
Received a 200 for https://checkoway.net/teaching/cs210/2021-fall/slides/Lec19%20ClocksandTiming.pdf#page=25
bundler: failed to load command: htmlproofer (/path/vendor/bundle/ruby/3.1.0/bin/htmlproofer)
/path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:85:in `parse': Document tree depth limit exceeded (ArgumentError)
        from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:85:in `do_parse'
        from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:43:in `parse'
        from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5.rb:31:in `HTML5'
        from /path/vendor/bundle/ruby/3.1.0/gems/html-proofer-5.0.3/lib/html_proofer/utils.rb:22:in `create_nokogiri'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant