You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to check if a URL's fragment is valid, html-proofer uses Nokogiri to parse the document at the URL. Nokogiri has a default maximum tree depth of 400. (This can be changed.)
Since the content of external documents is outside the control of the user of html-proofer, it makes sense to catch any exceptions that result from parsing those (and probably all) documents with Nokogiri.
I don't have a URL off-hand that demonstrates this as it is one of 900 on my website. But I did confirm that htmlproofer --check-external-hash fails with the error
/path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:85:in `parse': Document tree depth limit exceeded (ArgumentError)
from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:85:in `do_parse'
from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:43:in `parse'
from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5.rb:31:in `HTML5'
from /path/vendor/bundle/ruby/3.1.0/gems/html-proofer-5.0.3/lib/html_proofer/utils.rb:22:in `create_nokogiri'
...
and htmlproofer --no-check-external-hash completes successfully.
Update: I realized I could change the log level to figure out which URL was causing the problem. In fact, the issue was a link to a PDF with a query fragment, similar to #663.
in a.html and run $ htmlproofer --log-level :debug a.html.
Trimmed output:
Running 3 checks (Images, Links, Scripts) in a.html on *.html files ...
Running Images in a.html
Running Links in a.html
Running Scripts in a.html
Checking 1 external link
ETHON: started MULTI
Received a 200 for https://checkoway.net/teaching/cs210/2021-fall/slides/Lec19%20ClocksandTiming.pdf#page=25
bundler: failed to load command: htmlproofer (/path/vendor/bundle/ruby/3.1.0/bin/htmlproofer)
/path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:85:in `parse': Document tree depth limit exceeded (ArgumentError)
from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:85:in `do_parse'
from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5/document.rb:43:in `parse'
from /path/vendor/bundle/ruby/3.1.0/gems/nokogiri-1.13.10-arm64-darwin/lib/nokogiri/html5.rb:31:in `HTML5'
from /path/vendor/bundle/ruby/3.1.0/gems/html-proofer-5.0.3/lib/html_proofer/utils.rb:22:in `create_nokogiri'
The text was updated successfully, but these errors were encountered:
In order to check if a URL's fragment is valid, html-proofer uses Nokogiri to parse the document at the URL. Nokogiri has a default maximum tree depth of 400. (This can be changed.)
Since the content of external documents is outside the control of the user of html-proofer, it makes sense to catch any exceptions that result from parsing those (and probably all) documents with Nokogiri.
I don't have a URL off-handthat demonstrates this as it is one of 900 on my website. But I did confirm thathtmlproofer --check-external-hash
fails with the errorand
htmlproofer --no-check-external-hash
completes successfully.Update: I realized I could change the log level to figure out which URL was causing the problem. In fact, the issue was a link to a PDF with a query fragment, similar to #663.
To reproduce put
in
a.html
and run$ htmlproofer --log-level :debug a.html
.Trimmed output:
The text was updated successfully, but these errors were encountered: