Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new tool to check HTML #84480

Closed
wants to merge 5 commits into from

Conversation

GuillaumeGomez
Copy link
Member

@GuillaumeGomez GuillaumeGomez commented Apr 23, 2021

This PR adds a new tool to validate the HTML generated by rustdoc. To be noted, for now it has a lot of errors so opening it as a draft until I fix all reported issues.

Helps with #84356.

r? @jyn514

@GuillaumeGomez GuillaumeGomez added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Apr 23, 2021
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 23, 2021
Comment on lines 171 to 181
{
eprintln!("not running HTML-check tool because `tidy` is missing");
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if this is running in CI and give a hard error instead? You can do that with crate::CiEnv::current() != crate::CiEnv::None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this was skipped in the PR run:

Set({"src/tools/html-checker"}) not skipped for "bootstrap::test::HtmlCheck" -- not in ["src/tools/tidy"]
not running HTML-check tool because `tidy` is missing

Copy link
Member Author

@GuillaumeGomez GuillaumeGomez Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes since the command is not installed. I'll do something something similar to my eslint check that I added a while ago or for rustdog-gui.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of a check that only works locally? We should test this in CI or it will constantly regress.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need it to work on at least one CI, no need to run it everywhere.

Copy link
Member

@jyn514 jyn514 Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not testing if it runs on at least one CI. If you forget to install tidy, CI will silently pass. I really think it's easier to just require it unconditionally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is likely going to want the same resolution we're (coming to) on the browser-ui-test thread, which is that we don't run it by default unless tidy is available (e.g., this check passes). We'd still run it if tidy is available, and if it is explicitly requested.

In the future we'll want to develop a solution that works better for these 'opt-in' style tests, but this should be OK for now.

src/bootstrap/test.rs Outdated Show resolved Hide resolved
src/tools/html-checker/main.rs Outdated Show resolved Hide resolved
src/tools/html-checker/main.rs Show resolved Hide resolved
src/tools/html-checker/main.rs Outdated Show resolved Hide resolved
src/tools/html-checker/main.rs Outdated Show resolved Hide resolved
src/tools/html-checker/main.rs Outdated Show resolved Hide resolved
src/tools/html-checker/main.rs Outdated Show resolved Hide resolved
src/tools/html-checker/main.rs Outdated Show resolved Hide resolved
src/tools/html-checker/main.rs Outdated Show resolved Hide resolved
@jyn514 jyn514 added the T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) label Apr 23, 2021
@jyn514
Copy link
Member

jyn514 commented Apr 23, 2021

cc @Mark-Simulacrum - this is still pretty far from ready to merge but I imagine you'll want to take a look once it's closer to finished.

@jyn514 jyn514 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 23, 2021
@Mark-Simulacrum
Copy link
Member

Can the PR description be expanded to indicate what kind of bugs this is trying to address? Additionally, it would be good to provide some indication of the expected error message quality and how many people will hit them (just rustdoc developers changing templates? all rustc developers? random book authors?)...

Also, what tool are we using? Tidy is a pretty generic name -- is that available in distros (including e.g. homebrew)? On Windows?

I admit that I have no context here, so maybe this is all extensively documented; I just don't know where to look.

@GuillaumeGomez
Copy link
Member Author

I have to admit that I didn't expect to get this much attention on a draft so I documented to the strict minimum...

The goal here is to check that the HTML generated by rustdoc is valid (no unclosed tags or equivalent for example). As for tidy (or html-tidy for its full name), I use it here as an HTML linter/error detector.

It'll be useful only for rustdoc contributors who change the output of the HTML layout (or if someone editing documentation is using plain HTML and does bad stuff).

@jyn514
Copy link
Member

jyn514 commented Apr 23, 2021

@Mark-Simulacrum This is addressing #84356. I don't know if tidy is available on windows, but it's a very old command and I would expect basically every linux distro to have it.

> apt show tidy
Package: tidy
Version: 2:5.6.0-11
Priority: optional
Section: universe/web
Source: tidy-html5
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Tidy HTML5 <tidy-html5@packages.debian.org>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 86.0 kB
Depends: libtidy5deb1 (= 2:5.6.0-11), libc6 (>= 2.14)
Homepage: https://www.html-tidy.org/
Download-Size: 28.5 kB
APT-Manual-Installed: yes
APT-Sources: http://us.archive.ubuntu.com/ubuntu focal/universe amd64 Packages
Description: HTML/XML syntax checker and reformatter
 Tidy corrects and cleans up HTML and XML documents by fixing
 markup errors and upgrading legacy code to modern standards.
 .
 This package contains a command line tool 'tidy'.

Additionally, it would be good to provide some indication of the expected error message quality and how many people will hit them (just rustdoc developers changing templates? all rustc developers? random book authors?)...

Only rustdoc developers should hit them unless people are using raw HTML in their markdown, in which case it's a good thing they see the errors. Here's some example error (from docs generated by rustdoc):

> fd .html doc | xargs tidy --mute-id yes -quiet -o /dev/null
line 1 column 562 - Warning: <link> proprietary attribute "disabled" (PROPRIETARY_ATTRIBUTE)
line 1 column 630 - Warning: <link> proprietary attribute "disabled" (PROPRIETARY_ATTRIBUTE)
line 4 column 234 - Warning: <a> attribute "href" lacks value (MISSING_ATTR_VALUE)
line 1 column 558 - Warning: <link> proprietary attribute "disabled" (PROPRIETARY_ATTRIBUTE)
line 1 column 626 - Warning: <link> proprietary attribute "disabled" (PROPRIETARY_ATTRIBUTE)
line 4 column 472 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)
line 5 column 131 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)
line 5 column 323 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)
line 5 column 530 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)
line 5 column 739 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)
line 5 column 920 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)
line 5 column 1120 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)
line 5 column 1302 - Warning: trimming empty <span> (TRIM_EMPTY_ELEMENT)
line 1 column 532 - Warning: <link> proprietary attribute "disabled" (PROPRIETARY_ATTRIBUTE)
line 1 column 599 - Warning: <link> proprietary attribute "disabled" (PROPRIETARY_ATTRIBUTE)
line 4 column 628 - Warning: <select> proprietary attribute "autocomplete" (PROPRIETARY_ATTRIBUTE)
line 4 column 951 - Warning: <select> proprietary attribute "autocomplete" (PROPRIETARY_ATTRIBUTE)
line 1 column 557 - Warning: <link> proprietary attribute "disabled" (PROPRIETARY_ATTRIBUTE)
line 1 column 628 - Warning: <link> proprietary attribute "disabled" (PROPRIETARY_ATTRIBUTE)

@rust-log-analyzer

This comment has been minimized.

@GuillaumeGomez GuillaumeGomez force-pushed the html-checker branch 2 times, most recently from 6d0f56c to 5654e86 Compare April 23, 2021 20:21
@rust-log-analyzer

This comment has been minimized.

@bors

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@GuillaumeGomez
Copy link
Member Author

I fixed some HTML issues, and ignored some others that seem to require a bit more work. I also added the installation of tidy on a target so the HTML check will run on it.

cc @Mark-Simulacrum (at least checking if bootstrap and Dockerfile changes are good for you)

@bors

This comment has been minimized.

@JohnCSimon
Copy link
Member

Ping from triage:
@GuillaumeGomez can you please address the merge conflicts and address the comment?

@GuillaumeGomez
Copy link
Member Author

@Mark-Simulacrum The HTML changes are being done in #84703.

@JohnCSimon It's waiting for #84703 to be merged.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 2, 2021
Clean up dom

The commits come from rust-lang#84480.

They were errors reported by the `tidy` script that we will use to ensure that the HTML generated by rustdoc is valid.

I checked carefully that there were no difference so in principle it should be exactly the same rendering but a double-check would be very appreciated in case I missed something.

Extra note: `<h4>` and some `<h3>` tags were replaced by `<div>` because they're not supposed to contain tags as they currently do.

r? `@jsha`
@GuillaumeGomez
Copy link
Member Author

Since #84703 was merged, I used this opportunity to rebase and clean things up a bit by removing some checks and only run tidy on files generated by rustdoc (whereas it was running on all items in the doc folder before, including books). We still have some errors as you can see below. I'll fix them in another PR and then rebase this one and finish to fix things up.

Running HTML checker...
=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/alloc/str/pattern/struct.StrSearcher.html` (error code: 1) <=
line 1 column 227 - Warning: unescaped & or unknown entity "&str" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/core/arch/arm/fn.vbsl_s8.html` (error code: 1) <=
line 1 column 288 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)
line 1 column 352 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)
line 1 column 443 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/core/arch/aarch64/fn.vbsl_f64.html` (error code: 1) <=
line 1 column 288 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)
line 1 column 352 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)
line 1 column 443 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/core/arch/aarch64/fn.vbsl_s8.html` (error code: 1) <=
line 1 column 288 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)
line 1 column 352 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)
line 1 column 443 - Warning: unescaped & or unknown entity "&FP" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/core/mem/fn.transmute_copy.html` (error code: 1) <=
line 1 column 238 - Warning: unescaped & or unknown entity "&U" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/core/str/pattern/struct.StrSearcher.html` (error code: 1) <=
line 1 column 227 - Warning: unescaped & or unknown entity "&str" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/core/raw/struct.TraitObject.html` (error code: 1) <=
line 1 column 248 - Warning: unescaped & or unknown entity "&dyn" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/core/all.html` (error code: 1) <=
line 6 column 1290563 - Warning: <h3> attribute "id" has invalid value "Trait Aliases" (BAD_ATTRIBUTE_VALUE)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/std/mem/fn.transmute_copy.html` (error code: 1) <=
line 1 column 238 - Warning: unescaped & or unknown entity "&U" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/std/time/struct.SystemTime.html` (error code: 1) <=
line 46 column 19 - Warning: <th> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 46 column 51 - Warning: <th> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 47 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 47 column 32 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 48 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 48 column 33 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 49 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 49 column 35 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 50 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 50 column 36 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 51 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 51 column 33 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 52 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 52 column 36 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/std/time/struct.Instant.html` (error code: 1) <=
line 48 column 19 - Warning: <th> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 48 column 51 - Warning: <th> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 49 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 49 column 32 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 50 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 50 column 33 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 51 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 51 column 35 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 52 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 52 column 36 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 53 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 53 column 33 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 54 column 5 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)
line 54 column 36 - Warning: <td> attribute "align" not allowed for HTML5 (MISMATCHED_ATTRIBUTE_WARN)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/std/raw/struct.TraitObject.html` (error code: 1) <=
line 1 column 248 - Warning: unescaped & or unknown entity "&dyn" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/std/str/pattern/struct.StrSearcher.html` (error code: 1) <=
line 1 column 227 - Warning: unescaped & or unknown entity "&str" (UNKNOWN_ENTITY)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/proc_macro/macro.quote.html` (error code: 1) <=
line 1 column 171 - Warning: <meta> attribute "a")," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "punct('+'," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "alone)," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute with missing trailing quote mark (MISSING_QUOTEMARK)

=> Errors for `/home/imperio/rust/rust/build/x86_64-unknown-linux-gnu/doc/proc_macro/struct.Literal.html` (error code: 1) <=
line 1 column 171 - Warning: <meta> attribute "hello"`)," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "(`b"hello"`)," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "(`'a'`)," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "(`b'a'`)," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "(`1`," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "`1u8`," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "`2.3`," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "`2.3f32`)." lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "`true`" lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "`false`" lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute "here," lacks value (MISSING_ATTR_VALUE)
line 1 column 171 - Warning: <meta> attribute with missing trailing quote mark (MISSING_QUOTEMARK)
line 1 column 171 - Warning: <meta> dropping value "NULL" for repeated attribute "byte" (REPEATED_ATTRIBUTE)
line 1 column 171 - Warning: <meta> dropping value "NULL" for repeated attribute "character" (REPEATED_ATTRIBUTE)
line 1 column 171 - Warning: <meta> dropping value "NULL" for repeated attribute "or" (REPEATED_ATTRIBUTE)

Done! Read 26270 files...
Error: "HTML check failed: 64 errors"

@@ -58,12 +46,21 @@ fn check_html_file(file: &Path) -> usize {
}
}

const DOCS_TO_CHECK: &[&str] =
&["alloc", "core", "proc_macro", "implementors", "src", "std", "test"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than only checking specific crates, could you instead filter out the books you don't want to check?

Copy link
Member Author

@GuillaumeGomez GuillaumeGomez Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are A LOT of different ones and I can't know for sure which one will be added. I initially filtered out but it was a huge mess. We don't generate that many crates in the end so I preferred going this way.

EDIT: and some of them don't have "book" in their name, making things funnier, like "rustdoc". :3

@GuillaumeGomez
Copy link
Member Author

Ah and the check was supposed to fail so I guess it isn't run by default, gonna need to fix that too.

@rust-log-analyzer

This comment has been minimized.

@GuillaumeGomez GuillaumeGomez added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 5, 2021
@GuillaumeGomez
Copy link
Member Author

@Mark-Simulacrum I made a change suggested by @jyn514, but that would force all CI checks on linux to have tidy installed. If that sounds good to you, I'll add this dependency installation to our images.

@Mark-Simulacrum
Copy link
Member

I would expect this to use the same method used by the rustdoc-gui tests: we run by default if tidy is installed in the environment and explicitly request a run on one of the builders. If the html check is explicitly requested we would assert tidy is installed.

@GuillaumeGomez
Copy link
Member Author

Ok, gonna use the same mechanism then.

@GuillaumeGomez
Copy link
Member Author

Updated it, but with this change, it's not run anymore on the three "fast" CIs. Considering that we want to run this check on every PR, I suppose it's not an issue?

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jun 5, 2021
…r=jsha

Rustdoc html fixes

rust-lang#84480 latest update allowed me to fix the remaining issues. The last one is coming from `pulldown-cmark` so I'll send them a fix soon.

r? `@jsha`
@bors bors closed this in 302f3dc Jun 6, 2021
@GuillaumeGomez
Copy link
Member Author

No idea why it was closed and I can't re-open this one so I'm simply gonna open another one...

JohnTitor added a commit to JohnTitor/rust that referenced this pull request Jun 28, 2021
…k-Simulacrum

Add new tool to check HTML

Re-opening of rust-lang#84480.

r? `@Mark-Simulacrum`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants