Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Drop submission validation via libmagic #6500

Merged
merged 12 commits into from
Oct 23, 2023

Conversation

jennifer-richards
Copy link
Member

This drops content type detection that ultimately depends on the system's libmagic to guess the content type of an uploaded submission. This has proven unreliable on some platforms (notably production). Instead, this relies on the text / XML parsers to reject files that do not conform to their formats.

The detection had a side effect of guessing the character encoding used by the uploaded file, which was used to determine whether the file was "us-ascii" or UTF-8. As far as I can tell, the two were handled separately only to account for libmagic's preference to report "us-ascii" when possible. I've implemented a validator that rejects submissions that include invalid UTF-8 encodings, reporting the first problematic byte and its position in the file back to the user.

At least some invalid files that were previously screened out by the content type detection cause xml2rfc to raise an XMLSyntaxError exception that was unhandled. These changes catch that exception and turn it into an error message hinting that the wrong file format may have been submitted. In particular, attempting to upload a text-format draft where the form expects XML hits this.

@codecov
Copy link

codecov bot commented Oct 20, 2023

Codecov Report

Merging #6500 (0fd6348) into main (f6a2d8c) will decrease coverage by 0.01%.
Report is 20 commits behind head on main.
The diff coverage is 95.83%.

@@            Coverage Diff             @@
##             main    #6500      +/-   ##
==========================================
- Coverage   88.70%   88.70%   -0.01%     
==========================================
  Files         290      290              
  Lines       40440    40429      -11     
==========================================
- Hits        35874    35862      -12     
- Misses       4566     4567       +1     
Files Coverage Δ
ietf/doc/views_conflict_review.py 97.05% <100.00%> (-0.07%) ⬇️
ietf/submit/parsers/base.py 96.72% <100.00%> (+1.56%) ⬆️
ietf/submit/parsers/xml_parser.py 100.00% <ø> (ø)
ietf/utils/xmldraft.py 93.18% <100.00%> (+0.26%) ⬆️
ietf/submit/forms.py 79.10% <80.00%> (+0.07%) ⬆️
ietf/submit/parsers/plain_parser.py 92.85% <91.66%> (+0.70%) ⬆️

... and 3 files with indirect coverage changes

@rjsparks rjsparks merged commit dc14308 into ietf-tools:main Oct 23, 2023
9 checks passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 27, 2023
@jennifer-richards jennifer-richards deleted the less-magic branch October 27, 2023 03:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants