Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing an UTF-8 file without BOM and ISO-8859-1 encoding #1

Merged
merged 5 commits into from
Apr 7, 2023

Conversation

gnodet
Copy link
Member

@gnodet gnodet commented Apr 7, 2023

  • Fix parsing an UTF-8 file without BOM and ISO-8859-1 encoding (#242)
  • Fix BOM / encoding problems

belingueres and others added 2 commits April 7, 2023 09:36
* Deleted most code handling encoding (leaving that job to the XmlReader
* Fixed tests exercising encoding checks. Unsupported tests were skipped
* Simplified test-encoding-ISO-8859-1.xml test file

Skipped even more tests that pass on Linux but fail on Windows.
* enable testhst_lhs_007, testhst_lhs_008 and testhst_lhs_009 for InputStream
* disable those tests on readers, as readers bypass any encoding
* do not try to discover the encoding used when the input is given a Reader
* add an SIO-8859-1 encoded coment in the test xml (testEncodingISO_8859_1_newReader and testEncodingISO_8859_1_InputStream_encoded tests do decode it wrongly as they use UTF-8)
@gnodet gnodet merged commit 300a4e4 into master Apr 7, 2023
@slawekjaranowski slawekjaranowski deleted the ISSUE-242 branch May 20, 2024 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants