You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One may reproduce by prepending the string 'Garbage text\n' to e.g. the beginning of tests/repo1/data/hafez/divan/hafez.divan.perseus-eng1.xml.
The XMLSyntaxError is hidden by the imap_unordered call through the threadpool and presents instead as a MaybeEncodingError because lxml.etree can't pickle its _ListErrorLog. Flattening the parallel iterator to a serial one reveals the underlying issue.
The text was updated successfully, but these errors were encountered:
The problem occurs with general xml parsing failures. E.g. the unrecognized § entity on line 776 of tlg0004.tlg001.perseus-eng1.xml from canonical-greekLit.
rillian
changed the title
Exception on garbage at the start of an xml file.
Exception on invalid xml.
Jun 4, 2019
Yes, this seems like something that would need work. The XML parsing vs. Capitains Parsing is something that has remained in the codebase for a long time. Feel free to propose a fix, including by creating a new exception :)
Some logging output got into my tei files, and hooktest asserts rather than reporting the error:
One may reproduce by prepending the string 'Garbage text\n' to e.g. the beginning of
tests/repo1/data/hafez/divan/hafez.divan.perseus-eng1.xml
.The
XMLSyntaxError
is hidden by theimap_unordered
call through the threadpool and presents instead as aMaybeEncodingError
becauselxml.etree
can't pickle its_ListErrorLog
. Flattening the parallel iterator to a serial one reveals the underlying issue.The text was updated successfully, but these errors were encountered: