Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong line for XML event location in elements following DTD #67

Closed
m-g-sonar opened this issue Dec 17, 2018 · 2 comments
Closed

Wrong line for XML event location in elements following DTD #67

m-g-sonar opened this issue Dec 17, 2018 · 2 comments
Milestone

Comments

@m-g-sonar
Copy link

m-g-sonar commented Dec 17, 2018

Hello,

While updating our static code analyzer, we realized that in some scenario, the locations corresponding to events are wrong. It seems to be a bug from the woodstox engine.

In order to reproduce the issue, use the following XML file:

<?xml version=\"1.0\"?>
<!DOCTYPE menu [
<!--
Some comment
-->
<!ELEMENT menu (modulo)* >
]>
<menu value=\"foo\"></menu>

Important note: On line 3, there is NO spaces between the comment start and the new line (<!--\n).
If you add a space between the last - and the new line (<!-- \n), locations are then correct.

Code I used to reproduce the issue:

package org.foo;

import java.io.Reader;
import java.io.StringReader;

import javax.xml.stream.Location;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;

public class A {

  public static void main(String[] args) throws Exception {
    String input =
        "<?xml version=\"1.0\"?>\n" +
        "<!DOCTYPE menu [\n" +
        "<!--\n" +
        "Some comment\n" +
        "-->\n" +
        "<!ELEMENT menu (modulo)* >\n" +
        "]>\n" +
        "<menu value=\"foo\"></menu>";

    XMLStreamReader xmlReader = getXmlStreamReader(input);

    printLocation(xmlReader);
    while (xmlReader.hasNext()) {
      xmlReader.next();
      printLocation(xmlReader);
    }
  }

  private static XMLStreamReader getXmlStreamReader(String content) throws XMLStreamException {
    Reader reader = new StringReader(content);

    // Force woodstox factory instance
    XMLInputFactory factory = new com.ctc.wstx.stax.WstxInputFactory();

    factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
    factory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
    factory.setProperty(XMLInputFactory.IS_VALIDATING, false);
    return factory.createXMLStreamReader(reader);
  }

  private static void printLocation(XMLStreamReader xmlReader) {
    System.out.println(toString(xmlReader.getEventType(), xmlReader.getLocation()));
  }

  private static String toString(int eventType, Location location) {
    return "[" + eventType + " - " + eventType(eventType) + "] "
        + "{"
        + "L=" + location.getLineNumber() + "; "
        + "C=" + location.getColumnNumber() + "; "
        + "O=" + location.getCharacterOffset()
        + "}";
  }

  private static String eventType(int eventType) {
    switch (eventType) {
    case XMLStreamReader.ENTITY_REFERENCE:
      return "ENTITY_REFERENCE";
    case XMLStreamReader.COMMENT:
      return "COMMENT";
    case XMLStreamReader.PROCESSING_INSTRUCTION:
      return "PROCESSING_INSTRUCTION";
    case XMLStreamReader.CHARACTERS:
      return "CHARACTERS";
    case XMLStreamReader.START_ELEMENT:
      return "START_ELEMENT";
    case XMLStreamConstants.END_ELEMENT:
      return "END_ELEMENT";
    case XMLStreamConstants.CDATA:
      return "CDATA";
    case XMLStreamConstants.DTD:
      return "DTD";
    case XMLStreamReader.END_DOCUMENT:
      return "END_DOCUMENT";
    case XMLStreamReader.START_DOCUMENT:
      return "START_DOCUMENT";
    default:
      return "unknown (" + eventType + ")";
    }
  }
}

When executing the main method, the following lines are printed:

[7 - START_DOCUMENT] {L=1; C=1; O=0}
[11 - DTD] {L=2; C=1; O=22}
[1 - START_ELEMENT] {L=7; C=1; O=91}
[2 - END_ELEMENT] {L=7; C=19; O=109}
[8 - END_DOCUMENT] {L=7; C=26; O=116}

At this point, the 3 last event are wrong, while offset is correct. Line should be 8.
I would have expected:

[7 - START_DOCUMENT] {L=1; C=1; O=0}
[11 - DTD] {L=2; C=1; O=22}
[1 - START_ELEMENT] {L=8; C=1; O=91}
[2 - END_ELEMENT] {L=8; C=19; O=109}
[8 - END_DOCUMENT] {L=8; C=26; O=116}

Related issue: SONARXML-73

I did not manage to identify the issue in the code, being not that much familiar with the XML API. However, if you give me a hint, I may give a try to fix it.

Edit: If I set property XMLInputFactory.SUPPORT_DTD to true instead of false, then locations are correct. Now, I'm not sure if what it means. Does support of DTD means that they are validated?

Cheers,
Michael

@cowtowncoder
Copy link
Member

Thank you for the report! I suspect there may be an issue with linefeed handling within comments inside DTD; location counters need explicit handling (due to multiple LFs XML allows), and it could be possible there is an update missing. Tracking specific location down may be trickier, but it would be somewhere where comments are skipped (within internal DTD subset, neither comments nor PIs can be reported).

@cowtowncoder cowtowncoder added the active Issue being actively investigated label Dec 17, 2018
@cowtowncoder cowtowncoder added test-needed and removed active Issue being actively investigated labels Apr 23, 2020
@cowtowncoder
Copy link
Member

(added "test-needed" label just because automated test would be great -- I realize instructions for writing test are included above)

Orbisman added a commit to Orbisman/woodstox_contrib that referenced this issue Dec 16, 2023
Wrong line for XML event location in elements following DTD
- Added additional comment coverage in TestComments.java
- Added test per Issue FasterXML#67 writeup to TestLocation.java
- Carriage was not being checked so skipped, added check and call to
skipCRLF(c) to advance line number.
@cowtowncoder cowtowncoder added this to the 6.6.0 milestone Dec 21, 2023
cowtowncoder pushed a commit that referenced this issue Dec 21, 2023
cowtowncoder added a commit that referenced this issue Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants