Skip to content

Commit

Permalink
Fix a parser bug that some data may be ignored before DOCTYPE
Browse files Browse the repository at this point in the history
HackerOne: HO-1104077

For example, "x<?x y" in "x<?x y\n<!--..." is ignored.

Reported by Juho Nurminen. Thanks!!!
  • Loading branch information
kou authored and mame committed Apr 5, 2021
1 parent 9b311e5 commit 3c137eb
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 8 deletions.
15 changes: 8 additions & 7 deletions lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -195,11 +195,9 @@ def pull_event
return [ :end_document ] if empty?
return @stack.shift if @stack.size > 0
#STDERR.puts @source.encoding
@source.read if @source.buffer.size<2
#STDERR.puts "BUFFER = #{@source.buffer.inspect}"
if @document_status == nil
#@source.consume( /^\s*/um )
word = @source.match( /^((?:\s+)|(?:<[^>]*>))/um )
word = @source.match( /\A((?:\s+)|(?:<[^>]*>))/um )
word = word[1] unless word.nil?
#STDERR.puts "WORD = #{word.inspect}"
case word
Expand Down Expand Up @@ -257,18 +255,16 @@ def pull_event
@stack << [ :end_doctype ]
end
return args
when /^\s+/
when /\A\s+/
else
@document_status = :after_doctype
@source.read if @source.buffer.size<2
md = @source.match(/\s*/um, true)
if @source.encoding == "UTF-8"
@source.buffer.force_encoding(::Encoding::UTF_8)
end
end
end
if @document_status == :in_doctype
md = @source.match(/\s*(.*?>)/um)
md = @source.match(/\A\s*(.*?>)/um)
case md[1]
when SYSTEMENTITY
match = @source.match( SYSTEMENTITY, true )[1]
Expand Down Expand Up @@ -349,7 +345,11 @@ def pull_event
return [ :end_doctype ]
end
end
if @document_status == :after_doctype
@source.match(/\A\s*/um, true)
end
begin
@source.read if @source.buffer.size<2
if @source.buffer[0] == ?<
if @source.buffer[1] == ?/
@nsstack.shift
Expand Down Expand Up @@ -392,6 +392,7 @@ def pull_event
unless md
raise REXML::ParseException.new("malformed XML: missing tag start", @source)
end
@document_status = :in_element
prefixes = Set.new
prefixes << md[2] if md[2]
@nsstack.unshift(curr_ns=Set.new)
Expand Down
19 changes: 19 additions & 0 deletions test/parse/test_processing_instruction.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,25 @@ def test_no_name
<??>
DETAIL
end

def test_garbage_text
# TODO: This should be parse error.
# Create test/parse/test_document.rb or something and move this to it.
doc = parse(<<-XML)
x<?x y
<!--?><?x -->?>
<r/>
XML
pi = doc.children[1]
assert_equal([
"x",
"y\n<!--",
],
[
pi.target,
pi.content,
])
end
end
end
end
1 change: 0 additions & 1 deletion test/parser/test_ultra_light.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ def test_entity_declaration
nil,
[:entitydecl, "name", "value"]
],
[:text, "\n"],
[:start_element, :parent, "root", {}],
[:text, "\n"],
],
Expand Down

0 comments on commit 3c137eb

Please sign in to comment.