Skip to content

Commit

Permalink
[FIX] tools: remove control characters xml
Browse files Browse the repository at this point in the history
Steps to reproduce:
[account_edi_ubl_cii]
- create an invoice and set a line with on the control character https://unicode-explorer.com/b/0000
- confirm it
- try to print it

Issue:
Ugly Stack Trace

Cause:
XML does not accept such characters
```
        The characters to be escaped are the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML)
        [...] XML processors must accept any character in the range specified for Char:
        `Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]`
        source:https://www.w3.org/TR/xml/
```

opw-3773808

closes odoo#163433

X-original-commit: d06a229
Signed-off-by: William André (wan) <wan@odoo.com>
  • Loading branch information
yosa-odoo committed Apr 26, 2024
1 parent 6a90fee commit f244c5c
Showing 1 changed file with 23 additions and 1 deletion.
24 changes: 23 additions & 1 deletion odoo/tools/xml_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
"""Utilities for generating, parsing and checking XML/XSD files on top of the lxml.etree module."""

import logging
import re
import requests
import zipfile
from io import BytesIO
Expand All @@ -14,6 +15,27 @@
_logger = logging.getLogger(__name__)


def remove_control_characters(byte_node):
"""
The characters to be escaped are the control characters #x0 to #x1F and #x7F (most of which cannot appear in XML)
[...] XML processors must accept any character in the range specified for Char:
`Char :: = #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]`
source:https://www.w3.org/TR/xml/
"""
return re.sub(
'[^'
'\u0009'
'\u000A'
'\u000D'
'\u0020-\uD7FF'
'\uE000-\uFFFD'
'\U00010000-\U0010FFFF'
']'.encode(),
b'',
byte_node,
)


class odoo_resolver(etree.Resolver):
"""Odoo specific file resolver that can be added to the XML Parser.
Expand Down Expand Up @@ -118,7 +140,7 @@ def cleanup_xml_node(xml_node_or_string, remove_blank_text=True, remove_blank_no
if isinstance(xml_node, str):
xml_node = xml_node.encode() # misnomer: fromstring actually reads bytes
if isinstance(xml_node, bytes):
xml_node = etree.fromstring(xml_node)
xml_node = etree.fromstring(remove_control_characters(xml_node))

# Process leaf nodes iteratively
# Depth-first, so any inner node may become a leaf too (if children are removed)
Expand Down

0 comments on commit f244c5c

Please sign in to comment.