You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.
When an unrelated configuration (ex semantic_release) has a literal string, such as a regular expression, in the configuration denoted by """, pydocstyle will throw a toml.decoder.TomlDecodeError for an unterminated string. This likely does not happen with every literal string but causes errors when there is a single quote inside the regexp.
My offending config:
# pyproject.toml
[tool.semantic_release]
version_pattern = [
# regular expression to find version value in `_version.py` file'''src/pkg1/_version.py:__version__[ ]*[:=][ ]*["'](\d+\.\d+\.\d+)["']'''
]
[tool.pydocstyle]
convention = 'pep257'
Log
(venv) $ pydocstyle scripts/prepare.py
Traceback (most recent call last):
File "/workspaces/py-rpm/venv/bin/pydocstyle", line 8, in <module>
sys.exit(main())
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/cli.py", line 75, in main
sys.exit(run_pydocstyle())
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/cli.py", line 41, in run_pydocstyle
for (
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 288, in get_files_to_check
config = self._get_config(os.path.abspath(name))
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 369, in _get_config
config = self._get_config_by_discovery(node)
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 318, in _get_config_by_discovery
config = self._get_config(parent_dir)
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 369, in _get_config
config = self._get_config_by_discovery(node)
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 312, in _get_config_by_discovery
config_file = self._get_config_file_in_folder(path)
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 555, in _get_config_file_in_folder
if config.read(full_path) and cls._get_section_name(config):
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/pydocstyle/config.py", line 70, in read
self._config.update(toml.load(fp))
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/toml/decoder.py", line 156, in load
return loads(f.read(), _dict, decoder)
File "/workspaces/py-rpm/venv/lib/python3.8/site-packages/toml/decoder.py", line 362, in loads
raise TomlDecodeError("Unterminated string found."
toml.decoder.TomlDecodeError: Unterminated string found. Reached end of file. (line 121 column 1 char 2619)
Investigation
This seems to be a limitation of the parser implementation and associated TOML standard. I looked at the dependency trees of semantic_release and found that they use the library tomlkit instead of toml because it supports v1.0.0 of the TOML standard instead of v0.5.0. Under the hood, it seems there is a few flaws with the parser in toml==0.5.0 since I can change the regular expression in different variations and get different but not obvious/expected results. One such oddity, inside a the triple single quotes ''' if you have two double quotes " somewhere within it, it will cause an Unterminated string error, but if only one exists it is fine. The other variation that shouldn't work but does, is escaping the double quotes (ie. \") and it is fine.
I also found that the toml library itself is stale and has not received any updates since Oct 2020. Whereas tomlkit and its competitor tomli have both received updates in the 1st half of 2022. Furthermore, python3.11 also highlights these two frontrunners as the ideal libraries to read/write toml in the Python docs. Maybe in a year future you can use the python3.11 built-in library tomllib but clearly that would be incompatible for a few years.
Additional discussion on TOML support for raw/literal strings: toml-lang/toml#80
Recommendation
Switch toml dependency to tomlkit or tomli.
I have tested both of the variations tomli==2.0.1 and tomlkit==0.10.2 and both parse my pyproject.toml configuration file (as provided above) with regex correctly without error. tomlkit does seem to be leading in popularity but the tomli documentation is a bit better. Also of note, tomli.load() requires the file to have been opened for reading in bytes instead of a specified encoding.
Use the built-in `tomllib` module in Python 3.11 and the modern `tomli`
package in older Python versions to read .toml configs instead of
the unmaintained and broken `toml` package.
FixesPyCQA#599FixesPyCQA#600
Use the built-in `tomllib` module in Python 3.11 and the modern `tomli`
package in older Python versions to read .toml configs instead of
the unmaintained and broken `toml` package.
FixesPyCQA#599FixesPyCQA#600
mgorny
added a commit
to mgorny/pydocstyle
that referenced
this issue
Jan 3, 2023
Use the built-in `tomllib` module in Python 3.11 and the modern `tomli`
package in older Python versions to read .toml configs instead of
the unmaintained and broken `toml` package.
FixesPyCQA#599FixesPyCQA#600
Problem
When an unrelated configuration (ex
semantic_release
) has a literal string, such as a regular expression, in the configuration denoted by"""
, pydocstyle will throw atoml.decoder.TomlDecodeError
for an unterminated string. This likely does not happen with every literal string but causes errors when there is a single quote inside the regexp.My offending config:
Log
Investigation
This seems to be a limitation of the parser implementation and associated TOML standard. I looked at the dependency trees of
semantic_release
and found that they use the librarytomlkit
instead oftoml
because it supportsv1.0.0
of the TOML standard instead ofv0.5.0
. Under the hood, it seems there is a few flaws with the parser intoml==0.5.0
since I can change the regular expression in different variations and get different but not obvious/expected results. One such oddity, inside a the triple single quotes'''
if you have two double quotes"
somewhere within it, it will cause an Unterminated string error, but if only one exists it is fine. The other variation that shouldn't work but does, is escaping the double quotes (ie.\"
) and it is fine.I also found that the
toml
library itself is stale and has not received any updates since Oct 2020. Whereastomlkit
and its competitortomli
have both received updates in the 1st half of 2022. Furthermore,python3.11
also highlights these two frontrunners as the ideal libraries to read/write toml in the Python docs. Maybe in a year future you can use thepython3.11
built-in librarytomllib
but clearly that would be incompatible for a few years.Additional discussion on TOML support for raw/literal strings: toml-lang/toml#80
Recommendation
Switch
toml
dependency totomlkit
ortomli
.I have tested both of the variations
tomli==2.0.1
andtomlkit==0.10.2
and both parse mypyproject.toml
configuration file (as provided above) with regex correctly without error.tomlkit
does seem to be leading in popularity but thetomli
documentation is a bit better. Also of note,tomli.load()
requires the file to have been opened for reading in bytes instead of a specified encoding.Related: #599
The text was updated successfully, but these errors were encountered: