Skip to content

Latest commit

 

History

History
120 lines (91 loc) · 3.21 KB

DUO138.md

File metadata and controls

120 lines (91 loc) · 3.21 KB

DUO138

This linter searches for regular expressions that may, under certain inputs, exhibit catastrophic backtracking in the Python re module. For more information on catastrophic backtracking see:

Problematic code

There are many ways that ReDoS can occur. The following examples show ReDoS via nested quantifier and mutually inclusive alternation, respectively:

import re

subject = 'x' * 64
re.search(r'(x+x+)+y', subject)  # Boom
import re

subject = 'a' * 64
re.search(r'(.|[abc])+z', subject)  # Boom

Correct code

The above examples can be corrected with the following changes:

import re

subject = 'x' * 64
re.search(r'xx+y', subject)
import re

subject = 'a' * 64
re.search(r'.+z', subject)

Rarely will there be one-size-fits-all solutions to catastrophic backtracking. Developing a deeper understanding of the issue, and regular expressions in general, is often the best solution. Whenenever you're working with regular expressions you must always ask yourself if this is a potentially catastrophic expression.

Rationale

Catastrophic backtracking will often lead to denial-of-service. Catastrophic cases may take days, weeks, or years to complete which may leave your service degraded or unusable.

Exceptions

  • Nested quantifiers with small maximums may be okay (e.g. {1,3}). However, sensible runtimes for your code are application-dependent. Even with small maximums the runtime will depend on many factors including subject length, machine hardware, and repetition size. Proceed with nested quantifiers at your own risk.

Debugging

The Dlint ReDoS module (dlint.redos) comes with a CLI interface for quick debugging and testing. E.g.

$ python -m dlint.redos --pattern '(.|[abc])+z'
('(.|[abc])+z', True)
$ python -m dlint.redos --pattern '.+z'
('.+z', False)

You can dump the pattern in a more digestible format:

$ python -m dlint.redos --pattern '(.|[abc])+z' --dump
MAX_REPEAT 1 MAXREPEAT
  SUBPATTERN 1 0 0
    BRANCH
      ANY None
    OR
      IN
        LITERAL 97
        LITERAL 98
        LITERAL 99
LITERAL 122

Or dump the internal tree representation that Dlint uses:

$ python -m dlint.redos --pattern '(.|[abc])+z' --dump-tree
None: ()
  MAX_REPEAT: (1, MAXREPEAT)
    SUBPATTERN: (1, 0, 0)
      BRANCH: ()
        ANY: (None,)
        IN: ((LITERAL, 97), (LITERAL, 98), (LITERAL, 99))
  LITERAL: (122,)

You can also extend this functionality to analyze patterns from other regular expression modules or languages by piping a pattern to this interface:

$ echo -n '(.|[abc])+z' | python -m dlint.redos --pattern -
('(.|[abc])+z', True)