Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonBacktracking engine is pathologically slow on patterns like "((((((((a)*)*)*)*)*)*)*" and input like "aaaaaaa" which other engines handle fine #84188

Open
stephentoub opened this issue Mar 31, 2023 · 7 comments
Assignees
Labels
area-System.Text.RegularExpressions disabled-test The test is disabled in source code against the issue
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Mar 31, 2023

Build Information

Build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=369767
Build error leg or test failing: System.Text.RegularExpressions.Tests.RegexMatchTests.StressTestDeepNestingOfLoops
Pull request: #90318

Known Issue Error Message

{
  "BuildRetry": false,
  "ErrorMessage": "System.Text.RegularExpressions.Tests.RegexMatchTests.StressTestDeepNestingOfLoops(engine: NonBacktracking",
  "ExcludeConsoleLog": false
}

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0

Known issue validation

Build: 🔎
Result validation: ⚠️ Provided build not found. Provide a valid build in the "Build: 🔎" line.
Validation performed at: 8/22/2023 7:37:08 PM UTC

@stephentoub stephentoub added area-System.Text.RegularExpressions test-bug Problem in test source code (most likely) labels Mar 31, 2023
@stephentoub stephentoub added this to the 8.0.0 milestone Mar 31, 2023
@ghost
Copy link

ghost commented Mar 31, 2023

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

At least it is for me locally.

Author: stephentoub
Assignees: olsaarik
Labels:

area-System.Text.RegularExpressions, test-bug

Milestone: 8.0.0

@danmoseley danmoseley changed the title Outerloop StressTestDeepNestingOfLoops NonBacktracking test is deterministically timing out NonBacktracking engine is pathologically slow on patterns like "((((((((a)*)*)*)*)*)*)*" and input like "aaaaaaa" which other engines handle fine Jun 11, 2023
@danmoseley
Copy link
Member

danmoseley commented Jun 11, 2023

from PR:

It's just yield return new object[] { engine, "(", "a", ")*", "a", 2000, 1000 }; that is pathological.

pattern repetition input repetition Interpreted Compiled NonBacktracking
1 1000 0.01 0.18 0.29
10 1000 0.019 0.32
15 1000 0.01 0.019 1.62
17 1000 0.011 0.019 5.93
20 1000 0.011 0.0012 > 60s
40 1000 0.011 0.001 > 60s
2000 1000 0.39 0.37 > 60s
2000 10 0.39 0.36 10.8
2000 1 0.38 0.37 10.7

On the other two variants, NonBacktracking takes less than a second (but still slower than the others)

@danmoseley
Copy link
Member

cc @olsaarik

@danmoseley
Copy link
Member

danmoseley commented Jun 11, 2023

top of the stack looks something like

                        Thread #7 (OS 0x4650) [Thread pool worker] [Background] [MTA]
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexInfo.Create(Boolean, Boolean, Bool
  ean, Boolean, Boolean, Boolean, Boolean, Boolean)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexInfo.Concat(System.Text.RegularExp
  ressions.Symbolic.SymbolicRegexInfo, System.Text.RegularExpressions.Symbolic.SymbolicRegexInfo)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].CreateConcat(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>, System.Text.Regula
  rExpressions.Symbolic.SymbolicRegexNode`1<UInt64>, System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1<UInt64
  >)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1[[System.UInt64, System.P
  rivate.CoreLib]].CreateConcat(System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1<UInt64>, System.Text.Regula
  rExpressions.Symbolic.SymbolicRegexNode`1<UInt64>)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)
                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)

.... lots of frames ...

                                System.Text.RegularExpressions.Symbolic.SymbolicRegexNode`1[[System.UInt64, System.Priv
  ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder`1<UInt64>)
                                System.Threading.StackHelper+<>c__DisplayClass8_0`2[[System.__Canon, System.Private.Cor
  eLib],[System.__Canon, System.Private.CoreLib]].<CallOnEmptyStack>b__0()
                                System.Threading.Tasks.Task`1[[System.__Canon, System.Private.CoreLib]].InnerInvoke()
                                System.Threading.Tasks.Task+<>c.<.cctor>b__278_0(System.Object)
                                System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread
  , System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
                                System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, S
  ystem.Threading.Thread)
                                System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread)
                                System.Threading.Tasks.Task.ExecuteFromThreadPool(System.Threading.Thread)
                                System.Threading.ThreadPoolWorkQueue.DispatchWorkItem(System.Object, System.Threading.T
  hread)
                                System.Threading.ThreadPoolWorkQueue.Dispatch()
                                System.Threading.PortableThreadPool+WorkerThread.WorkerDoWork(System.Threading.Portable
  ThreadPool, Boolean ByRef)
                                System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
                                System.Threading.Thread+StartHelper.RunWorker()
                                System.Threading.Thread+StartHelper.Run()
                                System.Threading.Thread.StartCallback()
                                [DebuggerU2MCatchHandlerFrame]

Note that stack helper has switched threads to avoid a stack overflow. There appears to be recursion in System.Text.RegularExpressions.Symbolic.SymbolicRegexNode``1[[System.UInt64, System.Priv ate.CoreLib]].StripEffects(System.Text.RegularExpressions.Symbolic.SymbolicRegexBuilder``1<UInt64>) that we should avoid for efficiency's sake, and that may possibly fix the performance problem on its own.

StripEffects seems to be going down the SymbolicRegexNodeKind.Concat branch of the switch. There are other branches. Can the whole thing be made non recursive?

Aside, there are ~20 places where we're protected with StackHelper.CallOnEmptyStack. Do any of those trigger on a reasonable regex of some kind?

@danmoseley danmoseley removed the test-bug Problem in test source code (most likely) label Jun 11, 2023
@danmoseley
Copy link
Member

danmoseley commented Jun 11, 2023

BTW, I think I'd expect to see it in case SymbolicRegexNodeKind.Loop: so perhaps I'm misreading the stack. The structure of the RegexTree is thus:

((((((((((((((((((((a)*)*)*)*)*)*)*)*)*)*)*)*)*)*)*)*)*)*)*)* 

Capture index = 0
  Atomic
    Loop*
      Capture index = 1
        Loop*
          Capture index = 2
            Loop*
              Capture index = 3
                Loop*
                  Capture index = 4
                    Loop*
....
                      Capture index = nnn
                        One 'a'

I'll let the area expert take this further 😄

@stephentoub stephentoub added the disabled-test The test is disabled in source code against the issue label Jun 13, 2023
@steveharter steveharter added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Aug 10, 2023
@steveharter
Copy link
Member

steveharter commented Aug 10, 2023

This is still occurring; https://github.com/dotnet/runtime/pull/87369/files did not address all cases.

Adding known issue template to offload from the too-general-purpose error message in #84323.

@steveharter steveharter modified the milestones: 8.0.0, 9.0.0 Aug 14, 2023
@build-analysis build-analysis bot removed this from the 9.0.0 milestone Nov 15, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 15, 2023
@riarenas riarenas added this to the 9.0.0 milestone Nov 17, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Nov 17, 2023
@jeffschwMSFT jeffschwMSFT removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Feb 16, 2024
@jeffschwMSFT
Copy link
Member

removing blocking-clean-ci as it has not failed in 30 days

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0

@steveharter steveharter removed the Known Build Error Use this to report build issues in the .NET Helix tab label Mar 20, 2024
@stephentoub stephentoub removed this from the 9.0.0 milestone Jun 27, 2024
@stephentoub stephentoub added this to the Future milestone Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Text.RegularExpressions disabled-test The test is disabled in source code against the issue
Projects
None yet
Development

No branches or pull requests

6 participants