Prevent SqlConnection from being held past an async boundary #22658

sharwell · 2017-10-11T12:15:13Z

Rather than place a hard cap on the number of connections, I updated the code to ensure database connections are only held for the duration where they are actually in use.

CC: @dotnet/roslyn-ide @heejaechang @CyrusNajmabadi @onyxmaster

Customer scenario

A customer opens a large solution with 15.3 or 15.4. The IDE is at high risk for encountering OOM errors.

Bugs this fixes:

#22650
DevDiv 507560

Workarounds, if any

None

Risk

Relatively low.

Performance impact

For scenarios where database writes to disk are slow, performance is improved by allowing database connection sharing.

Is this a regression from a previous update?

Yes.

Root cause analysis:

As a best guess:

Our drives are faster, so we didn't observe the stalls locally (a local reproducer would mean we could sample the native allocations at runtime to get call stacks, as opposed to looking at a heap dump post-mortem)
The memory allocation impacting the process was a native memory allocation, which we were aware in statistics but did not have insight into the cause (we now know to treat the managed SqlConnection and sqlite3 objects as heavier to account for this, similar to what we already did for MemoryMappedView)

How was the bug found?

The 425600 byte allocation pattern was discovered by Lee Culver on the .NET Fundamentals team. Text contents in some of these blocks made me suspicious of SQLite, leading me to search for "425600 SQLite". The value appeared twice in interesting results:

https://forums.plex.tv/discussion/256008/pms-crashes-randomly-under-load

ERROR - SQLITE3:1B3E069A, 7, failed to allocate 425600 bytes of memory

http://iamoracleadmin.blogspot.com/p/sqlite-cheat-sheet.html

Largest Allocation: 425600 bytes

I then used PerfView to review the number of connections present in the managed heap, and found it exactly matched the number of native allocations.

Test documentation updated?

N/A

heejaechang · 2017-10-11T15:04:55Z

I think the pool must be still upper bounded. there is no reason it is unbounded and cache 100 of connections when most of time 80 of them is just sitting there doing nothing.

heejaechang · 2017-10-11T15:05:57Z

oops button clicked by mistake sorry.

heejaechang · 2017-10-11T15:09:49Z

by the way, how many connections where there? surprised even though connections were held cross async, that lead to that many concurrent connectinos to be needed.

sharwell · 2017-10-11T15:32:25Z

I think the pool must be still upper bounded

The pool has a soft upper bound of the number of managed threads. I was actually originally planning to add a second commit to add a hard limit on the pool itself, but realized there's no longer a code path to do anything interesting there.

The specific limit is the number of threads attempting to access the database concurrently. Given the threading mode, this is a good limit. We could further reduce resources over time by using a MRU and cache eviction, but I don't believe it will get us any further than we made it so far. Plus there seem to be other areas of the SQLite cache that could be improved first.

sharwell · 2017-10-11T15:35:13Z

by the way, how many connections where there?

The issue was found in a heap that had 2740. The screenshot for the issue was another heap with 3429. I've seen OOM heap dumps with ~900 all the way up to a winner of ~7000.

heejaechang · 2017-10-11T16:48:34Z

I still don't see why upper limit is bad? if you really don't want, put crash there so that if limit is upper than we expect, we explicitly crash VS.

heejaechang · 2017-10-11T16:50:47Z

hmmm.. still can't understand how there can be 7000 pooled connections. are we having thread starvation issue again?

sharwell · 2017-10-11T16:53:23Z

@heejaechang There is no need to crash VS since it typically manifests as UI delays, etc. The telemetry team has new red flags in place and will alert us if they see low memory conditions with a large number of these instances. Also, the use of the shared cache flag (which we should talk about separately) should prevent the cache from growing even if the number of connections does.

sharwell · 2017-10-11T16:56:07Z

hmmm.. still can't understand how there can be 7000 pooled connections. are we having thread starvation issue again?

We don't allow reads while writes are pending. Previously the read operations would claim a connection before waiting (asynchronously) for pending operations to complete. Now read operations return the connection to the pool before waiting.

Pilchie · 2017-10-11T16:57:18Z

In other words, when FAR starts a per-document search, if the SyntaxTreeIndex is being written, then there will be a connection per document?

heejaechang · 2017-10-11T17:01:48Z

I don't understand, if we are sure connection won't be pooled to 7000, then we should put that assumption in the code, if we are not sure, we should guard from it. otherwise, we are just hiding.

Pilchie · 2017-10-11T16:58:39Z

src/Workspaces/Core/Desktop/Workspace/SQLite/Interop/SqlConnection.cs

@@ -53,7 +53,7 @@ public static SqlConnection Create(IPersistentStorageFaultInjector faultInjector

            // Enable shared cache so that multiple connections inside of same process share cache
            // see https://sqlite.org/threadsafe.html for more detail
-            var flags = OpenFlags.SQLITE_OPEN_CREATE | OpenFlags.SQLITE_OPEN_READWRITE | OpenFlags.SQLITE_OPEN_SHAREDCACHE;
+            var flags = OpenFlags.SQLITE_OPEN_CREATE | OpenFlags.SQLITE_OPEN_READWRITE | OpenFlags.SQLITE_OPEN_NOMUTEX | OpenFlags.SQLITE_OPEN_SHAREDCACHE;


Why this flag?

This flag says we are only using connection instances on one thread at a time. We were already doing this, but since we didn't specify the flag the database fell back to SQLITE_OPEN_FULLMUTEX which added a second layer of locks.

heejaechang · 2017-10-11T17:04:23Z

current implementation, if we are hit 100 at one point and never hit more than 20 again, the pool still hold onto 100 wasting 80. improvement on pool to do eviction and etc, we should remove upper limit when we implement that. no reason we don't do since we might do the eviction in the future. especially when putting upper limit is so simple.

...

also, I still can't understand there is 7000 connection pooled. are we sure we are not leaking something?

sharwell · 2017-10-11T17:06:24Z

@Pilchie

In other words, when FAR starts a per-document search, if the SyntaxTreeIndex is being written, then there will be a connection per document?

I don't know enough about the reasons why we read from the database to answer this directly. However, I do know two things:

I'd like to improve the overall cache behavior to make reads not dependent on flushing writes to disk
SyntaxTreeIndex has shown up as a red flag recently (SyntaxTreeIndex holding a large working set #22605)

@heejaechang

I don't understand, if we are sure connection won't be pooled to 7000, then we should put that assumption in the code, if we are not sure, we should guard from it. otherwise, we are just hiding.

There are some edge cases which make it very difficult to predict the hard upper bound. I'm willing to implement telemetry and perhaps a time-based pool eviction policy so we are aware of regressions, but to me it's not worth causing a user to crash (coded assertions) or severe performance degradation (SQL connections getting created due to exceeding pool limits).

heejaechang · 2017-10-11T17:08:54Z

severe performance degradation (SQL connections getting created due to exceeding pool limits).

are you sure this cause severe performance degradation. currently we have evidence that pooling cause severe performance degradation - oom, but not the other way?

especially with the change you made here, I think upper limit won't affect perf as much.

heejaechang · 2017-10-11T17:16:32Z

by the way, any pool or cache that doesn't have upper limit is I think code smell.

sharwell · 2017-10-11T17:20:46Z

by the way, any pool or cache that doesn't have upper limit is I think code smell.

I agree. But one step at a time is good, especially on a rush change like this.

onyxmaster · 2017-10-11T17:24:36Z

The nomutex flag should be beneficial in terms of performance and allow for cross-thread connection migration when lock is held on this connection, but is it needed for shared cache scenario?

As for original case -- when developing #22340 I had file logging enabled, and I've seen a large chunk of connections running amok (I had a log file for each pid+thread_id combo on connection open, so I noticed lots of files quickly appearing), but it appears that having a fast CPU/disk prevents process death. I didn't report it in the #22340 because I believed it's the "normal" mode of operation and I had this even without proposed locking changes.

With the connection pool limit, I'd put a cap on a connection pool. Crashing is of course not an option, but logging (not sure what policy Roslyn have for these kinds of errors) + wait queue/reverting to non-pooled approach looks fine to me (yes, it will slow down, and it would be another case to solve, but it's still an improvement over crashing).

sharwell · 2017-10-11T17:36:00Z

The nomutex flag should be beneficial in terms of performance and allow for cross-thread connection migration when lock is held on this connection, but is it needed for shared cache scenario?

Given that we use more than one connection object, it appears nomutex is the behavior we always coded for. I'm not sure why shared cache has been enabled - it appears to be more relevant for embedded applications. However, its primary impact involves cases with multiple writers which appears to be "less common" (though not impossible) with our code.

With the connection pool limit, I'd put a cap on a connection pool.

There is a soft cap in place equivalent to the maximum number of active threads concurrently accessing the database. Given that the threads are generally within the .NET thread pool, this is loosely related to the number of logical processors on the machine, but can vary a bit from that. I would not expect this to ever exceed the number of logical processors times two, and plan to verify this through telemetry.

heejaechang · 2017-10-11T17:46:57Z

shared cache is beneficial to us since it doesn't cause each connection to be so expansive.
https://sqlite.org/sharedcache.html

sharwell · 2017-10-11T17:49:47Z

@heejaechang

shared cache is beneficial to us since it doesn't cause each connection to be so expansive.

Shared cache also causes concurrent write attempts on separate connection objects to block on each other. I'm hoping this is rare but we should be watching to see if performance reports deviate from expectation. For now it is acting as a secondary safeguard against runaway memory problems due to connection pool growing (and for the 15.5 Preview 1 release, will be the only remediation in place).

heejaechang · 2017-10-11T17:50:07Z

by the way, with exclusive mode and shared cache, we might no longer need to wait for writer since only 1 process uses the db and doesn't require file system or mutex to communicate db's state with other process or other connection.

heejaechang · 2017-10-11T17:52:13Z

not sure how write can't be serialized. I think WAL basically does that for us. before it was issue since multiple process can write to db, but I believe that is no longer possible?

heejaechang · 2017-10-11T17:53:00Z

anyway, okay, if other people are okay with not having upper limit.

CyrusNajmabadi · 2017-10-11T19:22:07Z

It put changes in memory cache and then journals and then db.
I guess SQLite doesn't have that kind of mode? I think it might have one.

The problem was simply that it was effectively 100s of K of transactions. Each write by some component into the persistence layer needed the full cost of being handled by sqlite properly. A single transactoin that inserts 10k rows is way better than 10k transactions that insert one row each.

Even with WAL that remains true. If there's a way to configure sqlite to make it the same, then by all means switch :) But i could find nothing that would give the same perf.

--

To repeat: i would love to remove the buffering layer :) it's just added complexity for a perf win. If we can get the perf win without it, be my guest :D

heejaechang · 2017-10-11T20:03:20Z

I searched a bit and looks like what you did was the recommendation. :)

… dotnet/dev15.4.x

sharwell · 2017-10-12T00:15:13Z

@heejaechang Rebased this on dev15.4.x, and included #21532.

@jasonmalinowski This should be the first PR we were after.

heejaechang · 2017-10-12T00:22:33Z

I think we should bring this too (#22340)

sharwell · 2017-10-12T00:54:38Z

@heejaechang After this is merged, we will rebase #22340 and bring it in as a separate change.

onyxmaster · 2017-10-12T04:23:07Z

A word of caution. It appears that the problems I encountered with #22340 are related not to exclusive locking, they but to shared cache.

With shared cache enabled, VS sometimes crashes (stacktrace below), with SQLite returning SQLITE_LOCKED, which is an SQLITE_BUSY shared-cache counterpart. While the busy case is handled in many places in the code, the locked one is not.

Application: devenv.exe
Framework Version: v4.0.30319
Description: The application requested process termination through System.Environment.FailFast(string message).
Message: Microsoft.CodeAnalysis.SQLite.Interop.SqlException: database table is locked: StringInfo1
database table is locked
   at Microsoft.CodeAnalysis.SQLite.Interop.SqlConnection.Throw(sqlite3 handle, Result result)
   at Microsoft.CodeAnalysis.SQLite.SQLitePersistentStorage.TryFetchStringTable(SqlConnection connection)
   at Microsoft.CodeAnalysis.SQLite.SQLitePersistentStorage.BulkPopulateProjectIds(SqlConnection connection, Project project, Boolean fetchStringTable)
   at Microsoft.CodeAnalysis.SQLite.SQLitePersistentStorage.TryGetDocumentDataId(SqlConnection connection, Document document, String name, Int64& dataId)
   at Microsoft.CodeAnalysis.SQLite.SQLitePersistentStorage.DocumentAccessor.TryGetDatabaseId(SqlConnection connection, ValueTuple`2 key, Int64& dataId)
   at Microsoft.CodeAnalysis.SQLite.SQLitePersistentStorage.Accessor`3.<ReadStreamAsync>d__9.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.CodeAnalysis.FindSymbols.SyntaxTreeIndex.<PrecalculatedAsync>d__55.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.CodeAnalysis.FindSymbols.SyntaxTreeIndex.<PrecalculateAsync>d__8.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService.WorkCoordinator.IncrementalAnalyzerProcessor.<>c__DisplayClass32_1`1.<<RunAnalyzersAsync>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService.WorkCoordinator.IncrementalAnalyzerProcessor.<GetOrDefaultAsync>d__34`2.MoveNext()
Stack:
   at System.Environment.FailFast(System.String, System.Exception)
   at Microsoft.CodeAnalysis.FailFast.OnFatalException(System.Exception)
   at Microsoft.CodeAnalysis.ErrorReporting.FatalError.Report(System.Exception, System.Action`1<System.Exception>)
   at Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService+WorkCoordinator+IncrementalAnalyzerProcessor+<GetOrDefaultAsync>d__34`2[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
   at Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService+WorkCoordinator+IncrementalAnalyzerProcessor+<GetOrDefaultAsync>d__34`2[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].Start[[Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService+WorkCoordinator+IncrementalAnalyzerProcessor+<GetOrDefaultAsync>d__34`2[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Microsoft.CodeAnalysis.Features, Version=42.42.42.42, Culture=neutral, PublicKeyToken=31bf3856ad364e35]](<GetOrDefaultAsync>d__34`2<System.__Canon,System.__Canon> ByRef)
   at Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService+WorkCoordinator+IncrementalAnalyzerProcessor.GetOrDefaultAsync[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.__Canon, System.Func`3<System.__Canon,System.Threading.CancellationToken,System.Threading.Tasks.Task`1<System.__Canon>>, System.Threading.CancellationToken)
   at Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService+WorkCoordinator+IncrementalAnalyzerProcessor+<RunAnalyzersAsync>d__32`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService+WorkCoordinator+IncrementalAnalyzerProcessor+<GetOrDefaultAsync>d__34`2[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Microsoft.CodeAnalysis.SolutionCrawler.SolutionCrawlerRegistrationService+WorkCoordinator+IncrementalAnalyzerProcessor+<>c__DisplayClass32_1`1+<<RunAnalyzersAsync>b__0>d[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1[[System.Threading.Tasks.VoidTaskResult, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.Threading.Tasks.VoidTaskResult)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Threading.Tasks.VoidTaskResult, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.Threading.Tasks.VoidTaskResult)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder.SetResult()
   at Microsoft.CodeAnalysis.Editor.Implementation.TodoComments.TodoCommentIncrementalAnalyzer+<AnalyzeSyntaxAsync>d__7.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1[[System.Collections.Immutable.ImmutableArray`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], System.Collections.Immutable, Version=1.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]].TrySetResult(System.Collections.Immutable.ImmutableArray`1<System.__Canon>)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.Collections.Immutable.ImmutableArray`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], System.Collections.Immutable, Version=1.2.1.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a]].SetResult(System.Collections.Immutable.ImmutableArray`1<System.__Canon>)
   at Microsoft.CodeAnalysis.Editor.Implementation.TodoComments.TodoCommentIncrementalAnalyzer+<CreateItemsAsync>d__9.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Microsoft.CodeAnalysis.TextDocumentState+<GetTextAsync>d__30.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Microsoft.CodeAnalysis.TextDocumentState+<GetTextAndVersionAsync>d__37.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Microsoft.CodeAnalysis.RecoverableTextAndVersion+<GetValueAsync>d__14.MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action, Boolean, System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].TrySetResult(System.__Canon)
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].SetResult(System.__Canon)
   at Microsoft.CodeAnalysis.Host.RecoverableWeakValueSource`1+<GetValueAsync>d__16[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.InvokeMoveNext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore+MoveNextRunner.Run()
   at System.Threading.Tasks.AwaitTaskContinuation.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

sharwell · 2017-10-12T11:05:56Z

@onyxmaster Sounds like I need to bring #21800 back as well, and then update it to treat BUSY and LOCKED the same way.

…to dotnet/dev15.4.x Be resilient to getting busy messages back from sqlite while retrieving data.

…r handling

Fixes dotnet#22650

sharwell · 2017-10-12T17:41:08Z

This is now updated to include #21800. I then modified the two locations which handle SQLITE_BUSY to also handle SQLITE_LOCKED as a second return value with the same impact on our code.

CyrusNajmabadi · 2017-10-12T20:09:15Z

src/Workspaces/Core/Desktop/Workspace/SQLite/SQLitePersistentStorage_StringIds.cs

+
+                return true;
+            }
+            catch (SqlException e) when (e.Result == Result.BUSY || e.Result == Result.LOCKED)


I'm a little nervous about this. Do we need to make the change that causes these 'locked' errors to start happening. What gain are we getting to warrant it?

My specific concern is that this now means that many more sql calls may have to be protected from locked errors, where we didn't have to worry about that before. There may be more places that are missing that may end up leading to actual application errors in the wild.

My understanding is the BUSY and LOCKED are similar, and the one getting reported depends on the current connection configuration. The original configuration produced BUSY errors (which we saw in Watson), but the change to use a shared and exclusive locks (required for follow-up work) changed a subset of those cases to return LOCKED instead.

@CyrusNajmabadi I'd like to note that busy states are almost the same as locked, so the argument about unknown places in code that may fail are relevant to the busy.

Update: didn't see Sam's comment before posting, I agree with it.

If they are effectively 1:1, then i'm ok with things. My concern was that 'locked' might show up in different places where 'busy' woudl not have. If you guys are convinced that that's not an issue, then i'm def ok with this then :) Thanks!

onyxmaster · 2017-10-13T04:12:53Z

Minor note: the #22340 no longer proposes the use of exclusive locking (and dependence on shared cache), so at least a part of the discussion might need reconsideration.

natidea · 2017-10-13T20:04:27Z

Approved to merge via 507562

dnfclas added the cla-already-signed label Oct 11, 2017

sharwell requested review from a team and heejaechang October 11, 2017 13:16

heejaechang closed this Oct 11, 2017

heejaechang reopened this Oct 11, 2017

dnfclas added the cla-already-signed label Oct 11, 2017

Pilchie approved these changes Oct 11, 2017

View reviewed changes

Cherry pick pull request dotnet#21532 from heejaechang/exclusiveDB to…

4278640

… dotnet/dev15.4.x

sharwell force-pushed the connection-leak branch from 3db065c to 6c66464 Compare October 12, 2017 00:14

sharwell changed the base branch from master to dev15.4.x October 12, 2017 00:14

heejaechang approved these changes Oct 12, 2017

View reviewed changes

sharwell mentioned this pull request Oct 12, 2017

Configure SQLite locking and journaling #22340

Merged

CyrusNajmabadi and others added 3 commits October 12, 2017 06:11

Cherry pick pull request dotnet#21800 from CyrusNajmabadi/sqliteBusy …

12f318b

…to dotnet/dev15.4.x Be resilient to getting busy messages back from sqlite while retrieving data.

Treat SQLITE_LOCKED equivalent to SQLITE_BUSY for the purpose of erro…

9577061

…r handling

Prevent SqlConnection from being held past an async boundary

8c63dc6

Fixes dotnet#22650

sharwell force-pushed the connection-leak branch from 6c66464 to 8c63dc6 Compare October 12, 2017 11:12

jinujoseph added the Needs Shiproom Approval label Oct 12, 2017

heejaechang mentioned this pull request Oct 12, 2017

RoslynCodeAnalysisService32.exe OutOfMemoryException after updating VS2017 to v15.3 #21580

Closed

CyrusNajmabadi reviewed Oct 12, 2017

View reviewed changes

natidea added Approved to merge and removed Needs Shiproom Approval labels Oct 13, 2017

sharwell changed the title ~~Prevent SqlConnection from being held past an async boundary~~ WIP Prevent SqlConnection from being held past an async boundary Oct 13, 2017

sharwell changed the title ~~WIP Prevent SqlConnection from being held past an async boundary~~ Prevent SqlConnection from being held past an async boundary Oct 13, 2017

sharwell merged commit a7c58d2 into dotnet:dev15.4.x Oct 13, 2017

sharwell deleted the connection-leak branch October 13, 2017 20:35

sharwell added this to the 15.4 milestone Oct 14, 2017

sharwell mentioned this pull request Oct 19, 2017

Sqlite exception is leaking from persistent service #22766

Closed

Prevent SqlConnection from being held past an async boundary #22658

Prevent SqlConnection from being held past an async boundary #22658

Conversation

sharwell commented Oct 11, 2017 • edited Loading

heejaechang commented Oct 11, 2017 • edited Loading

heejaechang commented Oct 11, 2017

heejaechang commented Oct 11, 2017

sharwell commented Oct 11, 2017 • edited Loading

sharwell commented Oct 11, 2017

heejaechang commented Oct 11, 2017

heejaechang commented Oct 11, 2017

sharwell commented Oct 11, 2017 • edited Loading

sharwell commented Oct 11, 2017

Pilchie commented Oct 11, 2017

heejaechang commented Oct 11, 2017

Pilchie Oct 11, 2017

Choose a reason for hiding this comment

sharwell Oct 11, 2017

Choose a reason for hiding this comment

heejaechang commented Oct 11, 2017 • edited Loading

sharwell commented Oct 11, 2017 • edited Loading

heejaechang commented Oct 11, 2017 • edited Loading

heejaechang commented Oct 11, 2017

sharwell commented Oct 11, 2017

onyxmaster commented Oct 11, 2017

sharwell commented Oct 11, 2017

heejaechang commented Oct 11, 2017

sharwell commented Oct 11, 2017

heejaechang commented Oct 11, 2017

heejaechang commented Oct 11, 2017

heejaechang commented Oct 11, 2017

CyrusNajmabadi commented Oct 11, 2017

heejaechang commented Oct 11, 2017

sharwell commented Oct 12, 2017

heejaechang commented Oct 12, 2017

sharwell commented Oct 12, 2017

onyxmaster commented Oct 12, 2017 • edited Loading

sharwell commented Oct 12, 2017

sharwell commented Oct 12, 2017

CyrusNajmabadi Oct 12, 2017

Choose a reason for hiding this comment

sharwell Oct 12, 2017

Choose a reason for hiding this comment

onyxmaster Oct 12, 2017 • edited Loading

Choose a reason for hiding this comment

CyrusNajmabadi Oct 12, 2017 • edited Loading

Choose a reason for hiding this comment

onyxmaster commented Oct 13, 2017 • edited Loading

natidea commented Oct 13, 2017

sharwell commented Oct 11, 2017 •

edited

Loading

heejaechang commented Oct 11, 2017 •

edited

Loading

sharwell commented Oct 11, 2017 •

edited

Loading

sharwell commented Oct 11, 2017 •

edited

Loading

heejaechang commented Oct 11, 2017 •

edited

Loading

sharwell commented Oct 11, 2017 •

edited

Loading

heejaechang commented Oct 11, 2017 •

edited

Loading

onyxmaster commented Oct 12, 2017 •

edited

Loading

onyxmaster Oct 12, 2017 •

edited

Loading

CyrusNajmabadi Oct 12, 2017 •

edited

Loading

onyxmaster commented Oct 13, 2017 •

edited

Loading