Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: deadlock while racing ipfs dag import and ipfs repo gc #9755

Merged
merged 1 commit into from
Mar 26, 2023

Conversation

Jorropo
Copy link
Contributor

@Jorropo Jorropo commented Mar 25, 2023

This fixes a deadlock introduced in 1457b4f.

We can't use the coreapi here because it will try to take the PinLock (RLock) again, so revert this small part of 1457b4f.

This used cause a deadlock when concurrently running ipfs dag import concurrently with the GC.

The bug is that ipfs dag import takes an RLock with the PinLock. the cars are imported, leaving a wide window of time Then GC Takes a Lock on that same RWMutex while taking the GC Lock (it blocks because it waits for the RLock to be released). Then the car imports are finished and ipfs dag import tries to aqcuire the PinLock (doing an RLock) again in Api().Pin.

However at this point the RWMutex is starved, the runtime put a fence in front of RLocks if a Lock has been waiting for too lock (else you could have an endless stream of RLock / RUnlock forever delaying a Lock to ever go through).

The issue is that ipfs dag import's original RLock which is blocking everyone will be released once it returns, which only happens when Api().Pin completes.

So we have a deadlock (ABA kind ?), because ipfs dag import waits on the GC Lock, which waits on ipfs dag import.

Calling the Pinner directly does not acquire the PinLock again, and thus does not have this issue.

@Jorropo Jorropo added the P0 Critical: Tackled by core team ASAP label Mar 25, 2023
@Jorropo Jorropo self-assigned this Mar 25, 2023
@Jorropo Jorropo added the kind/bug A bug in existing code (including security flaws) label Mar 25, 2023
This fixes a deadlock introduced in 1457b4f.

We can't use the coreapi here because it will try to take the PinLock (RLock) again, so revert this small part of 1457b4f.

This used cause a deadlock when concurrently running `ipfs dag import` concurrently with the GC.

The bug is that `ipfs dag import` takes an RLock with the PinLock.
*the cars are imported, leaving a wide window of time*
Then GC Takes a Lock on that same RWMutex while taking the GC Lock (it  blocks because it waits for the RLock to be released).
Then the car imports are finished and `ipfs dag import` tries to aqcuire the PinLock (doing an RLock) again in `Api().Pin`.

However at this point the RWMutex is starved, the runtime put a fence in front of RLocks if a Lock has been waiting for too lock (else you could have an endless stream of RLock / RUnlock forever delaying a Lock to ever go through).

The issue is that `ipfs dag import`'s original RLock which is blocking everyone will be released once it returns, which only happens when `Api().Pin` completes.

So we have a deadlock (ABA kind ?), because `ipfs dag import` waits on the GC Lock, which waits on `ipfs dag import`.

Calling the Pinner directly does not acquire the PinLock again, and thus does not have this issue.
@Jorropo Jorropo requested review from a team and removed request for a team March 25, 2023 10:38
Copy link
Contributor

@MichaelMure MichaelMure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me as a fix.
Imho though, the real way would be to change CoreApi to allow more control over those things. The current API is quite limiting.

@Jorropo
Copy link
Contributor Author

Jorropo commented Mar 26, 2023

Imho though, the real way would be to change CoreApi to allow more control over those things. The current API is quite limiting.

Without seeing code I think this is a bad idea, how will that manifest takeLock bool argument ?

I think it make sense to have multiple APIs for internal and external things.

There very well could be some better solution I'm missing here tho.

@Jorropo Jorropo merged commit bb020ea into master Mar 26, 2023
@Jorropo Jorropo deleted the fix/dag-import-deadlock branch March 26, 2023 03:40
@MichaelMure
Copy link
Contributor

@Jorropo alternatively, that dag/import code should be integrated inside the CoreApi, kinda like unixfs/adder ? From a quick code search, it seems that dag/import is the only one that stands out outside of the CoreApi perimeter while acting on the pinner (ignoring things like initializeIpnsKeyspace).

Fixing that would allow using properly the internal APIs, take the lock properly ... and let me override pinner and blockstore properly :-D

@Jorropo
Copy link
Contributor Author

Jorropo commented Mar 27, 2023

Make sense, make it take a []io.Reader or some non go-cmds-ipfs iterable interface instead, feel free to send a PR (I would wait until ipfs/boxo#220 and #9746 are merged tho).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) P0 Critical: Tackled by core team ASAP
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants