You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Zulip implementation of tusd involves using a pre-finish hook to create some data structures in our database (used to manage quota, track the messages referencing the file, etc.), verify that image files are thumbnailable, and kick off a worker job to begin thumbnailing images.
The problem I'm seeing is that the tusd protocol allows the client to delete the upload via a DELETE request at any time, including after an upload has completed, and tusd implements that by first removing the file from the backing storage and then notifying the application via the post-terminate hook. A specific case where this occurred involved nginx deciding the client had gone away when tusd was trying to write the response in the finish stage, serving a 499, which appears to have caused uppy to ask to delete/terminate the upload? At least, that's what we can piece together from logs. The end result is that our worker job for thumbnailing images sees a corrupt state where the application database thinks the file exists, but it's been deleted from the backing store by tusd.
I don't see a good way to implement this aspect of the protocol in a way that is safe from races where the application database is in a corrupt state. Options that we've considered include:
Enabling the --disable-terminate option on the tusd command line. This is not ideal, in that I think this will prevent all terminations, including of partially uploaded files? I'm also not sure after some source-diving whether clients like uppy have a way to be configured to support that setting.
Having the post-terminate hook clean up our database structures associated with the upload after tusd has deleted the file. The problem is that this happens after the file is removed from the backing storage, so there is a fundamental race where the database thinks the file exists but it is missing when other application code is trying to process it (for thumbnailing or whatever). This results in file-not-found exceptions that normally would be very scary being a routine/normal thing.
Deferring visible side effects, like queuing the file to be processed for thumbnailing, to the post-finish hook. This might limit our exposure to the race, but it doesn't offer a way to solve the problem that tusd might at any time delete already-uploaded files from the backing storage before adjusting our data structures.
I see the following options for how tusd could support a race-free application implementation:
Recommended: Add a pre-terminate hook that can be used by the application to either reject the deletion request (say, if the user doesn't have permission to delete the file for whatever reason) or indicate to tusd that as part of its processing, it has deleted the file from the backing store, and tusd does not need to do that itself. This would allow us to make that hook use our existing application-layer code to delete uploaded files safely (using 2-phase commit or other standard techniques to avoid invariant violations).
Add an option to disallow tusd deleting/terminated an upload after the pre-finish hook has completed successfully.
But maybe I'm misunderstanding something about the protocol -- is there a way that we can close this race?
The text was updated successfully, but these errors were encountered:
That's a valid point! Access to the uploaded files during the pre-finish hook is protected through tusd's internal locking system. The pre-finish hook is part of the PATCH request handling and the lock is only released once the request handling is completed. Deleting uploads requires acquiring the corresponding locks, which avoids prevents races between pre-finish and termination requestss.
However, post-finish hooks or worker jobs that were started by pre-finish are not protected through these locks, of course. This leads to the situation you observe where a user can delete an upload while your application is trying to process it. Uppy does send a termination request in a few situations. For example, when the user cancels the upload, Uppy terminates the tus upload, as far as I know. You might want to double check with the Uppy project to see if that's correct and whether it can be configured. How a 499 response from Nginx can trigger this though, I don't know.
With all of that in mind, we still need a solution. A pre-terminate hook for giving the user control over file termination is a useful tool, but also defers responsibility of preventing data races to the user, which is not ideal. A more general solution might be better to help in situations where the application is accessing the uploaded files and needs to prevent concurrent access from tusd. I can think of three typical situations where this happens:
Application processes uploaded file and tusd is instructed to terminate the upload (what you are mentioning here)
Application modifies or deleted uploaded file and tusd attempts to read the upload for a concatenation operation
Application wants to cleanup unfinished upload, but user wants to upload at the same time
With a pre-terminate hook (and custom logic) or an option to disable termination of completed uploads, you can prevent 1), but not the others.
I wonder if it's sensible to allow application to interact with tusd's locks. For example, before an application access an uploaded file, it asks tusd to acquire the lock/lease for the corresponding upload resource. This prevents concurrent access from tusd, so no data races are possible. Once the application is done processing, the lock/lease can be released again. This might be a bit more involved, but should cover all cases. What do you think?
The Zulip implementation of
tusd
involves using apre-finish
hook to create some data structures in our database (used to manage quota, track the messages referencing the file, etc.), verify that image files are thumbnailable, and kick off a worker job to begin thumbnailing images.The problem I'm seeing is that the
tusd
protocol allows the client to delete the upload via aDELETE
request at any time, including after an upload has completed, andtusd
implements that by first removing the file from the backing storage and then notifying the application via thepost-terminate
hook. A specific case where this occurred involvednginx
deciding the client had gone away whentusd
was trying to write the response in thefinish
stage, serving a499
, which appears to have causeduppy
to ask to delete/terminate the upload? At least, that's what we can piece together from logs. The end result is that our worker job for thumbnailing images sees a corrupt state where the application database thinks the file exists, but it's been deleted from the backing store bytusd
.I don't see a good way to implement this aspect of the protocol in a way that is safe from races where the application database is in a corrupt state. Options that we've considered include:
--disable-terminate
option on thetusd
command line. This is not ideal, in that I think this will prevent all terminations, including of partially uploaded files? I'm also not sure after some source-diving whether clients likeuppy
have a way to be configured to support that setting.post-terminate
hook clean up our database structures associated with the upload after tusd has deleted the file. The problem is that this happens after the file is removed from the backing storage, so there is a fundamental race where the database thinks the file exists but it is missing when other application code is trying to process it (for thumbnailing or whatever). This results in file-not-found exceptions that normally would be very scary being a routine/normal thing.post-finish
hook. This might limit our exposure to the race, but it doesn't offer a way to solve the problem thattusd
might at any time delete already-uploaded files from the backing storage before adjusting our data structures.I see the following options for how
tusd
could support a race-free application implementation:pre-terminate
hook that can be used by the application to either reject the deletion request (say, if the user doesn't have permission to delete the file for whatever reason) or indicate totusd
that as part of its processing, it has deleted the file from the backing store, andtusd
does not need to do that itself. This would allow us to make that hook use our existing application-layer code to delete uploaded files safely (using 2-phase commit or other standard techniques to avoid invariant violations).tusd
deleting/terminated an upload after thepre-finish
hook has completed successfully.But maybe I'm misunderstanding something about the protocol -- is there a way that we can close this race?
The text was updated successfully, but these errors were encountered: