Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't swallow errors coming from the shareSession call #2908

Closed
wants to merge 8 commits into from

Conversation

poljar
Copy link
Contributor

@poljar poljar commented Nov 24, 2022

This PR intends to fix element-hq/element-web#23792.

What should have been an easy fix sadly turned out to be a deciphering of the logic here. It seems to me that the ensureOutboundSession() has a bug which would wedge the outbound group session in the case of an error.

This wasn't problematic until now, since almost no errors happen in the prepareSession() call, but as we found out may happen in the shareSession call.

The first patch fixes the mentioned bug, aligning the code with the comments.
The second patch now propagates errors from the shareSession call to the caller.

The individual commits have even more info around what each of them does and means.

I tested this in Element Web, if there's an error in shareSession we get prompted to retry the sending of the event. If the error happened to be transient, a press of the retry button will successfully share the outbound session and send the event.

This closes: element-hq/element-web#23792.


Here's what your changelog entry will look like:

🐛 Bug Fixes

@poljar poljar requested a review from a team as a code owner November 24, 2022 14:15
@richvdh richvdh changed the title Don't swallow errors comming from the shareSession call Don't swallow errors coming from the shareSession call Nov 25, 2022
src/crypto/algorithms/megolm.ts Outdated Show resolved Hide resolved
src/crypto/algorithms/megolm.ts Outdated Show resolved Hide resolved
src/crypto/algorithms/megolm.ts Outdated Show resolved Hide resolved
src/crypto/algorithms/megolm.ts Outdated Show resolved Hide resolved
@poljar poljar requested a review from richvdh November 30, 2022 10:12
@richvdh richvdh removed the request for review from alunturner November 30, 2022 12:56
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@poljar
Copy link
Contributor Author

poljar commented Dec 1, 2022

Force pushed to get rid of the fixup commits.

@github-actions github-actions bot deployed to PR Documentation Preview December 1, 2022 15:18 Active
The ensureOutboundSession users and modifies the setupPromise of the
MegolmEncryption class. Some comments suggest that setupPromise will
always resolve, in other words it should never contain a promise that
will get rejected.

Other comments also seem to suggest that the return value of
ensureOutboundSession, a promise as well, may fail.

The critical error here is that the promise that gets set as
the next setupPromise, as well as the promise that ensureOutboundSession
returns, is the same promise.

It seems that the intention was for setupPromise to contain a promise
that will always resolve to either `null` or `OutboundSessionInfo`.

We can see that a couple of lines before we set setupPromise to its new
value we construct a promise that logs and discards errors using the
`Promise.catch()` method.

The `Promise.catch()` method does not mutate the promise, instead it
returns a new promise. The intention of the original author might have
been to set the next setupPromise to the promise which `Promise.catch()`
produces.

This patch modifies the updating of setupPromise in the
ensureOutboundSession so that setupPromise discards errors correctly.

Using `>>=` to represent the promise chaining operation, setupPromise is
now updated using the following logic:

    setupPromise = previousSetupPromise >>= setup >>= discardErrors
A call to ensureSession() has two steps:
    1. prepareSession(), where an outbound group session might get created
       or rotated
    2. shareSession(), where an outbound group session might get
       encrypted and queued up to be sent to other devices

Both of those calls may mostly fail due to storage errors, yet only the
errors from prepareSession get propagated to the caller.

Errors from prepareSession will mean that you can't get an
outbound group session so you can't encrypt an event.

Errors from shareSession, especially if the error happens in the part
where the to-device requests are queued up to be sent out, mean that
other people will not be able to decrypt the events that will get
encrypted using the outbound group session.

Both of those cases are catastrophic, the second case is just much
harder to debug, since the error happens on another device at some
arbitrary point in the future.

Let's just return the error instead, people can then retry and the
storage issue might have been resolved, or at least the error becomes
visible when it happens.
@poljar
Copy link
Contributor Author

poljar commented Dec 8, 2022

Force pushed to resolve a merge conflict.

What is needed to get this merged? I don't have write access in this repo, if the expectations is that I'm going to do that.

@richvdh
Copy link
Member

richvdh commented Dec 8, 2022

Oh, sorry, I didn't realise you didn't have write access. And now we have lint failures :(.

Force pushed to resolve a merge conflict.

It should be possible to resolve a conflict without force-pushing?

@poljar
Copy link
Contributor Author

poljar commented Dec 8, 2022

It should be possible to resolve a conflict without force-pushing?

Sure, but I posted some fixup commits that required a rebase/autosquash workflow so I continued it. Are rebases frowned upon in this repo, should I stop using them as well as fixup commits?

@richvdh
Copy link
Member

richvdh commented Dec 8, 2022

Rebases after a review make it hard to see what has been changed since the review, so one tends to have to do the review all over again.

@poljar
Copy link
Contributor Author

poljar commented Dec 8, 2022

Rebases after a review make it hard to see what has been changed since the review, so one tends to have to do the review all over again.

I understand this, that's why the fixup commits were used before the approval. Are you re-reviewing my merge-conflict resolution? Should I not use fixup commits for this lint issue now?

@richvdh
Copy link
Member

richvdh commented Dec 8, 2022

@poljar unfortunately this is failing the test-coverage gate :(

@poljar
Copy link
Contributor Author

poljar commented Dec 9, 2022

This has now a unrelated change that I'm trying to merge with #2959.

I'm closing this and going to reopen two separate PRs, one for the fix for setupPromise becoming wedged, and one to stop swallowing the errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failures when sharing outbound session keys are swallowed
2 participants