Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup RTDB connection before auth token is fetched #2349

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AaronO
Copy link

@AaronO AaronO commented Nov 9, 2019

A lot of (web) apps follow the following pattern:

  1. App starts / initializes Firebase SDK (app instance)
  2. Authenticates user (e.g: persisted in IndexedDB or otherwise)
  3. Queries the RTDB to fetch data for the now authenticated user

In the current implementation here's what happens:

  1. SDK reads persisted auth from IndexedDB
  2. Makes an API request to Google Identity Toolkit (getAccountInfo) (technically it's 2 https requests, because of CORS)
  3. Once auth token is received the SDK connects to the RTDB (usually via a websocket)
  4. Once the connection is established, any pending queries are now made / sent

Step 3. is actually quite expensive in practice (in my measurements it takes ~500ms, from Europe)
Establishing a wss connection takes 4 roundtrips (1 for TCP syn/ack, 2 for TLS handshake, 1 for HTTP upgrade).

So the time required to establish a (websocket) connection with the RTDB is 4 * RTT, where RTT is the "RoundTrip Time" between the Firebase RTDB node (which seem to be located in us-east/us-central) and the client (in my case RTT ~= 125ms which explains the ~500ms)
(and because each websocket requires it's own TCP socket, it doesn't seem possible to accelerate this by using <link rel="preconnect" ... )

This latency negatively affects the user experience for end-users because it delays their meaningful TTI.

Through my debugging and quick analysis of the code there's no obvious reason as to why we wait to finish the auth before setting up the websocket connection (given that it takes time). I patched our copy of the Firebase SDK to include the above change and there's no immediately obvious downside or issue.

Am I missing something ? Is there a strong reason why we don't establish connections until we obtain auth tokens ?


Note

The time to establish RTDB connections (to s-usc1c-nss-{x}.firebaseio.com) could be further improved without modifying the SDK, through improvements to your backend.

Right now it appears that:

  1. *.firebaseio.com resolves to an anycast IP (35.201.97.85), which is a "dumb" set of TCP loadbalancers/routers
  2. All SSL termination and co happens on the RTDB node(s) in us-central

Above we defined that time = 4 * RTT, (1 TCP, 2 TLS, 1 HTTP upgrade). Of those 4 roundtrips the first 3 could easily be terminated by the loadbalancer (if it had a wildcard cert for *.firebaseio.com), the 4th RTT (HTTP request/response) is more application specific and is probably out of scope for a load balancer.

Moving the first 3 roundtrips to the loadbalancers would reduce the overall time to time = 3 * cheap_RTT + 1 * expensive_RT

These improvements would benefit all firebase (RTDB) users.

A lot of (web) apps follow the following pattern:
1. App starts / initializes Firebase SDK (app instance)
2. Authenticates user (e.g: persisted in IndexedDB or otherwise)
3. Queries the RTDB to fetch data for the now authenticated user

In the current implementation here's what happens:
1. SDK reads persisted auth from IndexedDB
2. Makes an API request to Google Identity Toolkit (`getAccountInfo`) (technically it's 2 https requests, because of CORS)
3. Once auth token is received the SDK connects to the RTDB (usually via a websocket)
4. Once the connection is established, any pending queries are now made / sent

Step `3.` is actually quite expensive in practice (in my measurements it takes ~500ms, from Europe)
Establishing a `wss` connection takes 4 roundtrips (1 for TCP syn/ack, 2 for TLS handshake, 1 for HTTP upgrade).

So the time required to establish a (websocket) connection with the RTDB is `4 * RTT`, where RTT is the "RoundTrip Time" between the Firebase RTDB node (which seem to be located in us-east/us-central) and the client (in my case `RTT ~= 125ms` which explains the `~500ms`)
(and because each websocket requires it's own TCP socket, it doesn't seem possible to accelerate this by using `<link rel="preconnect" ...` )

This latency negatively affects the user experience for end-users because it delays their meaningful TTI.

Through my debugging and quick analysis of the code there's no obvious reason as to why we wait to finish the auth to start setting up the websocket connection (given that it takes time). I patched our copy of the Firebase SDK to include the above change and there's no immediately obvious downside or issue.

Am I missing something ? Is there a strong reason why we don't establish connections until we obtain auth tokens ?

---

## Note

The time to establish RTDB connections (to `s-usc1c-nss-{x}.firebaseio.com`) could be reduced without modifying the SDK, by improvements to your backend.

Right now it appears that:
1. `*.firebaseio.com` resolves to an anycast IP (`35.201.97.85`), which is a "dump" set of TCP load balancers/routers
2. All SSL termination and co happens on the RTDB node(s) in us-central

Above we defined that `time = 4 * RTT`, (1 TCP, 2 TLS, 1 HTTP upgrade). Of those 4 roundtrips the first 3 could easily be terminated by the loadbalancer (if it had a wildcard cert for `*.firebaseio.com`), the 4th RTT (HTTP request/response) is more application specific and is probably out of scope for a load balancer.

Moving the first 3 roundtrips to the loadbalancers would reduce the overall time to `time = 3 * cheap_RTT + 1 * expensive_RT`

These improvements would benefit all firebase (RTDB) users.
@mikelehen
Copy link
Contributor

@AaronO Thanks for bringing this up and taking the initiative to put a PR together.

For a bit of context, when the RTDB code was originally written, auth would typically provide us a cached token immediately and so there were no auth roundtrips we had to wait for. But you're right that today these auth roundtrips are causing unnecessary latency and your suggestion of starting up the WebSocket connection immediately is probably a good idea.

Unfortunately I think your PR will likely introduce a race condition where we may finish connecting to the backend and start sending requests before we have gotten the token back from auth (since this is now happening in parallel), leading to us accidentally sending the requests unauthenticated.

We essentially would need to delay most of our PersistentConnection.onReady() code to wait until the connection is ready and we have received the auth token. In particular, we must not call restoreState() [since it'll send our auth token and then send any queued requests] or set connected_ to true [since that will allow subsequent requests to be immediately sent to the backend].

I'm not sure of the best way to accomplish this while minimizing code churn (in order to minimize risk of breaking something). It is a little tempting to introduce an "onReady" promise that we resolve when the Connection onReady callback is called and then we do Promise.all([onReadyPromise, authTokenPromise]).then(...) or something. But we'd need to be careful to get all the error handling (onDisconnect, onKill, auth token failure, etc.) right. If you want to take another stab at it, feel free. Else we can add this to the backlog though I'm not sure when we will tackle it. Most of our SDK development effort is going into the Cloud Firestore SDK at present.

Thanks again!

Copy link
Contributor

@mikelehen mikelehen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting this out of my "Review requests" list. 😄

Note: I don't work for Google anymore, so somebody else will need to help with this if it moves forward...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants