Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent issue receiving server events on client when switching cellular network #417

Closed
Naomarik opened this issue Feb 21, 2023 · 9 comments
Assignees
Milestone

Comments

@Naomarik
Copy link

Naomarik commented Feb 21, 2023

I've been unreliably able to replicate an issue that causes client to no longer receive messages from server until page is refreshed.

This has occurred to me intermittently when my iPhone network goes from switching states between:

  • Wifi <-> Cellular
  • 5G <-> 4G (triggered by changing settings in menu)

This has been intermittent for me in that I can get client to stop receiving server messages several times consecutively, but it will also have multiple rounds of working fine.

This problem also occurs on my PC when tethered to iPhone for internet and switching between 4G/5G.

I've also tried switching states on my PC from ethernet <-> wifi but that has worked reliably.

Other info:

  • Sente 1.17.0
  • UAE etisalat network
  • user-id-fn using user id from database
  • site behind cloudflare (update: direct connection to server has same issues)
  • kit using io.undertow/undertow-core 2.3.0.Final
  • I've tested completely disconnecting from internet and reconnecting and it works fine
@Naomarik
Copy link
Author

Some more information from a recently broken connection after swapping between 4G/5G many, many times.

Adding a watch to chsk-state atom to conj state over time ordered by oldest first:

[{:type :ws,
  :open? true,
  :ever-opened? true,
  :csrf-token
  "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
  :uid 17592186045486,
  :handshake-data nil,
  :first-open? true,
  :time 1676982835857}
 {:first-open? false,
  :uid 17592186045486,
  :time 1676982960860,
  :type :ws,
  :handshake-data nil,
  :ever-opened? true,
  :open? false,
  :csrf-token
  "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
  :last-close {:udt 1676982960860, :reason :unexpected},
  :last-ws-close
  {:udt 1676982960860,
   :ev #object[CloseEvent [object CloseEvent]],
   :clean? false,
   :code 1006,
   :reason ""}}
 {:first-open? false,
  :uid 17592186045486,
  :time 1676982960865,
  :udt-next-reconnect 1676982962437,
  :type :ws,
  :handshake-data nil,
  :ever-opened? true,
  :open? false,
  :csrf-token
  "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
  :last-close {:udt 1676982960860, :reason :unexpected},
  :last-ws-close
  {:udt 1676982960860,
   :ev #object[CloseEvent [object CloseEvent]],
   :clean? false,
   :code 1006,
   :reason ""}}
 {:first-open? false,
  :uid 17592186045486,
  :time 1676982965376,
  :type :ws,
  :handshake-data nil,
  :ever-opened? true,
  :open? true,
  :csrf-token
  "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
  :last-close {:udt 1676982960860, :reason :unexpected},
  :last-ws-close
  {:udt 1676982960860,
   :ev #object[CloseEvent [object CloseEvent]],
   :clean? false, 
   :code 1006, 
   :reason ""}}]

Here is the series of events from monitoring :chsk/state

[{:ch-recv #object[CW [object Object]],
  :send-fn #object[g],
  :state
  #object[cljs.core.Atom {:val {:first-open? false, :uid 17592186045486, :type :ws, :handshake-data nil, :ever-opened? true, :open? true, :csrf-token "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW", :last-close {:udt 1676982960860, :reason :unexpected}, :last-ws-close {:udt 1676982960860, :ev #object[CloseEvent [object CloseEvent]], :clean? false, :code 1006, :reason ""}}}],
  :event
  [:chsk/state
   [{:type :auto,
     :open? false,
     :ever-opened? false,
     :csrf-token
     "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW"}
    {:type :ws,
     :open? true,
     :ever-opened? true,
     :csrf-token
     "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
     :uid 17592186045486,
     :handshake-data nil,
     :first-open? true}]],
  :id :chsk/state,
  :?data
  [{:type :auto,
    :open? false,
    :ever-opened? false,
    :csrf-token
    "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW"}
   {:type :ws,
    :open? true,
    :ever-opened? true,
    :csrf-token
    "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
    :uid 17592186045486,
    :handshake-data nil,
    :first-open? true}]}
 {:ch-recv #object[CW [object Object]],
  :send-fn #object[g],
  :state
  #object[cljs.core.Atom {:val {:first-open? false, :uid 17592186045486, :type :ws, :handshake-data nil, :ever-opened? true, :open? true, :csrf-token "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW", :last-close {:udt 1676982960860, :reason :unexpected}, :last-ws-close {:udt 1676982960860, :ev #object[CloseEvent [object CloseEvent]], :clean? false, :code 1006, :reason ""}}}],
  :event
  [:chsk/state
   [{:type :ws,
     :open? true,
     :ever-opened? true,
     :csrf-token
     "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
     :uid 17592186045486,
     :handshake-data nil,
     :first-open? true}
    {:first-open? false,
     :uid 17592186045486,
     :type :ws,
     :handshake-data nil,
     :ever-opened? true,
     :open? false,
     :csrf-token
     "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
     :last-close {:udt 1676982960860, :reason :unexpected},
     :last-ws-close
     {:udt 1676982960860,
      :ev #object[CloseEvent [object CloseEvent]],
      :clean? false,
      :code 1006,
      :reason ""}}]],
  :id :chsk/state,
  :?data
  [{:type :ws,
    :open? true,
    :ever-opened? true,
    :csrf-token
    "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
    :uid 17592186045486,
    :handshake-data nil,
    :first-open? true}
   {:first-open? false,
    :uid 17592186045486,
    :type :ws,
    :handshake-data nil,
    :ever-opened? true,
    :open? false,
    :csrf-token
    "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
    :last-close {:udt 1676982960860, :reason :unexpected},
    :last-ws-close
    {:udt 1676982960860,
     :ev #object[CloseEvent [object CloseEvent]],
     :clean? false,
     :code 1006,
     :reason ""}}]}
 {:ch-recv #object[CW [object Object]],
  :send-fn #object[g],
  :state
  #object[cljs.core.Atom {:val {:first-open? false, :uid 17592186045486, :type :ws, :handshake-data nil, :ever-opened? true, :open? true, :csrf-token "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW", :last-close {:udt 1676982960860, :reason :unexpected}, :last-ws-close {:udt 1676982960860, :ev #object[CloseEvent [object CloseEvent]], :clean? false, :code 1006, :reason ""}}}],
  :event
  [:chsk/state
   [{:first-open? false,
     :uid 17592186045486,
     :type :ws,
     :handshake-data nil,
     :ever-opened? true,
     :open? false,
     :csrf-token
     "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
     :last-close {:udt 1676982960860, :reason :unexpected},
     :last-ws-close
     {:udt 1676982960860,
      :ev #object[CloseEvent [object CloseEvent]],
      :clean? false,
      :code 1006,
      :reason ""}}
    {:first-open? false,
     :uid 17592186045486,
     :udt-next-reconnect 1676982962437,
     :type :ws,
     :handshake-data nil,
     :ever-opened? true,
     :open? false,
     :csrf-token
     "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
     :last-close {:udt 1676982960860, :reason :unexpected},
     :last-ws-close
     {:udt 1676982960860,
      :ev #object[CloseEvent [object CloseEvent]],
      :clean? false,
      :code 1006,
      :reason ""}}]],
  :id :chsk/state,
  :?data
  [{:first-open? false,
    :uid 17592186045486,
    :type :ws,
    :handshake-data nil,
    :ever-opened? true,
    :open? false,
    :csrf-token
    "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
    :last-close {:udt 1676982960860, :reason :unexpected},
    :last-ws-close
    {:udt 1676982960860,
     :ev #object[CloseEvent [object CloseEvent]],
     :clean? false,
     :code 1006,
     :reason ""}}
   {:first-open? false,
    :uid 17592186045486,
    :udt-next-reconnect 1676982962437,
    :type :ws,
    :handshake-data nil,
    :ever-opened? true,
    :open? false,
    :csrf-token
    "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
    :last-close {:udt 1676982960860, :reason :unexpected},
    :last-ws-close
    {:udt 1676982960860,
     :ev #object[CloseEvent [object CloseEvent]],
     :clean? false,
     :code 1006,
     :reason ""}}]}
 {:ch-recv #object[CW [object Object]],
  :send-fn #object[g],
  :state
  #object[cljs.core.Atom {:val {:first-open? false, :uid 17592186045486, :type :ws, :handshake-data nil, :ever-opened? true, :open? true, :csrf-token "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW", :last-close {:udt 1676982960860, :reason :unexpected}, :last-ws-close {:udt 1676982960860, :ev #object[CloseEvent [object CloseEvent]], :clean? false, :code 1006, :reason ""}}}],
  :event
  [:chsk/state
   [{:first-open? false,
     :uid 17592186045486,
     :udt-next-reconnect 1676982962437,
     :type :ws,
     :handshake-data nil,
     :ever-opened? true,
     :open? false,
     :csrf-token
     "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
     :last-close {:udt 1676982960860, :reason :unexpected},
     :last-ws-close
     {:udt 1676982960860,
      :ev #object[CloseEvent [object CloseEvent]],
      :clean? false,
      :code 1006,
      :reason ""}}
    {:first-open? false,
     :uid 17592186045486,
     :type :ws,
     :handshake-data nil,
     :ever-opened? true,
     :open? true,
     :csrf-token
     "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
     :last-close {:udt 1676982960860, :reason :unexpected},
     :last-ws-close
     {:udt 1676982960860,
      :ev #object[CloseEvent [object CloseEvent]],
      :clean? false,
      :code 1006,
      :reason ""}}]],
  :id :chsk/state,
  :?data
  [{:first-open? false,
    :uid 17592186045486,
    :udt-next-reconnect 1676982962437,
    :type :ws,
    :handshake-data nil,
    :ever-opened? true,
    :open? false,
    :csrf-token
    "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
    :last-close {:udt 1676982960860, :reason :unexpected},
    :last-ws-close
    {:udt 1676982960860,
     :ev #object[CloseEvent [object CloseEvent]],
     :clean? false,
     :code 1006,
     :reason ""}}
   {:first-open? false,
    :uid 17592186045486,
    :type :ws,
    :handshake-data nil,
    :ever-opened? true,
    :open? true,
    :csrf-token
    "rvHxShhbjWaa4NGg1gQPTXHwyteF2d5Qvxx/UaH4kSODiyH5/D5iJ0XBzLUnPwuUucf6mHgf0maYM+mW",
    :last-close {:udt 1676982960860, :reason :unexpected},
    :last-ws-close
    {:udt 1676982960860,
     :ev #object[CloseEvent [object CloseEvent]],
     :clean? false, 
     :code 1006, 
     :reason ""}}]}]

@ptaoussanis ptaoussanis self-assigned this Feb 22, 2023
@ptaoussanis
Copy link
Member

@Naomarik Hi Omar, thanks for the detailed info!

Haven't had the opportunity to look at this closely yet, but would like to confirm my understanding of what you've shared so long.

Is the following an accurate description of an example of the behaviour you're seeing?:

  1. Sente is initially working fine. Messages from client->server and server->client are received without issue.

  2. This working state persists until the client device (iPhone) switches networks (e.g. Wifi->Cellular).

  3. At that point it may happen that:
    3a. Messages from client->server stop being received.
    3b. Messages from server->client stop being received.

  4. This broken state persists indefinitely, until the page is refreshed on the client.

My questions:

  • Is the above accurate so far?
  • Does broken state (3) always occur, or just sometimes?
  • Once in broken state (3):
    • What is the client's perception of the connection state? Open? Closed?
    • Do messages sent from the client trigger their timeout as expected?
    • If the client's perception is that the connection is open, does trying (and failing) to send a client->server message change this perception?
    • Do any auto reconnection attempts appear to occur? If so, are they successful or unsuccessful?

There's at least a few different things that could be going on, and the above info should help lead us to the right general area.

@Naomarik
Copy link
Author

Naomarik commented Feb 22, 2023

1-4 is accurate except for client->server DOES continue working.

Once in broken state

  • The client says it's connected
  • Messages from client DO work still, in particular I can get replies back with the ?repl-fn while in broken state.
  • Sending messages from client do NOT fix the server->client issue.
  • Reconnection attempt happens and succeeds every single time, just stop receiving messages from server.

Some additional notes

  • I can force it most often by swapping between 4G/5G until my internet has a minor drop out as monitored by pinging a server while this is going on.
  • This only happens sometimes. After a dropout, it will reconnect and either might resume receiving server messages or fail to.

So to summarize in particular what seems to be broken is the server's send-fn made by make-channel-socket! when choosing a uid to send to. I have already tested both setting an actual user's ID from database and leaving it to the default client id.

@Naomarik
Copy link
Author

Quick update:
I exposed the conns_ atom to a GET request and found this. The 3 clients that have nils all do not receive messages from server. The 4th one is a new tab and receiving messages correctly.

 {:ws
  {17592186046201
   {"e4bbe8b8-82f6-4371-bb7f-65486bd1e40d" "[nil 1677102871278]",
    "53b5c7f9-6249-4157-b661-a515ee1f15dc" "[nil 1677102836257]",
    "2ceff709-7269-4349-bd0f-326763d9a013" "[nil 1677102878342]",
    "5166004f-87cc-499b-a0dd-229561a54277"
    "[#object[io.undertow.websockets.core.protocol.version13.WebSocket13Channel 0x65d0947c \"WebSocket13Channel peer /127.0.0.1:55998 local /127.0.0.1:4000[ No Receiver [] -- [] -- []\"] 1677102883344]"}},
  :ajax {}}

@ptaoussanis
Copy link
Member

ptaoussanis commented Feb 23, 2023

Thanks a lot for the extra info, busy going through your info from here and Telegram.

A work-in-progress summary so far (will keep updated):

  1. Sente initially working fine.
    client->server, server->client calls work fine.

  2. Working state persists until the client device (iPhone) switches networks
    (e.g. Wifi->Cellular).

  3. At that point it MAY (sometimes, not always) enter a broken state.

  4. Broken state persists indefinitely, until the page is refreshed on the client.

While in broken state:

  • server->client calls aren't received by client.
  • But client->server calls continue to work fine.
  • Successful client->server calls don't repair the broken state.
  • Client believe's it's connected.
  • It appears the server's conns_ atom shows nil server channel vals for affected client-ids.

@ptaoussanis ptaoussanis added this to the v1.18 milestone Feb 23, 2023
@Naomarik
Copy link
Author

So this is happening easily by just switching IP.

Connection on first IP. Notice websocket peer address is 49342

2023-02-23 17:43:24,796 [XNIO-3 task-2] INFO  naomarik.proj.web.routes.ws - [:ip-addr 2001:8f8:1d66:c751:4405:3f94:70fb:*7d8]
2023-02-23 17:43:24,798 [XNIO-3 task-2] INFO  proj.sentee - [:debug/on-open 17592186046209 fb891a6e-bf82-4c16-a94d-87e34ee36440 #object[io.undertow.websockets.core.protocol.version13.WebSocket13Channel 0x31892166 WebSocket13Channel peer /[0:0:0:0:0:0:0:1]:**49342** local /[0:0:0:0:0:0:0:1]:4000[ No Receiver [] -- [] -- []]]

Browser tab is still open, computer's internet switches from phone tethering -> home wifi.

Connection on second IP. Notice websocket peer address is 54406

2023-02-23 17:44:39,424 [XNIO-3 task-2] INFO  naomarik.proj.web.routes.ws - [:ip-addr *6.17.244.29]
2023-02-23 17:44:39,427 [XNIO-3 task-2] INFO  proj.sentee - [:debug/on-open 17592186046209 fb891a6e-bf82-4c16-a94d-87e34ee36440 #object[io.undertow.websockets.core.protocol.version13.WebSocket13Channel 0x2a6f60fc WebSocket13Channel peer /127.0.0.1:54406 local /127.0.0.1:4000[ No Receiver [] -- [] -- []]]

Disconnection from client from IP address a bit over a minute later, peer address from first IP 49342

2023-02-23 17:45:44,224 [async-dispatch-1] INFO  proj.sentee - [:DISCONNECTING [fb891a6e-bf82-4c16-a94d-87e34ee36440 #object[io.undertow.websockets.core.protocol.version13.WebSocket13Channel 0x31892166 WebSocket13Channel peer /[0:0:0:0:0:0:0:1]:**49342** local 0.0.0.0/0.0.0.0:4000[ No Receiver [] -- [] -- []]]]

This nils the sch in the [sch udt] vector from the conns_ atom and server->client no longer works. Client->server works fine.

@ptaoussanis
Copy link
Member

Excellent summary and debugging work by @Naomarik:

Client connects to server with a unique client ID. That is used as the primary key for that specific browser tab to identify it. The websocket channel (or sch in sente code) and udt are the key's value, so we have {<client-id [<websocket channel (port assigned to first IP)> udt]}.

The websocket channel maps onto a port on the server of which 65,535 are available. So the websocket channel has a distinct port assigned to each connection. When the IP address changes suddenly, there is no immediate on-close event that gets sent. Instead it times out after about 60 seconds.

At some point in the future, client sente reconnects with the SAME CLIENT ID as before, but with a new ip address. Now we have {<client-id [<websocket channel (port assigned to second IP)> udt]}.

When timeout occurs, the client ID gets dissoc'd, removing the reconnected websocket channel and any hope for the server to initiate contact to its beloved client, but vice versa works fine. Later on, upd-conn! occurs to update the udt in the first arity, so there is a {<client-id [nil <udt>]}.


I'll prepare a fix 👍

@ptaoussanis
Copy link
Member

Will be fixed in forthcoming v1.18 👍 Leaving issue open until the fix is confirmed.

@ptaoussanis
Copy link
Member

Fix has been confirmed by reporter, closing 👍

ptaoussanis added a commit that referenced this issue Mar 7, 2023
…Naomarik)

Huge thanks to @Naomarik for reporting and diagnosing this issue!

BEFORE THIS COMMIT
  The following scenario was possible:

  t0. Client WebSocket connects to server with unique client ID.
      State in `conns_` atom: `{<client-id1> [<ws-sch:port-for-ip1> <udt>]}`.

  t1. Client's IP changes (for example, by switching from wifi to cellular network).

  t2. No `:on-close` event is triggered on server.

  t3. New client WebSocket connects to server with the SAME client ID but new IP.
      State in `conns_` atom: `{<client-id1> [<ws-sch:port-for-ip2> <udt>]}`.

  t4. Server-side connection timeout triggers for ORIGINAL connection (IP1).
      This unintentionally removes sch for the NEW connection (IP2).
      State in `conns_` atom: `{<client-id1> [nil <udt>]}`.

  t5. At this point:
    - The client has a working WebSocket connection to server.
    - But the server `conns_` state is bad (has a nil sch for client).
    - Which means that server->client broadcasts all fail.
    - This broken state persists until if/when something causes the client
      to reconnect. But note that this may not happen anytime soon since
      the client believes (accurately) that it IS successfully connected.

IMPLEMENTATION DETAILS

   Each connection to server establishes the following:
     - An `:on-open`  handler that modifies state in `conns_` for <client-id>
     - An `:on-close` handler that modifies state in `conns_` for <client-id>

   The bad behaviour occurs when:

     - Conn1's delayed `:on-close` triggers after Conn2's `:on-open`.
       Because Conn1 and Conn2 share the same client-id, they're mutating
       the same state.

AFTER THIS COMMIT

  We introduce a simple compare-and-swap (CAS) mechanism in the `:on-close`
  handler so that its state mutations will noop if the current server-ch does
  not = the server-ch added by the corresponding `:on-open` handler.

  I.e. a given `:on-close` will now only remove the SAME server-ch added by
  its corresponding `:on-open`.

  In the example above, this means that the timeout at t4 will noop.
ptaoussanis added a commit that referenced this issue Mar 7, 2023
…Naomarik)

Huge thanks to @Naomarik for reporting and diagnosing this issue!

BEFORE THIS COMMIT
  The following scenario was possible:

  t0. Client WebSocket connects to server with unique client ID.
      State in `conns_` atom: `{<client-id1> [<ws-sch:port-for-ip1> <udt>]}`.

  t1. Client's IP changes (for example, by switching from wifi to cellular network).

  t2. No `:on-close` event is triggered on server.

  t3. New client WebSocket connects to server with the SAME client ID but new IP.
      State in `conns_` atom: `{<client-id1> [<ws-sch:port-for-ip2> <udt>]}`.

  t4. Server-side connection timeout triggers for ORIGINAL connection (IP1).
      This unintentionally removes sch for the NEW connection (IP2).
      State in `conns_` atom: `{<client-id1> [nil <udt>]}`.

  t5. At this point:
    - The client has a working WebSocket connection to server.
    - But the server `conns_` state is bad (has a nil sch for client).
    - Which means that server->client broadcasts all fail.
    - This broken state persists until if/when something causes the client
      to reconnect. But note that this may not happen anytime soon since
      the client believes (accurately) that it IS successfully connected.

IMPLEMENTATION DETAILS

   Each connection to server establishes the following:
     - An `:on-open`  handler that modifies state in `conns_` for <client-id>
     - An `:on-close` handler that modifies state in `conns_` for <client-id>

   The bad behaviour occurs when:

     - Conn1's delayed `:on-close` triggers after Conn2's `:on-open`.
       Because Conn1 and Conn2 share the same client-id, they're mutating
       the same state.

AFTER THIS COMMIT

  We introduce a simple compare-and-swap (CAS) mechanism in the `:on-close`
  handler so that its state mutations will noop if the current server-ch does
  not = the server-ch added by the corresponding `:on-open` handler.

  I.e. a given `:on-close` will now only remove the SAME server-ch added by
  its corresponding `:on-open`.

  In the example above, this means that the timeout at t4 will noop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants