-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeouts can not reconnect if event in requested index has been cleared #31
Comments
Oh, that's not good. I wonder what the best way to solve this is. We could go directly for the index in the error, but then risk missing some updates. It seems like fleet just does reconnects with index + 1 until it resyncs. That could cause thousands of requests though.. I guess we could do (error.index - 999), that might work. |
Sorry, I was away for the weekend. Yea I wondered how they were dealing with it. I think your (-999) approach would work but only if something doesn't happen before you get the next request out. I almost feel as though there are two separate cases here. One where you don't care about the stuff in between and you just need to get the watcher back in place and two where you are trying to watch every event as it happens. The second case is sort of a lost cause if the watcher is in this state already. Maybe its best to just make sure the connection is reestablished and then emit a warning event of some kind? |
@stianeikeland Why would this approach miss any updates? If any changes had been made up to the index specified in the bug, etcd would respond with these changes. It should therefore be safe to restart polling from that index. Below is a workaround of the issue I applied in my application. Note that if I detect etcd provided a new index in the error response, I am issuing a new poll immediately by substituting
|
I've now added basic resync to trunk. @tjanczuk I'm thinking about something like the following timeline:
client A is now resynced, but missed the key='value 2' update. Now I just emit a 'resync' event from the watcher, so that you can be sure to check manually for missed values if needed Maybe we could do the following on outdated and cleared index:
Or am I overthinking things? :) |
Published, could you try out version 2.1.5? |
I think you are right; what I proposed above would not address the cases of a race condition between broken network connections and updates on the server. I would suggest slight optimization over what you describe above:
As an aside, a useful feature in node-etcd would be a |
Actually I think I am overthinking it now. If there was an error due to network issues, and the index has become out of scope in the meantime, the next successful request will error with a 401 code, and your case here: https://github.com/stianeikeland/node-etcd/blob/master/src/watcher.coffee#L71-L72 should cover it. So looking at your code it seems everything should work correctly. The one slight optimization that can save a roundtrip to etcd is to also advance the index in case of the empty long poll response (https://github.com/stianeikeland/node-etcd/blob/master/src/watcher.coffee#L75-L76) - this is to save an occasional 401 for infrequently changing values. |
The 2.1.5 appears to perform well for me in a long-running/infrequent-changing scenario. Thanks for the quick turnaround! |
Thanks! Also agree with your optimizations. |
If there are no changes in the directory you are watching and etcd index has changed more than 1000 times the reconnection will continuously fail. This can happen quite often when using Fleet as it writes heartbeats a lot.
I think the reconnection should take the index returned in the 401 error.
The text was updated successfully, but these errors were encountered: