Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-mature exit callback or delayed data callback #3676

Closed
gireeshpunathil opened this issue Nov 5, 2015 · 8 comments
Closed

Pre-mature exit callback or delayed data callback #3676

gireeshpunathil opened this issue Nov 5, 2015 · 8 comments
Labels
child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests.

Comments

@gireeshpunathil
Copy link
Member

My bug report below may be inaccurate / irrelevant if I know the background of this test case. Can you please tell me the nomenclature of regression test cases? How do I find out the original problem (GH-1899) for which this test case was developed? #1899 seems to be not related?

The test case test/sequential/test-regress-GH-1899.js implements a simple asynchronous child spawn, which prints a hello world and exits. The parent has a child data callback and exit callback. The essence of the test case is to validate that the child exits gracefully, and the data is indeed received in the parent.

This fails in AIX intermittently with assertion failure:

AssertionError: '' == 'hello, world!\n'
at ChildProcess. (test/sequential/test-regress-GH-1899.js:18:10)

The root cause is that the exit callback is called before the data callback.

I would think that in asynchronous mode, the child's data event and exit event may not receive in the parent in the same order. As both data and exit events arrive as async events in the parent's uv loop, the order can be arbitrary, dictated by the OS based on factors outside the control of node.

If I print the fds appearing in the uv loop, I see that they are not in order always (AIX or Linux):
10 12 14 6
10 14 6 12
Here, fd 6 is for the async child exit event. As you can see, it is not deterministic.

Also, with artificial delay in the code, I am able to make the test fail in Linux.

My point is that while child exit happens chronologically later than the data event in the child, this order may not be maintained in the parent: it can be reversed, or it may trigger together, but depending on the order in which they appear in the resultant array of the epoll call, the execution order can differ, and no program should assume the order as it happen in the child.

Please let me know your views.

@bnoordhuis
Copy link
Member

I'm inclined to agree that the test erroneously assumes a particular ordering of events but on the other hand, maybe it's a quality-of-implementation issue that we can address without too much difficulty. Does the patch below improve matters?

diff --git a/deps/uv/src/unix/aix.c b/deps/uv/src/unix/aix.c
index c90b7e5..b129845 100644
--- a/deps/uv/src/unix/aix.c
+++ b/deps/uv/src/unix/aix.c
@@ -100,6 +100,7 @@ void uv__io_poll(uv_loop_t* loop, int timeout) {
   uv__io_t* w;
   uint64_t base;
   uint64_t diff;
+  int have_signals;
   int nevents;
   int count;
   int nfds;
@@ -207,6 +208,7 @@ void uv__io_poll(uv_loop_t* loop, int timeout) {
       goto update_timeout;
     }

+    have_signals = 0;
     nevents = 0;

     assert(loop->watchers != NULL);
@@ -237,10 +239,20 @@ void uv__io_poll(uv_loop_t* loop, int timeout) {
         continue;
       }

-      w->cb(loop, w, pe->revents);
+      /* Run signal watchers last.  This also affects child process watchers
+       * because those are implemented in terms of signal watchers.
+       */
+      if (w == &loop->signal_io_watcher)
+        have_signals = 1;
+      else
+        w->cb(loop, w, pe->revents);
+
       nevents++;
     }

+    if (have_signals != 0)
+      loop->signal_io_watcher.cb(loop, &loop->signal_io_watcher, UV__POLLIN);
+
     loop->watchers[loop->nwatchers] = NULL;
     loop->watchers[loop->nwatchers + 1] = NULL;

@bnoordhuis bnoordhuis added the test Issues and PRs related to the tests. label Nov 5, 2015
@gireeshpunathil
Copy link
Member Author

@bnoordhuis , thanks! this makes perfect sense. I will try this and get back.

@gireeshpunathil
Copy link
Member Author

@bnoordhuis, yes - that resolves the issue: 19 failures out of 1000 runs without the fix, and 2000 runs with no failures with the fix. Thanks for the quick review and patch.

@Fishrock123
Copy link
Contributor

@gireeshpunathil Some of those tests are old enough that they point to issues in the old repo. Specifically the formatting of the naming gives it away, to us that is. Maybe add comments with links pointing to the old repo?

nodejs/node-v0.x-archive#1899

@Fishrock123
Copy link
Contributor

test-regress-GH-<issue#>, without io.

@gireeshpunathil
Copy link
Member Author

@Fishrock123, thanks - that explains!

@mscdex mscdex added the child_process Issues and PRs related to the child_process subsystem. label Nov 5, 2015
@bnoordhuis
Copy link
Member

libuv/libuv#611

@Trott
Copy link
Member

Trott commented Feb 11, 2016

This looks like it's a libuv issue and that there is a pending libuv PR to fix it. I'm going to close this issue here, but if someone feels that it ought to remain open until the libuv fix happens, by all means either re-open it or comment here. (Just doing some issue-tidying for issue-tidying sake.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. test Issues and PRs related to the tests.
Projects
None yet
Development

No branches or pull requests

5 participants