Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: libsubprocess: support flow control on writes via credits #6353

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Oct 8, 2024

Per flux-framework/rfc#427 and discussion in #4572.

Some minor tweaks to the add-credit RPC. I think the design in the RFC assumed multiple channels would return credits at the same time, but I think that is impractical, so it's just credits for each channel.

Haven't tested that this works by adding support in flux-exec yet. Just thought I'd post this initial work. May split off the cleanup into another PR.

Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 88.60759% with 9 lines in your changes missing coverage. Please review.

Project coverage is 83.31%. Comparing base (d4cdf62) to head (d29b711).
Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
src/common/libsubprocess/server.c 64.28% 5 Missing ⚠️
src/common/libsubprocess/local.c 83.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6353      +/-   ##
==========================================
- Coverage   83.32%   83.31%   -0.01%     
==========================================
  Files         523      523              
  Lines       86189    86258      +69     
==========================================
+ Hits        71816    71865      +49     
- Misses      14373    14393      +20     
Files with missing lines Coverage Δ
src/common/libsubprocess/client.c 81.25% <100.00%> (+1.55%) ⬆️
src/common/libsubprocess/fbuf.c 97.07% <100.00%> (+0.18%) ⬆️
src/common/libsubprocess/remote.c 78.92% <100.00%> (+0.58%) ⬆️
src/common/libsubprocess/subprocess.c 88.92% <100.00%> (+0.01%) ⬆️
src/common/libsubprocess/util.c 90.47% <100.00%> (+0.73%) ⬆️
src/common/libsubprocess/local.c 82.80% <83.33%> (+0.06%) ⬆️
src/common/libsubprocess/server.c 78.11% <64.28%> (-0.62%) ⬇️

... and 12 files with indirect coverage changes

@garlick
Copy link
Member

garlick commented Oct 8, 2024

I think the design in the RFC assumed multiple channels would return credits at the same time, but I think that is impractical, so it's just credits for each channel.

In that PR I proposed that an add-credit response payload would be

{
  "type":"add-credit",
  "channels":{
    "stdin":42
  }
}

while I think you are using

{
  "type":"add-credit",
  "channel":{
    "stream":"stdin",
    "bytes":42
  }
}

The former is fewer bytes on the wire and leaves the door open to combine messages for multiple streams, should that be useful. What issue did you run into?

BTW the main use case I was thinking of for combining messages was at initialization (server sends {"channel":BUFSIZE} for all input buffers).

@chu11
Copy link
Member Author

chu11 commented Oct 8, 2024

BTW the main use case I was thinking of for combining messages was at initialization (server sends {"channel":BUFSIZE} for all input buffers).

Lemme see if I can refactor it that way.

@chu11
Copy link
Member Author

chu11 commented Oct 8, 2024

BTW the main use case I was thinking of for combining messages was at initialization (server sends {"channel":BUFSIZE} for all input buffers).

Ahhh tried to refactor and realized why it initially didn't work out. The on_credit() callback is

typedef void (*flux_subprocess_credit_f) (flux_subprocess_t *p,
                                          const char *stream,
                                          int bytes);

i.e. only one stream and "bytes" is reported at a time. For remote subprocesses I just report the initial credits that the local subprocess reports.

Could have a special case flag where local subprocess doesn't report initial credits and remote subprocess could "collate" them in one RPC instead?

@garlick
Copy link
Member

garlick commented Oct 8, 2024

It's a bit unfortunate if we have to wait for the reactor loop to tell us the initial buffer size before sending it to the remote user since we should already know what it is and that adds latency.

Still, is there a reason not to support multiple streams in the protocol even if this server doesn't use it yet? There are other subprocess servers (sdexec).

@chu11
Copy link
Member Author

chu11 commented Oct 9, 2024

It's a bit unfortunate if we have to wait for the reactor loop to tell us the initial buffer size before sending it to the remote user since we should already know what it is and that adds latency.

Perhaps some special casing should be done here. Let me think about it.

Still, is there a reason not to support multiple streams in the protocol even if this server doesn't use it yet? There are other subprocess servers (sdexec).

Fair enough.

@chu11
Copy link
Member Author

chu11 commented Oct 9, 2024

added tweaks to flux-exec that appear to handle the reproducer from #4572, (cat /tmp/achu/100mfile | flux exec -r 0 sed -n 'w /tmp/achu/foo.out') hooray. Gotta work on the edges of all this though (including comments from above).

@garlick
Copy link
Member

garlick commented Oct 9, 2024

Ooh nice, does it work for an even smaller buffer like 4K?

@chu11
Copy link
Member Author

chu11 commented Oct 9, 2024

Ooh nice, does it work for an even smaller buffer like 4K?

Yup, seems to work!

Problem: libsubprocess iostress tests only support "API" or
"direct" writes in tests.  Currently a simple boolean is used in the
tests to determine which write to use.  But in the near future other
options may be available.

Refactor the iostress tests to use an enum rather than boolean to
indicate how subprocess writes should happen.
Problem: In libsubprocess iostress tests, stdin input for tests
is generated in a prep watcher.  However, in future tests the input
will be needed in other callbacks.

Generate the stdin input earlier during test launch time.  It will
be available for all potential tests and callbacks.
Problem: The notify callback variables are all named "cb".  This will
be confusing when a future callback is added.

Rename all notify callback variables from "cb" to "notify_cb".
Problem: It would be useful for developers to have a defined macro
indicating the default stdio or channel buffer size.  But the
macro for the default is hidden in a private header file.

Move SUBPROCESS_DEFAULT_BUFSIZE from a private header to a public
one.
Problem: It would be convenient to be notified when data has
been read from a buffer, so that more can be written into it.

Support a credit callback.  The callback will be called after some
data has been read out of the buffer and inform the caller of
the number of bytes read.

Add unit tests.
Problem: Watcher priorities are only configurable in check
watchers, but would be useful in prepare watchers as well.

Add support for prepare watcher priorities.  Add unit tests.
Problem: It would be convenient for libsubprocess users to know when
write buffers have more space available for writing.  However, there
is no mechanism for that.

Support a new on_credit callback that will inform the user that
space is available in the write buffer.

Fixes flux-framework#6291
Problem: There is no coverage for the new libsubprocess on_credit
callback.

Add unit tests in libsubprocess/test/stdio.c and libsubprocess/test/iostress.c.
Problem: libsubprocess now supports stdin flow control via credits,
but that is not used in flux-exec.

Support credits and flow control in flux-exec to avoid overflowing
the stdin buffer.

Fixes flux-framework#4572
@chu11
Copy link
Member Author

chu11 commented Oct 11, 2024

re-pushed, fixing up a lot of things based on comments above. Most notably, the remote subprocess sends out a single "initial credits" RPC back to the client for all channels. Could use some tests that have more than just the single stdin channel. That's a TODO. But I think I can split out an obvious commit from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants