Skip to content

Commit

Permalink
lib: enhance client reader resume + rewind
Browse files Browse the repository at this point in the history
- update client reader documentation
- client reader, add rewind capabilities
    - tell creader to rewind on next start
    - Curl_client_reset() will keep reader for future rewind if requested
    - add Curl_client_cleanup() for freeing all resources independent of
      rewinds
    - add Curl_client_start() to trigger rewinds
    - move rewind code from multi.c to sendf.c and make part of
      "cr-in"'s implementation
- http, move the "resume_from" handling into the client readers
    - the setup of a HTTP request is reshuffled to follow:
      * determine method, target, auth negotiation
      * install the client reader(s) for the request, including crlf
        conversions and "chunked" encoding
      * apply ranges to client reader
      * concat request headers, upgrades, cookies, etc.
      * complete request by determining Content-Length of installed
        readers in combination with method
      * send
    - add methods for client readers to
      * return the overall length they will generate (or -1 when unknown)
      * return the amount of data on the CLIENT level, so that
        expect-100 can decide if it want to apply itself
      * set a "resume_from" offset or fail if unsupported
    - struct HTTP has become largely empty now
- rename `Client_reader_*` to `Curl_creader_*`

Closes curl#13026
  • Loading branch information
icing authored and bagder committed Mar 5, 2024
1 parent 9c7768c commit 14bcea0
Show file tree
Hide file tree
Showing 15 changed files with 751 additions and 473 deletions.
37 changes: 35 additions & 2 deletions docs/CLIENT-READERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ struct Curl_crtype {
char *buf, size_t blen, size_t *nread, bool *eos);
void (*do_close)(struct Curl_easy *data, struct Curl_creader *reader);
bool (*needs_rewind)(struct Curl_easy *data, struct Curl_creader *reader);
curl_off_t (*total_length)(struct Curl_easy *data,
struct Curl_creader *reader);
CURLcode (*resume_from)(struct Curl_easy *data,
struct Curl_creader *reader, curl_off_t offset);
CURLcode (*rewind)(struct Curl_easy *data, struct Curl_creader *reader);
};
struct Curl_creader {
Expand Down Expand Up @@ -80,14 +85,42 @@ Implemented in `sendf.c` for phase `CURL_CR_CLIENT`, this reader get a buffer po

Sometimes it is necessary to send a request with client data again. Transfer handling can inquire via `Curl_client_read_needs_rewind()` if a rewind (e.g. a reset of the client data) is necessary. This asks all installed readers if they need it and give `FALSE` of none does.

## Upload Size

Many protocols need to know the amount of bytes delivered by the client readers in advance. They may invoke `Curl_creader_total_length(data)` to retrieve that. However, not all reader chains know the exact value beforehand. In that case, the call returns `-1` for "unknown".

Even if the length of the "raw" data is known, the length that is send may not. Example: with option `--crlf` the uploaded content undergoes line-end conversion. The line converting reader does not know in advance how many newlines it may encounter. Therefore it must return `-1` for any positive raw content length.

In HTTP, once the correct client readers are installed, the protocol asks the readers for the total length. If that is known, it can set `Content-Length:` accordingly. If not, it may choose to add an HTTP "chunked" reader.

In addition, there is `Curl_creader_client_length(data)` which gives the total length as reported by the reader in phase `CURL_CR_CLIENT` without asking other readers that may transform the raw data. This is useful in estimating the size of an upload. The HTTP protocol uses this to determine if `Expect: 100-continue` shall be done.

## Resuming

Uploads can start at a specific offset, if so requested. The "resume from" that offset. This applies to the reader in phase `CURL_CR_CLIENT` that delivers the "raw" content. Resumption can fail if the installed reader does not support it or if the offset is too large.

The total length reported by the reader changes when resuming. Example: resuming an upload of 100 bytes by 25 reports a total length of 75 afterwards.

If `resume_from()` is invoked twice, it is additive. There is currently no way to undo a resume.

## Rewinding

When a request is retried, installed client readers are discarded and replaced by new ones. This works only if the new readers upload the same data. For many readers, this is not an issue. The "null" reader always does the same. Also the `buf` reader, initialized with the same buffer, does this.

Readers operating on callbacks to the application need to "rewind" the underlying content. For example, when reading from a `FILE*`, the reader needs to `fseek()` to the beginning. The following methods are used:

1. `Curl_creader_needs_rewind(data)`: tells if a rewind is necessary, given the current state of the reader chain. If nothing really has been read so far, this returns `FALSE`.
2. `Curl_creader_will_rewind(data)`: tells if the reader chain rewinds at the start of the next request.
3. `Curl_creader_set_rewind(data, TRUE)`: marks the reader chain for rewinding at the start of the next request.
4. `Curl_client_start(data)`: tells the readers that a new request starts and they need to rewind if requested.


## Summary and Outlook

By adding the client reader interface, any protocol can control how/if it wants the curl transfer to send bytes for a request. The transfer loop becomes then blissfully ignorant of the specifics.

The protocols on the other hand no longer have to care to package data most efficiently. At any time, should more data be needed, it can be read from the client. This is used when sending HTTP requests headers to add as much request body data to the initial sending as there is room for.

Future enhancements based on the client readers:
* delegate the actual "rewinding" to the readers. The should know how it is done, eliminating the `readrewind.c` protocol specifics in `multi.c`.
* `expect-100` handling: place that into a HTTP specific reader at `CURL_CR_PROTOCOL` and eliminate the checks in the generic transfer parts.
* `eos` detection: `upload_done` is partly triggered now by comparing the number of bytes sent to a known size. This is no longer necessary since the core readers obey length restrictions.
* `eos forwarding`: transfer should forward an `eos` flag to the connection filters. Filters like HTTP/2 and HTTP/3 can make use of that, terminating streams early. This would also eliminate length checks in stream handling.
14 changes: 6 additions & 8 deletions lib/c-hyper.c
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ CURLcode Curl_hyper_stream(struct Curl_easy *data,
k->exp100 = EXP100_SEND_DATA;
k->keepon |= KEEP_SEND;
Curl_expire_done(data, EXPIRE_100_TIMEOUT);
infof(data, "Done waiting for 100-continue");
infof(data, "Done waiting for 100-continue after %ldms", (long)ms);
if(data->hyp.exp100_waker) {
hyper_waker_wake(data->hyp.exp100_waker);
data->hyp.exp100_waker = NULL;
Expand Down Expand Up @@ -848,7 +848,9 @@ CURLcode Curl_http(struct Curl_easy *data, bool *done)
may be parts of the request that is not yet sent, since we can deal with
the rest of the request in the PERFORM phase. */
*done = TRUE;
Curl_client_reset(data);
result = Curl_client_start(data);
if(result)
return result;

/* Add collecting of headers written to client. For a new connection,
* we might have done that already, but reuse
Expand Down Expand Up @@ -883,9 +885,9 @@ CURLcode Curl_http(struct Curl_easy *data, bool *done)
return result;
}

result = Curl_http_resume(data, conn, httpreq);
result = Curl_http_req_set_reader(data, httpreq, &te);
if(result)
return result;
goto error;

result = Curl_http_range(data, httpreq);
if(result)
Expand Down Expand Up @@ -1006,10 +1008,6 @@ CURLcode Curl_http(struct Curl_easy *data, bool *done)
goto error;
}

result = Curl_http_body(data, conn, httpreq, &te);
if(result)
goto error;

if(data->state.aptr.host) {
result = Curl_hyper_header(data, headers, data->state.aptr.host);
if(result)
Expand Down
Loading

0 comments on commit 14bcea0

Please sign in to comment.