Skip to content

Commit

Permalink
libcurl/url.md: split into sub-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
bagder committed Dec 16, 2021
1 parent ecb9a6a commit da7b7ca
Show file tree
Hide file tree
Showing 12 changed files with 298 additions and 141 deletions.
9 changes: 9 additions & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,15 @@
* [Post transfer info](libcurl/getinfo.md)
* [Share data between handles](libcurl/sharing.md)
* [URL API](libcurl/url.md)
* [Include files](libcurl/url/include.md)
* [Create, cleanup, duplicate](libcurl/url/init.md)
* [Parse a URL](libcurl/url/parse.md)
* [Redirect to a relative URL](libcurl/url/redirect.md)
* [Get a URL](libcurl/url/get.md)
* [Get individual URL parts](libcurl/url/get-part.md)
* [Set individual URL parts](libcurl/url/set-part.md)
* [Append to the query](libcurl/url/append-query.md)
* [`CURLOPT_CURLU`](libcurl/url/setopt.md)
* [API compatibility](libcurl/api.md)
* [--libcurl](libcurl/--libcurl.md)
* [Header files](libcurl/headers.md)
Expand Down
56 changes: 28 additions & 28 deletions bookindex.md

Large diffs are not rendered by default.

127 changes: 14 additions & 113 deletions libcurl/url.md
Original file line number Diff line number Diff line change
@@ -1,115 +1,16 @@
# URL API

Since version 7.62.0, libcurl offers an API for parsing, updating and
generating URLs. Using this, applications can take advantage of using
libcurl's URL parser for its own purposes. By using the same parser, security
problems due to different interpretations can be avoided.

## Include files

You'd still only include `<curl/curl.h>` in your code.

## Create, cleanup, duplicate

Create a handle that holds URL info and resources:

CURLU *h = curl_url();

When done with it, clean it up:

curl_url_cleanup(h);

When you need a copy of a handle, just duplicate it:

CURLU *nh = curl_url_dup(h);

## Parse a URL

rc = curl_url_set(h, CURLUPART_URL, "https://example.com:449/foo/bar?name=moo", 0);

(The zero in the function call is bitmask for changing specific features.)

If successful, this stores the URL in its individual parts within the handle.

## Redirect to a relative URL

When the handle already has parsed a URL, setting a relative URL will make it
"redirect" to adapt to it.

rc = curl_url_set(h, CURLUPART_URL, "../test?another", 0);

## Get a URL

The `CURLU` handle represents a URL and you can easily extract that:

char *url;
rc = curl_url_get(h, CURLUPART_URL, &url, 0);
curl_free(url);

(The zero in the function call is bitmask for changing specific features.)

## Get individual URL parts

When a URL has been parsed or parts have been set, you can extract those pieces from the handle at any time.

rc = curl_url_get(h, CURLUPART_HOST, &host, 0);
rc = curl_url_get(h, CURLUPART_SCHEME, &scheme, 0);
rc = curl_url_get(h, CURLUPART_USER, &user, 0);
rc = curl_url_get(h, CURLUPART_PASSWORD, &password, 0);
rc = curl_url_get(h, CURLUPART_PORT, &port, 0);
rc = curl_url_get(h, CURLUPART_PATH, &path, 0);
rc = curl_url_get(h, CURLUPART_QUERY, &query, 0);
rc = curl_url_get(h, CURLUPART_FRAGMENT, &fragment, 0);

Extracted parts are not URL decoded unless the user asks for it with the
`CURLU_URLDECODE` flag.

Remember to free the returned string with `curl_free` when you are done with
it!

## Set individual URL parts

A user can opt to set individual parts, either after having parsed a full URL
or instead of parsing such.

rc = curl_url_set(urlp, CURLUPART_HOST, "www.example.com", 0);
rc = curl_url_set(urlp, CURLUPART_SCHEME, "https", 0);
rc = curl_url_set(urlp, CURLUPART_USER, "john", 0);
rc = curl_url_set(urlp, CURLUPART_PASSWORD, "doe", 0);
rc = curl_url_set(urlp, CURLUPART_PORT, "443", 0);
rc = curl_url_set(urlp, CURLUPART_PATH, "/index.html", 0);
rc = curl_url_set(urlp, CURLUPART_QUERY, "name=john", 0);
rc = curl_url_set(urlp, CURLUPART_FRAGMENT, "anchor", 0);

Set parts are not URL encoded unless the user asks for it with the
`CURLU_URLENCODE` flag.

## Append to the query

An application can append a string to the right end of the query part with the
`CURLU_APPENDQUERY` flag.

Imagine a handle that holds the URL `https://example.com/?shoes=2`. An
application can then add the string `hat=1` to the query part like this:

rc = curl_url_set(urlp, CURLUPART_QUERY, "hat=1", CURLU_APPENDQUERY);

It will even notice the lack of an ampersand (`&`) separator so it will inject
one too, and the handle's full URL would then equal
`https://example.com/?shoes=2&hat=1`.

The appended string can of course also get URL encoded on add, and if asked,
the encoding will skip the '=' character. For example, append `candy=M&M` to
what we already have, and URL encode it to deal with the ampersand in the
data:

rc = curl_url_set(urlp, CURLUPART_QUERY, "candy=M&M", CURLU_APPENDQUERY | CURLU_URLENCODE);

Now the URL looks like `https://example.com/?shoes=2&hat=1&candy=M%26M`.

## CURLOPT_CURLU

libcurl 7.63.0 or later allows applications to pass in a `CURLU` handle
instead of a URL string to tell curl what to transfer to or from. This is
particularly convenient for applications that already parse the URL and might
have it stored in such a handle already.
libcurl offers an API for parsing, updating and generating URLs. Using this,
applications can take advantage of using libcurl's URL parser for its own
purposes. By using the same parser, security problems due to different
interpretations can be avoided.

* [Include files](url/include.md)
* [Create, cleanup, duplicate](url/init.md)
* [Parse a URL](url/parse.md)
* [Redirect to a relative URL](url/redirect.md)
* [Get a URL](url/get.md)
* [Get individual URL parts](url/get-part.md)
* [Set individual URL parts](url/set-part.md)
* [Append to the query](url/append-query.md)
* [`CURLOPT_CURLU`](url/setopt.md)
23 changes: 23 additions & 0 deletions libcurl/url/append-query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Append to the query

An application can append a string to the right end of the existing query part
with the `CURLU_APPENDQUERY` flag.

Consider a handle that holds the URL `https://example.com/?shoes=2`. An
application can then add the string `hat=1` to the query part like this:

rc = curl_url_set(urlp, CURLUPART_QUERY, "hat=1", CURLU_APPENDQUERY);

It will even notice the lack of an ampersand (`&`) separator so it will inject
one too, and the handle's full URL would then equal
`https://example.com/?shoes=2&hat=1`.

The appended string can of course also get URL encoded on add, and if asked,
the encoding will skip the '=' character. For example, append `candy=M&M` to
what we already have, and URL encode it to deal with the ampersand in the
data:

rc = curl_url_set(urlp, CURLUPART_QUERY, "candy=M&M",
CURLU_APPENDQUERY | CURLU_URLENCODE);

Now the URL looks like `https://example.com/?shoes=2&hat=1&candy=M%26M`.
38 changes: 38 additions & 0 deletions libcurl/url/get-part.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Get individual URL parts

When a URL has been parsed or individual parts have been set in the `CURLU`
handle, you can extract those pieces again from the handle at any time.

The second argument to `curl_url_get()` specifies which part you want
extracted. They are all extracted as null-terminated `char *` data, so you
pass a pointer to such a variable.

char *host;
rc = curl_url_get(h, CURLUPART_HOST, &host, 0);

char *scheme;
rc = curl_url_get(h, CURLUPART_SCHEME, &scheme, 0);

char *user;
rc = curl_url_get(h, CURLUPART_USER, &user, 0);

char *password;
rc = curl_url_get(h, CURLUPART_PASSWORD, &password, 0);

char *port;
rc = curl_url_get(h, CURLUPART_PORT, &port, 0);

char *path;
rc = curl_url_get(h, CURLUPART_PATH, &path, 0);

char *query;
rc = curl_url_get(h, CURLUPART_QUERY, &query, 0);

char *fragment;
rc = curl_url_get(h, CURLUPART_FRAGMENT, &fragment, 0);

Remember to free the returned string with `curl_free` when you are done with
it!

Extracted parts are not URL decoded unless the user asks for it with the
`CURLU_URLDECODE` flag.
44 changes: 44 additions & 0 deletions libcurl/url/get.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Get a URL

The `CURLU *` handle represents a URL, or at parts of a URL, and you can
easily extract that URL at any point:

char *url;
rc = curl_url_get(h, CURLUPART_URL, &url, CURLU_NO_DEFAULT_PORT);
curl_free(url);

If the handle doesn't have enough information to extra a full URL, it will
return error.

The returned string must be freed with `curl_free()` after you are done with
it.

The zero in the function call's forth argument is a flag bitmask for changing
specific features.

## `CURLU_DEFAULT_PORT`

If the URL handle has no port number stored, this option will make
`curl_url_get()` return the default port for the used scheme.

## `CURLU_DEFAULT_SCHEME`

If the handle has no scheme stored, this option will make `curl_url_get()`
return the default scheme instead of error.

## `CURLU_NO_DEFAULT_PORT`

Instructs `curl_url_get()` to *not* use a port number in the generated URL if
that port number matches the default port used for the scheme. For example, if
port number 443 is set and the scheme is `https`, the extracted URL will not
include the port number.

## `CURLU_URLENCODE`

If set, will make `curl_url_get()` URL encode the host name part when a full
URL is retrieved. If not set (default), libcurl returns the URL with the host
name "raw" to support IDN names to appear as-is. IDN host names are typically
using non-ASCII bytes that otherwise will be percent-encoded.

Note that even when not asking for URL encoding, the `%` (byte 37) will be URL
encoded in host names to make sure the host name remains valid.
8 changes: 8 additions & 0 deletions libcurl/url/include.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Include files

You include `<curl/curl.h>` in your code when you want to use the URL API.

#include <curl/curl.h>

CURLU *h = curl_url();
rc = curl_url_set(h, CURLUPART_URL, "ftp://example.com/no/where", 0);
20 changes: 20 additions & 0 deletions libcurl/url/init.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Create, cleanup, duplicate

The first step when using this API is to create a `CURLU *` handle that holds
URL info and resources. The handle is a reference to an associated data object
that holds information about a single URL and all its different components.

The API allows you to set or get each URL component separately or as a full
URL.

Create a URL handle like this:

CURLU *h = curl_url();

When the done with it, clean it up:

curl_url_cleanup(h);

When you need a copy of a handle, just duplicate it:

CURLU *nh = curl_url_dup(h);
66 changes: 66 additions & 0 deletions libcurl/url/parse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Parse a URL

You parse a full URL by *setting* the `CURLUPART_URL` part in the handle:

CURLU *h = curl_url();
rc = curl_url_set(h, CURLUPART_URL,
"https://example.com:449/foo/bar?name=moo", 0);

If successful, rc contains `CURLUE_OK` and the different URL components are
held in the handle. It means that the URL was valid as far as libcurl
concerns.

The function call's forth argument is a bitmask for changing specific
features. You can set none, one more bits in that to alter the parser's
behavior:

## `CURLU_NON_SUPPORT_SCHEME`

Makes `curl_url_set()` accept a non-supported scheme. If not set, the only
acceptable schemes are for the protocols libcurl knows and have built-in
support for.

## `CURLU_URLENCODE`

Makes the function URL encode the path part if any bytes in it would benefit
from that: like spaces or "control characters".

## `CURLU_DEFAULT_SCHEME`

If the passed in string doesn't use a scheme, assume that the default one was
intended. The default scheme is HTTPS. If this is not set, a URL without a
scheme part will not be accepted as valid. Overrides the `CURLU_GUESS_SCHEME`
option if both are set.

## `CURLU_GUESS_SCHEME`

Makes libcurl allow the URL to be set without a scheme and it instead
"guesses" which scheme that was intended based on the host name. If the
outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then
that scheme will be used, otherwise it picks HTTP. Conflicts with the
`CURLU_DEFAULT_SCHEME` option which takes precedence if both are set.

## `CURLU_NO_AUTHORITY`

Skips authority checks. The RFC allows individual schemes to omit the host
part (normally the only mandatory part of the authority), but libcurl cannot
know whether this is permitted for custom schemes. Specifying the flag permits
empty authority sections, similar to how file scheme is handled. Really only
usable in combination with `CURLU_NON_SUPPORT_SCHEME`.

## `CURLU_PATH_AS_IS`

Makes libcurl skip the normalization of the path. That is the procedure where
curl otherwise removes sequences of dot-slash and dot-dot etc. The same option
used for transfers is called `CURLOPT_PATH_AS_IS`.

## `CURLU_ALLOW_SPACE`

Makes the URL parser allow space (ASCII 32) where possible. The URL syntax
does normally not allow spaces anywhere, but they should be encoded as `%20`
or `+`. When spaces are allowed, they are still not allowed in the
scheme. When space is used and allowed in a URL, it will be stored as-is
unless `CURLU_URLENCODE` is also set, which then makes libcurl URL-encode the
space before stored. This affects how the URL will be constructed when
`curl_url_get()` is subsequently used to extract the full URL or individual
parts.
12 changes: 12 additions & 0 deletions libcurl/url/redirect.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Redirect to a relative URL

When the handle already has parsed a URL, setting a second relative URL will
make it "redirect" to adapt to it.

Example, first set the original URL then set the one we "redirect" to:

CURLU *h = curl_url();
rc = curl_url_set(h, CURLUPART_URL,
"https://example.com/foo/bar?name=moo", 0);

rc = curl_url_set(h, CURLUPART_URL, "../test?another", 0);
21 changes: 21 additions & 0 deletions libcurl/url/set-part.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Set individual URL parts

The API allows the application to set individual parts of a URL held in the
`CURLU` handle, either after having parsed a full URL or instead of parsing
such.

rc = curl_url_set(urlp, CURLUPART_HOST, "www.example.com", 0);
rc = curl_url_set(urlp, CURLUPART_SCHEME, "https", 0);
rc = curl_url_set(urlp, CURLUPART_USER, "john", 0);
rc = curl_url_set(urlp, CURLUPART_PASSWORD, "doe", 0);
rc = curl_url_set(urlp, CURLUPART_PORT, "443", 0);
rc = curl_url_set(urlp, CURLUPART_PATH, "/index.html", 0);
rc = curl_url_set(urlp, CURLUPART_QUERY, "name=john", 0);
rc = curl_url_set(urlp, CURLUPART_FRAGMENT, "anchor", 0);

The API always expects a null-terminated `char *` string in the third
argument, or NULL to clear the field. Note that the port number is also
provided as a string this way.

Set parts are not URL encoded unless the user asks for it with the
`CURLU_URLENCODE` flag in the forth argument.
Loading

0 comments on commit da7b7ca

Please sign in to comment.