forked from curl/everything-curl
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
libcurl/url.md: split into sub-pages
- Loading branch information
Showing
12 changed files
with
298 additions
and
141 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,115 +1,16 @@ | ||
# URL API | ||
|
||
Since version 7.62.0, libcurl offers an API for parsing, updating and | ||
generating URLs. Using this, applications can take advantage of using | ||
libcurl's URL parser for its own purposes. By using the same parser, security | ||
problems due to different interpretations can be avoided. | ||
|
||
## Include files | ||
|
||
You'd still only include `<curl/curl.h>` in your code. | ||
|
||
## Create, cleanup, duplicate | ||
|
||
Create a handle that holds URL info and resources: | ||
|
||
CURLU *h = curl_url(); | ||
|
||
When done with it, clean it up: | ||
|
||
curl_url_cleanup(h); | ||
|
||
When you need a copy of a handle, just duplicate it: | ||
|
||
CURLU *nh = curl_url_dup(h); | ||
|
||
## Parse a URL | ||
|
||
rc = curl_url_set(h, CURLUPART_URL, "https://example.com:449/foo/bar?name=moo", 0); | ||
|
||
(The zero in the function call is bitmask for changing specific features.) | ||
|
||
If successful, this stores the URL in its individual parts within the handle. | ||
|
||
## Redirect to a relative URL | ||
|
||
When the handle already has parsed a URL, setting a relative URL will make it | ||
"redirect" to adapt to it. | ||
|
||
rc = curl_url_set(h, CURLUPART_URL, "../test?another", 0); | ||
|
||
## Get a URL | ||
|
||
The `CURLU` handle represents a URL and you can easily extract that: | ||
|
||
char *url; | ||
rc = curl_url_get(h, CURLUPART_URL, &url, 0); | ||
curl_free(url); | ||
|
||
(The zero in the function call is bitmask for changing specific features.) | ||
|
||
## Get individual URL parts | ||
|
||
When a URL has been parsed or parts have been set, you can extract those pieces from the handle at any time. | ||
|
||
rc = curl_url_get(h, CURLUPART_HOST, &host, 0); | ||
rc = curl_url_get(h, CURLUPART_SCHEME, &scheme, 0); | ||
rc = curl_url_get(h, CURLUPART_USER, &user, 0); | ||
rc = curl_url_get(h, CURLUPART_PASSWORD, &password, 0); | ||
rc = curl_url_get(h, CURLUPART_PORT, &port, 0); | ||
rc = curl_url_get(h, CURLUPART_PATH, &path, 0); | ||
rc = curl_url_get(h, CURLUPART_QUERY, &query, 0); | ||
rc = curl_url_get(h, CURLUPART_FRAGMENT, &fragment, 0); | ||
|
||
Extracted parts are not URL decoded unless the user asks for it with the | ||
`CURLU_URLDECODE` flag. | ||
|
||
Remember to free the returned string with `curl_free` when you are done with | ||
it! | ||
|
||
## Set individual URL parts | ||
|
||
A user can opt to set individual parts, either after having parsed a full URL | ||
or instead of parsing such. | ||
|
||
rc = curl_url_set(urlp, CURLUPART_HOST, "www.example.com", 0); | ||
rc = curl_url_set(urlp, CURLUPART_SCHEME, "https", 0); | ||
rc = curl_url_set(urlp, CURLUPART_USER, "john", 0); | ||
rc = curl_url_set(urlp, CURLUPART_PASSWORD, "doe", 0); | ||
rc = curl_url_set(urlp, CURLUPART_PORT, "443", 0); | ||
rc = curl_url_set(urlp, CURLUPART_PATH, "/index.html", 0); | ||
rc = curl_url_set(urlp, CURLUPART_QUERY, "name=john", 0); | ||
rc = curl_url_set(urlp, CURLUPART_FRAGMENT, "anchor", 0); | ||
|
||
Set parts are not URL encoded unless the user asks for it with the | ||
`CURLU_URLENCODE` flag. | ||
|
||
## Append to the query | ||
|
||
An application can append a string to the right end of the query part with the | ||
`CURLU_APPENDQUERY` flag. | ||
|
||
Imagine a handle that holds the URL `https://example.com/?shoes=2`. An | ||
application can then add the string `hat=1` to the query part like this: | ||
|
||
rc = curl_url_set(urlp, CURLUPART_QUERY, "hat=1", CURLU_APPENDQUERY); | ||
|
||
It will even notice the lack of an ampersand (`&`) separator so it will inject | ||
one too, and the handle's full URL would then equal | ||
`https://example.com/?shoes=2&hat=1`. | ||
|
||
The appended string can of course also get URL encoded on add, and if asked, | ||
the encoding will skip the '=' character. For example, append `candy=M&M` to | ||
what we already have, and URL encode it to deal with the ampersand in the | ||
data: | ||
|
||
rc = curl_url_set(urlp, CURLUPART_QUERY, "candy=M&M", CURLU_APPENDQUERY | CURLU_URLENCODE); | ||
|
||
Now the URL looks like `https://example.com/?shoes=2&hat=1&candy=M%26M`. | ||
|
||
## CURLOPT_CURLU | ||
|
||
libcurl 7.63.0 or later allows applications to pass in a `CURLU` handle | ||
instead of a URL string to tell curl what to transfer to or from. This is | ||
particularly convenient for applications that already parse the URL and might | ||
have it stored in such a handle already. | ||
libcurl offers an API for parsing, updating and generating URLs. Using this, | ||
applications can take advantage of using libcurl's URL parser for its own | ||
purposes. By using the same parser, security problems due to different | ||
interpretations can be avoided. | ||
|
||
* [Include files](url/include.md) | ||
* [Create, cleanup, duplicate](url/init.md) | ||
* [Parse a URL](url/parse.md) | ||
* [Redirect to a relative URL](url/redirect.md) | ||
* [Get a URL](url/get.md) | ||
* [Get individual URL parts](url/get-part.md) | ||
* [Set individual URL parts](url/set-part.md) | ||
* [Append to the query](url/append-query.md) | ||
* [`CURLOPT_CURLU`](url/setopt.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Append to the query | ||
|
||
An application can append a string to the right end of the existing query part | ||
with the `CURLU_APPENDQUERY` flag. | ||
|
||
Consider a handle that holds the URL `https://example.com/?shoes=2`. An | ||
application can then add the string `hat=1` to the query part like this: | ||
|
||
rc = curl_url_set(urlp, CURLUPART_QUERY, "hat=1", CURLU_APPENDQUERY); | ||
|
||
It will even notice the lack of an ampersand (`&`) separator so it will inject | ||
one too, and the handle's full URL would then equal | ||
`https://example.com/?shoes=2&hat=1`. | ||
|
||
The appended string can of course also get URL encoded on add, and if asked, | ||
the encoding will skip the '=' character. For example, append `candy=M&M` to | ||
what we already have, and URL encode it to deal with the ampersand in the | ||
data: | ||
|
||
rc = curl_url_set(urlp, CURLUPART_QUERY, "candy=M&M", | ||
CURLU_APPENDQUERY | CURLU_URLENCODE); | ||
|
||
Now the URL looks like `https://example.com/?shoes=2&hat=1&candy=M%26M`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Get individual URL parts | ||
|
||
When a URL has been parsed or individual parts have been set in the `CURLU` | ||
handle, you can extract those pieces again from the handle at any time. | ||
|
||
The second argument to `curl_url_get()` specifies which part you want | ||
extracted. They are all extracted as null-terminated `char *` data, so you | ||
pass a pointer to such a variable. | ||
|
||
char *host; | ||
rc = curl_url_get(h, CURLUPART_HOST, &host, 0); | ||
|
||
char *scheme; | ||
rc = curl_url_get(h, CURLUPART_SCHEME, &scheme, 0); | ||
|
||
char *user; | ||
rc = curl_url_get(h, CURLUPART_USER, &user, 0); | ||
|
||
char *password; | ||
rc = curl_url_get(h, CURLUPART_PASSWORD, &password, 0); | ||
|
||
char *port; | ||
rc = curl_url_get(h, CURLUPART_PORT, &port, 0); | ||
|
||
char *path; | ||
rc = curl_url_get(h, CURLUPART_PATH, &path, 0); | ||
|
||
char *query; | ||
rc = curl_url_get(h, CURLUPART_QUERY, &query, 0); | ||
|
||
char *fragment; | ||
rc = curl_url_get(h, CURLUPART_FRAGMENT, &fragment, 0); | ||
|
||
Remember to free the returned string with `curl_free` when you are done with | ||
it! | ||
|
||
Extracted parts are not URL decoded unless the user asks for it with the | ||
`CURLU_URLDECODE` flag. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Get a URL | ||
|
||
The `CURLU *` handle represents a URL, or at parts of a URL, and you can | ||
easily extract that URL at any point: | ||
|
||
char *url; | ||
rc = curl_url_get(h, CURLUPART_URL, &url, CURLU_NO_DEFAULT_PORT); | ||
curl_free(url); | ||
|
||
If the handle doesn't have enough information to extra a full URL, it will | ||
return error. | ||
|
||
The returned string must be freed with `curl_free()` after you are done with | ||
it. | ||
|
||
The zero in the function call's forth argument is a flag bitmask for changing | ||
specific features. | ||
|
||
## `CURLU_DEFAULT_PORT` | ||
|
||
If the URL handle has no port number stored, this option will make | ||
`curl_url_get()` return the default port for the used scheme. | ||
|
||
## `CURLU_DEFAULT_SCHEME` | ||
|
||
If the handle has no scheme stored, this option will make `curl_url_get()` | ||
return the default scheme instead of error. | ||
|
||
## `CURLU_NO_DEFAULT_PORT` | ||
|
||
Instructs `curl_url_get()` to *not* use a port number in the generated URL if | ||
that port number matches the default port used for the scheme. For example, if | ||
port number 443 is set and the scheme is `https`, the extracted URL will not | ||
include the port number. | ||
|
||
## `CURLU_URLENCODE` | ||
|
||
If set, will make `curl_url_get()` URL encode the host name part when a full | ||
URL is retrieved. If not set (default), libcurl returns the URL with the host | ||
name "raw" to support IDN names to appear as-is. IDN host names are typically | ||
using non-ASCII bytes that otherwise will be percent-encoded. | ||
|
||
Note that even when not asking for URL encoding, the `%` (byte 37) will be URL | ||
encoded in host names to make sure the host name remains valid. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Include files | ||
|
||
You include `<curl/curl.h>` in your code when you want to use the URL API. | ||
|
||
#include <curl/curl.h> | ||
|
||
CURLU *h = curl_url(); | ||
rc = curl_url_set(h, CURLUPART_URL, "ftp://example.com/no/where", 0); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Create, cleanup, duplicate | ||
|
||
The first step when using this API is to create a `CURLU *` handle that holds | ||
URL info and resources. The handle is a reference to an associated data object | ||
that holds information about a single URL and all its different components. | ||
|
||
The API allows you to set or get each URL component separately or as a full | ||
URL. | ||
|
||
Create a URL handle like this: | ||
|
||
CURLU *h = curl_url(); | ||
|
||
When the done with it, clean it up: | ||
|
||
curl_url_cleanup(h); | ||
|
||
When you need a copy of a handle, just duplicate it: | ||
|
||
CURLU *nh = curl_url_dup(h); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# Parse a URL | ||
|
||
You parse a full URL by *setting* the `CURLUPART_URL` part in the handle: | ||
|
||
CURLU *h = curl_url(); | ||
rc = curl_url_set(h, CURLUPART_URL, | ||
"https://example.com:449/foo/bar?name=moo", 0); | ||
|
||
If successful, rc contains `CURLUE_OK` and the different URL components are | ||
held in the handle. It means that the URL was valid as far as libcurl | ||
concerns. | ||
|
||
The function call's forth argument is a bitmask for changing specific | ||
features. You can set none, one more bits in that to alter the parser's | ||
behavior: | ||
|
||
## `CURLU_NON_SUPPORT_SCHEME` | ||
|
||
Makes `curl_url_set()` accept a non-supported scheme. If not set, the only | ||
acceptable schemes are for the protocols libcurl knows and have built-in | ||
support for. | ||
|
||
## `CURLU_URLENCODE` | ||
|
||
Makes the function URL encode the path part if any bytes in it would benefit | ||
from that: like spaces or "control characters". | ||
|
||
## `CURLU_DEFAULT_SCHEME` | ||
|
||
If the passed in string doesn't use a scheme, assume that the default one was | ||
intended. The default scheme is HTTPS. If this is not set, a URL without a | ||
scheme part will not be accepted as valid. Overrides the `CURLU_GUESS_SCHEME` | ||
option if both are set. | ||
|
||
## `CURLU_GUESS_SCHEME` | ||
|
||
Makes libcurl allow the URL to be set without a scheme and it instead | ||
"guesses" which scheme that was intended based on the host name. If the | ||
outermost sub-domain name matches DICT, FTP, IMAP, LDAP, POP3 or SMTP then | ||
that scheme will be used, otherwise it picks HTTP. Conflicts with the | ||
`CURLU_DEFAULT_SCHEME` option which takes precedence if both are set. | ||
|
||
## `CURLU_NO_AUTHORITY` | ||
|
||
Skips authority checks. The RFC allows individual schemes to omit the host | ||
part (normally the only mandatory part of the authority), but libcurl cannot | ||
know whether this is permitted for custom schemes. Specifying the flag permits | ||
empty authority sections, similar to how file scheme is handled. Really only | ||
usable in combination with `CURLU_NON_SUPPORT_SCHEME`. | ||
|
||
## `CURLU_PATH_AS_IS` | ||
|
||
Makes libcurl skip the normalization of the path. That is the procedure where | ||
curl otherwise removes sequences of dot-slash and dot-dot etc. The same option | ||
used for transfers is called `CURLOPT_PATH_AS_IS`. | ||
|
||
## `CURLU_ALLOW_SPACE` | ||
|
||
Makes the URL parser allow space (ASCII 32) where possible. The URL syntax | ||
does normally not allow spaces anywhere, but they should be encoded as `%20` | ||
or `+`. When spaces are allowed, they are still not allowed in the | ||
scheme. When space is used and allowed in a URL, it will be stored as-is | ||
unless `CURLU_URLENCODE` is also set, which then makes libcurl URL-encode the | ||
space before stored. This affects how the URL will be constructed when | ||
`curl_url_get()` is subsequently used to extract the full URL or individual | ||
parts. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Redirect to a relative URL | ||
|
||
When the handle already has parsed a URL, setting a second relative URL will | ||
make it "redirect" to adapt to it. | ||
|
||
Example, first set the original URL then set the one we "redirect" to: | ||
|
||
CURLU *h = curl_url(); | ||
rc = curl_url_set(h, CURLUPART_URL, | ||
"https://example.com/foo/bar?name=moo", 0); | ||
|
||
rc = curl_url_set(h, CURLUPART_URL, "../test?another", 0); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Set individual URL parts | ||
|
||
The API allows the application to set individual parts of a URL held in the | ||
`CURLU` handle, either after having parsed a full URL or instead of parsing | ||
such. | ||
|
||
rc = curl_url_set(urlp, CURLUPART_HOST, "www.example.com", 0); | ||
rc = curl_url_set(urlp, CURLUPART_SCHEME, "https", 0); | ||
rc = curl_url_set(urlp, CURLUPART_USER, "john", 0); | ||
rc = curl_url_set(urlp, CURLUPART_PASSWORD, "doe", 0); | ||
rc = curl_url_set(urlp, CURLUPART_PORT, "443", 0); | ||
rc = curl_url_set(urlp, CURLUPART_PATH, "/index.html", 0); | ||
rc = curl_url_set(urlp, CURLUPART_QUERY, "name=john", 0); | ||
rc = curl_url_set(urlp, CURLUPART_FRAGMENT, "anchor", 0); | ||
|
||
The API always expects a null-terminated `char *` string in the third | ||
argument, or NULL to clear the field. Note that the port number is also | ||
provided as a string this way. | ||
|
||
Set parts are not URL encoded unless the user asks for it with the | ||
`CURLU_URLENCODE` flag in the forth argument. |
Oops, something went wrong.