diff --git a/vignettes/api-packages.Rmd b/vignettes/api-packages.Rmd index 8cd5d1f1..8fb3b48b 100644 --- a/vignettes/api-packages.Rmd +++ b/vignettes/api-packages.Rmd @@ -419,69 +419,3 @@ rate_limit <- function() { rate_limit() ``` - -## Packaging your code - -### Documentation - -Like any R package, an API client needs clear and complete documentation of all functions. Examples are particularly useful but may need to be wrapped in `\donttest{}` to avoid challenges of authentication, rate limiting, lack of network access, or occasional API server down time. - -One other challenge unique to web API packages is the need to have documentation that clarifies any distinctions between the R implementation and the API generally (e.g., in the "details" section). It can be useful to add reference links to the API documentation to clarify the mapping between R functions and API endpoints. - -### Vignettes - -Vignettes pose additional challenges when an API requires authentication, because you don't want to bundle your own credentials with the package! However, you can take advantage of the fact that the vignette is built locally, and only checked by CRAN with this idea from Jenny Bryan: - -In a setup chunk, do: - -```{r} -NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") -knitr::opts_chunk$set(purl = NOT_CRAN) -``` - -And then use `eval = NOT_CRAN` as needed as a chunk option. - -### Testing - -The same issues and solutions can also be applied to test suites for API clients that require authenticated credentials. For these packages, you may want to have a test suite that is only run locally but not on CRAN. Another solution is to use a continuous integration service such as Travis-CI or Appveyor, which allows the inclusion of encrypted environment variables or encrypted token files. This enables continuous testing in a relatively secure way. - -## Appendix: Storing API Authentication Keys/Tokens - -If your package supports an authentication workflow based on an API key or token, encourage users to store it in an environment variable. We illustrate this using the [`github` R package](https://github.com/cscheid/rgithub), which wraps the Github v3 API. Tailor this template to your API + package and include in `README.md` or a vignette. - -1. Create a personal access token in the [Personal access tokens area](https://github.com/settings/tokens) of your GitHub personal settings. Copy token to the clipboard. -2. Identify your home directory. Not sure? Enter `normalizePath("~/")` in the R console. -3. Create a new text file. If in RStudio, do *File > New File > Text File.* -4. Create a line like this: - -```bash -GITHUB_PAT=blahblahblahblahblahblah -``` - -where the name `GITHUB_PAT` reminds you which API this is for and `blahblahblahblahblahblah` is your personal access token, pasted from the clipboard. - -1. Make sure the last line in the file is empty (if it isn't R will - __silently__ fail to load the file. If you're using an editor that shows - line numbers, there should be two lines, where the second one is empty. - -1. Save in your home directory with the filename `.Renviron`. If questioned, - YES you do want to use a filename that begins with a dot `.`. - - Note that by default [dotfiles](http://linux.about.com/cs/linux101/g/dot_file.htm) - are usually hidden. But within RStudio, the file browser will make - `.Renviron` visible and therefore easy to edit in the future. - -1. Restart R. `.Renviron` is processed only [at the start of an R session](http://stat.ethz.ch/R-manual/R-patched/library/base/html/Startup.html). - -1. Use `Sys.getenv()` to access your token. For example, here's how to use - your `GITHUB_PAT` with the `github` package: - - ```R - library(github) - ctx <- create.github.context(access_token = Sys.getenv("GITHUB_PAT")) - # ... proceed to use other package functions to open issues, etc. - ``` - -FAQ: Why define this environment variable via `.Renviron` instead of in `.bash_profile` or `.bashrc`? - -Because there are many combinations of OS and ways of running R where the `.Renviron` approach "just works" and the bash stuff does not. When R is a child process of, say, Emacs or RStudio, you can't always count on environment variables being passed to R. Put them in an R-specific start-up file and save yourself some grief. diff --git a/vignettes/secrets.Rmd b/vignettes/secrets.Rmd new file mode 100644 index 00000000..ca76d96a --- /dev/null +++ b/vignettes/secrets.Rmd @@ -0,0 +1,217 @@ +--- +title: "Managing secrets" +author: "Hadley Wickham" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Vignette Title} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +This document gives you the basics running on securely managing secrets. Relatively little is directly related to httr, but whenever you're working with an API you likely to have some secrets to handle. + +What is a secret? Some secrets are short alphanumeric sequences: + +* Passwords (e.g. the second argument to `authenticate()`) are + clearly secrets. Passwords are particularly important because people + often use the same password in multiple places. + +* Personal access tokens used by websites like GitHub should be kept secret: + they're equivalent to a user name password combination, but are slightly + safer because you can have multiple tokens for different purposes and + it's easy to invalidate one token without affecting the others. + +* Suprisingly, the "client secret" in an `oauth_app()` is not a secret. + It's not equivalent to a password, and if you're are writing an API wrapper + package, it should be included in the package. (If you don't believe me, + here are [google's comments on the topic][google-secret].) + +Other secrets are files: + +* The JSON web token (jwt) used for server-to-server OAuth + (e.g. [by google](https://developers.google.com/identity/protocols/OAuth2ServiceAccount)) + is a secret because it's equivalent to a personal access token. + +* The `.httr-oauth` file is a secret because it stores OAuth access keys. + +This vignette discusses the challenges of managing these secrets in locally + +This document assumes that the main threat model is accidentally sharing your secrets when you don't want to. Protecting against a committed attacker is much harder and requires more set up at the operating system level (e.g. using full disk encryption) and is outside the scope of this document. Similarly, it's exceedingly difficult to protect secrets if someone can run code on your computer, and this is also out of scope. + +Tradeoffs of convenience vs. security. + +## Locally + +It's usually fine to store secret files in the project directory as long as you take two precautions: + +* Make sure they are listed in `.gitignore` so they don't accidentally + get included in a public repositry. + +* Make sure they are listed in `.Rbuildignore` so they don't accidentally + get included in a public R package. + +The main remaining risk is that you zip up the entire directory to share it with someone else. If you're worried about this scenario, store your secret files outside of the project directory. + +Storing short secrets is harder because it's tempting to record them as a variable as in your R script. This is a bad idea, and instead you either store in an environment variable, or use the keyring package as described below. + +### Ask every time + +For rarely used scripts, one simple solution is to simply ask for the password each time. If you use RStudio an easy and secure way to request a password is: + +```{r, eval = FALSE} +password <- rstudioapi::askForPassword() +``` + +You should __never__ type your password into the R console: this will typically be stored in the `.Rhistory` file, and it's easy to accidentally share without realising it. + +For long or complicated passwords, it's fine to request the password in a regular text box: + +```{r, eval = FALSE} +password <- rstudioapi::showPrompt("Password", "Your API password") +``` + +The only thing that the stars protect you from is someone shoulder surfing. This is generally not worth worrying about. It might be a concern if your user is working in an cafe, but even then if someone really wants to get their password they can video record your fingers and carefully reconstruct. + +If you don't use RStudio, you can use a more general solution like the [getPass](https://github.com/wrathematics/getPass) package. + +### Environment variables + +Environment variables, or __env vars__ for short, are a cross platform way of storing strings outside of the running R process. Env vars are name value pairs stored in a file called `.Renviron` in your home directory. The easiest way to edit it is to run: + +```{r, eval = FALSE} +file.edit("~/.Renviron") +``` + +The file looks something like + +``` +VAR1 = value1 +VAR2 = value2 +``` + +```{r, include = FALSE} +Sys.setenv("VAR1" = "value1") +``` + +```{r} +Sys.getenv("VAR1") +``` + +Note that `.Renviron` is only processed on startup, so you'll need to restart R for any changes to be reflected in your current session. + +Note that these environment variables will be easily accessible by any R process. It's also possible for any other program running on your computer to access that file directly. You can get a little more security by using the system keyring. + +### Keyring + +The [keyring](https://github.com/r-lib/keyring) package provides a cross-platform way to store data in your operation system's secure secure store. + +It has a simple API: + +```{r, eval = FALSE} +keyring::key_set("MY_SECRET") +keyring::key_get("MY_SECRET") +``` + +By default, keyring will use the system keyring which is unlocked. This means that while the password is stored securely pretty much running any process can access it. + +If you really want to be even more secure, you can custom keyring and lock it. Note that accessing the key always unlocks the keyring, so if you're being really careful, make sure to lock it again afterwards. + +```{r, eval = FALSE} +keyring::keyring_create("httr") +keyring::key_set("MY_SECRET", keyring = "httr") +keyring::keyring_lock("httr") +keyring::key_get("MY_SECRET", keyring = "httr") +keyring::keyring_lock("httr") +``` + +## Sharing with others + +By and large, managing secrets on your own computer is straightforward. The challenge comes when you need to share it with selected others. All of the approaches use __public key cryptography__. This is a type of assymetric encryption where you use a public key to produce content that can only be decrypted if you have the private key. + +### Reprexes + +The most common place you might need to share a secret is to generate a reprex. First, do everything you can do eliminate the need to share a secret: + +* If it is an http problem, make sure to run all request with `verbose()`. +* If you get an R error, make sure to include `traceback()`. + +If you're lucky, that will be sufficient information to fix the problem. + +Otherwise, you'll need to encrypt the secret so you can share it with me. The easiest way to do so is with the following snippet: + +```{r} +library(openssl) +library(jsonlite) +library(curl) + +encrypt <- function(secret, username) { + url <- paste("https://api.github.com/users", username, "keys", sep = "/") + + resp <- curl::curl_fetch_memory(url) + pubkey <- jsonlite::fromJSON(rawToChar(resp$content))$key[1] + + opubkey <- openssl::read_pubkey(pubkey) + cipher <- openssl::rsa_encrypt(charToRaw(secret), opubkey) + jsonlite::base64_enc(cipher) +} + +cipher <- encrypt("username,password", "hadley") +cipher +``` + +Then I can run the following code on my computer to access it: + +```{r, eval = FALSE} +decrypt <- function(cipher, key = openssl::my_key()) { + cipherraw <- jsonlite::base64_dec(cipher) + rawToChar(openssl::rsa_decrypt(cipherraw, key = key)) +} + +decrypt(cipher) +#> username, password +``` + +I'll use the credentials carefully and won't share with anyone else, but I still recommend changing your password temporarily. + +### GitHub + +If you want to share secrets with a group of other people on GitHub, try the [secret](https://github.com/gaborcsardi/secret) package. + +### Travis + +Encrypted environment variables + +https://docs.travis-ci.com/user/encryption-keys/ + +To run secret files on travis, see . Basically you will encrypt the file locally and check in to git. Then you'll add a decryption step to your `.travis.yml` which makes it decrypts it for each run. + +Be careful to not accidentally expose the secret on travis. An easy way to accidentally expose the secret is to print it out so that it's captured in the log. Don't do that! + +(Note that encrypted data is not available in pull requests in forks. Typically you'll need to check PRs locally once you've confirmed that the code isn't actively malicious.) + +## CRAN + +There is no way to securely share information with arbitrary R users. This means that when developing a package you need to make sure `R CMD check` still passes without any problems when authentication is not available. The challenge is making sure this works while ensuring your code is still tested locally where authentication is available. + +### Documentation + +Like any R package, an API client needs clear and complete documentation of all functions. Examples are particularly useful but may need to be wrapped in `\donttest{}` to avoid challenges of authentication, rate limiting, lack of network access, or occasional API server down time. + +### Vignettes + +Vignettes pose additional challenges when an API requires authentication, because you don't want to bundle your own credentials with the package! However, you can take advantage of the fact that the vignette is built locally, and only checked by CRAN. + +In a setup chunk, do: + +```{r} +NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") +knitr::opts_chunk$set(purl = NOT_CRAN) +``` + +And then use `eval = NOT_CRAN` in any chunk that requires access to a secret. + +### Testing + +Use `testthat::skip_on_cran()` (preferrably wrapped inside a function like `auth_available()`) to skip all tests that require authentication. + +[google-secret]: https://developers.google.com/identity/protocols/OAuth2#installed