nomisr #190

evanodell · 2018-01-30T21:49:32Z

Summary

What does this package do? (explain in 50 words or less):

Downloads UK official statistics from the Nomis API and converts to R objects, primarily tibbles. The API is based around different statistical geographies.

Paste the full DESCRIPTION file inside a code block below:

Package: nomisr
Type: Package
Title: Access Nomis UK Labour Market Data with R
Version: 0.0.2
Date: 2018-01-30
Authors@R: person("Evan Odell", email="evanodell91@gmail.com",
  role=c("aut", "cre"),
  comment = c(ORCID='0000-0003-1845-808X'))
Author: Evan Odell [aut, cre]
Maintainer: Evan Odell <evanodell91@gmail.com>
Description: Access UK official statistics from the Nomis database through R. 
    Nomis includes data from the Census, the Labour Force Survey, DWP benefit 
    statistics and other economic and demographic data from the Office for 
    National Statistics, based around statistical geographies. See 
    <https://www.nomisweb.co.uk/api/v01/help> for full API documentation.
URL: https://github.com/EvanOdell/nomisr
BugReports: https://github.com/EvanOdell/nomisr/issues
License: MIT + file LICENSE
Imports: 
  jsonlite,
  tibble,
  readr,
  dplyr,
  curl
RoxygenNote: 6.0.1
Suggests: 
  knitr,
  rmarkdown,
  testthat
VignetteBuilder: knitr

URL for the package (the development repository, not a stylized html page):

https://github.com/EvanOdell/nomisr

Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

[e.g., "data extraction, because the package parses a scientific data file format"]

Data retrieval, because the package assists the downloading of a data from an API into R.

Who is the target audience and what are scientific applications of this package?

Demographers, economists, geographers, public health researchers, any other social scientists who are interested in geographic factors. The package will aid reproducibility, reduce the need to manually download area profiles, and allow easy linking of different datasets covering the same geographic area.

Are there other R packages that accomplish the same thing? If so, how does
yours differ or meet our criteria for best-in-category?

There are no other R packages specifically for retrieving this data. Some of the data on Nomis is also available through the UK Data Service with the ukds package, but other ONS data, including labour force, benefits, mortality and business statistics are not otherwise available through an R package. The UK Data Service also requires users registration, while Nomis does not, and does not, as far as I can tell, provide the same linking with statistical geographies as Nomis.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

Do you intend for this package to go on CRAN?
Do you wish to automatically submit to the Journal of Open Source Software? If so:
- The package has an obvious research application according to JOSS's definition.
- The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
- The package is novel and will be of interest to the broad readership of the journal.
- The manuscript describing the package is no longer than 3000 words.
- You intend to archive the code for the package in a long-term repository which meets the requirements of the journal.
- (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:
Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

The text was updated successfully, but these errors were encountered:

evanodell · 2018-01-30T21:50:36Z

Hi all, submitting this as a presubmission because I'm not sure how much it fits with rOpenSci's remit.

Best,

Evan

karthik · 2018-01-30T22:43:08Z

👋 @evanodell
Thanks for this submission. It fits our suite, and since you used the full template (which isn't required for a pre-submission inquiry), I can just treat this as a submission and proceed as such. I'll edit your title to remove pre-submission.

I will also be the editor on your submission, so stay tuned for next steps.

karthik · 2018-01-30T22:49:08Z

nomisr

Editor checks:

Fit: The package meets criteria for fit and overlap
Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
License: The package has a CRAN or OSI accepted license
Repository: The repository link resolves correctly
Archive (JOSS only, may be post-review): The repository DOI resolves correctly
Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Here are results of a check on the good practice package on some best practices. You can re-run this yourself (as you fix issues) by running goodpractice::gp() on your package. Please fix relevant issues before reviewers begin to take a look at your package.

── GP nomisr ───────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 59% of code lines are covered by test cases.

    R/data_download.R:103:NA
    R/data_download.R:137:NA
    R/data_download.R:139:NA
    R/data_download.R:141:NA
    R/data_download.R:143:NA
    ... and 35 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it
    gets invalid quite often. A build date will be added to the package
    when you perform `R CMD build` on it.
  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/data_download.R:118:1
    R/data_download.R:135:1
    R/data_download.R:157:1
    R/nomis_codes.R:52:1
    R/search.R:29:1
    ... and 3 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
    result 1:0 if the expression on the right hand side is zero. Use
    seq_len() or seq_along() instead.

    R/data_download.R:143:14

  ✖ not import packages as a whole, as this can cause name
    clashes between the imported packages. Instead, import only the
    specific functions you need.
  ✖ fix this R CMD check WARNING: LaTeX errors when creating
    PDF version. This typically indicates Rd problems.

Reviewer 1: @cderv
Due date: February 23rd

Reviewer 2: @pegeler
Due date: February 25th

cderv · 2018-02-10T17:45:32Z

Package Review for `nomisr`

Onboarding submission : #190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8

Review Comments

General comments

First of all, great package ! Very well done, Everything is good!
Well done really ! There was still medium coverage at the submission, it is 100% now and all run perfectly locally. Know that goodpractice::gp praises you right :

♥ Whee! Grand package! Keep up the fabulous work!

Everything is based on tibble class that makes it perfect to work inside the tidyverse. It is really great! However, maybe this could be more explained for those who are not used to tibbles and tidyverse ecosystem.
I find it kind of necessary because you also used list-columns in tibbles. This is not something easy for some newcomers to handles. Even for some dplyr users who do not used that a lot, it is not something easy. Adding some example in the vignettes to show how to retrieve embedded data.frame inside some tibble's list column could be a good idea. To illustrate:

library(nomisr)
library(dplyr, warn.conflicts = F)
y <- nomis_data_info("NM_893_1")
y$annotations.annotation %>% class()
#> [1] "list"
y$annotations.annotation[[1]] %>% class()
#> [1] "data.frame"
y %>% pull(annotations.annotation) %>% class()
#> [1] "list"
y %>% pull(annotations.annotation)%>% .[[1]] %>% class()
#> [1] "data.frame"
y %>% pull(annotations.annotation) %>% purrr::pluck(1) %>% class()
#> [1] "data.frame"
y %>% pull(annotations.annotation) %>% purrr::map(1) %>% class()
#> [1] "list"
y %>% tidyr::unnest(annotations.annotation) %>% glimpse()
#> Observations: 14
#> Variables: 11
#> $ agencyid                             <chr> "NOMIS", "NOMIS", "NOMIS"...
#> $ id                                   <chr> "NM_893_1", "NM_893_1", "...
#> $ uri                                  <chr> "Nm-893d1", "Nm-893d1", "...
#> $ version                              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1...
#> $ components.primarymeasure.conceptref <chr> "OBS_VALUE", "OBS_VALUE",...
#> $ components.timedimension.codelist    <chr> "CL_893_1_TIME", "CL_893_...
#> $ components.timedimension.conceptref  <chr> "TIME", "TIME", "TIME", "...
#> $ name.value                           <chr> "LC4408EW - Tenure by num...
#> $ name.lang                            <chr> "en", "en", "en", "en", "...
#> $ annotationtext                       <chr> "Current (being actively ...
#> $ annotationtitle                      <chr> "Status", "Keywords", "Un...

Following are my review comments, for improvement:

About Community Guidelines

There is no contribution guidelines in the README or other CONTRIBUTING.md.
DESCRIPTION needs some improvement : no need to use Author and Maintainer fields if you are using Authors@R. You can just use the latest. You can find some good example in R package book.

Improvement for the packaging guidelines

see below parts for details

Adding a code of conduct
Improve README considering it is the first entry point for discovering NOMIS API
Improve DESCRIPTION for authorship
[minor] follow versionning advice with *.*.*.9000 for in development version
[question] I did not find why you need curl package in suggest...

README improvement

I would be more precise on what the NOMIS database is and for who it is. For example, I found that this submission issue mentioned UK database but not the README. You help file for the package as pretty clear description of what it is for. While reviewing, I had a doubt and went to check on NOMIS. This is why I think the statement of need is not complete.
It feels like the README could get more examples on what your package can do. Only one of your functions is shown in the readme.
Put package documentation link in the title of the repo like in the reprex repo. By the way, you can add a more descriptive description on your github repo and some tags.
could precise that it uses the tibble format (from the tidyverse)

Vignettes improvement

Vignette was not built when I installed the package with devtools::install_github. Did not know if this intended behavior or not.
I tried to build the vignette locally then: it worked.
- However, there is a problem on result formatting with the tibble. (I think because you use result = 'asis' in code chunks) and because every results are not tables (y$annotations.annotation is a list for example.) so it does not print nicely.
- The vignette do not show any code to get the results - I would show them for someone to learn how to use your function, as examples. (with echo = TRUE)
- The formatting issue is visible on your website too
I think the vignette contains necessary information but does not show enough examples on how to retrieve data in the objects you return. You mentioned the three list columns with nested data.frame but not show how correctly extracts the info.
It is pretty clear that you package allows to retrieve a large amount of data from the NOMIS API. it seems necessary to always use nomis_data_info, then nomis_codes then nomis_get_data to be sure to not retrieve to much. The vignette helps you understand that. It could appears to in your readme, and it could be more clear how to construct a pipeline for good practice with this API.

These improvement are the reason why I did not check the vignettes box, even if there is one in the github repo and doc.
If you need help on all this, do not hesitate to ask.

About documentation of functions and examples

All functions are well documented.
I encountered some issues running the examples because of a timeout. I believe this is what you meant in your by "there are some limits to the amount of data that can be retrieved within a certain period of time, although those are not published.". The maybe some improvement to do on this. (see under)
Is this the reason why you enclosed each example in \dontrun{} statement ? am I not sure on what is the recommendation for API package like this one for example ? \dontrun{} does allow anyone to run examples with something like example("nomis_codes", package = "nomisr"). Idea : Maybe \donttest{} is better here: does show in help and runs with example, but not run by R CMD check (so not run by CRAN ?) - see rd vignette from roxygen2
I ran all tests with devtools::run_examples(run = FALSE) (in several time, due to timeout constraint). Everything is working. It works well for almost all, but your example assign systematically to a variable (named x or y - room for improvement) but never print to show head of results. I feel that is missing.
I said "almost all" because there are some very long request in nomis_get_data. The warning that you put in Details states that it can be very slow process if calling significant amount of data. I would emphasize on this, saying this warning in Description part of the help file. And these example should definitely not be ran.
I saw you made a pkgdown website. This is great !! and nice theme!
I did not find the source code of this website. I would have thought it leaved with the package source code. But no docs folder or gh-pages. I find it useful to have access for letting people PR your some typos or other thing and all the site from your package source code. However, you may have good reason.

Various comments on the code

[minor] I would put the url of the API in its own variable. I think it is better practice and more efficient to maintain in case of change in the url (e.g V0.2). The url is used several time and a change would mean copy pasting. another utils function could help building the url (paste0 is used three times), with the same base url def.sdmx.json). Know that it exists some tools to help build some strings with parameters like glue, and specific one to handle url like urltools or those in httr. You may not need them here but it is worth knowing.
suppressMessage is used to hide the message of readr::read_csv. However, the message appears because column types are guessed and not set. It is good practice to set column type in readr with column_types = cols(<something>). In an API package like this one, it make even more sense because it could helps be aware of any change in the format. (using stop_for_problems() or just see a message appear) - see readr vignettes
[minor] I don't know what are the good practice on this one. I just feel that you have a dplyr dependency in import for three functions bind_rows, case_when and if_else than can be easily replace with rbind or if.else. I feel that, if the former are very useful interactively in an analysis, using them in a package can be unnecessary dependency to a "big" package that can cause you some effort later on if anything change in dplyr. More of a personal point of view on this one.
However, it relates on how you are build your query url

  query <- dplyr::if_else(keywords==FALSE,
                           paste0("/def.sdmx.json?search=*", 
                                  search, "*"),
                           paste0("/def.sdmx.json?search=keywords-", 
                                  search, "*")

could be rewritten without dplyr::if_else

base_query <- "/def.sdmx.json?search="
if (keyword) {
    keyword_term <- "keywords-")
} else {
    keyword_term <- "*"
}
# or keyword_term <- if.else(keyword, "keywords-", "*")
query <- paste0(base_query, key_word_term, search, "*")

It would suppres a non essential dependency.

The style in code could be improve in some places. You can follow some style guide to help you.
like the one in ROpensci Guidelines linking to Advanced R or more recent the one used in the tidyverse styleguide. They even build some tools to help like styler and lintr. Example of improvement:
- put spaces after foror if and also around =, ==, +, >, ...
- indentation of code (in RStudio CTRL+i i useful). for example in if () { } else { }, lines of code that are inside brackets should be indented.
- use {} with if when it does not fit on one line
- More importantly, be consistent in all your script with the guideline you choose.
Know that when you want to stop on a condition, there is also stopifnot. For user experience, you can also return clearer error sometimes with stop("API request did not return any results", call. = FALSE) to not show the call for example.
A good practice is also to use meaningful name inside your internal code, even for temporary variable. There are a lot of x, a, q, ... This will improve readability and the ease for others to collaborate on your code (including future you)

About the timeout and size of data

If session is interactive(), it may be a good idea to warn clearly and even ask if we want to continue when lots of data are being requested. I find myself playing with the example trying to retrieve 208888 of your pages

Retrieving additional pages 1 of 208888
Retrieving additional pages 2 of 208888

I don't know if some other API packages are doing something to help user or if it is all the user responsibility to know what is he doing. (users will surely be already NOMIS data users).

As another improvement, I think you should try to create custom error message for the timeout issue as you know it is an issue that will impact user experience but cannot really prevent. I encountered:

a <- nomis_codes("NM_1_1")
# Error in open.connection(con, "rb") : 
#  Timeout was reached: Resolving timed out after 10000 milliseconds

You could use some tryCatch mechanism (some useful info in Advanced R).

[thoughts] You may need more control on the request mechanism to deal with timeout. First, it seems for the website one need to contact NOMIS help-desk to know more about limitation. Then , your are relying on readr connection mechanism, and may need to use httr, crul or curl to better handle request to API with the specificity of NOMIS

About SMDX format on NOMIS

Do you know about the rsdmx package ? it is on github and CRAN, there is WIKI. I did not find NOMIS in their listing, so I did not know if it works for you and how it could overlap with your package.
rsdmx is definitely more a generic package than yours. I just wanted to point it to you in case you did not know.

Last words

Thank you very much letting me review this package. It is pretty simple one, yet powerful as it brings to R users lots of new open data for UK. And it does it well allowing to work with tidyverse tools directly thanks to the tibble format.

All my remarks were toward improvement. The package in its currents state is functional, is well tested and as the minimal documentation to get start with it.

@evanodell, if you have some questions or you want some help / advice on anything, feel free to reach to me, I will be glad to help you.

@karthik, I hope I am doing fine for my first review with Ropensci, please tell me if miss something and need to review further.

maelle · 2018-02-11T08:03:56Z

I'm not the editor handling this review but I find that it is a very good review @cderv, thank you!

karthik · 2018-02-13T20:08:47Z

Quick note: I’ll be away from the internet for about a week and will be back online around 02/21. So I will be back to editorial activities shortly.

pegeler · 2018-02-23T06:30:02Z

Package Review for `nomisr`

Onboarding submission: #190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10

Review Comments

Excellent work! This is a fine example of the level of quality and professionalism that I have
come to expect from rOpenSci contributions. This package is a great addition to the
complement of R functionality and provides a facile portal to a useful data source.

Many of the comments that I had have been addressed as a result of @cderv's prompt and thorough review.
In addition, development has continued to be very active over the past few weeks! Issues are
fixed as quickly as I find them! Below are a few more comments that I would like to add.

Basic Package Structure Overview

README.md

The README.md does not contain citation information, per rOpenSci packaging guide.
I am not sure how necessary that is since there is a CITATION file in the inst/ directory,
however, it should be an easy add.

I would concur with @cderv in suggesting additional usage examples.

DESCRIPTION

The DESCRIPTION file does not currently specify a minimum version of R. I would
recommend doing so. A minor release is preferred over a patch release, e.g.
```
Depends: R (>= 3.4.0)
```
is better than
```
Depends: R (>= 3.4.1)
```
I am not seeing where curl is used even though it is in the Suggests list.

CITATION

The CITATION file uses meta$Date to dynamically fill in date for the citation.
However, the Date entry was removed from the DESCRIPTION file per gp suggestion so
the date is NULL.

Since it is considered good practice to omit the Date entry because it may be prone to
falling out of date, it seems that hard-coding the date in the citation would be silly.
I briefly considered the use of a different field to dynamically update the citation date,
such as Date/Publication or Packaged. However, those will only work for users who are downloading
from a CRAN repos. Therefore I'm a little bit at a loss as to what the most elegant solution might be.
I might suggest breaking with good practice here and adding the Date field back in
since there is a reason to do so.
Suggestions are welcome.

The CITATION file should also end in a newline character :)

Documentation

The man page for the package is named nmisr, not nomisr. This appears to be a typo.
@seealso can link to the other package functions using \link{}.
It appears that some descriptions may be intended to be multi-paragraph. Without explicitly
using the @description and @details tags, the third code block is assumed to be the 'details'
block by default. Therefore, using explicit roxygen tags might be helpful here.
Nomis data comes packaged in very complex data structures. Please don't be afraid of 'over-explaining'.
The more information in the documentation and vignettes, the better for the user, especially if they do
not have prior experience with Nomis data.
'Nomis' in the title for nomis_search() should be capitalized to be consistent with the rest of the package.

Examples

The example for nomis_get_data failed due to lack of data. I recommend finding a specific
date range that will return results and then hard-code that into the example.
Athough \donttest{} used to be perfect for API packages because it would not be run by
R CMD check but would be run by example() and would be included in the man page, it appears
that the functionality has changed.
Now, according to Hadley's book, it appears that code enclosed
in \donttest{} is indeed checked (though I have yet to confirm). Therefore,
for long-running API calls, it might make sense to change back to \dontrun{} if you are
concerned about time-out. I recall
reading some best practice advice on CRAN's website at some point, but cannot find the reference
anymore.
The examples often show multiple ways to use the function, which is excellent. However,
it might be a nice enhancement to add a little extra code to glimpse the data for the users to get
a better feel for what they are getting out of the API call. This is especially helpful since
the data that is returned can differ greatly depending on what args are passed.

Vignettes

I like the switch to echo = TRUE for vignette code chunks. This is the user's opportunity
to see the code in action for the first time. Including the code used to generate the results can
aid in understanding and readability.
Vignettes are not currently built. This is not a big deal. However, it would
be a convenience for those installing from GitHub or local repos. This is especially true since
the API calls can cause vignettes to take a long time to build. (Currently I cannot build vignettes
because I keep getting hung up).
The addition of the theme and TOC is a nice touch.
It looks like the vignette might still be a little incomplete. I see that the final paragraph of
Querying data variables is not finished.
It's unnecessary to hard-code the results in code chunks. The results will be output when the vignette is knit.

Functionality

nomis_search()
- [suggestion] Currently, users can either search based on generic search terms or search for specific keywords.
  Looking at the API docs, I think there is an opportunity to improve functionality by taking advantage
  of some of the other search categories (e.g., name, description, content type, units). This would
  be doubly useful since content type and units are not searched when performing a generic search.
- [suggestion] Rather than automatically inserting wildcards (*), I would recommend informing the user of the wildcard
  character so that they can have finer control over how wildcards are used in their search.
- It would be nice to mention in the documentation that several terms can be searched at once using
  a comma. Alternatively, the parameter could accept a character vector that could be
  collapsed with commas.
- [suggestion] The function could potentially accept character vectors from each of the categories (as well as
  a generic search term field) and then the appropriate search string could be created.
nomis_codes()
- Generally, it is helpful to mirror the language of the source material when possible.
  The nomis_codes() function accepts a dimension (concept) and the function returns codes for that dimension.
  Therefore, the argument code might be renamed to dimension or concept, since the
  function searches for codes availalbe within a dimension.
- Related to above, I think the @title could be more descriptive.
- [suggestion] The query is currently built using a hierarchal structure. E.g., /api/v01/dataset/<ID>/<DIMENSION>/<CODE>/def.sdmx.json. However,
  I think there is an opportunity to make this query a little more flexible and extensible. Instead of enforcing the hierarchy,
  the funciton could use the GET verb to provide a little more control over how the data is filtered. A user could
  query on criteria from different dimensions than the one which they are interested in results from. For example,
  even if the user is interested in the 'item' dimension, they may want to filter on 'geography' and 'sex'. Using GET would allow
  for this. As an example: /api/v01/dataset/NM_1_1/item.def.sdmx.json?geography=2038432081&sex=7. This would also expose the user
  to options for specifying times and dates, which cannot otherwise be done. I fully acknowledge that this might be over-building,
  since this function only provides a support role and will probably not be used that way.
- This is a minor concern, but I feel it's cleaner to specify value if a test function returns TRUE first when using ifelse().
  The code ifelse(is.null(type), "", paste0("/", type)) reads cleaner than ifelse(is.null(type) == FALSE, paste0("/", type), "") IMO.
nomis_data_info()
- The documentation could be more explicit in telling the user that the data returned by this function is metadata.
- [suggestion] Relating to the comment above, it may be a little clearer to rename the function something to the effect of nomis_get_metadata().
  This naming scheme is more in line with the pattern set forth by nomis_get_data naming scheme.
nomis_get_data()
- I see that the API documentation states, "you have a 25000 cell limit on the number of data values downloaded in these formats."
  It appears that the code currently breaks the fetch into multiple steps if the record count is greater than or equal to 25k records.
  I am curious if this might be an oversight. It might be useful to test the edge case where the query returns more than 25k cells but
  fewer than 25k records to see if everything still works.
- Including a select parameter may be a good enhancement for the future. Right now, that can be accomplished with
  additional_queries but is probably important enough to be standalone. This would allow for additional documentation
  and sugar to be added for it.

Conclusion

This is a great package that will be valuable to anyone interested in analysis of this data. Nice work!

I appreciate the opportunity to review this package and hope that my comments are constructive and useful. This package needs just a few
tweaks (namely getting the examples up and running) and then I will gladly approve!

evanodell · 2018-03-09T14:38:03Z

@hi @cderv and @pegeler, thank you both of you very much for your helpful comments and suggestions, apologies for not being able to respond to them sooner.

Response to @cderv

List-columns

I find it kind of necessary because you also used list-columns in tibbles. This is not something easy for some newcomers to handles. Even for some dplyr users who do not used that a lot, it is not something easy. Adding some example in the vignettes to show how to retrieve embedded data.frame inside some tibble's list column could be a good idea. To illustrate:

> library(nomisr)
> library(dplyr, warn.conflicts = F)
> y <- nomis_data_info("NM_893_1")
> y$annotations.annotation %>% class()
> #> [1] "list"
> y$annotations.annotation[[1]] %>% class()
> #> [1] "data.frame"
> y %>% pull(annotations.annotation) %>% class()
> #> [1] "list"
> y %>% pull(annotations.annotation)%>% .[[1]] %>% class()
> #> [1] "data.frame"
> y %>% pull(annotations.annotation) %>% purrr::pluck(1) %>% class()
> #> [1] "data.frame"
> y %>% pull(annotations.annotation) %>% purrr::map(1) %>% class()
> #> [1] "list"
> y %>% tidyr::unnest(annotations.annotation) %>% glimpse()
> #> Observations: 14
> #> Variables: 11
> #> $ agencyid                             <chr> "NOMIS", "NOMIS", "NOMIS"...
> #> $ id                                   <chr> "NM_893_1", "NM_893_1", "...
> #> $ uri                                  <chr> "Nm-893d1", "Nm-893d1", "...
> #> $ version                              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1...
> #> $ components.primarymeasure.conceptref <chr> "OBS_VALUE", "OBS_VALUE",...
> #> $ components.timedimension.codelist    <chr> "CL_893_1_TIME", "CL_893_...
> #> $ components.timedimension.conceptref  <chr> "TIME", "TIME", "TIME", "...
> #> $ name.value                           <chr> "LC4408EW - Tenure by num...
> #> $ name.lang                            <chr> "en", "en", "en", "en", "...
> #> $ annotationtext                       <chr> "Current (being actively ...
> #> $ annotationtitle                      <chr> "Status", "Keywords", "Un...
>

Response: I've added a similar example to the vignette, which is similar enough to your example that I feel I should cite you as a contributor if that is okay.

Following are my review comments, for improvement:

About Community Guidelines

There is no contribution guidelines in the README or other CONTRIBUTING.md.

Response: Added a CONTRIBUTING.md in ropensci/nomisr@e4bbb02

DESCRIPTION needs some improvement : no need to use Author and Maintainer fields if you are using Authors@R. You can just use the latest. You can find some good example in R package book.

Response: Now only using Authors@R

Improvement for the packaging guidelines

see below parts for details

Adding a code of conduct

Response: I a code of conduct in ropensci/nomisr@ede3558 and

Improve README considering it is the first entry point for discovering NOMIS API

Improve DESCRIPTION for authorship

[minor] follow versionning advice with ..*.9000 for in development version

Response: Now following versioning best practice in development version

[question] I did not find why you need curl package in suggest...
Response: For reasons I'm not entirely clear on, the package was failing on Travis without the curl suggestion, I think due to a bug in jsonlite.

README improvement

I would be more precise on what the NOMIS database is and for who it is. For example, I found that this submission issue mentioned UK database but not the README. You help file for the package as pretty clear description of what it is for. While reviewing, I had a doubt and went to check on NOMIS. This is why I think the statement of need is not complete.

It feels like the README could get more examples on what your package can do. Only one of your functions is shown in the readme.

Put package documentation link in the title of the repo like in the reprex repo. By the way, you can add a more descriptive description on your github repo and some tags.

could precise that it uses the tibble format (from the tidyverse)

Response: I've added more details to the README, including more details on Nomis itself, a simple example and citation information.

Vignettes improvement

Vignette was not built when I installed the package with devtools::install_github. Did not know if this intended behavior or not.

I tried to build the vignette locally then: it worked.

However, there is a problem on result formatting with the tibble. (I think because you use result = 'asis' in code chunks) and because every results are not tables (y$annotations.annotation is a list for example.) so it does not print nicely.

The vignette do not show any code to get the results - I would show them for someone to learn how to use your function, as examples. (with echo = TRUE)

The formatting issue is visible on your website too

I think the vignette contains necessary information but does not show enough examples on how to retrieve data in the objects you return. You mentioned the three list columns with nested data.frame but not show how correctly extracts the info.

It is pretty clear that you package allows to retrieve a large amount of data from the NOMIS API. it seems necessary to always use nomis_data_info, then nomis_codes then nomis_get_data to be sure to not retrieve to much. The vignette helps you understand that. It could appears to in your readme, and it could be more clear how to construct a pipeline for good practice with this API.

These improvement are the reason why I did not check the vignettes box, even if there is one in the github repo and doc.
If you need help on all this, do not hesitate to ask.

Response: As mentioned above, I've added your suggestions on exploring list-columns to the vignette, and fixed the online formatting issues. I've changed echo to TRUE to show code, pre built vignette in ropensci/nomisr@1c9cd3f

About documentation of functions and examples

All functions are well documented.

I encountered some issues running the examples because of a timeout. I believe this is what you meant in your by "there are some limits to the amount of data that can be retrieved within a certain period of time, although those are not published.". The maybe some improvement to do on this. (see under)

Is this the reason why you enclosed each example in \dontrun{} statement ? am I not sure on what is the recommendation for API package like this one for example ? \dontrun{} does allow anyone to run examples with something like example("nomis_codes", package = "nomisr"). Idea : Maybe \donttest{} is better here: does show in help and runs with example, but not run by R CMD check (so not run by CRAN ?) - see rd vignette from roxygen2

Response: I've changed \dontrun{} to \donttest{} in ropensci/nomisr@ac3e5a4

I ran all tests with devtools::run_examples(run = FALSE) (in several time, due to timeout constraint). Everything is working. It works well for almost all, but your example assign systematically to a variable (named x or y - room for improvement) but never print to show head of results. I feel that is missing.

Response: I have changed the names in examples in ropensci/nomisr@ac3e5a4 to be more descriptive of the data they are actually retrieving, and added a print call so the head is actually displayed.

I said "almost all" because there are some very long request in nomis_get_data. The warning that you put in Details states that it can be very slow process if calling significant amount of data. I would emphasize on this, saying this warning in Description part of the help file. And these example should definitely not be ran.

Response: I've removed the very long examples from nomis_get_data ropensci/nomisr@ac3e5a4 - I considered wrapping them in \dontrun{} but I think dropping them is the better option.

I saw you made a pkgdown website. This is great !! and nice theme!

Response: Thank you!

*I did not find the source code of this website. I would have thought it leaved with the package source code. But no docs folder or gh-pages. I find it useful to have access for letting people PR your some typos or other thing and all the site from your package source code. However, you may have good reason.

Response: I keep all the source code for my documentation website in my docs repo. I did this so I could host on netlify with a custom domain and https encryption and not have to use separate subdomains for each package. I'm not sure if that is the best option or not.

Various comments on the code

[minor] I would put the url of the API in its own variable. I think it is better practice and more efficient to maintain in case of change in the url (e.g V0.2). The url is used several time and a change would mean copy pasting. another utils function could help building the url (paste0 is used three times), with the same base url def.sdmx.json). Know that it exists some tools to help build some strings with parameters like glue, and specific one to handle url like urltools or those in httr. You may not need them here but it is worth knowing.

Response: I've turned the API URL into its own variable in ropensci/nomisr@ac3e5a4

suppressMessage is used to hide the message of readr::read_csv. However, the message appears because column types are guessed and not set. It is good practice to set column type in readr with column_types = cols(). In an API package like this one, it make even more sense because it could helps be aware of any change in the format. (using stop_for_problems() or just see a message appear) - see readr vignettes

Response: I've dropped suppressMessage and used col_types = readr::cols(.default = readr::col_character()) to make columns characters by default in ropensci/nomisr@ac3e5a4

[minor] I don't know what are the good practice on this one. I just feel that you have a dplyr dependency in import for three functions bind_rows, case_when and if_else than can be easily replace with rbind or if.else. I feel that, if the former are very useful interactively in an analysis, using them in a package can be unnecessary dependency to a "big" package that can cause you some effort later on if anything change in dplyr. More of a personal point of view on this one.

However, it relates on how you are build your query url
  query <- dplyr::if_else(keywords==FALSE,
                           paste0("/def.sdmx.json?search=*", 
                                  search, "*"),
                           paste0("/def.sdmx.json?search=keywords-", 
                                  search, "*")
could be rewritten without dplyr::if_else
base_query <- "/def.sdmx.json?search="
if (keyword) {
    keyword_term <- "keywords-")
} else {
    keyword_term <- "*"
}
# or keyword_term <- if.else(keyword, "keywords-", "*")
query <- paste0(base_query, key_word_term, search, "*")
It would suppres a non essential dependency.

Response: I've dropped the use of case_when and if_else in favour the the style you described above in ropensci/nomisr@ac3e5a4. bind_rows is used to combine together a single tibble with a list of one or more tibbles, and using rbind in place of bind_rows would likely require recreating the bind_rows function. If there are changes to bind_rows in the future it is only in one location so should be easy to adjust to.

The style in code could be improve in some places. You can follow some style guide to help you. like the one in ROpensci Guidelines linking to Advanced R or more recent the one used in the tidyverse styleguide. They even build some tools to help like styler and lintr. Example of improvement:
* put spaces after foror if and also around =, ==, +, >, ...
* indentation of code (in RStudio CTRL+i i useful). for example in if () { } else { }, lines of code that are inside brackets should be indented.
* use {} with if when it does not fit on one line
* More importantly, be consistent in all your script with the guideline you choose.

Know that when you want to stop on a condition, there is also stopifnot. For user experience, you can also return clearer error sometimes with stop("API request did not return any results", call. = FALSE) to not show the call for example.

A good practice is also to use meaningful name inside your internal code, even for temporary variable. There are a lot of x, a, q, ... This will improve readability and the ease for others to collaborate on your code (including future you)

Response: I've added call. = FALSE to the various stop messages.

Response: I've gone through the code and changed temporary variable names to be more meaningful, and formatted the code to improve indentation, spaces around if and for operators, ==, etc.

About the timeout and size of data

*If session is interactive(), it may be a good idea to warn clearly and even ask if we want to continue when lots of data are being requested. I find myself playing with the example trying to retrieve 208888 of your pages
Retrieving additional pages 1 of 208888
Retrieving additional pages 2 of 208888
I don't know if some other API packages are doing something to help user or if it is all the user responsibility to know what is he doing. (users will surely be already NOMIS data users).
As another improvement, I think you should try to create custom error message for the timeout issue as you know it is an issue that will impact user experience but cannot really prevent. I encountered:
a <- nomis_codes("NM_1_1")
# Error in open.connection(con, "rb") : 
#  Timeout was reached: Resolving timed out after 10000 milliseconds
You could use some tryCatch mechanism (some useful info in Advanced R).

[thoughts] You may need more control on the request mechanism to deal with timeout. First, it seems for the website one need to contact NOMIS help-desk to know more about limitation. Then , your are relying on readr connection mechanism, and may need to use httr, crul or curl to better handle request to API with the specificity of NOMIS

Response: I've used tryCatch to provide more useful information in the case of time outs. I've also written, but not implemented, a warning and yes/no option in interactive sesssions with requests over 350,000 cells which are likely to cause rate limiting issues. Implemented in ropensci/nomisr@ac3e5a4

About SMDX format on NOMIS

Do you know about the rsdmx package ? it is on github and CRAN, there is WIKI. I did not find NOMIS in their listing, so I did not know if it works for you and how it could overlap with your package.
rsdmx is definitely more a generic package than yours. I just wanted to point it to you in case you did not know.

Response: I was not aware of it, but it is something I should explore for the future, thank you for pointing me towards it.

Response to pegeler

Package Review for nomisr

Onboarding submission: #190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
Documentation

The package includes all the following forms of documentation:
A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

    The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

    A short summary describing the high-level functionality of the software
    Authors: A list of authors with their affiliations
    A statement of need clearly stating problems the software is designed to solve and its target audience.
    References: with DOIs for all those that have one (e.g. papers, datasets, software).
Functionality
Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
Final approval (post-review)
The author has responded to my review and made changes to my satisfaction. I recommend approving this package.
Estimated hours spent reviewing: 10
Review Comments

Excellent work! This is a fine example of the level of quality and professionalism that I have
come to expect from rOpenSci contributions. This package is a great addition to the
complement of R functionality and provides a facile portal to a useful data source.

Many of the comments that I had have been addressed as a result of @cderv's prompt and thorough review.
In addition, development has continued to be very active over the past few weeks! Issues are
fixed as quickly as I find them! Below are a few more comments that I would like to add.
Basic Package Structure Overview
README.md

The README.md does not contain citation information, per rOpenSci packaging guide.
I am not sure how necessary that is since there is a CITATION file in the inst/ directory,
however, it should be an easy add.

Reponse: Added in ropensci/nomisr@b75a47d

I would concur with @cderv in suggesting additional usage examples.
DESCRIPTION

The DESCRIPTION file does not currently specify a minimum version of R. I would
recommend doing so. A minor release is preferred over a patch release, e.g.

Depends: R (>= 3.4.0)

is better than

Depends: R (>= 3.4.1)

I am not seeing where curl is used even though it is in the Suggests list.

Response: I've added a minimum version of R (>= 3.4.0) to the DESCRIPTION in ropensci/nomisr@b75a47d

Response: Not including curl was causing errors with jsonlite on travis - I'm not sure why, as jsonlite is supposed to import curl itself.

CITATION

The CITATION file uses meta$Date to dynamically fill in date for the citation. However, the Date entry was removed from the DESCRIPTION file per gp suggestion so the date is NULL.

Since it is considered good practice to omit the Date entry because it may be prone to falling out of date, it seems that hard-coding the date in the citation would be silly. I briefly considered the use of a different field to dynamically update the citation date, such as Date/Publication or Packaged. However, those will only work for users who are downloading from a CRAN repos. Therefore I'm a little bit at a loss as to what the most elegant solution might be. I might suggest breaking with good practice here and adding the Date field back in since there is a reason to do so. Suggestions are welcome.

The CITATION file should also end in a newline character :)

Response: I've hardcoded 2018 into the CITATION file in ropensci/nomisr@b75a47d, which I think is the easiest solution, as its the year the package was first released, although I'm open to other suggestions as well.

Documentation

The man page for the package is named nmisr, not nomisr. This appears to be a typo.
@Seealso can link to the other package functions using \link{}.
It appears that some descriptions may be intended to be multi-paragraph. Without explicitly
using the @description and @details tags, the third code block is assumed to be the 'details'
block by default. Therefore, using explicit roxygen tags might be helpful here.
Nomis data comes packaged in very complex data structures. Please don't be afraid of 'over-explaining'.
The more information in the documentation and vignettes, the better for the user, especially if they do
not have prior experience with Nomis data.
'Nomis' in the title for nomis_search() should be capitalized to be consistent with the rest of the package.

Response: I've expanded the documentation and used roxygen tags to improve organisation. I've also added an expanded explanation to the package vignette, all in ropensci/nomisr@b75a47d

Examples

The example for nomis_get_data failed due to lack of data. I recommend finding a specific
date range that will return results and then hard-code that into the example.
Athough \donttest{} used to be perfect for API packages because it would not be run by
R CMD check but would be run by example() and would be included in the man page, it appears
that the functionality has changed.
Now, according to Hadley's book, it appears that code enclosed
in \donttest{} is indeed checked (though I have yet to confirm). Therefore,
for long-running API calls, it might make sense to change back to \dontrun{} if you are
concerned about time-out. I recall
reading some best practice advice on CRAN's website at some point, but cannot find the reference
anymore.
The examples often show multiple ways to use the function, which is excellent. However,
it might be a nice enhancement to add a little extra code to glimpse the data for the users to get
a better feel for what they are getting out of the API call. This is especially helpful since
the data that is returned can differ greatly depending on what args are passed.

Response: I've added glimpsing to the examples in different functions, and removed long-running API calls, which aren't good practice due to rate limiting, as of ropensci/nomisr@b75a47d

Vignettes

I like the switch to echo = TRUE for vignette code chunks. This is the user's opportunity
to see the code in action for the first time. Including the code used to generate the results can
aid in understanding and readability.
Vignettes are not currently built. This is not a big deal. However, it would
be a convenience for those installing from GitHub or local repos. This is especially true since
the API calls can cause vignettes to take a long time to build. (Currently I cannot build vignettes
because I keep getting hung up).
The addition of the theme and TOC is a nice touch.
It looks like the vignette might still be a little incomplete. I see that the final paragraph of
Querying data variables is not finished.
It's unnecessary to hard-code the results in code chunks. The results will be output when the vignette is knit.

Response: I've now pre-built the vignette, and expanded on the various sections in ropensci/nomisr@1c9cd3f

Functionality

nomis_search()
    [suggestion] Currently, users can either search based on generic search terms or search for specific keywords.
    Looking at the API docs, I think there is an opportunity to improve functionality by taking advantage
    of some of the other search categories (e.g., name, description, content type, units). This would
    be doubly useful since content type and units are not searched when performing a generic search.
    
    [suggestion] Rather than automatically inserting wildcards (*), I would recommend informing the user of the wildcard character so that they can have finer control over how wildcards are used in their search.  It would be nice to mention in the documentation that several terms can be searched at once using a comma. Alternatively, the parameter could accept a character vector that could be collapsed with commas.
  
    [suggestion] The function could potentially accept character vectors from each of the categories (as well as a generic search term field) and then the appropriate search string could be created.

Response: I have re-written nomis_search in keeping with your suggestions. It now accepts five parameters - name, description, keywords, content_type, units - and can search in any or all of them at the same time, with or without wildcards, and each parameter accepts character vectors of multiple queries. I've also added a nomis_content_type() function to assist in looking up specific content types, which may be used with nomis_search(). All in ropensci/nomisr@b75a47d

nomis_codes()
Generally, it is helpful to mirror the language of the source material when possible.
The nomis_codes() function accepts a dimension (concept) and the function returns codes for that dimension.
Therefore, the argument code might be renamed to dimension or concept, since the
function searches for codes availalbe within a dimension.
Related to above, I think the @title could be more descriptive.

[suggestion] The query is currently built using a hierarchal structure. E.g., /api/v01/dataset////def.sdmx.json. However, I think there is an opportunity to make this query a little more flexible and extensible. Instead of enforcing the hierarchy, the funciton could use the GET verb to provide a little more control over how the data is filtered. A user could query on criteria from different dimensions than the one which they are interested in results from. For example, even if the user is interested in the 'item' dimension, they may want to filter on 'geography' and 'sex'. Using GET would allow for this. As an example: /api/v01/dataset/NM_1_1/item.def.sdmx.json?geography=2038432081&sex=7. This would also expose the user to options for specifying times and dates, which cannot otherwise be done. I fully acknowledge that this might be over-building, since this function only provides a support role and will probably not be used that way. This is a minor concern, but I feel it's cleaner to specify value if a test function returns TRUE first when using ifelse(). The code ifelse(is.null(type), "", paste0("/", type)) reads cleaner than ifelse(is.null(type) == FALSE, paste0("/", type), "") IMO. nomis_data_info() The documentation could be more explicit in telling the user that the data returned by this function is metadata.


Response: I've updated the documentation to be more explicit about what the function actually does, and improved the code readability. I changed the code parameter to concept, as you suggested, to be more in keeping with the syntax used in the original API. I've also added a specific search parameter, and a generic additional_queries parameter, as used in nomis_get_data(), in ropensci/nomisr@b75a47d

[suggestion] Relating to the comment above, it may be a little clearer to rename the function something to the effect of nomis_get_metadata().

This naming scheme is more in line with the pattern set forth by nomis_get_data naming scheme. nomis_get_data()

Response: I've renamed the nomis_codes() function to nomis_get_metadata(), which I agree is more descriptive of what the function actually does. I've also added the nomis_overview() function, which returns a more general overview of all available metadata, without the flexibility and specificity offered by nomis_get_metadata() ropensci/nomisr@b75a47d

I see that the API documentation states, "you have a 25000 cell limit on the number of data values downloaded in these formats."

It appears that the code currently breaks the fetch into multiple steps if the record count is greater than or equal to 25k records.

I am curious if this might be an oversight. It might be useful to test the edge case where the query returns more than 25k cells but fewer than 25k records to see if everything still works.

Response: The 25k limit refers to the number of observation values, the official documentation is rather confusing on that point, as Nomis only treats one of the columns as containing the data you are querying, I hope my changes in ropensci/nomisr@b75a47d have made it clearer.

Including a select parameter may be a good enhancement for the future. Right now, that can be accomplished with additional_queries but is probably important enough to be standalone. This would allow for additional documentation and sugar to be added for it.

Response: I've added a select parameter as an enhancement, as it is straightforward to do and reduces the amount of information that could possibly need to be included under additional_queries.

Conclusion
This is a great package that will be valuable to anyone interested in analysis of this data. Nice work!
I appreciate the opportunity to review this package and hope that my comments are constructive and useful. This package needs just a few

tweaks (namely getting the examples up and running) and then I will gladly approve!

cderv · 2018-03-10T17:38:48Z

Hi @evanodell

Great work ! And thanks a lot for your answer formatting just above. It is very easy to follow how you took our comments into account. Nice.

I updated my first post by checking some boxes ✅ and ⏲ of review.

I still have a few remarks after taking another pass on your updated code

Package Website

Nice idea to centralize all your documentation. However, there is still a few things to adjust as I get 404 error navigating through doc. I can only view the reference page of the package but not the vignette for example. Must be something to do with how pkgdown deals with path when hosting in another site.
I do not have the solution for your https request, but you may find some help in the #rstats community.

Contributing.md

Great addition. Two things:

I would add it to .Rbuildignore as you get a R CMD CHECK note otherwise
I would add a link to this doc in the README. (Example in another great Ropensci 📦 )

Readme

Great improvement !! I find it now a good introduction to your 📦 with nice examples.
You may find me sticky on this but the mention to UK is still missing in your introduction. Your DESCRIPTION is good, so why not add the first line in your readme ?

Access UK official statistics from the Nomis database through R

About formatting, take care : links won't works when between single quote ``.

`[tibbles](http://tibble.tidyverse.org/)`

must be changed to

[`tibbles`](http://tibble.tidyverse.org/)

in order to get tibbles

Also, you need to keep in mind to synchronise your README.Rmd modification and README.md generation. If you use usethis::use_readme_rmd, it puts a git pre-commit hook that prevent you to commit if md is older than Rmd. That way you'll never forget to rerender. This comes from devtools::use_readme_rmd now in usethis. (BTW, a pre-commit hook is a little .sh script that is executed before every commit. You can see it there)

Vignettes

It looks good ! No problem to take the example I gave you. It was just to illustrate, it could be rework but if you found it useful to you, it could be useful to other.
The html vignette is now included, but not up to date. Before installing the package, after a git pull I needed to do a devtools::build_vignette. Not sure if the vignette are to be built in source package. I found re-reading R package book that they can be built at installation with devtools::install_github("evanodell/nomisr", build_vignettes = TRUE), so i guess it is not an issue if not present in the package.

about curl and travis

I read something on that but I did not found it. Sorry. It may worth asking on ropencsi forum, rstudio community.

About download data limit

I saw you wanted to do something with interactive() in your code, but it is still commented out. You may try to put this code in separate feature branch to clean your master if it is still work in progress.

I think your 📦 is ready ! Very nice work, I still find it a simple one but powerful one as always with data packages. Thanks!

evanodell · 2018-03-22T19:01:37Z

Thank you @cderv

I did not know about usethis::use_readme_rmd, that has made things much easier. I've made the other tweaks you've suggested as well.

I've split off the the interactive() bits into a dev branch.

The curl and travis issue seems to stem from the internal loadpkg function in jsonlite. Using httr to check for http response codes seems to have solved the problem. I'm not sure why but at least the solution provides more informative error messages.

Regarding pkgdown, path in build_site() seems to have stopped working, but I think I now have a suitable workaround.

Enhancements to documentation as part of rOpenSci review ropensci/software-review#190

pegeler · 2018-03-30T06:01:01Z

Package Review for `nomisr`

Onboarding submission: #190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software

Authors: A list of authors with their affiliations

A statement of need clearly stating problems the software is designed to solve and its target audience.

References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 14

Review Comments

Overall this package is well put together.

The data structures returned by the Nomis API are very complex and this package does a good job of helping the user along to find the data they need. I was able to get metadata and data out without too much trouble. Existing Nomis users should find this package facilitates their workflow and users new to Nomis data will likely struggle a bit, but less so because of the package's enhanced vignettes, documentation, and README.

As development continues, I would suggest continuing to consider how to make the data more accessible to users who may not be familiar with data structures returned by the Nomis API. In doing so, the package user-base will continue to expand.

It appears that functionality is fairly complete. I am able to navigate metadata several ways and can pull data with relative ease. Great work!! I think this package is ready!!!

NEWS.md

The NEWS.md will need to be updated when 0.0.3 is released. Don't forget to git tag it as well 😀!

Conclusion

From the time of initial review to now, I have seen this package transformed from an already good package to something really awesome. I appreciate the way that the package author has taken reviewer comments and turned them into opportunities to rethink existing code. These changes represent a very thoughtful approach that is reflected in package usability and performance.

Thanks again for the chance to review!

cderv · 2018-03-31T13:40:14Z

@evanodell Well done! I am really impress with the evolution of this really a great 📦 from the beginning of the review and now. Awesome work ! Thanks for having taken into account our advices, I really appreciated it like @pegeler.

It was really a pleasure to review your package and it gave me the chance to discover the NOMIS database. I think your package is a great example of using a simple api and prepare the data return to an easier and practical format for the users.

I agree with @pegeler, this package is more than ready to ✨ in ropensci 📦 collections!

@karthik, what are the following steps ?

A few minor edits prior to submitting second round of review. ropensci/software-review#190

karthik · 2018-04-16T20:15:44Z

@cderv Thanks for this fantastic, thorough and thoughtful review! Since you have signed off, nothing further required from your side. Now just waiting on @pegeler to provide the final sign off (edit: which I noticed is done now)

karthik · 2018-04-16T20:18:19Z

Congrats @evanodell! Your submission is now approved and accepted! 🎉 Thank you for submitting and many thanks to @cderv and @pegeler for thorough and timely reviews. To-dos:

Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
Add the rOpenSci footer to the bottom of your README

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Fix any links in badges for CI and coverage to point to the ropensci URL. (We'll turn on the services on our end as needed)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

evanodell · 2018-04-16T21:10:37Z

Thanks @karthik! I'm more than happy to write something up on the package, whichever format you prefer.

stefaniebutland · 2018-04-17T18:09:47Z

Hi @evanodell. Great to hear you're interested in contributing a post about nomisr. The format is really up to you, but you could take publication dates into account. Right now we have a full schedule of longer narrative posts up to ~June so if you would like it up quickly, a technote is the way to go. We'll tweet from @ropensci either way.

You can read example narrative posts about onboarded packages here: https://ropensci.org/tags/onboarding/ and technotes are here: https://ropensci.org/technotes/

Technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post. The only difference in YAML for technote is
categories:

technotes

I will add the topicid before publishing.

Happy to answer questions here.

karthik · 2018-04-17T20:58:26Z

@evanodell Final suggestion. If you found feedback from the reviewers valuable, consider adding them as reviewers in the description of your package (this is entirely up to you and not a requirement). If you decide to, you'll need:

The ORCID for both reviewers (if the reviewers don't have it, they should sign up for one here: https://orcid.org/)
Follow this format: https://github.com/ropensci/rdflib/blob/master/DESCRIPTION#L8-L10
More context for why such attribution matters: https://ropensci.org/blog/2018/03/16/thanking-reviewers-in-metadata/

Changed the way API keys are handled ropensci/software-review#190

evanodell · 2018-04-17T21:40:12Z

@cderv do you have an ORCID I can include to list you as a reviewer in the package description?

evanodell · 2018-04-17T21:46:48Z

@stefaniebutland A technote seems good to me. I can send it to you by the end of the week if that works?

cderv · 2018-04-18T05:57:38Z

@evanodell Yes I have one : https://orcid.org/0000-0003-4474-2498
Thank you for this offer to add me, I appreciate!

karthik · 2018-04-18T17:16:22Z

@evanodell Thanks again for the submission. I'll close the issue since the review is complete but please feel free to continue the conversation with Stefanie re the blog post.

karthik self-assigned this Jan 30, 2018

karthik added package topic:data-access topic:data-extraction 1/editor-checks labels Jan 30, 2018

karthik changed the title ~~nomisr - presubmission~~ nomisr Jan 30, 2018

karthik added 3/reviewer(s)-assigned and removed 1/editor-checks labels Feb 2, 2018

evanodell added a commit to ropensci/nomisr that referenced this issue Mar 28, 2018

Merge pull request #1 from pegeler/review2

cd0beb6

Enhancements to documentation as part of rOpenSci review ropensci/software-review#190

evanodell added a commit to ropensci/nomisr that referenced this issue Apr 2, 2018

Merge pull request #3 from pegeler/review2

d8c942c

A few minor edits prior to submitting second round of review. ropensci/software-review#190

karthik added 6/approved and removed 3/reviewer(s)-assigned labels Apr 16, 2018

evanodell added a commit to ropensci/nomisr that referenced this issue Apr 17, 2018

Merge pull request #5 from pegeler/master

a3fb066

Changed the way API keys are handled ropensci/software-review#190

karthik closed this as completed Apr 18, 2018

stefaniebutland mentioned this issue Apr 20, 2018

Technotes for the nomisr package ropensci-archive/roweb2#175

Merged

arfon mentioned this issue Jul 28, 2018

[REVIEW]: nomisr: Access 'Nomis' UK Labour Market Data openjournals/joss-reviews#859

Closed

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nomisr #190

nomisr #190

evanodell commented Jan 30, 2018 •

edited

Loading

evanodell commented Jan 30, 2018

karthik commented Jan 30, 2018

karthik commented Jan 30, 2018 •

edited

Loading

cderv commented Feb 10, 2018 •

edited

Loading

For packages co-submitting to JOSS

maelle commented Feb 11, 2018

karthik commented Feb 13, 2018

pegeler commented Feb 23, 2018

For packages co-submitting to JOSS

evanodell commented Mar 9, 2018 •

edited

Loading

List-columns

About Community Guidelines

Improvement for the packaging guidelines

README improvement

Vignettes improvement

About documentation of functions and examples

Various comments on the code

About the timeout and size of data

About SMDX format on NOMIS

CITATION

cderv commented Mar 10, 2018

evanodell commented Mar 22, 2018

pegeler commented Mar 30, 2018

For packages co-submitting to JOSS

cderv commented Mar 31, 2018

karthik commented Apr 16, 2018 •

edited

Loading

karthik commented Apr 16, 2018

evanodell commented Apr 16, 2018

stefaniebutland commented Apr 17, 2018

karthik commented Apr 17, 2018

evanodell commented Apr 17, 2018

evanodell commented Apr 17, 2018

cderv commented Apr 18, 2018

karthik commented Apr 18, 2018

nomisr #190

nomisr #190

Comments

evanodell commented Jan 30, 2018 • edited Loading

Summary

Requirements

Publication options

Detail

evanodell commented Jan 30, 2018

karthik commented Jan 30, 2018

karthik commented Jan 30, 2018 • edited Loading

nomisr

Editor checks:

Editor comments

cderv commented Feb 10, 2018 • edited Loading

Package Review for nomisr

Documentation

For packages co-submitting to JOSS

Functionality

Final approval (post-review)

Review Comments

General comments

About Community Guidelines

Improvement for the packaging guidelines

README improvement

Vignettes improvement

About documentation of functions and examples

Various comments on the code

About the timeout and size of data

About SMDX format on NOMIS

Last words

maelle commented Feb 11, 2018

karthik commented Feb 13, 2018

pegeler commented Feb 23, 2018

Package Review for nomisr

Documentation

For packages co-submitting to JOSS

Functionality

Final approval (post-review)

Review Comments

Basic Package Structure Overview

README.md

DESCRIPTION

CITATION

Documentation

Examples

Vignettes

Functionality

Conclusion

evanodell commented Mar 9, 2018 • edited Loading

Response to @cderv

List-columns

About Community Guidelines

Improvement for the packaging guidelines

README improvement

Vignettes improvement

About documentation of functions and examples

Various comments on the code

About the timeout and size of data

About SMDX format on NOMIS

Response to pegeler

CITATION

Documentation

cderv commented Mar 10, 2018

Package Website

Contributing.md

Readme

Vignettes

about curl and travis

About download data limit

evanodell commented Mar 22, 2018

pegeler commented Mar 30, 2018

Package Review for nomisr

Documentation

For packages co-submitting to JOSS

Functionality

Final approval (post-review)

Review Comments

NEWS.md

Conclusion

evanodell commented Jan 30, 2018 •

edited

Loading

karthik commented Jan 30, 2018 •

edited

Loading

cderv commented Feb 10, 2018 •

edited

Loading

Package Review for `nomisr`

Package Review for `nomisr`

evanodell commented Mar 9, 2018 •

edited

Loading

Package Review for `nomisr`

karthik commented Apr 16, 2018 •

edited

Loading