Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomisr #190

Closed
12 of 19 tasks
evanodell opened this issue Jan 30, 2018 · 21 comments
Closed
12 of 19 tasks

nomisr #190

evanodell opened this issue Jan 30, 2018 · 21 comments

Comments

@evanodell
Copy link

evanodell commented Jan 30, 2018

Summary

  • What does this package do? (explain in 50 words or less):

Downloads UK official statistics from the Nomis API and converts to R objects, primarily tibbles. The API is based around different statistical geographies.

  • Paste the full DESCRIPTION file inside a code block below:
Package: nomisr
Type: Package
Title: Access Nomis UK Labour Market Data with R
Version: 0.0.2
Date: 2018-01-30
Authors@R: person("Evan Odell", email="evanodell91@gmail.com",
  role=c("aut", "cre"),
  comment = c(ORCID='0000-0003-1845-808X'))
Author: Evan Odell [aut, cre]
Maintainer: Evan Odell <evanodell91@gmail.com>
Description: Access UK official statistics from the Nomis database through R. 
    Nomis includes data from the Census, the Labour Force Survey, DWP benefit 
    statistics and other economic and demographic data from the Office for 
    National Statistics, based around statistical geographies. See 
    <https://www.nomisweb.co.uk/api/v01/help> for full API documentation.
URL: https://github.com/EvanOdell/nomisr
BugReports: https://github.com/EvanOdell/nomisr/issues
License: MIT + file LICENSE
Imports: 
  jsonlite,
  tibble,
  readr,
  dplyr,
  curl
RoxygenNote: 6.0.1
Suggests: 
  knitr,
  rmarkdown,
  testthat
VignetteBuilder: knitr
  • URL for the package (the development repository, not a stylized html page):

https://github.com/EvanOdell/nomisr

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

[e.g., "data extraction, because the package parses a scientific data file format"]

Data retrieval, because the package assists the downloading of a data from an API into R.

  • Who is the target audience and what are scientific applications of this package?

Demographers, economists, geographers, public health researchers, any other social scientists who are interested in geographic factors. The package will aid reproducibility, reduce the need to manually download area profiles, and allow easy linking of different datasets covering the same geographic area.

There are no other R packages specifically for retrieving this data. Some of the data on Nomis is also available through the UK Data Service with the ukds package, but other ONS data, including labour force, benefits, mortality and business statistics are not otherwise available through an R package. The UK Data Service also requires users registration, while Nomis does not, and does not, as far as I can tell, provide the same linking with statistical geographies as Nomis.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses.
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN?
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal.
    • (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:

  • If this is a resubmission following rejection, please explain the change in circumstances:

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

@evanodell
Copy link
Author

Hi all, submitting this as a presubmission because I'm not sure how much it fits with rOpenSci's remit.

Best,

Evan

@karthik
Copy link
Member

karthik commented Jan 30, 2018

👋 @evanodell
Thanks for this submission. It fits our suite, and since you used the full template (which isn't required for a pre-submission inquiry), I can just treat this as a submission and proceed as such. I'll edit your title to remove pre-submission.

I will also be the editor on your submission, so stay tuned for next steps.

@karthik
Copy link
Member

karthik commented Jan 30, 2018

nomisr

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Here are results of a check on the good practice package on some best practices. You can re-run this yourself (as you fix issues) by running goodpractice::gp() on your package. Please fix relevant issues before reviewers begin to take a look at your package.

── GP nomisr ───────────────────────────────────────────────────────────────────

It is good practice to

  ✖ write unit tests for all functions, and all package code
    in general. 59% of code lines are covered by test cases.

    R/data_download.R:103:NA
    R/data_download.R:137:NA
    R/data_download.R:139:NA
    R/data_download.R:141:NA
    R/data_download.R:143:NA
    ... and 35 more lines

  ✖ omit "Date" in DESCRIPTION. It is not required and it
    gets invalid quite often. A build date will be added to the package
    when you perform `R CMD build` on it.
  ✖ avoid long code lines, it is bad for readability. Also,
    many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/data_download.R:118:1
    R/data_download.R:135:1
    R/data_download.R:157:1
    R/nomis_codes.R:52:1
    R/search.R:29:1
    ... and 3 more lines

  ✖ avoid 1:length(...), 1:nrow(...), 1:ncol(...),
    1:NROW(...) and 1:NCOL(...) expressions. They are error prone and
    result 1:0 if the expression on the right hand side is zero. Use
    seq_len() or seq_along() instead.

    R/data_download.R:143:14

  ✖ not import packages as a whole, as this can cause name
    clashes between the imported packages. Instead, import only the
    specific functions you need.
  ✖ fix this R CMD check WARNING: LaTeX errors when creating
    PDF version. This typically indicates Rd problems.

Reviewer 1: @cderv
Due date: February 23rd

Reviewer 2: @pegeler
Due date: February 25th

@cderv
Copy link
Contributor

cderv commented Feb 10, 2018

Package Review for nomisr

Onboarding submission : #190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8


Review Comments

General comments

First of all, great package ! Very well done, Everything is good!
Well done really ! There was still medium coverage at the submission, it is 100% now and all run perfectly locally. Know that goodpractice::gp praises you right :

♥ Whee! Grand package! Keep up the fabulous work!

Everything is based on tibble class that makes it perfect to work inside the tidyverse. It is really great! However, maybe this could be more explained for those who are not used to tibbles and tidyverse ecosystem.
I find it kind of necessary because you also used list-columns in tibbles. This is not something easy for some newcomers to handles. Even for some dplyr users who do not used that a lot, it is not something easy. Adding some example in the vignettes to show how to retrieve embedded data.frame inside some tibble's list column could be a good idea. To illustrate:

library(nomisr)
library(dplyr, warn.conflicts = F)
y <- nomis_data_info("NM_893_1")
y$annotations.annotation %>% class()
#> [1] "list"
y$annotations.annotation[[1]] %>% class()
#> [1] "data.frame"
y %>% pull(annotations.annotation) %>% class()
#> [1] "list"
y %>% pull(annotations.annotation)%>% .[[1]] %>% class()
#> [1] "data.frame"
y %>% pull(annotations.annotation) %>% purrr::pluck(1) %>% class()
#> [1] "data.frame"
y %>% pull(annotations.annotation) %>% purrr::map(1) %>% class()
#> [1] "list"
y %>% tidyr::unnest(annotations.annotation) %>% glimpse()
#> Observations: 14
#> Variables: 11
#> $ agencyid                             <chr> "NOMIS", "NOMIS", "NOMIS"...
#> $ id                                   <chr> "NM_893_1", "NM_893_1", "...
#> $ uri                                  <chr> "Nm-893d1", "Nm-893d1", "...
#> $ version                              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1...
#> $ components.primarymeasure.conceptref <chr> "OBS_VALUE", "OBS_VALUE",...
#> $ components.timedimension.codelist    <chr> "CL_893_1_TIME", "CL_893_...
#> $ components.timedimension.conceptref  <chr> "TIME", "TIME", "TIME", "...
#> $ name.value                           <chr> "LC4408EW - Tenure by num...
#> $ name.lang                            <chr> "en", "en", "en", "en", "...
#> $ annotationtext                       <chr> "Current (being actively ...
#> $ annotationtitle                      <chr> "Status", "Keywords", "Un...

Following are my review comments, for improvement:

About Community Guidelines

  • There is no contribution guidelines in the README or other CONTRIBUTING.md.
  • DESCRIPTION needs some improvement : no need to use Author and Maintainer fields if you are using Authors@R. You can just use the latest. You can find some good example in R package book.

Improvement for the packaging guidelines

see below parts for details

  • Adding a code of conduct
  • Improve README considering it is the first entry point for discovering NOMIS API
  • Improve DESCRIPTION for authorship
  • [minor] follow versionning advice with *.*.*.9000 for in development version
  • [question] I did not find why you need curl package in suggest...

README improvement

  • I would be more precise on what the NOMIS database is and for who it is. For example, I found that this submission issue mentioned UK database but not the README. You help file for the package as pretty clear description of what it is for. While reviewing, I had a doubt and went to check on NOMIS. This is why I think the statement of need is not complete.
  • It feels like the README could get more examples on what your package can do. Only one of your functions is shown in the readme.
  • Put package documentation link in the title of the repo like in the reprex repo. By the way, you can add a more descriptive description on your github repo and some tags.
  • could precise that it uses the tibble format (from the tidyverse)

Vignettes improvement

  • Vignette was not built when I installed the package with devtools::install_github. Did not know if this intended behavior or not.
  • I tried to build the vignette locally then: it worked.
    • However, there is a problem on result formatting with the tibble. (I think because you use result = 'asis' in code chunks) and because every results are not tables (y$annotations.annotation is a list for example.) so it does not print nicely.
    • The vignette do not show any code to get the results - I would show them for someone to learn how to use your function, as examples. (with echo = TRUE)
    • The formatting issue is visible on your website too
  • I think the vignette contains necessary information but does not show enough examples on how to retrieve data in the objects you return. You mentioned the three list columns with nested data.frame but not show how correctly extracts the info.
  • It is pretty clear that you package allows to retrieve a large amount of data from the NOMIS API. it seems necessary to always use nomis_data_info, then nomis_codes then nomis_get_data to be sure to not retrieve to much. The vignette helps you understand that. It could appears to in your readme, and it could be more clear how to construct a pipeline for good practice with this API.

These improvement are the reason why I did not check the vignettes box, even if there is one in the github repo and doc.
If you need help on all this, do not hesitate to ask.

About documentation of functions and examples

  • All functions are well documented.
  • I encountered some issues running the examples because of a timeout. I believe this is what you meant in your by "there are some limits to the amount of data that can be retrieved within a certain period of time, although those are not published.". The maybe some improvement to do on this. (see under)
  • Is this the reason why you enclosed each example in \dontrun{} statement ? am I not sure on what is the recommendation for API package like this one for example ? \dontrun{} does allow anyone to run examples with something like example("nomis_codes", package = "nomisr"). Idea : Maybe \donttest{} is better here: does show in help and runs with example, but not run by R CMD check (so not run by CRAN ?) - see rd vignette from roxygen2
  • I ran all tests with devtools::run_examples(run = FALSE) (in several time, due to timeout constraint). Everything is working. It works well for almost all, but your example assign systematically to a variable (named x or y - room for improvement) but never print to show head of results. I feel that is missing.
  • I said "almost all" because there are some very long request in nomis_get_data. The warning that you put in Details states that it can be very slow process if calling significant amount of data. I would emphasize on this, saying this warning in Description part of the help file. And these example should definitely not be ran.
  • I saw you made a pkgdown website. This is great !! and nice theme!
    I did not find the source code of this website. I would have thought it leaved with the package source code. But no docs folder or gh-pages. I find it useful to have access for letting people PR your some typos or other thing and all the site from your package source code. However, you may have good reason.

Various comments on the code

  • [minor] I would put the url of the API in its own variable. I think it is better practice and more efficient to maintain in case of change in the url (e.g V0.2). The url is used several time and a change would mean copy pasting. another utils function could help building the url (paste0 is used three times), with the same base url def.sdmx.json). Know that it exists some tools to help build some strings with parameters like glue, and specific one to handle url like urltools or those in httr. You may not need them here but it is worth knowing.

  • suppressMessage is used to hide the message of readr::read_csv. However, the message appears because column types are guessed and not set. It is good practice to set column type in readr with column_types = cols(<something>). In an API package like this one, it make even more sense because it could helps be aware of any change in the format. (using stop_for_problems() or just see a message appear) - see readr vignettes

  • [minor] I don't know what are the good practice on this one. I just feel that you have a dplyr dependency in import for three functions bind_rows, case_when and if_else than can be easily replace with rbind or if.else. I feel that, if the former are very useful interactively in an analysis, using them in a package can be unnecessary dependency to a "big" package that can cause you some effort later on if anything change in dplyr. More of a personal point of view on this one.

  • However, it relates on how you are build your query url

  query <- dplyr::if_else(keywords==FALSE,
                           paste0("/def.sdmx.json?search=*", 
                                  search, "*"),
                           paste0("/def.sdmx.json?search=keywords-", 
                                  search, "*")

could be rewritten without dplyr::if_else

base_query <- "/def.sdmx.json?search="
if (keyword) {
    keyword_term <- "keywords-")
} else {
    keyword_term <- "*"
}
# or keyword_term <- if.else(keyword, "keywords-", "*")
query <- paste0(base_query, key_word_term, search, "*")

It would suppres a non essential dependency.

  • The style in code could be improve in some places. You can follow some style guide to help you.
    like the one in ROpensci Guidelines linking to Advanced R or more recent the one used in the tidyverse styleguide. They even build some tools to help like styler and lintr. Example of improvement:
    • put spaces after foror if and also around =, ==, +, >, ...
    • indentation of code (in RStudio CTRL+i i useful). for example in if () { } else { }, lines of code that are inside brackets should be indented.
    • use {} with if when it does not fit on one line
    • More importantly, be consistent in all your script with the guideline you choose.
  • Know that when you want to stop on a condition, there is also stopifnot. For user experience, you can also return clearer error sometimes with stop("API request did not return any results", call. = FALSE) to not show the call for example.
  • A good practice is also to use meaningful name inside your internal code, even for temporary variable. There are a lot of x, a, q, ... This will improve readability and the ease for others to collaborate on your code (including future you)

About the timeout and size of data

  • If session is interactive(), it may be a good idea to warn clearly and even ask if we want to continue when lots of data are being requested. I find myself playing with the example trying to retrieve 208888 of your pages
Retrieving additional pages 1 of 208888
Retrieving additional pages 2 of 208888

I don't know if some other API packages are doing something to help user or if it is all the user responsibility to know what is he doing. (users will surely be already NOMIS data users).

  • As another improvement, I think you should try to create custom error message for the timeout issue as you know it is an issue that will impact user experience but cannot really prevent. I encountered:
a <- nomis_codes("NM_1_1")
# Error in open.connection(con, "rb") : 
#  Timeout was reached: Resolving timed out after 10000 milliseconds

You could use some tryCatch mechanism (some useful info in Advanced R).

  • [thoughts] You may need more control on the request mechanism to deal with timeout. First, it seems for the website one need to contact NOMIS help-desk to know more about limitation. Then , your are relying on readr connection mechanism, and may need to use httr, crul or curl to better handle request to API with the specificity of NOMIS

About SMDX format on NOMIS

  • Do you know about the rsdmx package ? it is on github and CRAN, there is WIKI. I did not find NOMIS in their listing, so I did not know if it works for you and how it could overlap with your package.
    rsdmx is definitely more a generic package than yours. I just wanted to point it to you in case you did not know.

Last words

Thank you very much letting me review this package. It is pretty simple one, yet powerful as it brings to R users lots of new open data for UK. And it does it well allowing to work with tidyverse tools directly thanks to the tibble format.

All my remarks were toward improvement. The package in its currents state is functional, is well tested and as the minimal documentation to get start with it.

@evanodell, if you have some questions or you want some help / advice on anything, feel free to reach to me, I will be glad to help you.

@karthik, I hope I am doing fine for my first review with Ropensci, please tell me if miss something and need to review further.

@maelle
Copy link
Member

maelle commented Feb 11, 2018

I'm not the editor handling this review but I find that it is a very good review @cderv, thank you!

@karthik
Copy link
Member

karthik commented Feb 13, 2018

Quick note: I’ll be away from the internet for about a week and will be back online around 02/21. So I will be back to editorial activities shortly.

@pegeler
Copy link

pegeler commented Feb 23, 2018

Package Review for nomisr

Onboarding submission: #190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10


Review Comments

Excellent work! This is a fine example of the level of quality and professionalism that I have
come to expect from rOpenSci contributions. This package is a great addition to the
complement of R functionality and provides a facile portal to a useful data source.

Many of the comments that I had have been addressed as a result of @cderv's prompt and thorough review.
In addition, development has continued to be very active over the past few weeks! Issues are
fixed as quickly as I find them! Below are a few more comments that I would like to add.

Basic Package Structure Overview

README.md

The README.md does not contain citation information, per rOpenSci packaging guide.
I am not sure how necessary that is since there is a CITATION file in the inst/ directory,
however, it should be an easy add.

I would concur with @cderv in suggesting additional usage examples.

DESCRIPTION
  • The DESCRIPTION file does not currently specify a minimum version of R. I would
    recommend doing so. A minor release is preferred over a patch release, e.g.

    Depends: R (>= 3.4.0)
    

    is better than

    Depends: R (>= 3.4.1)
    
  • I am not seeing where curl is used even though it is in the Suggests list.

CITATION

The CITATION file uses meta$Date to dynamically fill in date for the citation.
However, the Date entry was removed from the DESCRIPTION file per gp suggestion so
the date is NULL.

Since it is considered good practice to omit the Date entry because it may be prone to
falling out of date, it seems that hard-coding the date in the citation would be silly.
I briefly considered the use of a different field to dynamically update the citation date,
such as Date/Publication or Packaged. However, those will only work for users who are downloading
from a CRAN repos. Therefore I'm a little bit at a loss as to what the most elegant solution might be.
I might suggest breaking with good practice here and adding the Date field back in
since there is a reason to do so.
Suggestions are welcome.

The CITATION file should also end in a newline character :)

Documentation
  • The man page for the package is named nmisr, not nomisr. This appears to be a typo.
  • @seealso can link to the other package functions using \link{}.
  • It appears that some descriptions may be intended to be multi-paragraph. Without explicitly
    using the @description and @details tags, the third code block is assumed to be the 'details'
    block by default. Therefore, using explicit roxygen tags might be helpful here.
  • Nomis data comes packaged in very complex data structures. Please don't be afraid of 'over-explaining'.
    The more information in the documentation and vignettes, the better for the user, especially if they do
    not have prior experience with Nomis data.
  • 'Nomis' in the title for nomis_search() should be capitalized to be consistent with the rest of the package.

Examples

  • The example for nomis_get_data failed due to lack of data. I recommend finding a specific
    date range that will return results and then hard-code that into the example.
  • Athough \donttest{} used to be perfect for API packages because it would not be run by
    R CMD check but would be run by example() and would be included in the man page, it appears
    that the functionality has changed
    .
    Now, according to Hadley's book, it appears that code enclosed
    in \donttest{} is indeed checked (though I have yet to confirm). Therefore,
    for long-running API calls, it might make sense to change back to \dontrun{} if you are
    concerned about time-out. I recall
    reading some best practice advice on CRAN's website at some point, but cannot find the reference
    anymore.
  • The examples often show multiple ways to use the function, which is excellent. However,
    it might be a nice enhancement to add a little extra code to glimpse the data for the users to get
    a better feel for what they are getting out of the API call. This is especially helpful since
    the data that is returned can differ greatly depending on what args are passed.

Vignettes

  • I like the switch to echo = TRUE for vignette code chunks. This is the user's opportunity
    to see the code in action for the first time. Including the code used to generate the results can
    aid in understanding and readability.
  • Vignettes are not currently built. This is not a big deal. However, it would
    be a convenience for those installing from GitHub or local repos. This is especially true since
    the API calls can cause vignettes to take a long time to build. (Currently I cannot build vignettes
    because I keep getting hung up).
  • The addition of the theme and TOC is a nice touch.
  • It looks like the vignette might still be a little incomplete. I see that the final paragraph of
    Querying data variables is not finished.
  • It's unnecessary to hard-code the results in code chunks. The results will be output when the vignette is knit.

Functionality

  • nomis_search()
    • [suggestion] Currently, users can either search based on generic search terms or search for specific keywords.
      Looking at the API docs, I think there is an opportunity to improve functionality by taking advantage
      of some of the other search categories (e.g., name, description, content type, units). This would
      be doubly useful since content type and units are not searched when performing a generic search.
    • [suggestion] Rather than automatically inserting wildcards (*), I would recommend informing the user of the wildcard
      character so that they can have finer control over how wildcards are used in their search.
    • It would be nice to mention in the documentation that several terms can be searched at once using
      a comma. Alternatively, the parameter could accept a character vector that could be
      collapsed with commas.
    • [suggestion] The function could potentially accept character vectors from each of the categories (as well as
      a generic search term field) and then the appropriate search string could be created.
  • nomis_codes()
    • Generally, it is helpful to mirror the language of the source material when possible.
      The nomis_codes() function accepts a dimension (concept) and the function returns codes for that dimension.
      Therefore, the argument code might be renamed to dimension or concept, since the
      function searches for codes availalbe within a dimension.
    • Related to above, I think the @title could be more descriptive.
    • [suggestion] The query is currently built using a hierarchal structure. E.g., /api/v01/dataset/<ID>/<DIMENSION>/<CODE>/def.sdmx.json. However,
      I think there is an opportunity to make this query a little more flexible and extensible. Instead of enforcing the hierarchy,
      the funciton could use the GET verb to provide a little more control over how the data is filtered. A user could
      query on criteria from different dimensions than the one which they are interested in results from. For example,
      even if the user is interested in the 'item' dimension, they may want to filter on 'geography' and 'sex'. Using GET would allow
      for this. As an example: /api/v01/dataset/NM_1_1/item.def.sdmx.json?geography=2038432081&sex=7. This would also expose the user
      to options for specifying times and dates, which cannot otherwise be done. I fully acknowledge that this might be over-building,
      since this function only provides a support role and will probably not be used that way.
    • This is a minor concern, but I feel it's cleaner to specify value if a test function returns TRUE first when using ifelse().
      The code ifelse(is.null(type), "", paste0("/", type)) reads cleaner than ifelse(is.null(type) == FALSE, paste0("/", type), "") IMO.
  • nomis_data_info()
    • The documentation could be more explicit in telling the user that the data returned by this function is metadata.
    • [suggestion] Relating to the comment above, it may be a little clearer to rename the function something to the effect of nomis_get_metadata().
      This naming scheme is more in line with the pattern set forth by nomis_get_data naming scheme.
  • nomis_get_data()
    • I see that the API documentation states, "you have a 25000 cell limit on the number of data values downloaded in these formats."
      It appears that the code currently breaks the fetch into multiple steps if the record count is greater than or equal to 25k records.
      I am curious if this might be an oversight. It might be useful to test the edge case where the query returns more than 25k cells but
      fewer than 25k records to see if everything still works.
    • Including a select parameter may be a good enhancement for the future. Right now, that can be accomplished with
      additional_queries but is probably important enough to be standalone. This would allow for additional documentation
      and sugar to be added for it.

Conclusion

This is a great package that will be valuable to anyone interested in analysis of this data. Nice work!

I appreciate the opportunity to review this package and hope that my comments are constructive and useful. This package needs just a few
tweaks (namely getting the examples up and running) and then I will gladly approve!

@evanodell
Copy link
Author

evanodell commented Mar 9, 2018

@hi @cderv and @pegeler, thank you both of you very much for your helpful comments and suggestions, apologies for not being able to respond to them sooner.

Response to @cderv

List-columns

I find it kind of necessary because you also used list-columns in tibbles. This is not something easy for some newcomers to handles. Even for some dplyr users who do not used that a lot, it is not something easy. Adding some example in the vignettes to show how to retrieve embedded data.frame inside some tibble's list column could be a good idea. To illustrate:

> library(nomisr)
> library(dplyr, warn.conflicts = F)
> y <- nomis_data_info("NM_893_1")
> y$annotations.annotation %>% class()
> #> [1] "list"
> y$annotations.annotation[[1]] %>% class()
> #> [1] "data.frame"
> y %>% pull(annotations.annotation) %>% class()
> #> [1] "list"
> y %>% pull(annotations.annotation)%>% .[[1]] %>% class()
> #> [1] "data.frame"
> y %>% pull(annotations.annotation) %>% purrr::pluck(1) %>% class()
> #> [1] "data.frame"
> y %>% pull(annotations.annotation) %>% purrr::map(1) %>% class()
> #> [1] "list"
> y %>% tidyr::unnest(annotations.annotation) %>% glimpse()
> #> Observations: 14
> #> Variables: 11
> #> $ agencyid                             <chr> "NOMIS", "NOMIS", "NOMIS"...
> #> $ id                                   <chr> "NM_893_1", "NM_893_1", "...
> #> $ uri                                  <chr> "Nm-893d1", "Nm-893d1", "...
> #> $ version                              <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1...
> #> $ components.primarymeasure.conceptref <chr> "OBS_VALUE", "OBS_VALUE",...
> #> $ components.timedimension.codelist    <chr> "CL_893_1_TIME", "CL_893_...
> #> $ components.timedimension.conceptref  <chr> "TIME", "TIME", "TIME", "...
> #> $ name.value                           <chr> "LC4408EW - Tenure by num...
> #> $ name.lang                            <chr> "en", "en", "en", "en", "...
> #> $ annotationtext                       <chr> "Current (being actively ...
> #> $ annotationtitle                      <chr> "Status", "Keywords", "Un...
> 

Response: I've added a similar example to the vignette, which is similar enough to your example that I feel I should cite you as a contributor if that is okay.

Following are my review comments, for improvement:

About Community Guidelines

There is no contribution guidelines in the README or other CONTRIBUTING.md.

Response: Added a CONTRIBUTING.md in ropensci/nomisr@e4bbb02

DESCRIPTION needs some improvement : no need to use Author and Maintainer fields if you are using Authors@R. You can just use the latest. You can find some good example in R package book.

Response: Now only using Authors@R

Improvement for the packaging guidelines

see below parts for details

  • Adding a code of conduct

Response: I a code of conduct in ropensci/nomisr@ede3558 and

  • Improve README considering it is the first entry point for discovering NOMIS API
  • Improve DESCRIPTION for authorship
  • [minor] follow versionning advice with ..*.9000 for in development version

Response: Now following versioning best practice in development version

  • [question] I did not find why you need curl package in suggest...
    Response: For reasons I'm not entirely clear on, the package was failing on Travis without the curl suggestion, I think due to a bug in jsonlite.

README improvement

  • I would be more precise on what the NOMIS database is and for who it is. For example, I found that this submission issue mentioned UK database but not the README. You help file for the package as pretty clear description of what it is for. While reviewing, I had a doubt and went to check on NOMIS. This is why I think the statement of need is not complete.
  • It feels like the README could get more examples on what your package can do. Only one of your functions is shown in the readme.
  • Put package documentation link in the title of the repo like in the reprex repo. By the way, you can add a more descriptive description on your github repo and some tags.
  • could precise that it uses the tibble format (from the tidyverse)

Response: I've added more details to the README, including more details on Nomis itself, a simple example and citation information.

Vignettes improvement

  • Vignette was not built when I installed the package with devtools::install_github. Did not know if this intended behavior or not.
  • I tried to build the vignette locally then: it worked.
    • However, there is a problem on result formatting with the tibble. (I think because you use result = 'asis' in code chunks) and because every results are not tables (y$annotations.annotation is a list for example.) so it does not print nicely.
      • The vignette do not show any code to get the results - I would show them for someone to learn how to use your function, as examples. (with echo = TRUE)
      • The formatting issue is visible on your website too
  • I think the vignette contains necessary information but does not show enough examples on how to retrieve data in the objects you return. You mentioned the three list columns with nested data.frame but not show how correctly extracts the info.
  • It is pretty clear that you package allows to retrieve a large amount of data from the NOMIS API. it seems necessary to always use nomis_data_info, then nomis_codes then nomis_get_data to be sure to not retrieve to much. The vignette helps you understand that. It could appears to in your readme, and it could be more clear how to construct a pipeline for good practice with this API.

These improvement are the reason why I did not check the vignettes box, even if there is one in the github repo and doc.
If you need help on all this, do not hesitate to ask.

Response: As mentioned above, I've added your suggestions on exploring list-columns to the vignette, and fixed the online formatting issues. I've changed echo to TRUE to show code, pre built vignette in ropensci/nomisr@1c9cd3f

About documentation of functions and examples

  • All functions are well documented.
  • I encountered some issues running the examples because of a timeout. I believe this is what you meant in your by "there are some limits to the amount of data that can be retrieved within a certain period of time, although those are not published.". The maybe some improvement to do on this. (see under)
  • Is this the reason why you enclosed each example in \dontrun{} statement ? am I not sure on what is the recommendation for API package like this one for example ? \dontrun{} does allow anyone to run examples with something like example("nomis_codes", package = "nomisr"). Idea : Maybe \donttest{} is better here: does show in help and runs with example, but not run by R CMD check (so not run by CRAN ?) - see rd vignette from roxygen2

Response: I've changed \dontrun{} to \donttest{} in ropensci/nomisr@ac3e5a4

  • I ran all tests with devtools::run_examples(run = FALSE) (in several time, due to timeout constraint). Everything is working. It works well for almost all, but your example assign systematically to a variable (named x or y - room for improvement) but never print to show head of results. I feel that is missing.

Response: I have changed the names in examples in ropensci/nomisr@ac3e5a4 to be more descriptive of the data they are actually retrieving, and added a print call so the head is actually displayed.

  • I said "almost all" because there are some very long request in nomis_get_data. The warning that you put in Details states that it can be very slow process if calling significant amount of data. I would emphasize on this, saying this warning in Description part of the help file. And these example should definitely not be ran.

Response: I've removed the very long examples from nomis_get_data ropensci/nomisr@ac3e5a4 - I considered wrapping them in \dontrun{} but I think dropping them is the better option.

  • I saw you made a pkgdown website. This is great !! and nice theme!

Response: Thank you!

*I did not find the source code of this website. I would have thought it leaved with the package source code. But no docs folder or gh-pages. I find it useful to have access for letting people PR your some typos or other thing and all the site from your package source code. However, you may have good reason.

Response: I keep all the source code for my documentation website in my docs repo. I did this so I could host on netlify with a custom domain and https encryption and not have to use separate subdomains for each package. I'm not sure if that is the best option or not.

Various comments on the code

  • [minor] I would put the url of the API in its own variable. I think it is better practice and more efficient to maintain in case of change in the url (e.g V0.2). The url is used several time and a change would mean copy pasting. another utils function could help building the url (paste0 is used three times), with the same base url def.sdmx.json). Know that it exists some tools to help build some strings with parameters like glue, and specific one to handle url like urltools or those in httr. You may not need them here but it is worth knowing.

Response: I've turned the API URL into its own variable in ropensci/nomisr@ac3e5a4

  • suppressMessage is used to hide the message of readr::read_csv. However, the message appears because column types are guessed and not set. It is good practice to set column type in readr with column_types = cols(). In an API package like this one, it make even more sense because it could helps be aware of any change in the format. (using stop_for_problems() or just see a message appear) - see readr vignettes

Response: I've dropped suppressMessage and used col_types = readr::cols(.default = readr::col_character()) to make columns characters by default in ropensci/nomisr@ac3e5a4

  • [minor] I don't know what are the good practice on this one. I just feel that you have a dplyr dependency in import for three functions bind_rows, case_when and if_else than can be easily replace with rbind or if.else. I feel that, if the former are very useful interactively in an analysis, using them in a package can be unnecessary dependency to a "big" package that can cause you some effort later on if anything change in dplyr. More of a personal point of view on this one.

However, it relates on how you are build your query url

  query <- dplyr::if_else(keywords==FALSE,
                           paste0("/def.sdmx.json?search=*", 
                                  search, "*"),
                           paste0("/def.sdmx.json?search=keywords-", 
                                  search, "*")

could be rewritten without dplyr::if_else

base_query <- "/def.sdmx.json?search="
if (keyword) {
    keyword_term <- "keywords-")
} else {
    keyword_term <- "*"
}
# or keyword_term <- if.else(keyword, "keywords-", "*")
query <- paste0(base_query, key_word_term, search, "*")

It would suppres a non essential dependency.

Response: I've dropped the use of case_when and if_else in favour the the style you described above in ropensci/nomisr@ac3e5a4. bind_rows is used to combine together a single tibble with a list of one or more tibbles, and using rbind in place of bind_rows would likely require recreating the bind_rows function. If there are changes to bind_rows in the future it is only in one location so should be easy to adjust to.

  • The style in code could be improve in some places. You can follow some style guide to help you. like the one in ROpensci Guidelines linking to Advanced R or more recent the one used in the tidyverse styleguide. They even build some tools to help like styler and lintr. Example of improvement:
    * put spaces after foror if and also around =, ==, +, >, ...
    * indentation of code (in RStudio CTRL+i i useful). for example in if () { } else { }, lines of code that are inside brackets should be indented.
    * use {} with if when it does not fit on one line
    * More importantly, be consistent in all your script with the guideline you choose.
  • Know that when you want to stop on a condition, there is also stopifnot. For user experience, you can also return clearer error sometimes with stop("API request did not return any results", call. = FALSE) to not show the call for example.
  • A good practice is also to use meaningful name inside your internal code, even for temporary variable. There are a lot of x, a, q, ... This will improve readability and the ease for others to collaborate on your code (including future you)

Response: I've added call. = FALSE to the various stop messages.

Response: I've gone through the code and changed temporary variable names to be more meaningful, and formatted the code to improve indentation, spaces around if and for operators, ==, etc.

About the timeout and size of data

*If session is interactive(), it may be a good idea to warn clearly and even ask if we want to continue when lots of data are being requested. I find myself playing with the example trying to retrieve 208888 of your pages

Retrieving additional pages 1 of 208888
Retrieving additional pages 2 of 208888

I don't know if some other API packages are doing something to help user or if it is all the user responsibility to know what is he doing. (users will surely be already NOMIS data users).

As another improvement, I think you should try to create custom error message for the timeout issue as you know it is an issue that will impact user experience but cannot really prevent. I encountered:
a <- nomis_codes("NM_1_1")
# Error in open.connection(con, "rb") : 
#  Timeout was reached: Resolving timed out after 10000 milliseconds

You could use some tryCatch mechanism (some useful info in Advanced R).

  • [thoughts] You may need more control on the request mechanism to deal with timeout. First, it seems for the website one need to contact NOMIS help-desk to know more about limitation. Then , your are relying on readr connection mechanism, and may need to use httr, crul or curl to better handle request to API with the specificity of NOMIS

Response: I've used tryCatch to provide more useful information in the case of time outs. I've also written, but not implemented, a warning and yes/no option in interactive sesssions with requests over 350,000 cells which are likely to cause rate limiting issues. Implemented in ropensci/nomisr@ac3e5a4

About SMDX format on NOMIS

Do you know about the rsdmx package ? it is on github and CRAN, there is WIKI. I did not find NOMIS in their listing, so I did not know if it works for you and how it could overlap with your package.
rsdmx is definitely more a generic package than yours. I just wanted to point it to you in case you did not know.

Response: I was not aware of it, but it is something I should explore for the future, thank you for pointing me towards it.

Response to pegeler

Package Review for nomisr

Onboarding submission: #190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions in R help
Examples for all exported functions in R Help that run successfully locally
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

For packages co-submitting to JOSS

    The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

    A short summary describing the high-level functionality of the software
    Authors: A list of authors with their affiliations
    A statement of need clearly stating problems the software is designed to solve and its target audience.
    References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 10
Review Comments

Excellent work! This is a fine example of the level of quality and professionalism that I have
come to expect from rOpenSci contributions. This package is a great addition to the
complement of R functionality and provides a facile portal to a useful data source.

Many of the comments that I had have been addressed as a result of @cderv's prompt and thorough review.
In addition, development has continued to be very active over the past few weeks! Issues are
fixed as quickly as I find them! Below are a few more comments that I would like to add.
Basic Package Structure Overview
README.md

The README.md does not contain citation information, per rOpenSci packaging guide.
I am not sure how necessary that is since there is a CITATION file in the inst/ directory,
however, it should be an easy add.

Reponse: Added in ropensci/nomisr@b75a47d

I would concur with @cderv in suggesting additional usage examples.
DESCRIPTION

The DESCRIPTION file does not currently specify a minimum version of R. I would
recommend doing so. A minor release is preferred over a patch release, e.g.

Depends: R (>= 3.4.0)

is better than

Depends: R (>= 3.4.1)

I am not seeing where curl is used even though it is in the Suggests list.

Response: I've added a minimum version of R (>= 3.4.0) to the DESCRIPTION in ropensci/nomisr@b75a47d

Response: Not including curl was causing errors with jsonlite on travis - I'm not sure why, as jsonlite is supposed to import curl itself.

CITATION

The CITATION file uses meta$Date to dynamically fill in date for the citation. However, the Date entry was removed from the DESCRIPTION file per gp suggestion so the date is NULL.

Since it is considered good practice to omit the Date entry because it may be prone to falling out of date, it seems that hard-coding the date in the citation would be silly. I briefly considered the use of a different field to dynamically update the citation date, such as Date/Publication or Packaged. However, those will only work for users who are downloading from a CRAN repos. Therefore I'm a little bit at a loss as to what the most elegant solution might be. I might suggest breaking with good practice here and adding the Date field back in since there is a reason to do so. Suggestions are welcome.

The CITATION file should also end in a newline character :)

Response: I've hardcoded 2018 into the CITATION file in ropensci/nomisr@b75a47d, which I think is the easiest solution, as its the year the package was first released, although I'm open to other suggestions as well.

Documentation

The man page for the package is named nmisr, not nomisr. This appears to be a typo.
@Seealso can link to the other package functions using \link{}.
It appears that some descriptions may be intended to be multi-paragraph. Without explicitly
using the @description and @details tags, the third code block is assumed to be the 'details'
block by default. Therefore, using explicit roxygen tags might be helpful here.
Nomis data comes packaged in very complex data structures. Please don't be afraid of 'over-explaining'.
The more information in the documentation and vignettes, the better for the user, especially if they do
not have prior experience with Nomis data.
'Nomis' in the title for nomis_search() should be capitalized to be consistent with the rest of the package.

Response: I've expanded the documentation and used roxygen tags to improve organisation. I've also added an expanded explanation to the package vignette, all in ropensci/nomisr@b75a47d

Examples

The example for nomis_get_data failed due to lack of data. I recommend finding a specific
date range that will return results and then hard-code that into the example.
Athough \donttest{} used to be perfect for API packages because it would not be run by
R CMD check but would be run by example() and would be included in the man page, it appears
that the functionality has changed.
Now, according to Hadley's book, it appears that code enclosed
in \donttest{} is indeed checked (though I have yet to confirm). Therefore,
for long-running API calls, it might make sense to change back to \dontrun{} if you are
concerned about time-out. I recall
reading some best practice advice on CRAN's website at some point, but cannot find the reference
anymore.
The examples often show multiple ways to use the function, which is excellent. However,
it might be a nice enhancement to add a little extra code to glimpse the data for the users to get
a better feel for what they are getting out of the API call. This is especially helpful since
the data that is returned can differ greatly depending on what args are passed.

Response: I've added glimpsing to the examples in different functions, and removed long-running API calls, which aren't good practice due to rate limiting, as of ropensci/nomisr@b75a47d

Vignettes

I like the switch to echo = TRUE for vignette code chunks. This is the user's opportunity
to see the code in action for the first time. Including the code used to generate the results can
aid in understanding and readability.
Vignettes are not currently built. This is not a big deal. However, it would
be a convenience for those installing from GitHub or local repos. This is especially true since
the API calls can cause vignettes to take a long time to build. (Currently I cannot build vignettes
because I keep getting hung up).
The addition of the theme and TOC is a nice touch.
It looks like the vignette might still be a little incomplete. I see that the final paragraph of
Querying data variables is not finished.
It's unnecessary to hard-code the results in code chunks. The results will be output when the vignette is knit.

Response: I've now pre-built the vignette, and expanded on the various sections in ropensci/nomisr@1c9cd3f

Functionality

nomis_search()
    [suggestion] Currently, users can either search based on generic search terms or search for specific keywords.
    Looking at the API docs, I think there is an opportunity to improve functionality by taking advantage
    of some of the other search categories (e.g., name, description, content type, units). This would
    be doubly useful since content type and units are not searched when performing a generic search.
    
    [suggestion] Rather than automatically inserting wildcards (*), I would recommend informing the user of the wildcard character so that they can have finer control over how wildcards are used in their search.  It would be nice to mention in the documentation that several terms can be searched at once using a comma. Alternatively, the parameter could accept a character vector that could be collapsed with commas.
  
    [suggestion] The function could potentially accept character vectors from each of the categories (as well as a generic search term field) and then the appropriate search string could be created.

Response: I have re-written nomis_search in keeping with your suggestions. It now accepts five parameters - name, description, keywords, content_type, units - and can search in any or all of them at the same time, with or without wildcards, and each parameter accepts character vectors of multiple queries. I've also added a nomis_content_type() function to assist in looking up specific content types, which may be used with nomis_search(). All in ropensci/nomisr@b75a47d

nomis_codes()
Generally, it is helpful to mirror the language of the source material when possible.
The nomis_codes() function accepts a dimension (concept) and the function returns codes for that dimension.
Therefore, the argument code might be renamed to dimension or concept, since the
function searches for codes availalbe within a dimension.
Related to above, I think the @title could be more descriptive.

[suggestion] The query is currently built using a hierarchal structure. E.g., /api/v01/dataset////def.sdmx.json. However, I think there is an opportunity to make this query a little more flexible and extensible. Instead of enforcing the hierarchy, the funciton could use the GET verb to provide a little more control over how the data is filtered. A user could query on criteria from different dimensions than the one which they are interested in results from. For example, even if the user is interested in the 'item' dimension, they may want to filter on 'geography' and 'sex'. Using GET would allow for this. As an example: /api/v01/dataset/NM_1_1/item.def.sdmx.json?geography=2038432081&sex=7. This would also expose the user to options for specifying times and dates, which cannot otherwise be done. I fully acknowledge that this might be over-building, since this function only provides a support role and will probably not be used that way.
This is a minor concern, but I feel it's cleaner to specify value if a test function returns TRUE first when using ifelse(). The code ifelse(is.null(type), "", paste0("/", type)) reads cleaner than ifelse(is.null(type) == FALSE, paste0("/", type), "") IMO.
nomis_data_info()
The documentation could be more explicit in telling the user that the data returned by this function is metadata.

Response: I've updated the documentation to be more explicit about what the function actually does, and improved the code readability. I changed the code parameter to concept, as you suggested, to be more in keeping with the syntax used in the original API. I've also added a specific search parameter, and a generic additional_queries parameter, as used in nomis_get_data(), in ropensci/nomisr@b75a47d

[suggestion] Relating to the comment above, it may be a little clearer to rename the function something to the effect of nomis_get_metadata().
This naming scheme is more in line with the pattern set forth by nomis_get_data naming scheme. nomis_get_data()

Response: I've renamed the nomis_codes() function to nomis_get_metadata(), which I agree is more descriptive of what the function actually does. I've also added the nomis_overview() function, which returns a more general overview of all available metadata, without the flexibility and specificity offered by nomis_get_metadata() ropensci/nomisr@b75a47d

I see that the API documentation states, "you have a 25000 cell limit on the number of data values downloaded in these formats."
It appears that the code currently breaks the fetch into multiple steps if the record count is greater than or equal to 25k records.
I am curious if this might be an oversight. It might be useful to test the edge case where the query returns more than 25k cells but fewer than 25k records to see if everything still works.

Response: The 25k limit refers to the number of observation values, the official documentation is rather confusing on that point, as Nomis only treats one of the columns as containing the data you are querying, I hope my changes in ropensci/nomisr@b75a47d have made it clearer.

Including a select parameter may be a good enhancement for the future. Right now, that can be accomplished with additional_queries but is probably important enough to be standalone. This would allow for additional documentation and sugar to be added for it.

Response: I've added a select parameter as an enhancement, as it is straightforward to do and reduces the amount of information that could possibly need to be included under additional_queries.

Conclusion

This is a great package that will be valuable to anyone interested in analysis of this data. Nice work!

I appreciate the opportunity to review this package and hope that my comments are constructive and useful. This package needs just a few
tweaks (namely getting the examples up and running) and then I will gladly approve!

@cderv
Copy link
Contributor

cderv commented Mar 10, 2018

Hi @evanodell

Great work ! And thanks a lot for your answer formatting just above. It is very easy to follow how you took our comments into account. Nice.

I updated my first post by checking some boxes ✅ and ⏲ of review.

I still have a few remarks after taking another pass on your updated code

Package Website

Nice idea to centralize all your documentation. However, there is still a few things to adjust as I get 404 error navigating through doc. I can only view the reference page of the package but not the vignette for example. Must be something to do with how pkgdown deals with path when hosting in another site.
I do not have the solution for your https request, but you may find some help in the #rstats community.

Contributing.md

Great addition. Two things:

Readme

Great improvement !! I find it now a good introduction to your 📦 with nice examples.
You may find me sticky on this but the mention to UK is still missing in your introduction. Your DESCRIPTION is good, so why not add the first line in your readme ?

Access UK official statistics from the Nomis database through R

About formatting, take care : links won't works when between single quote ``.

`[tibbles](http://tibble.tidyverse.org/)`

must be changed to

[`tibbles`](http://tibble.tidyverse.org/)

in order to get tibbles

Also, you need to keep in mind to synchronise your README.Rmd modification and README.md generation. If you use usethis::use_readme_rmd, it puts a git pre-commit hook that prevent you to commit if md is older than Rmd. That way you'll never forget to rerender. This comes from devtools::use_readme_rmd now in usethis. (BTW, a pre-commit hook is a little .sh script that is executed before every commit. You can see it there)

Vignettes

It looks good ! No problem to take the example I gave you. It was just to illustrate, it could be rework but if you found it useful to you, it could be useful to other.
The html vignette is now included, but not up to date. Before installing the package, after a git pull I needed to do a devtools::build_vignette. Not sure if the vignette are to be built in source package. I found re-reading R package book that they can be built at installation with devtools::install_github("evanodell/nomisr", build_vignettes = TRUE), so i guess it is not an issue if not present in the package.

about curl and travis

I read something on that but I did not found it. Sorry. It may worth asking on ropencsi forum, rstudio community.

About download data limit

I saw you wanted to do something with interactive() in your code, but it is still commented out. You may try to put this code in separate feature branch to clean your master if it is still work in progress.

I think your 📦 is ready ! Very nice work, I still find it a simple one but powerful one as always with data packages. Thanks!

@evanodell
Copy link
Author

Thank you @cderv

I did not know about usethis::use_readme_rmd, that has made things much easier. I've made the other tweaks you've suggested as well.

I've split off the the interactive() bits into a dev branch.

The curl and travis issue seems to stem from the internal loadpkg function in jsonlite. Using httr to check for http response codes seems to have solved the problem. I'm not sure why but at least the solution provides more informative error messages.

Regarding pkgdown, path in build_site() seems to have stopped working, but I think I now have a suitable workaround.

evanodell added a commit to ropensci/nomisr that referenced this issue Mar 28, 2018
Enhancements to documentation as part of rOpenSci review ropensci/software-review#190
@pegeler
Copy link

pegeler commented Mar 30, 2018

Package Review for nomisr

Onboarding submission: #190

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 14


Review Comments

Overall this package is well put together.

The data structures returned by the Nomis API are very complex and this package does a good job of helping the user along to find the data they need. I was able to get metadata and data out without too much trouble. Existing Nomis users should find this package facilitates their workflow and users new to Nomis data will likely struggle a bit, but less so because of the package's enhanced vignettes, documentation, and README.

As development continues, I would suggest continuing to consider how to make the data more accessible to users who may not be familiar with data structures returned by the Nomis API. In doing so, the package user-base will continue to expand.

It appears that functionality is fairly complete. I am able to navigate metadata several ways and can pull data with relative ease. Great work!! I think this package is ready!!!

NEWS.md

The NEWS.md will need to be updated when 0.0.3 is released. Don't forget to git tag it as well 😀!

Conclusion

From the time of initial review to now, I have seen this package transformed from an already good package to something really awesome. I appreciate the way that the package author has taken reviewer comments and turned them into opportunities to rethink existing code. These changes represent a very thoughtful approach that is reflected in package usability and performance.

Thanks again for the chance to review!

@cderv
Copy link
Contributor

cderv commented Mar 31, 2018

@evanodell Well done! I am really impress with the evolution of this really a great 📦 from the beginning of the review and now. Awesome work ! Thanks for having taken into account our advices, I really appreciated it like @pegeler.

It was really a pleasure to review your package and it gave me the chance to discover the NOMIS database. I think your package is a great example of using a simple api and prepare the data return to an easier and practical format for the users.

I agree with @pegeler, this package is more than ready to ✨ in ropensci 📦 collections!

@karthik, what are the following steps ?

evanodell added a commit to ropensci/nomisr that referenced this issue Apr 2, 2018
A few minor edits prior to submitting second round of review. ropensci/software-review#190
@karthik
Copy link
Member

karthik commented Apr 16, 2018

@cderv Thanks for this fantastic, thorough and thoughtful review! Since you have signed off, nothing further required from your side. Now just waiting on @pegeler to provide the final sign off (edit: which I noticed is done now)

@karthik
Copy link
Member

karthik commented Apr 16, 2018

Congrats @evanodell! Your submission is now approved and accepted! 🎉 Thank you for submitting and many thanks to @cderv and @pegeler for thorough and timely reviews. To-dos:

  • Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.

  • Add the rOpenSci footer to the bottom of your README

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

  • Fix any links in badges for CI and coverage to point to the ropensci URL. (We'll turn on the services on our end as needed)

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.

@evanodell
Copy link
Author

Thanks @karthik! I'm more than happy to write something up on the package, whichever format you prefer.

@stefaniebutland
Copy link
Member

Hi @evanodell. Great to hear you're interested in contributing a post about nomisr. The format is really up to you, but you could take publication dates into account. Right now we have a full schedule of longer narrative posts up to ~June so if you would like it up quickly, a technote is the way to go. We'll tweet from @ropensci either way.

You can read example narrative posts about onboarded packages here: https://ropensci.org/tags/onboarding/ and technotes are here: https://ropensci.org/technotes/

Technical and editorial guidelines: https://github.com/ropensci/roweb2#contributing-a-blog-post. The only difference in YAML for technote is
categories:

  • technotes

I will add the topicid before publishing.

Happy to answer questions here.

@karthik
Copy link
Member

karthik commented Apr 17, 2018

@evanodell Final suggestion. If you found feedback from the reviewers valuable, consider adding them as reviewers in the description of your package (this is entirely up to you and not a requirement). If you decide to, you'll need:

evanodell added a commit to ropensci/nomisr that referenced this issue Apr 17, 2018
@evanodell
Copy link
Author

@cderv do you have an ORCID I can include to list you as a reviewer in the package description?

@evanodell
Copy link
Author

@stefaniebutland A technote seems good to me. I can send it to you by the end of the week if that works?

@cderv
Copy link
Contributor

cderv commented Apr 18, 2018

@evanodell Yes I have one : https://orcid.org/0000-0003-4474-2498
Thank you for this offer to add me, I appreciate!

@karthik
Copy link
Member

karthik commented Apr 18, 2018

@evanodell Thanks again for the submission. I'll close the issue since the review is complete but please feel free to continue the conversation with Stefanie re the blog post.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants