Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch performance #16

Merged
merged 29 commits into from
Feb 20, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8eabf8a
timing notes
cboettig Feb 16, 2018
21e50ae
Much faster, cleaner parsing of SPARQL returns
cboettig Feb 17, 2018
e0e90b1
testing
cboettig Feb 17, 2018
3b4c5f9
tweaking
cboettig Feb 17, 2018
c11eb84
Successful & fast rdf-join :rocket: :sparkles:
cboettig Feb 17, 2018
86376d8
move ex notebook to notebook/
cboettig Feb 17, 2018
702dac3
datalake
cboettig Feb 17, 2018
47875f4
data lake showing gh api example
cboettig Feb 17, 2018
b2d07ad
clean up tmp
cboettig Feb 17, 2018
9fed41c
run results using full lake
cboettig Feb 17, 2018
8d0a4b8
be better about cleaning up temp files
cboettig Feb 17, 2018
0d5c02b
add libs, run full data ex
cboettig Feb 18, 2018
06170d4
make 'data-lake.Rmd' into vignette
cboettig Feb 19, 2018
3bf5216
data lake example
cboettig Feb 19, 2018
26f1beb
suggest nycflights13 data
cboettig Feb 19, 2018
a49fb6a
rdf_add can handle NA as a blank node
cboettig Feb 20, 2018
e8f2927
c() method use turtle to save disk space
cboettig Feb 20, 2018
887294a
parser and serializer will guess format
cboettig Feb 20, 2018
39074cf
cleaning up as_rdf methods
cboettig Feb 20, 2018
481a329
datatype should not be assigned to blank nodes
cboettig Feb 20, 2018
5574938
use rdflib_base_uri throughout
cboettig Feb 20, 2018
3c3aceb
avoid c() by passing rdf arg
cboettig Feb 20, 2018
38598ea
option to reconnect to an existing database
cboettig Feb 20, 2018
7a02e04
indicate storage type in rdf() constructor instead
cboettig Feb 20, 2018
8d55f78
tests
cboettig Feb 20, 2018
7fe8fb6
good practice
cboettig Feb 20, 2018
4332540
newline
cboettig Feb 20, 2018
9efce89
update pkgdown
cboettig Feb 20, 2018
ec81511
skip has_bdb on appveyor
cboettig Feb 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
parser and serializer will guess format
based on file extension, closes #4

serializer also sets explicit base option

serializer defaults to print to character string if doc is NULL.
  • Loading branch information
cboettig committed Feb 20, 2018
commit 887294a792ccd563206081f40534e3bee9c91a2d
10 changes: 8 additions & 2 deletions R/rdf_parse.R
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
#' @param doc path, URL, or literal string of the rdf document to parse
#' @param format rdf serialization format of the doc,
#' one of "rdfxml", "nquads", "ntriples", "turtle"
#' or "jsonld"
#' or "jsonld". If not provided, will try to guess based
#' on file extension and fall back on rdfxml.
#' @param rdf an existing rdf triplestore to extend with triples from
#' the parsed file. Default will create a new rdf object.
#' @param ... additional parameters (not implemented)
Expand All @@ -20,14 +21,19 @@
#' rdf <- rdf_parse(doc)
#'
rdf_parse <- function(doc,
format = c("rdfxml",
format = c("guess",
"rdfxml",
"nquads",
"ntriples",
"turtle",
"jsonld"),
rdf = NULL,
...){

format <- match.arg(format)
if(format == "guess"){
format <- guess_format(doc)
}

## if we get a string as input, we'll store it in tmp file here
## which we can later be sure to clean up.
Expand Down
24 changes: 18 additions & 6 deletions R/rdf_serialize.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#'
#' @inheritParams rdf_parse
#' @inheritParams rdf_query
#' @param doc file path to write out to
#' @param doc file path to write out to. If null, will write to character.
#' @param namespace string giving the namespace to set
#' @param prefix string giving the prefix associated with the namespace
#' @param ... additional arguments to \code{redland::serializeToFile}
Expand All @@ -29,17 +29,22 @@
#' prefix = "dc")
#'
rdf_serialize <- function(rdf,
doc,
format = c("rdfxml",
doc = NULL,
format = c("guess",
"rdfxml",
"nquads",
"ntriples",
"turtle",
"jsonld"),
namespace = NULL,
prefix = NULL,
base = getOption("rdflib_base_uri", as.character(NA)),
...){

format <- match.arg(format)
if(format == "guess"){
format <- guess_format(doc)
}


## redlands doesn't support jsonld. So write as nquads and then transform
Expand All @@ -61,13 +66,20 @@ rdf_serialize <- function(rdf,
prefix = prefix)
}

status <-
redland::serializeToFile(serializer, rdf$world, rdf$model, doc, ...)
if(is.null(doc)){
doc <- redland::serializeToCharacter(serializer, rdf$world, rdf$model, baseUri = base, ...)
} else {
status <-
redland::serializeToFile(serializer, rdf$world, rdf$model, doc, baseUri = base, ...)
}

if(jsonld_output){
txt <- paste(readLines(doc), collapse = "\n")
if(length(txt) > 0){ ## don't attempt to write empty file into json
json <- jsonld::jsonld_from_rdf(txt)
json <- jsonld::jsonld_from_rdf(txt,
options = list(
base = base,
format = "application/nquads"))
compact_json <- jsonld_compact(json, "{}")
writeLines(compact_json, doc)
}
Expand Down
14 changes: 14 additions & 0 deletions R/utilities.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
guess_format <- function(doc){
switch(gsub(".*\\.(\\w+)$", "\\1", basename(doc)),
"xml" = "rdfxml",
"rdf" = "rdfxml",
"json" = "jsonld",
"nq" = "nquads",
"nt" = "ntriples",
"ttl" = "turtle",
"jsonld" = "jsonld",
"quads" = "nquads",
"turtle" = "turtle",
"rdfxml")
}


## Don't explicitly type characters as strings, since this is default
xs_class <- function(x, explicit_strings = FALSE){
Expand Down
7 changes: 4 additions & 3 deletions man/rdf_parse.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 6 additions & 4 deletions man/rdf_serialize.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 21 additions & 1 deletion man/rdflib-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

24 changes: 23 additions & 1 deletion tests/testthat/test-parse-serialize.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@ doc <- system.file("extdata/example.rdf", package="redland")
out <- "testing.rdf"



testthat::test_that("we can serialize to character", {
rdf <- rdf_parse(doc)
txt <- rdf_serialize(rdf, format = "nquads")
testthat::expect_is(txt, "character")
testthat::expect_match(txt, "John Smith")
rdf_free(rdf)
})


testthat::test_that("we can parse (in rdfxml)
and serialize (in nquads) a simple rdf graph", {
rdf <- rdf_parse(doc)
Expand Down Expand Up @@ -73,10 +83,22 @@ testthat::test_that("we can parse and serialize rdfxml", {

################################################################


testthat::test_that("we can parse by guessing on the file extension", {
ex <- system.file("extdata/person.nq", package="rdflib")
rdf <- rdf_parse(ex)
rdf_serialize(rdf, "tmp.nq", base = "http://schema.org/")
roundtrip <- rdf_parse("tmp.nq", "turtle")
testthat::expect_is(roundtrip, "rdf")
unlink("tmp.nq")
rdf_free(rdf)
})


testthat::test_that("we can serialize turtle with a baseUri", {
ex <- system.file("extdata/person.nq", package="rdflib")
rdf <- rdf_parse(ex, "nquads")
rdf_serialize(rdf, out, "turtle", baseUri = "http://schema.org/")
rdf_serialize(rdf, out, "turtle", base = "http://schema.org/")
roundtrip <- rdf_parse(out, "turtle")
testthat::expect_is(roundtrip, "rdf")
rdf_free(rdf)
Expand Down