Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch performance #16

Merged
merged 29 commits into from
Feb 20, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8eabf8a
timing notes
cboettig Feb 16, 2018
21e50ae
Much faster, cleaner parsing of SPARQL returns
cboettig Feb 17, 2018
e0e90b1
testing
cboettig Feb 17, 2018
3b4c5f9
tweaking
cboettig Feb 17, 2018
c11eb84
Successful & fast rdf-join :rocket: :sparkles:
cboettig Feb 17, 2018
86376d8
move ex notebook to notebook/
cboettig Feb 17, 2018
702dac3
datalake
cboettig Feb 17, 2018
47875f4
data lake showing gh api example
cboettig Feb 17, 2018
b2d07ad
clean up tmp
cboettig Feb 17, 2018
9fed41c
run results using full lake
cboettig Feb 17, 2018
8d0a4b8
be better about cleaning up temp files
cboettig Feb 17, 2018
0d5c02b
add libs, run full data ex
cboettig Feb 18, 2018
06170d4
make 'data-lake.Rmd' into vignette
cboettig Feb 19, 2018
3bf5216
data lake example
cboettig Feb 19, 2018
26f1beb
suggest nycflights13 data
cboettig Feb 19, 2018
a49fb6a
rdf_add can handle NA as a blank node
cboettig Feb 20, 2018
e8f2927
c() method use turtle to save disk space
cboettig Feb 20, 2018
887294a
parser and serializer will guess format
cboettig Feb 20, 2018
39074cf
cleaning up as_rdf methods
cboettig Feb 20, 2018
481a329
datatype should not be assigned to blank nodes
cboettig Feb 20, 2018
5574938
use rdflib_base_uri throughout
cboettig Feb 20, 2018
3c3aceb
avoid c() by passing rdf arg
cboettig Feb 20, 2018
38598ea
option to reconnect to an existing database
cboettig Feb 20, 2018
7a02e04
indicate storage type in rdf() constructor instead
cboettig Feb 20, 2018
8d55f78
tests
cboettig Feb 20, 2018
7fe8fb6
good practice
cboettig Feb 20, 2018
4332540
newline
cboettig Feb 20, 2018
9efce89
update pkgdown
cboettig Feb 20, 2018
ec81511
skip has_bdb on appveyor
cboettig Feb 20, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
be better about cleaning up temp files
  • Loading branch information
cboettig committed Feb 17, 2018
commit 8d0a4b87d7b52b47aec18bd92d8686e95945c3ec
2 changes: 2 additions & 0 deletions R/rdf_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ c.rdf <- function(...){
f <- file.path(loc,paste0(i, ".rdf"))
rdf_serialize(rdfs[[i]],f)
rdf_parse(f, rdf = rdf)
file.remove(f)
}
unlink(loc)
rdf
}

Expand Down
18 changes: 13 additions & 5 deletions R/rdf_parse.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,22 +29,26 @@ rdf_parse <- function(doc,
...){
format <- match.arg(format)

## if we get a string as input, we'll store it in tmp file here
## which we can later be sure to clean up.
tmp_string <- tempfile()
## if we get json-ld, we'll need a temp location to serialize that too:
tmp_json <- tempfile()

# convert string input or url to local file
doc <- text_or_url_to_doc(doc)
doc <- text_or_url_to_doc(doc, tmp_string)

## redlands doesn't support jsonld. So rewrite as nquads using jsonld package
## We use tmp to avoid altering input doc, since parsing a local file should
## be a read-only task!
if(format == "jsonld"){
tmp <- tempfile()
#tmp <- add_base_uri(doc, tmp)
x <- jsonld::jsonld_to_rdf(doc,
options =
list(base = getOption("rdflib_base_uri", "localhost://"),
format = "application/nquads"))
writeLines(x, tmp)
writeLines(x, tmp_json)
format <- "nquads"
doc <- tmp
doc <- tmp_json
}

if(is.null(rdf)){
Expand All @@ -54,8 +58,12 @@ rdf_parse <- function(doc,
mimetype <- unname(rdf_mimetypes[format])
parser <- new("Parser", rdf$world, name = format, mimeType = mimetype)
redland::parseFileIntoModel(parser, rdf$world, doc, rdf$model)

redland::freeParser(parser)
unlink(tmp_string)
unlink(tmp_json)

## return rdf object (pointer)
rdf
}

Expand Down