Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch performance #16

Merged
merged 29 commits into from
Feb 20, 2018
Merged

Patch performance #16

merged 29 commits into from
Feb 20, 2018

Conversation

cboettig
Copy link
Member

@cboettig cboettig commented Feb 20, 2018

A variety of tweaks to improve performance and memory handling when working with large triplestores. Tested with over 6 million triples using disk-based storage in the data-lake vignette.

Key changes:

  • rdf_query now bypasses the stupidly slow iteration over getNextResult approach and uses an internal redland function call to access all results at once in csv format.
  • experimental as_rdf method now uses a poor-man's nquad serializer to rapidly generate rdf (instead of slowly iterating over add_rdf.

Uses getResults() method from redland package internals.  This is way way faster for returning large numbers of results.  This also sidesteps the need to rectangularize query results and manually coerce types; readr instead can handle that for us (as well as one can duck type from strings).
Merge branch 'master' into patch-performance

# Conflicts:
#	inst/examples/profile_performance.Rmd
based on file extension, closes #4

serializer also sets explicit base option

serializer defaults to print to character string if doc is NULL.
methods take vocab, base, and key
@cboettig cboettig merged commit 01f59bb into master Feb 20, 2018
@cboettig cboettig deleted the patch-performance branch March 3, 2018 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant