Skip to content

Modelling data with JSON-LD, Turtle, SHACL

License

Notifications You must be signed in to change notification settings

guscht/shapiro

 
 

Repository files navigation

Unit Tests Python 3.8 Unit Tests Python 3.9 Unit Tests Python 3.10 Unit Tests Python 3.11 Coverage Last Commit Release Date

Shapiro Shapiro

What is Shapiro

Shapiro is a simple ontology/vocabulary server serving turtle, json-ld, html or json-schema (as indicated by the requesting client in the accept-header). It therefore provides a simple approach to serving up an organization's ontologies/vocabularies.

Motivation - Model as Code

Why would one need something like Shapiro? The basic idea is to model data as knowledge graphs using Turtle or JSON-LD and use these models in API definitions/implementations and all other code consuming data based on these models.

Make the use of these machine-readable model definitions pervasive throughout all phases of the software lifecycle (design, implement, test, release) and the lifecycle of the data originating from software built using these models.

Express non-functional requirements like security, traceability/lineage, data quality in the models and bind them to the instances of data wherever the data is distributed to and used.

Drive all documentation (model diagrams, documents, graph visualizations, etc.) from the same RDF-based model definition (a.k.a. ontology/knowledge graph).

Start out with providing a toolset from developers for developers for formulating such models and using them in source code, gradually extending towards tools, editors, UIs, transformations making this modelling approach accessible to non-technical actors like business analysts, domain data owners, etc.

In order to do so, you need a way to serve the models - this is where Shapiro comes in.

Serving Schemas

Shapiro serves schemas from a directory hierarchy in the file system (specified by the content_dirparameter at startup). Shapiro will regularly check new or modified schemas for syntax errors and exclude such "bad schemas" from getting served. Schemas can be moved into Shapiro's content_dir while it is running. This decouples the lifecycle for schemas from the lifecycle of Shapiro - the basic idea being that the lifecycle of schemas is managed in some code repository where changes get pushed into Shapiro's content directory without Shapiro having to be restarted.

Content Negotiation

Shapiro will use the accept header of the get request for a schema to determine the mime type of its response, independent of the format that Shapiro holds the schema in on its file system:

Request Accept Header & Response Mime Type Implementation Status
application/ld+json implemented
text/turtle implemented
text/html implemented
application/schema+json implemented
application/json implemented (will return JSON-SCHEMA)

If no accept header is specified, Shapiro will assume application/schema+json as default, because many JSON-SCHEMA processors/validators do not properly set the accept header when resolving $ref URLs.

Integration with OpenAPI & JSON-Schema

Shapiro converts Shacl nodeshapes into JSON-Schema and thereby integrates with JSON-Schema validation. Based on this, you can use the semantic datamodels served by Shapiro in your OpenAPI definitions (by way of $ref). An end to end example based on this OpenAPI tutorial can be found in test/openapi/tutorial.yaml where the corresponding semantic model is at test/openapi/tutorial/artist.ttl.

Markdown in RDFS Comments/SKOS Definitions/DCT Descriptions

When rendering for mime type text/html Shapiro will consider markdown in RDFS comments, SKOS definitions, DCT descriptions for improved readability of documentation.

No URL fragments

Shapiro is opinionated about URL fragments for referring to terms in a schema - it plainly does not support them (here's why). So when writing your schema a.k.a. model a.k.a. vocabulary a.k.a. ontology, please ensure you refer to the individual terms it defines using the regular forward slash: e.g. http://myserver.com/myontology/term instead of http://myserver.com/myontology#term

Hierarchical Namespaces

Shapiro allows you to keep schemas/ontologies in arbitrary namespace hierarchies - simply by reflecting namespaces as a directory hierarchy. This allows organizations to separate their schemas/ontologies across a hierarchical namespace and avoid any clashes. This also means you can have a more relaxed governance around the various ontologies/schemas across a collaborating community. The assumption is that you manage your schemas/ontologies in a code repository (Github, etc.) and manage releases form there onto a Shapiro instance serving these schemas in a specific environment (dev/test/prod).

Querying the combined Graph of all Schemas served by Shapiro

Shapiro keeps the complete graph of all schemas combined in memory. The graph can be queried using the post request API /query. This takes a SPARQL query (no updates) in the request body. That way you can query and mine the combined graph of all models.

Validation with Shapiro

Validation is a bit more involved, in particular since Shapiro allows you to enable/disable API features (serving schemas and validating data against schemas). If both serving schemas and validation are activated, you can validate against schemas residing on the same Shapiro instance offering the validation:

http://localhost:8000/validate/org/example/myschemas/person

Posting against this url (with a request body containing the data to be validated), will get the schema named org/example/myschemas/person from localhost:8000 to validate the data against. Obviously, this will not work if you've switched off the 'serve' feature on localhost:8000.

Assume you want to validate your data against a schema sitting on a different schema server, you can do:

http://localhost:8000/validate/www.w3.org/ms/shacl/something

This would validate the data provided in the body of the post request against the schema served at http://wwww3.org/ms/shacl under the name of something.

Assume you want to use one instance of Shapiro to just serve schemas, and another instance of Shapiro to just validate schemas. Assume the instance serving schemas sits under localhost:8000 and the instance just validating schemas sits under localhost:3333. You would run your post request against the following URL:

http://localhost:3333/validate/localhost:8000/org/example/myschemas/person

This would look for the schema names org/example/myschemas/personon localhost:8000 (the instance that just serves schemas) and validate the schema obtained from there in localhost:8000 against the data provided in the body of the request.

You don't need to specify an explicit schema to validate data against. If you specify no schema, Shapiro will infer the schemas to validate the data against using the prefix IRIs defined used in the data graph. The algorithm uses a configurable list of namespaces to ignore when infering the schemas - this list can be set using the command line parameter --ignore_namespacesand defaults to ['schema.org', 'w3.org', 'example.org']. This means that prefixes pointing to these namespaces are assumed never to contain SHACL constraints to validate a given data graph against.

Searching Shapiro

Shapiro uses Whoosh Full-text-search to index all schemas it serves. Shapiro regularly checks for modified or new schemas in its content directory and indexes them.

Shapiro UI

Shapiro provides a minimal UI available at /welcome/. Any GETrequest to / without a schema name to retrieve will also redirect to the UI. The ui lists all schemas served by Shapiro at a given point in time and allows to full-text-search schema content. The Shapiro UI also renders models/schemas/ontologies as HTML.

Writing Semantic Models to be served by Shapiro

Given the number of possibilities to use ontologies & vocabularies for your models, Shapiro can't anticipate them all. While I'm trying to keep Shapiro as open as possible and while Shapiro can serve any kind of ontology or vocabulary, things like validation, HTML rendering of models and JSON-SCHEMA rendering of models work best if you keep the following in mind:

  • Use RDFS for modelling your classes and properties. HTML rendering will work best with this vocabulary.
  • Use RDFS labels that are acceptable object names resp. property names in programming languages (specifically when you use JSON-SCHEMA & OpenAPI in conjunction with schemas hosted by Shapiro)
  • Use SHACL for constraints. Shapiro cannot validate your data against models, if you use anything else and it cannot convert semantic models to JSON-SCHEMA otherwise. Of course, Shapiro will also render SHACL constraints in HTML.
  • JSON-SCHEMA conversion requires your model defining NodeShapes with the appropriate SHACL properties and constraints. Shapiro will render empty schemas if you ask for JSON-SCHEMA of an RDFS class.

Installing and running Shapiro

  1. Clone the Shapiro repository.
  2. Install dependencies: pip install -r requirements.txt
  3. Run Shapiro Server: python shapiro_server.py with optional commandline paramaters --host(default 127.0.0.1) --port(default 8000), --content_dir(default ./) and --log_level(default info), --features (default all), --ignore_namespaces (default ['schema.org', 'w3.org', 'example.org'], --index_dir (default `./fts_index/').
  4. Access the UI at http://localhost:8000/welcome/
  5. Access the API docs at http://localhost:8000/docs
  6. Try curl -X 'GET' 'http://localhost:8000/<SCHEMANAME HERE>' -H 'accept-header: application/ld+json' to get JSON-LD from a schema in the content dir
  7. Try curl -X 'GET' 'http://localhost:8000/<SCHEMANAME HERE>' -H 'accept-header: text/turtle' to get JSON-LD from a schema in the content dir.

Make sure you run python shapiro_server.py --helpfor a full reference of command line parameters (host, port, domain, content dir, log level, default mime type, features, ignore namespaces, index directory, and if needed ssl-parameters).

About

Modelling data with JSON-LD, Turtle, SHACL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • CSS 47.8%
  • Python 42.5%
  • HTML 8.0%
  • JavaScript 1.7%