Skip to content
This repository has been archived by the owner on Nov 3, 2020. It is now read-only.

Latest commit

 

History

History
2024 lines (1727 loc) · 79.3 KB

query.org

File metadata and controls

2024 lines (1727 loc) · 79.3 KB

Contents

Namespace: thi.ng.trio.query

Triplestores not just provide high flexibility in terms of data organization and storage, but also for querying purposes. The query engine implemented in this namespace is conceptually based on W3C SPARQL and also boroughs some of its terminology, however does not entail any RDF specifics or semantics. Datalog (as used by Datomic) too is similar in approach and both are essentially based on describing & matching patterns in a directed graph.

Query engine features

FeatureStatus
Query planningINPROGRESS
Query compilationDONE
Sub-queriesDONE
Multi-graph queriesDONE
Direct query of clj triple seqsDONE
Filtering (per sub-query)DONE
Joins (fail fast)DONE
Optional joinsDONE
Union (match alternatives)DONE
Subtraction (negation)DONE
Async / parallel sub-query executionTODO
Arbitrary-length paths queriesINPROGRESS
Result graph constructionDONE
Result graph extractionINPROGRESS
Result var binding injectionDONE
Result var aliasing/transformationDONE
Result ordering on arbitrary varsDONE
Result groupingDONE
Result aggregate expressionsDONE
Aggregate filtersDONE
Compiler error reportingINPROGRESS

More generally, the overall engine design was guided by:

  • no macro magic, completely functional
  • use of plain hash maps to formulate and dynamically compose query specs
  • easy to read syntax, no cryptic symbols
  • use of vanilla Clojure fns for predicates, binding & aggregate fns

Highlevel query processing

Public query API

Currently, the query engine’s public API consist of only these two functions:

  • compile-query compiles a query spec
  • query executes a query and accepts both pre-compiled queries or raw query specs.

For both functions an exception is thrown if there’re any compile errors. The query compiler itself is described in detail below.

(defn compile-query
  "Pre-compiles a query spec hashmap into a query function to be
  executed with `query`. Throws an exception if compilation fails
  due to illegal args in spec."
  [q]
  (if-let [type (some #{:select :ask :construct :describe} (keys q))]
    (compile-query-step identity q type)
    (err/illegal-arg! "Unsupported query type")))

(defn query
  "Takes a single query spec (raw or compiled) and executes it.
  Returned result format depends on query type. A failed query
  either results in nil or an empty collection."
  [q]
  (let [res ((if (fn? q) q (compile-query q)) ::empty)]
    (if (seq res) res)))

Introductory examples

To introduce some of the query engine features, let’s define a small graph of relationships:

SubjectPredicateObject
alicefriendbob
aliceage34
bobfriendcarl
bobnickbobby
carlspousealice
carlfatherdonald
carlage42
bobfatheremily
emilyage23
freyamotheremily
emilynickem
donaldfriendemily

We can visualize this table with the following Graphviz snippet (utilizing some Elisp helper functions defined in libraryofbabel.org:

../assets/query-example01.png

In Clojure this graph could be defined like this (using a hashmap is just one option (see Clojure collection conversion in trio.core namespace).

(require '[thi.ng.trio.core :as trio])
(require '[thi.ng.trio.query :as q])
(def g
  (trio/as-model
   '{alice  {friend bob, age 34}
     bob    {friend carl, father emily, nick bobby}
     carl   {father donald, spouse alice, age 42}
     donald {friend emily}
     emily  {age 23, nick em}
     freya  {mother emily}}))

Selection

Let’s query this graph to find some information about all parents & children:

  • perform alternative queries to bind ?p to fathers or mothers of any child ?c
  • optionally retrieve child’s age ?a
  • inject result var binding/mapper to format age as descriptive string
  • sort results by child name
(q/query
 {:select :*
  :from g
  :query [{:where '[[?p father ?c]]}
          {:union '[[?p mother ?c]]}
          {:optional '[[?c age ?a]]}]
  :bind {'?a (fn [res a] (if a (str (res '?c) " is " a " years old")))}
  :order-asc '?c})

;; => ({?p carl, ?c donald} {?a "emily is 23 years old", ?p freya, ?c emily} {?a "emily is 23 years old", ?p bob, ?c emily})

Graph construction / inference

We know Carl’s spouse is Alice, but we don’t have explicity stated her as Donald’s mother and therefore she didn’t appear in the results. However, we could create a :construct query that produces a number of inferred facts based on the following rules (of course these wouldn’t stand up in the real world, but let’s ignore that point on this occasion):

  1. for every ?f father of ?c and where ?m spouse of ?f, we declare ?m mother of ?c
  2. for every ?m mother of ?c and where ?f spouse of ?m, we declare ?f father of ?c
  3. for every ?f father of ?c and ?m mother of ?c, we declare ?f spouse ?m (and vice versa)
  4. declare a new parent relationship for each father/mother
  5. assign gender information for each parent
  6. declare a new child-of relationship for each child

In order to create these new facts we can use a :construct type query and we will also compile it first:

(def infer-parents
  (q/compile-query
   {:from g
    :construct '[[?f father ?c] [?m mother ?c]       ;; rule 1 + 2
                 [?f spouse ?m] [?m spouse ?f]       ;; rule 3
                 [?f parent ?c] [?m parent ?c]       ;; rule 4
                 [?f gender male] [?m gender female] ;; rule 5
                 [?c child-of ?f] [?c child-of ?m]]  ;; rule 6
    :query [{:where '[[?f father ?c] [?f spouse ?m]]}
            {:union '[[?m mother ?c] [?m spouse ?f]]}
            {:union '[[?f father ?c] [?m mother ?c]]}]}))
(q/query infer-parents)

;; #{[freya spouse bob] [emily child-of bob] [carl parent donald] [bob gender male]
;;   [alice parent donald] [carl father donald] [freya mother emily] [donald child-of alice]
;;   [alice gender female] [alice mother donald] [bob spouse freya] [emily child-of freya]
;;   [freya gender female] [bob parent emily] [carl spouse alice] [donald child-of carl]
;;   [bob father emily] [alice spouse carl] [carl gender male] [freya parent emily]}

Now we can add these inferred triples to our graph and test it.

Note: Graphs have set-like semantics and only store unique triples. So even though our :construct query produces some triples which are already in the graph, no duplicates will be added.

;; build new graph with inferred triples
(def g2 (trio/add-triples g (q/query infer-parents)))
;; select all parents from new graph and group by child
(q/query {:select :* :from g2 :query [{:where '[[?p parent ?c]]}] :group '?c})

;; => {emily [{?p bob, ?c emily} {?p freya, ?c emily}], donald [{?p carl, ?c donald} {?p alice, ?c donald}]}

The new graph with inferred triples marked in red:

../assets/query-example02.png

Subgraph extraction

As a final introductory example, let’s demonstrate how to extract anything we know about subjects matched via an initial query/search, in this case people with nicknames and who are younger than 40. A :describe query returns all triples in which a bound result var is either a subject or object.

(q/query {:describe '?p :from g2 :query [{:where '[[?p nick ?n] [?p age ?a]] :filter {'?a #(< % 40)}}]})

;; => #{[emily child-of bob] [emily child-of freya] [emily nick em] [emily age 23]
;;      [donald friend emily] [freya mother emily] [freya parent emily] [bob parent emily] [bob father emily]}

The extracted sub-graph defined by the result triples (edges in red are the matched query terms):

../assets/query-example03.png

Query compiler

The role of the query compiler is to validate, analyze and translate query specs (plain hashmaps) into actual working code, as well as to reduce pre-processing and optimize code paths for repetitive queries.

As shown in the above examples, the query engine supports several query types, all of which are described in further detail below. Common to all query types is the concept of using one or more subqueries (each of which describes one or more graph patterns with variables to be matched). Each sub-query also defines a set operation how its individual result set is to be merged with already existing results to produce the final solution(s).

Query compilation is realized via the compile-query-step multimethod in order to cleanly separate logic for each query element and to keep the code legible & steps composable.

(defmulti compile-query-step
  (fn [qfn q type] type))

Essentially, each compile step is using functional composition to wrap its predecessor and combine that result with its own. To allow fail-fast behaviors for some phases, the namespaced ::empty keyword is used internally as sentinel to distinguish the initially empty result set from an earlier failed search (where results are nil or an empty seq).

Before discussing the compilation steps in detail, we first define and take a look at the various top-level query types and their options:

Main query types

Query type: SELECT

Purpose: Matches patterns in graph, returns sequence of all found solutions. Each solution is a map of unique qvar bindings.

Use cases:

  • General graph queries / pattern matching
  • Result transformations
Query spec keys
KeyValue(s)DescriptionRequired
:select:*, qvar(s), mapDefines which qvars appear in solutionY
(See Result var projection)
:fromtrio.core/PModelSelect instanceDefault triplestore to use for all sub-queriesY/N
(only mandatory if any sub-query doesn’t provide its own :from)
:query[{sub-query} …]Seq of any number of sub-queries (see Sub-queries)Y
:asyncbooleanIf true, execute sub-queries asynchronously in parallelN
Note: Not yet implemented
:bindfn or bindings mapInject/transform qvars in final solutionsN
(see Query options)
:values’{?v1 #{a b} ?v2 c ..}Pre-bind qvars to given values for all subqueriesN
in order to constrain search.
If a value set is specified for a qvar, attempts all options given
Note: The same effect can be achieved with filters,
though pre-binding will produce optimized queries
:distinctbooleanIf true, results only include distinct bindings
:aggregate{?v fn …}Compute aggregate values and inject into solutionN
(See Result aggregation & grouping)
:filter(fn [res] ..) or {?v fn..}Filter final solutionsN
:order-asc‘?varSort solutions by qvar (ascending)N
’[?v1 ?v2 ..]Nested sort by first ?v1, ?v2 …
:order-descsame as :order-ascSort solutions by qvar (descending)N
:group‘?varGroup solutions by qvarN
(fn [res] group-key)Group solutions by return value of fn
Note: returns solutions as map (not as seq)
Order of processing steps

The order of processing steps of a :select query is as follows:

  1. Compile query spec
  2. Produce solution seq via iterative sub-query processing
  3. Inject qvars given via :bind option in all results
  4. Sort solution seq via :order-asc or :order-desc options
  5. Project variables
    1. Perform grouping, if enabled
    2. Compute & inject aggregations (if grouped, apply aggregation fns for each key’s solution set)
    3. Final result filtering
Query spec validation

The query compiler uses thi.ng/validate to verify all query specs and provide useful error messages. The follow spec serves as formal (but non-exhaustive) definition of a valid top-level :select query. Further validation occurs for certain query elements (e.g. during sub-query dispatch, see below).

(def validator-select
  {:select    (v/alts
               [(v/symbol) (v/keyword) (v/sequential)]
               "must be a qvar, keyword or seq")
   #?@(:clj
        [:from (v/optional
                (v/alts
                 [(v/instance clojure.lang.IDeref)
                  (v/satisfies api/PModelSelect) (v/satisfies api/PModelConvert)]
                 "must be an instance of IDeref or satisfy the PModelSelect or PModelConvert protocols"))])
   :query     {:* (v/map)}
   :bind      (v/optional
               (v/alts
                [(v/map) (v/function)]
                "must be a map or fn"))
   :group     (v/optional
               (v/alts
                [(v/symbol) (v/sequential) (v/function)]
                "must be a qvar, qvar seq or fn"))
   :aggregate (v/optional (v/map))
   :filter    (v/optional
               (v/alts
                [(v/map) (v/function)]
                "must be a map or fn"))})
Implementation
(defmethod compile-query-step :select
  [qfn {:keys [select bind order-asc order-desc distinct] :as q} _]
  (debug :select select)
  (validate {:from (collect-source-graphs q)} {:from (v/required)})
  (validate q validator-select)
  (cond->
   (compile-query-step qfn q :query-dispatch)
   bind       (compile-query-step bind :bind)
   order-asc  (compile-query-step order-asc :order-asc)
   order-desc (compile-query-step order-desc :order-desc)
   true       (compile-query-step q :project-vars)
   distinct   (compile-query-step q :distinct-bindings)))

Query type: CONSTRUCT

Purpose: Matches patterns in graph(s) and binds values to qvars, then constructs new triples using bound qvars solutions. Returns set of constructed triples.

Use cases:

  • Rule-based inference
  • Triple mapping/updating
  • Reification
  • Annotation
  • Adding of provenance data
  • Graph transformation (in combination w/ :describe)
Query spec keys
KeyValue(s)DescriptionRequired
:construct[pattern …]solutions contain all used qvarsyes
:fromtrio.core/PModelSelect instancedefault triplestore to use for all sub-queriesyes
:query[{sub-query} …]seq of any number of sub-queries (see Sub-queries)yes
:bindfn or bindings mapinject/transform qvars in final solutionsno
(see Query options)

Any other options supported by :select will be removed from the query spec.

Implementation
(defn construct-triples
  [patterns res]
  (->> res
       (mapcat
        (fn [r]
          (map
           (fn [[s p o]]
             (let [s (if (qvar? s) (r s) s)
                   p (if (qvar? p) (r p) p)
                   o (if (qvar? o) (r o) o)]
               (if (and s p o) (api/triple s p o))))
           patterns)))
       (filter identity)
       (set)))

(defmethod compile-query-step :construct-triples
  [qfn {:keys [construct] :as q} _]
  (fn [res] (->> res qfn (construct-triples construct))))

(defmethod compile-query-step :construct
  [qfn q _]
  (let [q (-> q
              (dissoc :order-asc :order-desc :group)
              (assoc :select :*))]
    (-> qfn
        (compile-query-step q :select)
        (compile-query-step q :construct-triples))))

Query type: ASK

(defmethod compile-query-step :ask
  [qfn q _]
  (let [q (-> q
              (dissoc :order-asc :order-desc :bind :group :aggregate)
              (assoc :select :*))
        qfn (compile-query-step qfn q :select)]
    (fn [res] (if (seq (qfn res)) true false))))

Query type: DESCRIBE

Purpose: Extracts sub-graph for nodes matched via query. Returns sub-graph as set of triples. If used with multiple graphs (defined via :from in sub-queries), :describe queries will return a combined result set from all graphs.

Use cases:

  • REST APIs
  • Sub-graph extraction
  • Selective graph merging
  • Graph partitioning/segmentation
  • Complete removal of resources from graph(s)

Options:

KeyValue(s)DescriptionRequired
:describe‘?varselect sub-graphs of nodes in the value set of single qvaryes
’[?v1 ?v2 …]select sub-graphs of given qvar value sets
:fromtrio.core/PModelSelect instancedefault triplestore to use for all sub-queriesyes
:query[{sub-query} …]seq of any number of sub-queries (see Sub queries)yes
:bindfn or bindings mapinject/transform qvars in final solutionsno
(see Query options)

Any other options supported by :select will be removed from the query spec.

Implementation
(defmethod compile-query-step :describe
  [qfn {:keys [describe] :as q} _]
  (let [q        (-> q
                     (dissoc :order-asc :order-desc :group :aggregate)
                     (assoc :select :*))
        qfn      (compile-query-step qfn q :select)
        describe (if (sequential? describe) describe [describe])
        graphs   (collect-source-graphs q)]
    (fn [res]
      (let [vars (select-keys (accumulate-result-vars (qfn res)) describe)]
        (if (seq vars)
          (reduce-kv
           (fn [acc _ vals]
             (reduce
              (fn [acc from]
                (-> acc
                    (into (mapcat #(api/select from % nil nil) vals))
                    (into (mapcat #(api/select from nil nil %) vals))))
              acc graphs))
           #{} vars))))))

Sub-queries

Every main query spec must contain a :query key which is a vector or seq of subqueries and which themselves are maps. The engine currently supports the following sub-query types:

TypeKeyOptionsDescription
join:where:from :filter :bind :valuesjoin/intersection of given patterns (fail fast)
join optional:optional:from :filter :bind :valuesnatural join of given patterns
union:union:from :filter :bind :valuesmerge results with existing
negation:minus:from :filter :bind :valuesremove results from existing

Sub-query ordering

IMPORTANT: Each sub-query must specify one or more search patterns which are always joined in order and are fail fast (i.e. if one pattern fails the entire sub-query fails). The specified set operation (e.g. optional join) is only applied to merge a sub-query’s result set with the already computed solution. That also means, any query type other than a :where is potentially meaningless as first (or only) sub-query: For example, a :minus sub-query stated as first sub-query would attempt to subtract its results from a still empty result set and therefore constitute a no-op…

Options

Each sub-query can define its own filters & variable injections and other options. These are passed via the options keys listed in the above table.

The :filter & :bind options both assume a function which is applied to each unique solution map. However, for most use cases it is more convenient to only specify these functions for individual query vars and the following helpers allow us to do so.

Example

For example this sub-query asks for artists under 25 and injects a new qvar ?fullname into each result:

{:where '[[?s :type :artist] [?s :age ?a] [?s :fname ?fn] [?s :surname ?sn]]
 :filter {'?a #(< % 25)}
 :bind   {'?fullname (fn [res _] (str (res '?fn) " " (res '?sn)))}}
:filter

If specified as map with qvars as keys, :filter functions must accept a single arg, i.e. the value of the qvar binding to check. Any truthy return value will keep the result in the set of solutions.

(defn compile-filter-fn
  "Takes a map of qvar bindings as keys and filter fns as values.
  Returns fn which accepts a single result map and returns true if all
  given filter fns succeed. Filter fns must accept a single arg, i.e.
  the qvar's value in the result map. Filter fns are only called if
  the qvar is bound. If a filter is defined for a qvar which does not
  ever appear in a solution, the related query will produce no
  results."
  [filters]
  (if filters
    (if (fn? filters)
      filters
      (fn [res]
        (every?
         (fn [[k f]] (let [rv (res k)] (if-not (nil? rv) (f rv) false)))
         filters)))))
:bind

On the other hand, :bind fns must accept two args:

  1. the full result map and
  2. the related qvar’s value.

The return value will be bound to the specified qvar in each result/solution, unless the function returns nil.

Important: For subqueries, qvar injection via :bind is only applied post-filtering.

(defn compile-bind-fn
  "Takes a map of qvar bindings as keys and bind fns as values.
  Returns a fn which accepts a single result map and returns updated
  map with injected/updated qvars. Bind fns must accept two args: the
  full result map and the value of the fn's associated qvars. The fn's
  return value is used as new binding for the qvar unless nil. Bind
  fns are called regardless if qvar is currently bound (non-nil). This
  is in order to allow for the injection of new vars into the
  solution."
  [bindings]
  (if bindings
    (if (fn? bindings)
      bindings
      (fn [res]
        (reduce-kv
         (fn [acc k f]
           (let [rv (f res (res k))]
             (if (nil? rv) acc (assoc acc k rv))))
         res bindings)))))
:from (Multi-graph queries)

In order to support multi-graph queries, each sub-query can specify its own graph to act on using the :from option. If not specified, the sub-query inherits the value stated in the main spec.

;; Multigraph example, define two graphs:
;; world-graph contains stats of countries & cities
;; uk-graph contains UK specifics

(def world-graph
  (trio/as-model
   '{usa        {type country}
     brazil     {type country}
     uk         {type country}
     nyc        {type city, population 8.4e6, located-in usa}
     sao-paulo  {type city, population 20.3e6, located-in brazil}
     london     {type city, population 9.8e6, located-in uk}
     manchester {type city, population 2.55e6, located-in uk}
     cambridge  {type city, population 1.24e5, located-in uk}}))

(def uk-graph
  (trio/as-model
   '[[LDN name london]
     [MAN name manchester]
     [CAM name cambridge]
     [[CAM LDN MAN] type city]
     [[brixton camden westminster] part-of LDN]
     [[ardwick bradford hulme] part-of MAN]
     [[abbey castle market] part-of CAM]
     [[brixton camden westminster] type borough]
     [[abbey ardwick bradford castle hulme market] type ward]]))

Visualization of the two graphs combined, with the UK graph shown in red:

../assets/multigraph.png

;; find all cities w/ population > 1 mil
;; optionally find city regions for any city in UK graph
;; return sorted by ?country -> ?city -> ?region
(q/query
  {:select '[?country ?city ?region]
   :query [{:where '[[?country type country]
                     [?city located-in ?country]
                     [?city population ?pop]]
            :filter {'?pop #(> % 1e6)}
            :from world-graph}
           {:optional '[[?c name ?city] [?region part-of ?c]]
            :from uk-graph}]
   :order-asc '[?country ?city ?region]})

;; => ({?country brazil, ?city sao-paulo} {?country uk, ?city london, ?region brixton}
;;     {?country uk, ?city london, ?region camden} {?country uk, ?city london, ?region westminster}
;;     {?country uk, ?city manchester, ?region ardwick} {?country uk, ?city manchester, ?region bradford}
;;     {?country uk, ?city manchester, ?region hulme} {?country usa, ?city nyc})
:values (Pre-bound values / search restrictions)

As discussed above, specifying a :filter allows us to constrain the solution space of a query, however a filter is always applied after each sub-query and so might have to deal with a large amount of extraneous results. Whereas a filter can specify arbitrary criteria, for some use cases we’re only interested in certain qvars taking a number of possible constants/values and it would be more efficient to pre-constrain the search space. This is achieved via the :values option:

;; Example graph of randomly generated log messages
(def g
  (->> #(vector
         (rand-nth [:server :queue :scheduler :render :db :api])
         (rand-nth [:debug :info :warning])
         (rand-nth ["m1" "m2" "m3" "m4" "m5" "m6" "m7" "m8" "m9"]))
       (repeatedly 100)
       (trio/as-model)))

;; using :filter - select all messages for :server or :db process
;; note: the pattern [?p ?log ?msg] matches *all* triples in the graph
(q/query
 {:select :*
  :from g
  :query [{:where '[[?p ?log ?msg]] :filter {'?p #{:server :db}}}]})

;; => ({?p :server, ?log :debug, ?msg "m7"} {?p :db, ?log :warn, ?msg "m9"} ...)

;; using :values - constrain ?p to either :server or :db *prior* to querying
;; produces same result, but with reduced effort
(q/query
 {:select :*
  :from g
  :query [{:where '[[?p ?log ?msg]] :values {'?p #{:server :db}}}]})

Important: If :values are also defined for the top-level query, then they’re merged with the :values bindings of the sub-query (using merge-with and combining both value sets).

Update query options map
(defn query-opts
  "Takes a query options map and injects compiled versions for
  :filter & :bind keys."
  [opts]
  (-> opts
      (update-in [:filter] compile-filter-fn)
      (update-in [:bind] compile-bind-fn)))

Implementations

(defn- maybe-deref
  [x] (try @x (catch #?(:clj Exception :cljs js/Error) e x)))

(defn- unpack-model
  [model]
  (let [model (maybe-deref model)]
    (if (satisfies? api/PModelSelect model)
      model
      (if (satisfies? api/PModelConvert model)
        (api/as-model model)
        (err/illegal-arg! "Referenced data model doesn't implement PModelSelect")))))

(defmethod compile-query-step :join
  [qfn {:keys [from values where] :as q} _]
  (debug :join where values)
  (let [opts (query-opts q)]
    (fn [res]
      (let [a (qfn res)]
        (if (or (= ::empty a) (seq a))
          (let [b (select-join (unpack-model from) values opts where)
                res' (if (= ::empty a) b (join a b))]
            res')
          a)))))

(defmethod compile-query-step :minus
  [qfn {:keys [from values minus] :as q} _]
  (debug :minus minus values)
  (let [opts (query-opts q)]
    (fn [res]
      (let [a (qfn res)]
        (if (seq a)
          (thi.ng.trio.query/minus a (select-join (unpack-model from) values opts minus))
          a)))))

(defmethod compile-query-step :optional
  [qfn {:keys [from values optional] :as q} _]
  (debug :join-opt optional values)
  (let [opts (query-opts q)]
    (fn [res]
      (let [a (qfn res)]
        (if (or (= ::empty a) (seq a))
          (let [b (select-join (unpack-model from) values opts optional)
                res' (if (= ::empty a) b (join-optional a b))]
            res')
          a)))))

(defmethod compile-query-step :union
  [qfn {:keys [from values union] :as q} _]
  (debug :union union values)
  (let [opts (query-opts q)]
    (fn [res]
      (let [a (qfn res)
            b (select-join (unpack-model from) values opts union)
            res' (if (= ::empty a) b (thi.ng.trio.query/union a b))]
        res'))))

Dispatch

(defmethod compile-query-step :query-dispatch
  [qfn {:keys [from values query]} _]
  (debug :query-dispatch query)
  (loop [qfn qfn, query query, first? true]
    (if query
      (let [q (first query)
            from (or (:from q) from)
            from' (maybe-deref from)
            q (if (or (satisfies? api/PModelSelect from')
                      (satisfies? api/PModelConvert from'))
                (assoc q :from from))
            q (if values
                (assoc q :values (merge-with (fnil into #{}) values (:values q)))
                (update-in q [:values] (fnil identity {})))
            _ (when (and first? (seq (select-keys q [:optional :minus :union])))
                (err/illegal-arg! "First sub-query type must be of type :where"))
            qfn (cond
                 (:where q)    (compile-query-step qfn q :join)
                 (:optional q) (compile-query-step qfn q :optional)
                 (:minus q)    (compile-query-step qfn q :minus)
                 (:union q)    (compile-query-step qfn q :union))]
        (recur qfn (next query) false))
      qfn)))

Result processing & modifiers

(defmethod compile-query-step :bind
  [qfn bind _]
  (debug :bindings bind)
  (let [bind (compile-bind-fn bind)]
    (fn [res] (map bind (qfn res)))))

(defmethod compile-query-step :order-asc
  [qfn order _]
  (debug :order-asc order)
  (fn [res] (order-asc order (qfn res))))

(defmethod compile-query-step :order-desc
  [qfn order _]
  (debug :order-desc order)
  (fn [res] (order-desc order (qfn res))))

(defmethod compile-query-step :distinct-bindings
  [qfn spec _]
  (if (:group spec)
    (fn [res]
      (reduce-kv (fn [acc k v] (assoc acc k (into #{} v))) {} (qfn res)))
    (fn [res]
      (into #{} (qfn res)))))

Result var projection & final filtering

All top-level query types are based on an initial :select query and result variable projection & filtering is the final transformation step of a :select. During this phase the engine does the following (in this order):

  1. result seq is transformed into a map when :group is enabled (optional)
  2. aggregation functions are run (optional)
  3. results are passed through top-level filter (optional)
  4. any qvars not requested by the user (but which are present in the solutions) are removed
  5. any qvars with aliasing or transformation fns are renamed/processed

Important: This processing order implies that any top-level filters can only be applied to “normal” or aggregated qvars (i.e. not their renamed/transformed versions).

Result var aliasing & transformation

For :select queries, the engine supports the implicit renaming of qvars in the set of solutions, as well as the option to transform a qvar’s value during the final projection.

:select valueDescription
:*Keep all bound qvars in result maps
‘?vResult maps only contain given qvar
’[?v1 ?v2 …]Result maps only contain given qvars
{‘?v (fn [res] (* 2 (res ‘?v)))}Result maps only contain qvars used as keys in the map
and their values as transformed by fn
Note: The fn receives the full result map
{‘?alias ?orig}Result maps only contain ?alias but values from ?orig
{‘?alias {:use ‘?v :fn (fn [v] …)}}Result maps only contain qvars used as keys in the map
but take (optionally transformed) value of qvar given for :use
Note: the :fn key is optional
[?v1 {?v2 #(…) ?v3 {:use … :fn …}}]Combination of all above options is possible too

As shown in the table, in order to rename and/or transform a qvar binding during projection we need to use a map in one of these formats:

;; solution contains ?a and ?b-new (as alias for ?b)
:select ['?a {'?b-new '?b}]

;; solution contains ?a and ?b-new (as alias for ?b and transformed ?b values)
:select ['?a {'?b-new {:use '?b :fn #(* 100.0 %)}}]

;; solution contains ?a and ?b (transformation fn receives full result map)
:select ['?a {'?b (fn [res] (* (res '?b) (inc (res '?tax))))}]

;; same as above, but only ?b (don't need to wrap map in vector)
:select {'?b (fn [res] (* (res '?b) (inc (res '?tax))))}
Example

The following example graph maps authors to their books & prices:

../assets/bookauthors.png

;; data store of authors, books & prices
(def ds
  (trio/as-model
   [[:a1 :author [:b1 :b2]]
    [:a2 :author [:b1 :b3]]
    [:a3 :author :b4]
    [:b1 :price 10]
    [[:b2 :b4] :price 20]
    [:b3 :price 5]]))

Let’s query this graph to return all books with price < 20 and calculate their retail price incl. taxes:

(q/query
 {:select ['?b {'?net '?p, '?retail {:use '?p :fn #(* 1.2 %)}}]
  :from ds
  :query '[{:where [[?a :author ?b] [?b :price ?p]]}]
  :filter {'?p #(< % 20)}})

;; => ({?b :b3, ?net 5, ?retail 6.0}
;;     {?b :b1, ?net 10, ?retail 12.0}
;;     {?b :b1, ?net 10, ?retail 12.0})
Implementation
(defn renamed-vars
  [binding-map]
  (reduce-kv
   (fn [vars k v]
     (if (and (map? v) (:as v))
       (assoc vars (:as v) v)
       (assoc vars k v)))
   {} binding-map))

(defn selected-var-keys
  [qvars]
  (reduce
   (fn [vars v]
     (if (map? v)
       (merge vars v)
       (assoc vars v v)))
   {} qvars))

(defn parse-var-bind-spec
  [v spec] (if (map? spec) [(or (:use spec) v) (:fn spec)] [nil spec]))

(defn compute-select-bindings
  [res svars]
  (reduce-kv
   (fn [acc v spec]
     (let [[k f] (parse-var-bind-spec v spec)
           v' (if k
                (if (fn? f) (f (res k)) (res k))
                (f res))]
       (if v'
         (assoc acc v v')
         acc)))
   {} svars))

(defn validate-selected-vars
  [svars]
  (if (svars :*)
    (if (> (count svars) 1)
      (err/illegal-arg! (str "can't use :* selector with other vars: " (vals svars)))
      (dissoc svars :*))
    svars))

(defmethod compile-query-step :project-vars
  [qfn {:keys [select group aggregate filter] :as q} _]
  (debug :project-vars select)
  (let [select     (if (sequential? select) select [select])
        svars      (selected-var-keys select)
        catch-all? (svars :*)
        svars      (validate-selected-vars svars)
        agg-only?  (every? (set (keys aggregate)) (keys svars))
        filter     (compile-filter-fn filter)]
    (if group
      (compile-query-step qfn [group aggregate svars filter agg-only? catch-all?] :project-groups)
      (if (seq aggregate)
        (if (and agg-only? (not catch-all?))
          (compile-query-step qfn [aggregate svars filter] :project-vars-aggregates)
          (compile-query-step qfn [aggregate svars catch-all? filter] :project-vars-mixed))
        (if catch-all?
          (compile-query-step qfn filter :project-vars-all)
          (compile-query-step qfn [svars filter] :project-vars-simple))))))

Result aggregation & grouping

The query engine supports the computation of aggregate values (reductions) for both grouped and ungrouped result sets. Aggregated qvars are defined via the top-level :aggregate key which is a map with qvars as keys and their reduction functions as values.

There are differences in how such aggregated qvars appear in the solution, based on if grouping is enabled or not. If no group criteria are given, aggregate qvars and their values are injected into every single result map of the solution seq. This table gives a better overview of the expected outcomes when aggregation is used:

GroupingAggregated vars selectedNon-aggregated vars selectedSolution format
YYYmap keyed by group, each group a seq of result maps w/ agg. qvars injected in each
YYNmap keyed by group, each key only single map of agg. qvars
YNYmap keyed by group, each group a seq of result maps, no agg. qvars (pointless)
NYYseq of result maps, each w/ selected qvars and agg. qvars injected in each
NYNmap w/ only agg. qvars
NNYseq of result maps, each w/ selected qvars

Note: Aggregated qvars will not automatically appear in the solution unless stated in the list of :select qvars. They can be further aliased & transformed during projection just as “normal” qvars…

Whereas aggregation is supported for all query types, result grouping is only supported for :select type queries and causes the solutions to be returned as map instead of a flat sequence. The returned map is keyed by the result value of the stated qvars or function given for the :group option in the query spec.

Multi-value clustering can be achieved by specifying several qvars as sequence. In this case, the keys of the solution map are unique tuples of the given qvar values. If a function is used to compute the grouping criteria, that function will be called for each individual result map and the return value is used to define grouping.

Example

Let’s use once more the above example graph of book authors. We will now query this graph using grouping and compute some aggregate stats for each author. (Here we use some predefined aggregation functions, see section further below)

;; Aggregate query example

;; retrieve book prices, group by author
;; compute avg. & max price, group size per author
;; filter by avg.
(q/query
 {:select '[?avg ?max ?num]
  :from ds
  :query '[{:where [[?a :author ?b] [?b :price ?p]]}]
  :group '?a
  :aggregate {'?avg {:use '?p :fn q/mean}
              '?max {:use '?p :fn q/max}
              '?num count}
  :filter {'?avg #(> % 10)}})

;; => {:a3 {?avg 20.0, ?max 20, ?num 1}, :a1 {?avg 15.0, ?max 20, ?num 2}}
;; produce map of authors and their collected books
(q/query
 {:select :books
  :from ds
  :query '[{:where [[?a :author ?b]]}]
  :group '?a
  :aggregate {:books {:use '?b :fn #(into #{} %)}}})

;; => {:a1 {:books #{:b2 :b1}}, :a2 {:books #{:b1 :b3}}, :a3 {:books #{:b4}}}

Implementation

The following functions and compiler steps are used to produce optimized code paths based on the analysis of qvars selected for projection, both for grouped & ungrouped results.

(defn compile-group-fn
  [group]
  (if (fn? group)
    group
    (if (sequential? group)
      (fn [res] (reduce #(conj % (res %2)) [] group))
      (fn [res] (res group)))))

(defn compute-aggregates
  [vars vals]
  (reduce-kv
   (fn [inner v spec]
     (let [[k f] (parse-var-bind-spec v spec)]
       (if k
         (assoc inner v (f (map #(get % k) vals)))
         (assoc inner v (f vals)))))
   {} vars))

(defn project-aggregates-only
  [avars svars filter]
  (fn [vals]
    (let [agg (compute-aggregates avars vals)]
      (if (or (nil? filter) (filter agg))
        (compute-select-bindings agg svars)))))

(defn project-aggregates-mixed
  [avars svars all? flt]
  (fn [vals]
    (let [agg  (compute-aggregates avars vals)
          vals (map #(merge % agg) vals)
          vals (if flt (filter flt vals) vals)]
      (if (seq vals)
        (if all?
          vals
          (map #(compute-select-bindings % svars) vals))))))

(defn project-aggregates-simple
  [vars flt]
  (fn [vals]
    (let [vals (if flt (filter flt vals) vals)]
      (if (seq vals)
        (map #(compute-select-bindings % vars) vals)))))

(defn project-aggregates-all
  [flt]
  (fn [vals]
    (let [vals (if flt (filter flt vals) vals)]
      (if (seq vals) vals))))

(defmethod compile-query-step :project-vars-aggregates
  [qfn [avars svars filter] _]
  (debug :aggregates-only avars filter)
  (let [project-fn (project-aggregates-only avars svars filter)]
    (fn [res] (->> (qfn res) (project-fn)))))

(defmethod compile-query-step :project-vars-mixed
  [qfn [avars svars all? flt] _]
  (debug :mixed-aggregates avars svars all? flt)
  (let [project-fn (project-aggregates-mixed avars svars all? flt)]
    (fn [res] (->> (qfn res) (project-fn)))))

(defmethod compile-query-step :project-vars-simple
  [qfn [vars flt] _]
  (debug :simple-aggregates vars flt)
  (let [project-fn (project-aggregates-simple vars flt)]
    (fn [res] (->> (qfn res) (project-fn)))))

(defmethod compile-query-step :project-vars-all
  [qfn flt _]
  (debug :all-aggregates flt)
  (let [project-fn (project-aggregates-all flt)]
    (fn [res] (->> (qfn res) (project-fn)))))

(defn project-groups-with
  [project-fn res]
  (reduce-kv
   (fn [acc k vals]
     (if-let [v (project-fn vals)]
       (assoc acc k v)
       acc))
   {} res))

(defmethod compile-query-step :group-aggregates-only
  [qfn [avars svars filter] _]
  (debug :aggregates-only avars filter)
  (let [project-fn (project-aggregates-only avars svars filter)]
    (fn [res] (->> (qfn res) (project-groups-with project-fn)))))

(defmethod compile-query-step :group-aggregates-mixed
  [qfn [avars svars all? flt] _]
  (debug :mixed-aggregates avars svars all? flt)
  (let [project-fn (project-aggregates-mixed avars svars all? flt)]
    (fn [res] (->> (qfn res) (project-groups-with project-fn)))))

(defmethod compile-query-step :group-aggregates-simple
  [qfn [vars flt] _]
  (debug :simple-aggregates vars flt)
  (let [project-fn (project-aggregates-simple vars flt)]
    (fn [res] (->> (qfn res) (project-groups-with project-fn)))))

(defmethod compile-query-step :group-aggregates-all
  [qfn flt _]
  (debug :all-aggregates flt)
  (let [project-fn (project-aggregates-all flt)]
    (fn [res] (->> (qfn res) (project-groups-with project-fn)))))

(defmethod compile-query-step :project-groups
  [qfn [group aggregate svars filter agg-only? catch-all?] _]
  (debug :project-groups group aggregate)
  (let [group (compile-group-fn group)
        qfn   (fn [res] (group-by group (qfn res)))]
    (if (seq aggregate)
      (if (and agg-only? (not catch-all?))
        (compile-query-step qfn [aggregate svars filter] :group-aggregates-only)
        (compile-query-step qfn [aggregate svars catch-all? filter] :group-aggregates-mixed))
      (if catch-all?
        (compile-query-step qfn filter :group-aggregates-all)
        (compile-query-step qfn [svars filter] :group-aggregates-simple)))))

Aggregation & grouping operations

Statistics
(defn mean
  "Aggregation function, computes mean of given values (as double)."
  [vals] (double (/ (reduce + vals) (count vals))))

(defn median
  "Aggregation function, computes median of given values."
  [vals]
  (nth (sort vals) (/ (count vals) 2)))

(defn min
  "Aggregation function, computes minimum of given values."
  [vals] (reduce clojure.core/min vals))

(defn max
  "Aggregation function, computes maximum of given values."
  [vals] (reduce clojure.core/max vals))

(defn percentile
  "Aggregation HOF, takes percentile value (0-100), returns aggration
  function accepting a seq of query var results and computes requested
  percentile."
  [n]
  {:pre [(integer? n) (m/in-range? 0 100 n)]}
  (fn [vals]
    (m/percentile n (sort vals))))

(defn quartile
  "Aggregation HOF, takes quartile value (1-4), returns aggration
  function accepting a seq of query var results and computes requested
  quartile subset."
  [n]
  {:pre [(integer? n) (m/in-range? 1 4 n)]}
  (fn [vals]
    (m/quartile n (sort vals))))
Binning
(defn bins-of
  "Grouping HOF, accepts a bin size and query var, returns grouping
  function binning values of qvar to steps of bin size."
  [n qvar]
  {:pre [(number? n) (pos? n) (qvar? qvar)]}
  (fn [res] (* (m/floor (/ (res qvar) n)) n)))

Query validation

We have seen how various phases of the query compiler define their own query spec validators. The following helper functions are used to apply them. If validation fails, query compilation halts and an error/exception is thrown with the error message containing a stringified map of explanations/reasons for each failed key.

;; failed validation example
(q/query {:select :* :query [{:where '[[?a author ?b]]}]})

;; => IllegalArgumentException error compiling query: {:from ("is required")}
(defn validate
  [q spec]
  (try
    (let [[ret err] (v/validate q spec)]
      (if-not err
        ret
        (err/throw! (str "Error compiling query: " err))))
    (catch #?(:clj Exception :cljs js/Error) e
      (err/throw! #?(:clj (.getMessage e) :cljs e)))))

Validation helpers

  

Lowlevel query functions

The functions in this section do the actual heavy-lifting and together with the select method of a triplestore’s PModelSelect protocol implementation provide the pattern matching functionality the query engine relies on.

Helpers

Query variables

(def ^:dynamic *auto-qvar-prefix* "?__q")

(def qvar?
  "Returns true, if x is a qvar (a symbol prefixed with '?')"
  (fn [x] (and (symbol? x) (= \? (.charAt ^String (name x) 0)))))

(defn auto-qvar?
  "Returns true, if x is an auto-generated qvar
  (a symbol prefixed with *auto-qvar-prefix*)"
  [x] (and (symbol? x) (zero? (.indexOf ^String (name x) ^String *auto-qvar-prefix*))))

(defn auto-qvar
  "Creates a new auto-named qvar (symbol)."
  [] (gensym *auto-qvar-prefix*))

(defn qvar-name
  [x] (-> x name (subs 1)))

Data sources

(defn collect-source-graphs
  [q] (->> q :query (into #{(:from q)} (map :from)) (filter identity)))

Query pattern resolution & handling

Specifying variable length paths

The table below provides an overview of possible property paths which can be defined for each search pattern. In principle these are based on the options defined for SPARQL, albeit using different syntax.

Path patternDescriptionMeaning
[?s [p1 p2] ?o]fixed length path between ?s, ?oequivalent to: {:where '[[?s p1 ?x] [?x p2 ?o]}
[?s [:? p1 p2] ?o]fixed length path, optionalif path exists fully, bind ?s, ?o
[?s [:+ p1 p2] ?o]full path, cyclicif path exists fully one or more times, bind ?s, ?o
[?s [:* p1 p2] ?o]partial path, cyclicif path exists partially one or more times, bind ?s, ?o
[?s [:not p] ?o]single edge not via pequivalent to: {:where [?s ?pp ?o]} {:minus [?s p ?o]}
[?s [:or p1 p2] ?o]single edge p1 or p2equivalent to: {:where [?s p1 ?o]} {:union [?s p2 ?o]}

Important: Please note, that property paths are not yet supported by the highlevel query engine, but are actively being worked on. So far, only the first four property path styles can be directly queried (in isolation) using the low-level select-transitive function described further below.

(defn resolve-path-pattern
  "Takes a path triple pattern where the predicate is a seq of preds.
  Returns seq of query patterns with injected temp qvars for inbetween
  patterns. E.g.

      [?s [p1 p2 p3] ?o]
      => ([?s p1 ?__q0] [?__q0 p2 ?__q1] [?__q1 p3 ?o])"
  [[s p o]]
  (let [vars (->> auto-qvar
                  (repeatedly (dec (count p)))
                  (cons s))]
    (->> (concat (interleave vars p) [o])
         (d/successive-nth 3 2))))

(defn resolve-patterns
  [patterns]
  (mapcat
   (fn [[_ p :as t]]
     (if (vector? p)
       (resolve-path-pattern t)
       [t]))
   patterns))
Pattern generation
(defn produce-patterns-with-bound-vars
  "Takes a triple pattern (possibly with variables) and a map of
  possible value sets (each *must* be a set or single value) for each var.
  Produces lazy seq of resulting triple query patterns using cartesian
  product of all values.

      (produce-patterns-with-bound-vars
        [?a :type ?b]
        {?a #{\"me\" \"you\"} ?b #{\"foo\" \"bar\"})
      => ((\"me\" :type \"foo\") (\"me\" :type \"bar\")
          (\"you\" :type \"foo\") (\"you\" :type \"bar\"))"
  [[s p o] bindings]
  (let [s (or (bindings s) s)
        p (or (bindings p) p)
        o (or (bindings o) o)]
    (if (some set? [s p o])
      (d/cartesian-product
       (if (set? s) s #{s}) (if (set? p) p #{p}) (if (set? o) o #{o}))
      [[s p o]])))
Query planning

Proper query planning is still an active TODO item. The only step towards this so far is the following function, which sorts the patterns of an single sub-query based on an heuristic prioritizing patterns with the least qvars and therefore producing the most constrained, initial result set.

(defn sort-patterns
  "Sorts a seq of triple patterns in dependency order using any
  re-occuring vars. Triples with least qvars will be in head
  position."
  [patterns]
  (let [q (map #(let [v (d/filter-tree qvar? %)] [(count v) v %]) patterns)
        singles (->> q (filter #(= 1 (first %))) (mapcat second) set)]
    (->> q
         (sort-by (fn [[c v]] (- (* c 4) (count (filter singles v)))))
         (map peek))))
Pattern based triple verification
(defn triple-verifier
  "Takes a triple pattern (potentially with vars) and 3 booleans to
  indicate which SPO is a var. Returns fn which accepts a result triple
  and returns false if any of the vars clash (e.g. a qvar is used multiple
  times but result has different values in each position or likewise, if
  different vars relate to same values)."
  [[ts tp to] vars varp varo]
  (cond
   (and vars varp varo) (cond
                         (= ts tp to) (fn [r] (= (r 0) (r 1) (r 2)))
                         (= ts tp) (fn [r] (and (= (r 0) (r 1)) (not= (r 0) (r 2))))
                         (= ts to) (fn [r] (and (= (r 0) (r 2)) (not= (r 0) (r 1))))
                         (= tp to) (fn [r] (and (= (r 1) (r 2)) (not= (r 0) (r 1))))
                         :else (constantly true))
   (and vars varp)      (if (= ts tp)
                          (fn [r] (= (r 0) (r 1)))
                          (fn [r] (not= (r 0) (r 1))))
   (and vars varo)      (if (= ts to)
                          (fn [r] (= (r 0) (r 2)))
                          (fn [r] (not= (r 0) (r 2))))
   (and varp varo)      (if (= tp to)
                          (fn [r] (= (r 1) (r 2)))
                          (fn [r] (not= (r 1) (r 2))))
   :else                (constantly true)))

Result handling

(defn unique-bindings?
  "Returns true if all values in the given map are unique, i.e.
  no two keys are mapped to the same value."
  [map] (== (count (into #{} (vals map))) (count map)))

(defn accumulate-result-vars
  "Takes a seq of query result maps and returns a map of all found
  qvars as keys and their value sets."
  ([res] (accumulate-result-vars {} nil res))
  ([acc vars res]
     (let [acc (reduce
                (fn [acc b]
                  (merge-with
                   (fn [a b] (if (set? a) (conj a b) (hash-set a b)))
                   acc (if vars (select-keys b vars) b)))
                acc res)]
       (reduce-kv
        (fn [acc k v] (if (set? v) acc (assoc acc k #{v})))
        acc acc))))

(defn select-renamed-keys
  "Similar to clojure.core/select-keys, but instead of key seq takes a
  map of keys to be renamed (keys in that map are the original keys to
  be selected, their values the renamed keys in returned map)."
  [ret map aliases]
  (loop [ret ret, keys aliases]
    (if keys
      (let [kv (first keys)
            entry (find map (key kv))]
        (recur
         (if entry
           (assoc ret (val kv) (val entry))
           ret)
         (next keys)))
      ret)))

(defn order-asc
  [vars res]
  (if (coll? vars)
    (sort-by (fn [r] (reduce #(conj % (r %2)) [] vars)) res)
    (sort-by (fn [r] (get r vars)) res)))

(defn order-desc
  [vars res]
  (if (coll? vars)
    (sort-by
     (fn [r] (reduce #(conj % (r %2)) [] vars))
     #(- (compare % %2))
     res)
    (sort-by #(get % vars) #(- (compare % %2)) res)))

(defn distinct-result-set
  [res]
  (->> res
       (reduce
        (fn [acc r]
          (let [vs (set (vals r))]
            (if (acc vs) acc (assoc acc vs r))))
        {})
       (vals)))

(defn keywordize-result-vars
  [res]
  (map
   (fn [r] (into {} (map (fn [[k v]] [(-> k qvar-name keyword) v]) r)))
   res))

Graphviz export

move to separate ns
(defn triples->dot
  "Takes a seq of triples and returns them as digraph spec in
  Graphviz .dot format."
  [triples]
  (apply
   str
   (concat
    "digraph G {\n"
    "node[color=\"black\",style=\"filled\",fontname=\"Inconsolata\",fontcolor=\"white\"];\n"
    "edge[fontname=\"Inconsolata\",fontsize=\"9\"];\n"
    (map
     (fn [t]
       (let [[s p o] (map #(if (string? %) % (pr-str %)) t)]
         (str "\"" s "\" -> \"" o "\" [label=\"" p "\"];\n")))
     triples)
    "}")))

Joins

(defn join
  [a b]
  (->> (set/join a b)
       (seq)
       #_(mapcat
        (fn [k]
          (if (unique-bindings? k)
            [k])))))

(defn join-optional
  [a b]
  (loop [old (transient #{}), new (transient #{}), kb b]
    (if kb
      (let [kb' [(first kb)]
            [old new] (loop [old old, new new, ka a]
                        (if ka
                          (let [ka' (first ka)
                                j (first (set/join [ka'] kb'))]
                            (if j
                              (recur (conj! old ka') (conj! new j) (next ka))
                              (recur old new (next ka))))
                          [old new]))]
        (recur old new (next kb)))
      (let [new (persistent! new)]
        (if (seq new)
          (into (apply disj (set a) (persistent! old)) new)
          a)))))

(defn minus
  [a b]
  (let [vars (accumulate-result-vars b)]
    (reduce-kv
     (fn [res k v]
       (let [v (if (coll? v) v [v])]
         (reduce
          (fn [res v] (vec (remove (fn [r] (= (r k) v)) res))) res v)))
     a vars)))

(defn union
  [a b]
  (debug :union-ab a b)
  (concat a b))

Select

(defn unbound-var?
  [bindings v? x] (and v? (not (bindings x))))

(defn bind-translator
  [vs? vp? vo? [s p o]]
  (if vs?
    (if vp?
      (if vo?
        (fn [r] {s (r 0) p (r 1) o (r 2)})
        (fn [r] {s (r 0) p (r 1)}))
      (if vo?
        (fn [r] {s (r 0) o (r 2)})
        (fn [r] {s (r 0)})))
    (if vp?
      (if vo?
        (fn [r] {p (r 1) o (r 2)})
        (fn [r] {p (r 1)}))
      (if vo?
        (fn [r] {o (r 2)})
        (fn [_] {})))))

(defn unbound-vars-translator
  [bindings vs? vp? vo? [s p o]]
  (if (unbound-var? bindings vs? s)
    (if (unbound-var? bindings vp? p)
      (if (unbound-var? bindings vo? o)
        (fn [p] [nil nil nil])
        (fn [p] [nil nil (p 2)]))
      (if (unbound-var? bindings vo? o)
        (fn [p] [nil (p 1) nil])
        (fn [p] (assoc p 0 nil))))
    (if (unbound-var? bindings vp? p)
      (if (unbound-var? bindings vo? o)
        (fn [p] [(p 0) nil nil])
        (fn [p] (assoc p 1 nil)))
      (if (unbound-var? bindings vo? o)
        (fn [p] (assoc p 2 nil))
        identity))))

(defn select-with-bindings-alts
  [ds bindings opts [s p o :as t]]
  (let [vs?      (qvar? s), vp? (qvar? p), vo? (qvar? o)
        vmap     (bind-translator vs? vp? vo? t)
        verify   (triple-verifier t vs? vp? vo?)
        flt      (compile-filter-fn (opts :filter))
        res-fn   (if (:triples opts)
                   (if flt
                     #(if (verify %)
                        (let [vbinds (vmap %)]
                          (if (flt vbinds) [vbinds %])))
                     #(if (verify %) [(vmap %) %]))
                   (if flt
                     #(if (verify %)
                        (let [vbinds (vmap %)]
                          (if (flt vbinds) vbinds)))
                     #(if (verify %) (vmap %))))
        qs (if vs? (bindings s) s)
        qp (if vp? (bindings p) p)
        qo (if vo? (bindings o) o)
        _ (debug :select-alt qs qp qo)
        res (api/select-with-alts ds qs qp qo)]
    (if (seq res)
      (loop [acc (transient []), res res]
        (if res
          (let [r (res-fn (first res))]
            (if r
              (recur (conj! acc r) (next res))
              (recur acc (next res))))
          (persistent! acc))))))

(defn select-with-bindings
  [ds bindings opts [s p o :as t]]
  (let [patterns (produce-patterns-with-bound-vars t bindings)
        vs?      (qvar? s), vp? (qvar? p), vo? (qvar? o)
        vmap     (bind-translator vs? vp? vo? t)
        pmap     (unbound-vars-translator bindings vs? vp? vo? t)
        verify   (triple-verifier t vs? vp? vo?)
        flt      (compile-filter-fn (opts :filter))
        res-fn   (if (:triples opts)
                   (if flt
                     #(if (verify %)
                        (let [vbinds (vmap %)]
                          (if (flt vbinds) [vbinds %])))
                     #(if (verify %) [(vmap %) %]))
                   (if flt
                     #(if (verify %)
                        (let [vbinds (vmap %)]
                          (if (flt vbinds) vbinds)))
                     #(if (verify %) (vmap %))))]
    (debug :patterns patterns)
    (loop [acc (transient []), ps patterns]
      (if ps
        (let [[qs qp qo] (pmap (vec (first ps)))
              _ (debug :select qs qp qo)
              res (api/select ds qs qp qo)]
          (if (seq res)
            (recur
             (loop [acc acc, res res]
               (if res
                 (let [r (res-fn (first res))]
                   (if r
                     (recur (conj! acc r) (next res))
                     (recur acc (next res))))
                 acc))
             (next ps))
            (recur acc (next ps))))
        (persistent! acc)))))

(defn select-join
  ([ds patterns]
     (select-join ds {} {} patterns))
  ([ds bindings {bind :bind flt :filter opt? :optional?} patterns]
     (let [[p & ps] (sort-patterns patterns)
           res      (select-with-bindings ds bindings {} p)
           join-fn  (if opt? join-optional join)
           flt      (compile-filter-fn flt)
           bind     (compile-bind-fn bind)]
       (if (seq res)
         (loop [res res, ps ps]
           (if ps
             (let [binds (merge-with into (accumulate-result-vars res) bindings)
                   r' (select-with-bindings-alts ds binds {} (first ps))
                   res (if (seq r') (join-fn res r'))]
               (if (seq res)
                 (recur res (next ps))))
             (cond->> res
                      flt (filter flt)
                      bind (map bind))))))))

Transitive properties & path queries

If using equal min/max search depth, resort to select-join with expanded pattern. In that case the predicate can be a seq [pred1 pred2 ..] which is expanded cyclically for depth hops, e.g:

(select-transitive ds '[?s [mother father] ?o] 3 3)

Else the fn takes a standard pattern with the subject or object either a given constant or qvar:

Select all ‘mother’ rels from 1 - 3 hops (i.e. from mother -> great-grandmother)

(select-transitive ds '[?s mother ?c] 1 3)

If the object is not a qvar, the search is executed in reverse direction:

(select-transitive ds '[?m mother 'toxi])

Select only grandchildren for given person:

(select-transitive ds '[john father ?c] 2 2)
;; same as...
(select-join ds '[[john father ?p] [?p father ?c]])
(defn subvec-slices
  "Takes a min & max count and returns function accepting a vector as
  single arg. When called, returns vector of subvec slices each starting
  at index 0 and with an increasing length from min to max."
  [n1 n2]
  (fn [path]
    (mapv #(subvec path 0 %) (range n1 (inc (clojure.core/min (count path) n2))))))

(defn dfs-forward*
  [ds s p sv acc min max]
  (if (<= (count acc) max)
    (let [acc (conj acc sv)
          o (auto-qvar)
          r (select-with-bindings ds {s sv} {} [s p o])]
      (if (seq r)
        (let [visited (set acc)
              ovals (filter (comp not visited) (set (map o r)))]
          (if (seq ovals)
            (#?(:clj r/mapcat :cljs mapcat)
               (fn [ov] (dfs-forward* ds o p ov acc min max))
               ovals)
            [acc]))
        (if (> (count acc) min) [acc])))
    [acc]))

(defn dfs-backward*
  [ds o p ov acc min max]
  (if (<= (count acc) max)
    (let [acc (conj acc ov)
          s (auto-qvar)
          r (select-with-bindings ds {o ov} {} [s p o])]
      (if (seq r)
        (let [visited (set acc)
              svals (filter (comp not visited) (set (map s r)))]
          (if (seq svals)
            (#?(:clj r/mapcat :cljs mapcat)
               (fn [sv] (dfs-backward* ds s p sv acc min max))
               svals)
            [acc]))
        (if (> (count acc) min) [acc])))
    [acc]))

(defn select-transitive
  ([ds [s p o :as t]]
   (if (vector? p)
     (select-transitive ds t (count p) (count p))
     (select-transitive ds t 1 1000000)))
  ([ds [s p o] mind maxd]
   (let [mind (clojure.core/max mind 1)
         maxd (clojure.core/max maxd 1)]
     (if (= mind maxd)
       (let [p (if (vector? p) p [p])
             p (take mind (cycle p))]
         (select-join ds {} {} (resolve-path-pattern [s p o])))
       (let [vs? (qvar? s)
             vo? (qvar? o)
             v (auto-qvar)
             conf (cond
                    (and vs? vo?) {:s s ;; [?s p ?o]
                                   :o v
                                   :bmap {}
                                   :lookup (fn [b] [(b v) (b s)])
                                   :bind (fn [p] {s (first p) o (peek p)})
                                   :search dfs-forward*}
                    vo?           {:s v ;; [x p ?o]
                                   :o o
                                   :bmap {v s}
                                   :lookup (fn [b] [(b o) (b v)])
                                   :bind (fn [p] {o (peek p)})
                                   :search dfs-forward*}
                    vs?           {:s s ;; [?s p x]
                                   :o v
                                   :bmap {v o}
                                   :lookup (fn [b] [(b s) (b v)])
                                   :bind (fn [p] {s (peek p)})
                                   :search dfs-backward*})
             {:keys [bind search]} conf
             slices (subvec-slices (inc mind) (inc maxd))]
         (->> (select-with-bindings ds (:bmap conf) {} [(:s conf) p (:o conf)])
              (r/map    (:lookup conf))
              (r/mapcat #(search ds v p (% 0) [(% 1)] mind maxd))
              (r/mapcat slices)
              (r/map    bind)
              (into #{})))))))

Complete namespace definitions

(ns thi.ng.trio.query
  (:refer-clojure :exclude [min max])
  #?(:cljs
     (:require-macros
      [cljs-log.core :refer [debug info warn severe]]))
  (:require
   [thi.ng.trio.core :as api]
   [thi.ng.dstruct.core :as d]
   [thi.ng.dstruct.unionfind :as uf]
   [thi.ng.math.core :as m]
   [thi.ng.xerror.core :as err]
   [thi.ng.validate.core :as v]
   [clojure.set :as set]
   [clojure.core.reducers :as r]
   #?(:clj
      [taoensso.timbre :refer [debug info warn error]])))

<<helpers>>

<<agg-helpers>>

<<joins>>

<<selectors>>

<<dfs>>

<<validators>>

<<compiler>>

<<compiler-api>>

Autogenerated tests from examples in this file

(ns thi.ng.trio.test.query-ex1
  (:require
   [thi.ng.trio.core :as trio]
   [thi.ng.trio.query :as q]
   #?(:clj
      [clojure.test :refer :all]
      :cljs
      [cemerick.cljs.test :refer-macros [is deftest]])))

<<test-ex1-a>>

<<test-ex1-c>>

<<test-ex1-d>>

(deftest test-example
  (is (= '#{{?p carl ?c donald}
            {?a "emily is 23 years old" ?p freya, ?c emily}
            {?a "emily is 23 years old" ?p bob, ?c emily}}
         (set
          <<test-ex1-b>>
          )))
  (is (= '#{[freya spouse bob] [emily child-of bob]
            [carl parent donald] [bob gender male]
            [alice parent donald] [carl father donald]
            [freya mother emily] [donald child-of alice]
            [alice gender female] [alice mother donald]
            [bob spouse freya] [emily child-of freya]
            [freya gender female] [bob parent emily]
            [carl spouse alice] [donald child-of carl]
            [bob father emily] [alice spouse carl]
            [carl gender male] [freya parent emily]}
         (set (q/query infer-parents))))
  (is (= '{emily [{?p bob ?c emily} {?p freya ?c emily}]
           donald [{?p carl ?c donald} {?p alice ?c donald}]}
          <<test-ex1-e>>
          ))
  (is (= '#{[emily child-of bob] [emily child-of freya] [emily nick em] [emily age 23]
            [donald friend emily] [freya mother emily] [freya parent emily] [bob parent emily] [bob father emily]}
         (set
          <<test-ex1-f>>
          ))))
(ns thi.ng.trio.test.query-ex2
  (:require
   [thi.ng.trio.core :as trio]
   [thi.ng.trio.query :as q]
   #?(:clj
      [clojure.test :refer :all]
      :cljs
      [cemerick.cljs.test :refer-macros [is deftest]])))

<<test-ex2-a>>

(deftest text-example
  (is (= '#{{?country brazil, ?city sao-paulo} {?country uk, ?city london, ?region brixton}
            {?country uk, ?city london, ?region camden} {?country uk, ?city london, ?region westminster}
            {?country uk, ?city manchester, ?region ardwick} {?country uk, ?city manchester, ?region bradford}
            {?country uk, ?city manchester, ?region hulme} {?country usa, ?city nyc}}
         (set
          <<test-ex2-b>>
          ))))
(ns thi.ng.trio.test.query-ex3
  (:require
   [thi.ng.trio.core :as trio]
   [thi.ng.trio.query :as q]
   #?(:clj
      [clojure.test :refer :all]
      :cljs
      [cemerick.cljs.test :refer-macros [is deftest]])))

<<test-ex3-a>>

(deftest test-example
  (is (= '#{{?b :b3, ?net 5, ?retail 6.0}
            {?b :b1, ?net 10, ?retail 12.0}}
         (set
          <<test-ex3-b>>
          )))
  (is (= '{:a3 {?avg 20.0, ?max 20, ?num 1}, :a1 {?avg 15.0, ?max 20, ?num 2}}
         <<test-ex3-c>>
         ))
  (is (= '{:a1 {:books #{:b2 :b1}}, :a2 {:books #{:b1 :b3}}, :a3 {:books #{:b4}}}
         <<test-ex3-d>>
         )))