Skip to content

Commit

Permalink
Fixed up some issue around regex predicate #269
Browse files Browse the repository at this point in the history
  • Loading branch information
spmallette committed Jan 29, 2024
1 parent 8d9cac6 commit d78ce30
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 186 deletions.
65 changes: 0 additions & 65 deletions book/Section-Beyond-Basic-Queries.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4273,9 +4273,6 @@ g.V().hasLabel('airport').as('a').values('desc').
[EUX,F. D. Roosevelt Airport]
----

You will see more examples of how to use Lambda expressions with the 'filter' step in
the "<<fuzzyregs>>" section.

[[mapstep]]
Introducing the 'Map' step
^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -4726,68 +4723,6 @@ When run, this time the results do indeed include the edges as well as the verti
[v[3],e[5162][3-route->49],v[49],e[8454][49-route->71],v[71]]
----

[[fuzzyregs]]
Using regular expressions to do fuzzy searches
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let's take a look at one case where use of closures might be helpful. It is a
common requirement when working with any kind of database to want to do some
sort of fuzzy text search or even to search using a regular expression. Gremlin
offers a series of text related predicates for these types of searches. Where
standard predicates are a part of the 'P' enum, the text specific predicates can be
found on the 'TextP' enum.

NOTE: Most TinkerPop enabled graph stores that you are likely to use for any sort of
serious deployment will also be backed by an indexing technology like Solr or
Elasticsearch. In those cases some amount of more sophisticated search methods will
likely be made available to you. You should always check the documentation for the
system you are using to see what is recommended.

So let's look at some examples. First of all, every airport in the air routes
graph contains a description which will be something like 'Dallas Fort Worth
International Airport' in the case of DFW. If we wanted to search the vertices in
the graph for any airport that starts with the letter "D" we could use the
'startingWith' predicate.

[source,groovy]
----
// Airport descriptions starting with 'D' - this is case sensitive
g.V().has('airport', 'desc', TextP.startingWith('D'))
----

NOTE: There is an analogous 'endingWith' predicate for testing the end of a string.

If we wanted to search the vertices in the graph for any airport that has the word
'Dallas' in the description we could use 'TextP.containing'.

[source,groovy]
----
// Airport descriptions containing the word 'Dallas'
g.V().has('airport', 'desc', TextP.containing('Dallas'))
----

Where things get even more interesting is when you want to use a regular
expression as part of a query. Note that the first example below could also be
achieved using a Gremlin 'within' step as it is still really doing exact
string comparisons but it gives us a template for how to write any query
containing a regular expression. The example that follows finds all airports
in cities with names that begin with 'Dal' so it will find Dallas, Dalaman, Dalian,
Dalcahue, Dalat and Dalanzadgad!.

[source,groovy]
----
// Using a filter to search using a regular expression
g.V().has('airport','type','airport').has('city', TextP.regex('~/Dallas|Austin/')).values('code')
// A regular expression to find any airport with a city name that begins with "Dal"
g.V().has('airport','type','airport').has('city', TextP.regex('~/^Dal\w*/')).values('city')
----

Gremlin adheres to the regex syntax prescribed by the Java `Pattern` class documented
at https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
The Java regular expression syntax may be different than the one you are used to so
it is worth taking a few minutes to study the documentation at that URL.

[[graphvars]]
Using graph variables to associate metadata with a graph
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion book/Section-Janus-Graph.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -1370,7 +1370,7 @@ before TinkerPop added them to Gremlin in version 3.6.0. As a result, there are
text predicates that are JanusGraph specific which have similar functionality to the
ones officially exposed by Gremlin itself. This section describes the
JanusGraph-specific text predicates. You can learn more about the official Gremlin
text predicates in the "<<fuzzyregs>>" section.
text predicates in the "<<regex>>" section.

The JanusGraph regular expression predicates recognize the syntax defined as part of
the Java 1.8 Pattern class that is documented at
Expand Down
172 changes: 52 additions & 120 deletions book/Section-Writing-Gremlin-Queries.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -798,7 +798,7 @@ representing the number of runways the second airport has.

It is also possible to use a traversal inside of a 'by' modulator. Such traversals
are known as '"anonymous traversals"' and they are discussed in greater details in
the "<<anonymoustraversals>>" section.
the "<<deepdivetraversals>>" section.

For now, it is enough to know that they allow us to do things like combine multiple
values together as part of a path result. The example below finds five routes that
Expand Down Expand Up @@ -3031,122 +3031,6 @@ g.V().hasId(within(1,2,3))
g.V().hasId(within([1,2,3]))
----

You will find more examples of these types of queries in the next two sections.

[[startswith]]
Using 'between' to simulate 'startsWith'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
One thing that may not be obvious is that when using string values with the 'between'
predicate the values do not have to specify exact matches. Take a look at the query
below. This will find any airports in cities whose names start with '"Dal"' as it
looks for strings between '"Dal"' and '"Dam"' in an inclusive/exclusive fashion. The
rest of the characters following '"Dal"' in the strings being tested are ignored.
Note that this is a case sensitive comparison. In other words '"Dal"' and '"dal"' are
different strings in this context.

TIP: The 'between' predicate can be used to simulate a string 'startsWith' method.

As discussed more in the "<<fuzzyregs>>" section, Gremlin does not currently support
any methods for applying regular expressions or even more basic text analysis
operators to strings. This use of the 'between' predicate can at least be used to
simulate a 'startsWith' type of operator. It is likely that support for additional
text search predicates will appear in future Apache TinkerPop releases.

[source,groovy]
----
g.V().hasLabel('airport').
has('city',between('Dal','Dam')).
values('city')
----

Here are the results from running the query. As you can see every city name starts
with the characters '"Dal"'.

[source,groovy]
----
Dallas
Dallas
Dalaman
Dalian
Dalcahue
Dalat
Dalanzadgad
----

You will notice from the results above that '"Dallas"' appears twice as there are two
airports with that city name. We could add a 'dedup' step to our query to only return
unique matches.

[source,groovy]
----
g.V().hasLabel('airport').
has('city',between('Dal','Dam')).
values('city').dedup()
----

Here are the modified results.

[source,groovy]
----
Dallas
Dalaman
Dalian
Dalcahue
Dalat
Dalanzadgad
----

Here is one more example where the range of values being compared is expanded a
little. This query will find any cities that start with '"Dal"' through '"Dar"'.

[source,groovy]
----
g.V().hasLabel('airport').
has('city',between('Dal','Dat')).
values('city').order().dedup()
----

As you can see this time, more cities met our search criteria.

[source,groovy]
----
Dalaman
Dalanzadgad
Dalat
Dalcahue
Dalian
Dallas
Damascus
Dandong
Dangriga
Daocheng
Daqing Shi
Dar es Salaam
Daru
Darwin
----

If you wanted to find strings that begin with a single character you can achieve
that as follows.

[source,groovy]
----
g.V().has('airport','code',between('X','Xa')).
values('code').fold()
----

When run, the query returns all airports with codes that start with the letter '"X"'.

[source,groovy]
----
[XNA,XMN,XRY,XIY,XUZ,XSB,XCH,XIL,XFN,XNN,XGR,XFW,XCR,XSC,XQP,XMH,XBJ,XAP,XMS,XKH,XIC,XTG,XKS,XBE,XTO]
----

While Gremlin does not currently provide any advanced text searching capabilities,
graph systems such as JanusGraph do offer such capabilities. Those features are
discussed in the "<<janpred>>" section.


[[winout]]
Refining flight routes analysis using 'not', 'neq', 'within' and 'without'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -3468,9 +3352,11 @@ Apache TinkerPop documentation here: http://tinkerpop.apache.org/docs/current/re
|startingWith | Match text that starts with the given character(s)
|endingWith | Match text that ends with the given character(s)
|containing | Match text that contains the given character(s)
|regex | Match text using a regular expression
|notStartingWith | Match text that does not start with the given character(s)
|notEndingWith | Match text that does not end with the given character(s)
|notContaining | Match text that does not contain the given character(s)
|notRegex | Match text that does not match a regular expression
|==============================================================================

In the sections below you will find examples of each predicate being used. Each
Expand All @@ -3479,6 +3365,12 @@ insensitive search you can chain multiple steps together combined by an 'or' ste

NOTE: All of these predicates are *_case sensitive_*.

NOTE: Most TinkerPop enabled graph stores that you are likely to use for any sort of
serious deployment will also be backed by an indexing technology like Solr or
Elasticsearch. In those cases some amount of more sophisticated search methods will
likely be made available to you. You should always check the documentation for the
system you are using to see what is recommended.

These predicates add to the existing Gremlin predicates that we looked at in the
"<<tranges>>" section.

Expand Down Expand Up @@ -3607,6 +3499,34 @@ Yongzhou
Yangzho
----

[[regex]]
regex
^^^^^

For more advanced text matching scenarios you can use regular expressions as part of
a query. Note that the first example below could also be achieved using a Gremlin
'within' step as it is still really doing exact string comparisons but it gives us a
template for how to write any query containing a regular expression. The example that
follows finds all airports in cities with names that begin with 'Dal' so it will find
Dallas, Dalaman, Dalian, Dalcahue, Dalat and Dalanzadgad!.

[source,groovy]
----
// Using a filter to search using a regular expression
g.V().has('airport','type','airport').
has('city', TextP.regex('Dallas|Austin')).values('code')
// A regular expression to find any airport with a city name that begins with "Dal"
g.V().has('airport','type','airport').
has('city', TextP.regex('^Dal\\w*')).values('city')
----

Gremlin adheres to the regex syntax prescribed by the Java `Pattern` class documented
at https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html
The Java regular expression syntax may be different than the one you are used to so
it is worth taking a few minutes to study the documentation at that URL.

[[notStartingWith]]
notStartingWith
^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -3635,7 +3555,6 @@ g.V().hasLabel('airport').
3367
----


[[notEndingWith]]
notEndingWith
^^^^^^^^^^^^^
Expand All @@ -3651,12 +3570,10 @@ g.V().hasLabel('airport').
3373
----


[[notContaining]]
notContaining
^^^^^^^^^^^^^


The query below counts the number of cities that do not contain the string "berg" in
their name.

Expand Down Expand Up @@ -3697,6 +3614,21 @@ Osh
Kyzyl
----

[[notRegex]]
notRegex
^^^^^^^^

The 'notRegex' predicate is mostly present for Gremlin language symmetry as regular
expressions can naturally express negations itself.


[source,groovy]
----
// A regular expression to find airports not in Dallas or Austin
g.V().has('airport','type','airport').
has('city', TextP.notRegex('Dallas|Austin')).values('code')
----

[[sort]]
Sorting things - introducing 'order'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down

0 comments on commit d78ce30

Please sign in to comment.