Skip to content

Commit

Permalink
Update and add new public articles
Browse files Browse the repository at this point in the history
  • Loading branch information
jexp committed Sep 15, 2020
1 parent 65f30ad commit 95268f0
Show file tree
Hide file tree
Showing 19 changed files with 1,140 additions and 95 deletions.
36 changes: 36 additions & 0 deletions articles/add-a-neo4j-instance-to-an-embedded-ha-application.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
= Add a Neo4j instance to a running embedded HA application
:slug: add-a-neo4j-instance-to-an-embedded-ha-application
:author: Vivek Saran
:neo4j-versions: 3.4,3.5
:tags: embedded, ha
:category: cluster

There are situations when we would like to use the Neo4j Browser to access an embedded HA cluster.

The documented approach to accomplish that goal requires changing the embedded application code as described in the Neo4j documentation:

https://neo4j.com/docs/java-reference/3.4/tutorials-java-embedded/#tutorials-java-embedded-bolt/[Accessing Neo4j embedded via the Bolt protocol]

There is another approach that does not require changing application code. This second approach involves adding an additional server mode instance to the cluster. Here are the steps:

- Install a new Neo4j Enterprise server mode instance (using tarball/zip or other means) with the same version of Neo4j that is operational in the embedded application.
- Edit the `neo4j.conf` file and update the following parameters:
[source,conf]
----
dbms.mode=HA
ha.server_id=20 # this is just a high number to easily identify that it is a server instance
ha.slave_only=true
ha.initial_hosts=<initial hosts to only include the members of the embedded HA cluster>
----

We don't need to add the new server instance to `ha.initial_hosts` unless it is going to be a permanent fixture in the cluster, and will be required for the cluster to startup. We have set `ha.slave_only=true` because we want to prevent this instance from becoming the Master.
It will, however, be able to accept writes.

- Start Neo4j on the new instance.
During startup, the new instance will connect to one of the initial hosts and request to join the cluster. Part of the startup process is to discover who else is in the cluster, including instances that may not be in the inital_hosts that joined later, and also to know if any members from the inital_hosts have failed.

We now have a Neo4j instance where we can use the Neo4j Browser to communicate with the graph database as we do in a conventional server mode HA cluster.


91 changes: 91 additions & 0 deletions articles/arcgis-geocoding.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
= Geocoding with Arcgis
:slug: geocoding-with-arcgis
:author: Davids Pecollet
:neo4j-versions: 3.5, 4.0
:tags: cypher, configuration
:category: geospatial


== Prerequisites

* Create/obtain an Arcgis account.
* Create application within your account. The application will be assigned a 'client_id' and 'secret'.

=== APOC
The APOC library provides a `apoc.spatial.geocode('address')` procedure (as well as `reverseGeocode`), that supports geocoding against OpenStreetMap and Google Maps.
It also supports other providers (ex: opencage) with a more explicit configuration of the API call (in neo4j.conf) :

[source,conf]
----
apoc.spatial.geocode.provider=opencage
apoc.spatial.geocode.opencage.key=<api_key>
apoc.spatial.geocode.opencage.url=http://api.opencagedata.com/geocode/v1/json?q=PLACE&key=KEY
apoc.spatial.geocode.opencage.reverse.url=http://api.opencagedata.com/geocode/v1/json?q=LAT+LNG&key=KEY
----

with *KEY* gets replaced by the API key, and *PLACE* by the address to geocode at run time (resp. *LAT*/*LNG* by the coordinates to reverse geocode).

For Arcgis, the key would be the application token, and the url would look like that (that's the public Arcgis API endpoint) :
apoc.spatial.geocode.arcgis.url=https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&outFields=Match_addr,Addr_typ&singleLine=PLACE&token=KEY


Unfortunately, the apoc procedures expect the json response from the provider to contain a list of "results".
That is not the case for the Arcgis API endpoints :

* endpoint 'findAddressCandidates' returns a list of "candidates"
* the other bulk geocoding endpoint 'geocodeAddresses' returns a list of "locations"

So the apoc.spatial procedures can't help here.

=== Workaround using apoc.load.json

The apoc.load.json procedure lets you call any HTTP/REST API and process the response directly in cypher.

You can use the apoc.static procedures to read the API key and URL from neo4j.conf, similarly to what apoc.spatial.geocode does.
The two following properties would be required for geocoding (this is using the public arcgis server ; replace with your own Arcgis server hostname if necessary) :

[source,conf]
----
apoc.static.arcgis.key=<arcgis_token>
apoc.static.arcgis.geocode_url=https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/findAddressCandidates?f=json&outFields=Match_addr,Addr_typ&singleLine=
----

Then run the following cypher query :

[source,cypher]
----
WITH 'Statue of Liberty, Liberty Island New York, NY 10004' as address
//get the configuration properties
CALL apoc.static.getAll("arcgis") yield value AS arcgis
//build the URL
WITH arcgis.geocode_url+apoc.text.urlencode(address)+'&token='+ arcgis.key as url
//extract the top result
CALL apoc.load.json(url, '$.candidates[0].location') YIELD value as location
return location.x, location.y
----

== Temporary Tokens

Arcgis application token may be temporary (by default 2h). That means you may not be able to hardcode a token in your neo4j.conf.
To obtain a new token, you're supposed to call the Authentication API with your application credentials.
You can use `apoc.load.json` again to do that in cypher.

In neo4j.conf, add the building bricks of the token API call:

[source,conf]
----
apoc.static.arcgis.client_id=<application_client_id>
apoc.static.arcgis.client_secret=<secret>
apoc.static.arcgis.token_url=https://www.arcgis.com/sharing/rest/oauth2/token?grant_type=client_credentials
----

And run the following cypher query :

[source,cypher]
----
CALL apoc.static.getAll("arcgis") yield value AS arcgis
WITH arcgis.token_url+'&client_id='+arcgis.client_id+'&client_secret='+arcgis.client_secret as tokenUrl
CALL apoc.load.json(tokenUrl) YIELD value as tokenResponse
WITH tokenResponse.access_token as token
//proceed with geocoding using 'token'
----
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,12 @@
:neo4j-versions: 3.0, 3.1, 3.2, 3.3, 3.4, 3.5
:tags: storage, disk, filesystem, unix, operations

The short answer is no.
The short answer is no.
Although this may seem harmless, the reason for this is not performance related, but rather for control over locking files.

NFS and other filesystems that don't offer locking should not be used to install Neo4j or store the datastore.
If we can't lock the store files, others can concurrently access them, resulting in corruption.

Refer to the Neo4j documentation on recommended filesystem storage formats:

http://neo4j.com/docs/stable/deployment-requirements.html#_filesystem[]


* Last Modified: {docdatetime} by {author}.
* Relevant for Neo4j Versions: {neo4j-versions}.
* Relevant keywords {tags}.
https://neo4j.com/docs/operations-manual/current/installation/requirements/#deployment-requirements-software
126 changes: 120 additions & 6 deletions articles/conditional-cypher-execution.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,133 @@
:author: Andrew Bowman
:category: cypher
:tags: cypher, conditional, apoc
:neo4j-versions: 3.1, 3.2, 3.3, 3.4, 3.5
:neo4j-versions: 3.1, 3.2, 3.3, 3.4, 3.5, 4.1

At some point you're going to write a Cypher query requiring some conditional logic, where you want different Cypher statements executed depending on the case.

At this point in time Cypher does not include native conditional functionality to address this case, but there are some workarounds that can be used.

This article covers the ways you can perform conditional Cypher execution.

.First a note on CASE
== First a note on CASE

The CASE expression does some conditional logic, but the logic can only be used to output an expression. It cannot be used to conditionally execute Cypher clauses.


== Using correlated subqueries in 4.1

Neo4j 4.1 allows correlated subqueries, letting us perform a subquery using variables present mid-query.
By combining subquery usage with filtering, we can use subqueries to implement conditional Cypher execution.

This requires the use of `WITH` as the first clause within the subquery CALL block, for the purpose of importing variables to the subquery.

This import usage has some special restrictions that do not normally apply to `WITH` usage:

1. You may only include variables from the outer query and no others.
You cannot perform calculations, aggregations, or introduction of new variables in the initial `WITH`.
2. You cannot alias any variables within this initial `WITH`.
3. You cannot follow the initial `WITH` with a `WHERE` clause for filtering.

If you try any of these, you will be met with some kind of error, such as:

```
Importing WITH should consist only of simple references to outside variables. Aliasing or expressions are not supported.
```
or more cryptically, if you try to use a `WHERE` clause after the initial `WITH`

```
Variable `x` not defined
```
(where the variable is the first one present in the `WITH` clause)

You can get around all of these restrictions by simply introducing an additional `WITH` clause after the importing `WITH`, like so:

[source,cypher]
----
MATCH (bruce:Person {name:'Bruce Wayne'})
CALL {
WITH bruce
WITH bruce
WHERE bruce.isOrphan
MERGE (batman:Hero {name:'Batman'})
CREATE (bruce)-[:SuperheroPersona]->(batman)
WITH count(batman) as count
RETURN count = 1 as isBatman
}
RETURN isBatman
----

This demonstrates the ability to filter on imported variables to the subquery by adding a second `WITH` clause, which is not restricted in the same way as the initial `WITH` used for the import into the subquery.

=== The subquery must return a row for the outer query to continue

Subqueries are not independent of the outer query, and if they don't yield any rows, the outer query won't have any rows to continue execution.

This can be a problem with conditional Cypher, since by definition you are evaluating a condition as a filter to figure out whether to do something or not.

If that conditional evaluates to false, then the row is wiped out, which is often fine within the subquery itself (you don't want to create Batman if Bruce isn't an orphan yet),
but you usually want to continue execution no matter what happened in the subquery, and maybe return some boolean value for whether or not the conditional succeeded.

There are some workarounds to avoid having the row wiped out.

=== Use a standalone aggregation to restore a row before the subquery return

An aggregation (such as `count()`), when there are no other non-aggregation variables present to act as a grouping key,
can restore a row even if the row has been wiped out.

This is because it is valid to get the `count()` of 0 rows, or to do a `collect()` over 0 rows to produce an empty collection.

Again, there must be no other non-aggregation variables present when you perform this aggregation.

In the above example, we are using this technique in the subquery so that the outer query can continue no matter how the conditional evaluates:

[source,cypher]
----
WITH count(batman) as count
RETURN count = 1 as isBatman
----

With that `count()` we will get 0 or 1 no matter how the query evaluated, allowing us a row to continue execution when the subquery finishes.



=== Use a UNION subquery to cover all possible conditionals

We can instead use a UNION within a subquery, where the set of all the unioned queries covers all possible conditional outcomes.
This ensures there will be an execution path that succeeds and will return a row, allowing the outer query to continue.

This is also useful for keeping the equivalent of if/else or case logic together, as otherwise you would have to use separate subqueries per conditional block.

With this approach you no longer have to use aggregation to ensure rows remain, you just need to make sure at least one of the UNIONed queries will succeed no matter what.


[source,cypher]
----
MATCH (bruce:Person {name:'Bruce Wayne'})
CALL {
WITH bruce
WITH bruce
WHERE bruce.isOrphan
MERGE (batman:Hero {name:'Batman'})
CREATE (bruce)-[:SuperheroPersona]->(batman)
RETURN true as isBatman
UNION
WITH bruce
WITH bruce
WHERE NOT coalesce(bruce.isOrphan, false)
SET bruce.name = 'Bruce NOT BATMAN Wayne'
RETURN false as isBatman
}
RETURN isBatman
----

Note that we have to use the import `WITH` for each of the UNIONed queries, to ensure each of them imports variables from the outer query,
and we still must use a second `WITH` to allow us to filter.

Since there is no limit to the number of queries that can be unioned together, you can use this approach to handle multiple conditional evaluations.

== Using FOREACH for write-only Cypher

The FOREACH clause can be used to perform the equivalent of an IF conditional, with the restriction that only write clauses are used (MERGE, CREATE, DELETE, SET, REMOVE).
Expand Down Expand Up @@ -54,7 +168,7 @@ There are two types of procedures:
Only the first condition that evaluates to true will execute its associated query. If no condition is true, then an else query can be supplied as a default.
Cannot write to the graph.

.Read and write variations
=== Read and write variations

The procedures shown above have read permission only, they are not allowed to write to the graph, and so if there are any write operations
in the conditional Cypher within, the query will error out.
Expand Down Expand Up @@ -89,14 +203,14 @@ As such, be careful to properly handle quotes within your query string. If the q

Using these procedures can be tricky. Here are some more tips to help avoid the most common tripping points.

.Dealing with quotes/escapes in complex nested queries
=== Dealing with quotes/escapes in complex nested queries

For more complicated queries (such as nested queries that must handle quotes at multple levels),
consider either defining the query string as a variable first, then pass the variable into the procedure,
or alternately pass the conditional queries as parameters to the query itself.
This might save you from the headaches of dealing with escape characters within Java strings.

.Pass parameters that must be visible within the conditional queries
=== Pass parameters that must be visible within the conditional queries

When executed, the conditional Cypher queries do not have visibility to the variables outside of the CALL.

Expand All @@ -112,7 +226,7 @@ CALL apoc.do.when(bruceWayne.isOrphan, "MERGE (batman:Hero {name:'Batman'}) CREA
The params map is the last argument of the call: `{bruce:bruceWayne}`, and allows the `bruceWayne` variable to be visible to any of the conditional queries as `bruce`.
Additional parameters can be added to the params map if needed.

.Conditional queries must RETURN something if you want to keep executing the query after the CALL
=== Conditional queries must RETURN something if you want to keep executing the query after the CALL

Currently, when a (non-empty) conditional query is executed, and the query doesn't RETURN anything, nothing is YIELDed for the row,
wiping out the row. For that original row, anything after the CALL is now a no-op, since there is no longer a row to execute upon (Cypher operations execute per row).
Expand Down
Loading

0 comments on commit 95268f0

Please sign in to comment.