Skip to content

Commit

Permalink
Docs: Drop inline callouts (#1270)
Browse files Browse the repository at this point in the history
Drops the inline callouts from the docs. This is when you write `<1>`
anywhere but the end of a line. Asciidoctor doesn't support them and
we'd very much like to move to Asciidoctor to generate the docs because
it is being actively maintained.
  • Loading branch information
nik9000 authored and jbaiera committed Apr 5, 2019
1 parent cc01f8d commit fcecc7b
Show file tree
Hide file tree
Showing 6 changed files with 149 additions and 104 deletions.
15 changes: 9 additions & 6 deletions docs/src/reference/asciidoc/core/cascading.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,8 @@ Simply hook `EsTap` into the Cascading flow:
----
Tap in = new Lfs(new TextDelimited(new Fields("id", "name", "url", "picture")),
"/resources/artists.dat");
Tap out = new EsTap("radio/artists" <1>, new Fields("name", "url", "picture") <2>);
Tap out = new EsTap("radio/artists", <1>
new Fields("name", "url", "picture")); <2>
new HadoopFlowConnector().connect(in, out, new Pipe("write-to-Es")).complete();
----

Expand Down Expand Up @@ -140,8 +141,8 @@ One can index the data to a different resource, depending on the tuple being rea

[source,java]
----
Tap out = new EsTap("my-collection-{media.type}/doc" <1>,
new Fields("name", "media.type", "year") <2>);
Tap out = new EsTap("my-collection-{media.type}/doc", <1>
new Fields("name", "media.type", "year")); <2>
----

<1> Resource pattern using field `media.type`
Expand All @@ -154,7 +155,7 @@ The functionality is available when dealing with raw JSON as well - in this case
[source,js]
----
{
"media_type":"book",<1>
"media_type":"book", <1>
"title":"Harry Potter",
"year":"2010"
}
Expand All @@ -167,7 +168,8 @@ the `Tap` declaration can be as follows:
----
props.setProperty("es.input.json", "true");
Tap in = new Lfs(new TextLine(new Fields("line")),"/archives/collection.json");
Tap out = new EsTap("my-collection-{media_type}/doc" <1>, new Fields("line") <2>);
Tap out = new EsTap("my-collection-{media_type}/doc", <1>
new Fields("line")); <2>
----

<1> Resource pattern relying on fields _within_ the JSON document and _not_ on the `Tap` schema
Expand All @@ -180,7 +182,8 @@ Just the same, add `EsTap` on the other end of a pipe, to read (instead of writi

[source,java]
----
Tap in = new EsTap("radio/artists/"<1>,"?q=me*"<2>);
Tap in = new EsTap("radio/artists/", <1>
"?q=me*"); <2>
Tap out = new StdOut(new TextLine());
new LocalFlowConnector().connect(in, out, new Pipe("read-from-Es")).complete();
----
Expand Down
23 changes: 11 additions & 12 deletions docs/src/reference/asciidoc/core/hive.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ When using Hive, one can use `TBLPROPERTIES` to specify the <<configuration,conf
CREATE EXTERNAL TABLE artists (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists',
'es.index.auto.create' = 'false') <1>;
'es.index.auto.create' = 'false'); <1>
----

<1> {eh} setting
Expand All @@ -78,12 +78,10 @@ To wit:
CREATE EXTERNAL TABLE artists (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists',
<1>'es.mapping.names' = 'date:@timestamp <2>, url:url_123 <3>');
'es.mapping.names' = 'date:@timestamp, url:url_123'); <1>
----

<1> name mapping for two fields
<2> Hive column `date` mapped in {es} to `@timestamp`
<3> Hive column `url` mapped in {es} to `url_123`
<1> Hive column `date` mapped in {es} to `@timestamp`; Hive column `url` mapped in {es} to `url_123`

TIP: Hive is case **insensitive** while {es} is not. The loss of information can create invalid queries (as the column in Hive might not match the one in {es}). To avoid this, {eh} will always convert Hive column names to lower-case.
This being said, it is recommended to use the default Hive style and use upper-case names only for Hive commands and avoid mixed-case names.
Expand All @@ -102,7 +100,7 @@ CREATE EXTERNAL TABLE artists (
name STRING,
links STRUCT<url:STRING, picture:STRING>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'<1>
TBLPROPERTIES('es.resource' = 'radio/artists'<2>);
TBLPROPERTIES('es.resource' = 'radio/artists'); <2>
-- insert data to Elasticsearch from another table called 'source'
INSERT OVERWRITE TABLE artists
Expand Down Expand Up @@ -148,10 +146,10 @@ IMPORTANT: Make sure the data is properly encoded, in `UTF-8`. The field content

[source,java]
----
CREATE EXTERNAL TABLE json (data STRING<1>)
CREATE EXTERNAL TABLE json (data STRING) <1>
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '...',
'es.input.json` = 'yes'<2>);
'es.input.json` = 'yes'); <2>
...
----

Expand All @@ -170,7 +168,7 @@ CREATE EXTERNAL TABLE media (
type STRING,<1>
year STRING,
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'my-collection-{type}/doc'<2>);
TBLPROPERTIES('es.resource' = 'my-collection-{type}/doc'); <2>
----

<1> Table field used by the resource pattern. Any of the declared fields can be used.
Expand All @@ -195,9 +193,9 @@ the table declaration can be as follows:

[source,sql]
----
CREATE EXTERNAL TABLE json (data STRING<1>)
CREATE EXTERNAL TABLE json (data STRING) <1>
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'my-collection-{media_type}/doc'<2>,
TBLPROPERTIES('es.resource' = 'my-collection-{media_type}/doc', <2>
'es.input.json` = 'yes');
----

Expand All @@ -216,7 +214,8 @@ CREATE EXTERNAL TABLE artists (
name STRING,
links STRUCT<url:STRING, picture:STRING>)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'<1>
TBLPROPERTIES('es.resource' = 'radio/artists'<2>, 'es.query' = '?q=me*'<3>);
TBLPROPERTIES('es.resource' = 'radio/artists', <2>
'es.query' = '?q=me*'); <3>
-- stream data from Elasticsearch
SELECT * FROM artists;
Expand Down
19 changes: 11 additions & 8 deletions docs/src/reference/asciidoc/core/intro/download.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ These are available under the same `groupId`, using an `artifactId` with the pat
----
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-mr<1></artifactId>
<artifactId>elasticsearch-hadoop-mr</artifactId> <1>
<version>{ver}</version>
</dependency>
----
Expand All @@ -40,7 +40,7 @@ These are available under the same `groupId`, using an `artifactId` with the pat
----
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-hive<1></artifactId>
<artifactId>elasticsearch-hadoop-hive</artifactId> <1>
<version>{ver}</version>
</dependency>
----
Expand All @@ -52,7 +52,7 @@ These are available under the same `groupId`, using an `artifactId` with the pat
----
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-pig<1></artifactId>
<artifactId>elasticsearch-hadoop-pig</artifactId> <1>
<version>{ver}</version>
</dependency>
----
Expand All @@ -64,13 +64,16 @@ These are available under the same `groupId`, using an `artifactId` with the pat
----
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20<1>_2.10<2></artifactId>
<artifactId>elasticsearch-spark-20_2.10</artifactId> <1>
<version>{ver}</version>
</dependency>
----

<1> 'spark' artifact. Notice the `-20` part of the suffix which indicates the Spark version compatible with the artifact. Use `20` for Spark 2.0+ and `13` for Spark 1.3-1.6.
<2> Notice the `_2.10` suffix which indicates the Scala version compatible with the artifact. Currently it is the same as the version used by Spark itself.
<1> 'spark' artifact. Notice the `-20` part of the suffix which indicates the
Spark version compatible with the artifact. Use `20` for Spark 2.0+ and `13` for
Spark 1.3-1.6. Notice the `_2.10` suffix which indicates the Scala version
compatible with the artifact. Currently it is the same as the version used by
Spark itself.

The Spark connector framework is the most sensitive to version incompatibilities. For your convenience, a version compatibility matrix has been provided below:
[cols="2,2,10",options="header",]
Expand All @@ -89,7 +92,7 @@ The Spark connector framework is the most sensitive to version incompatibilities
----
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-cascading<1></artifactId>
<artifactId>elasticsearch-hadoop-cascading</artifactId> <1>
<version>{ver}</version>
</dependency>
----
Expand All @@ -114,7 +117,7 @@ in order for the Cascading dependencies to be properly resolved:
----
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-storm<1></artifactId>
<artifactId>elasticsearch-storm</artifactId> <1>
<version>{ver}</version>
</dependency>
----
Expand Down
36 changes: 19 additions & 17 deletions docs/src/reference/asciidoc/core/pig.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,10 @@ With Pig, one can specify the <<configuration,configuration>> properties (as an

[source,sql]
----
STORE B INTO 'radio/artists'<1> USING org.elasticsearch.hadoop.pig.EsStorage
('es.http.timeout = 5m<2>',
'es.index.auto.create = false' <3>);
STORE B INTO 'radio/artists' <1>
USING org.elasticsearch.hadoop.pig.EsStorage
('es.http.timeout = 5m', <2>
'es.index.auto.create = false'); <3>
----

<1> {eh} configuration (target resource)
Expand Down Expand Up @@ -163,12 +164,10 @@ For example:
[source,sql]
----
STORE B INTO '...' USING org.elasticsearch.hadoop.pig.EsStorage(
'<1>es.mapping.names=date:@timestamp<2>, uRL:url<3>')
'es.mapping.names=date:@timestamp, uRL:url') <1>
----

<1> name mapping for two fields
<2> Pig column `date` mapped in {es} to `@timestamp`
<3> Pig column `uRL` mapped in {es} to `url`
<1> Pig column `date` mapped in {es} to `@timestamp`; Pig column `uRL` mapped in {es} to `url`

TIP: Since {eh} 2.1, the Pig schema case sensitivity is preserved to {es} and back.

Expand All @@ -185,11 +184,13 @@ A = LOAD 'src/test/resources/artists.dat' USING PigStorage()
-- transform data
B = FOREACH A GENERATE name, TOTUPLE(url, picture) AS links;
-- save the result to Elasticsearch
STORE B INTO 'radio/artists'<1> USING org.elasticsearch.hadoop.pig.EsStorage(<2>);
STORE B INTO 'radio/artists'<1>
USING org.elasticsearch.hadoop.pig.EsStorage(); <2>
----

<1> {es} resource (index and type) associated with the given storage
<2> additional configuration parameters can be passed here - in this case the defaults are used
<2> additional configuration parameters can be passed inside the `()` - in this
case the defaults are used

For cases where the id (or other metadata fields like +ttl+ or +timestamp+) of the document needs to be specified, one can do so by setting the appropriate <<cfg-mapping, mapping>>, namely +es.mapping.id+. Following the previous example, to indicate to {es} to use the field +id+ as the document id, update the +Storage+ configuration:

Expand Down Expand Up @@ -219,9 +220,9 @@ IMPORTANT: Make sure the data is properly encoded, in `UTF-8`. The field content

[source,sql]
----
A = LOAD '/resources/artists.json' USING PigStorage() AS (json:chararray<1>);"
A = LOAD '/resources/artists.json' USING PigStorage() AS (json:chararray);" <1>
STORE B INTO 'radio/artists'
USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true'<2>...);
USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true'...); <2>
----

<1> Load the (JSON) data as a single field (`json`)
Expand All @@ -235,8 +236,9 @@ One can index the data to a different resource, depending on the 'row' being rea
[source,sql]
----
A = LOAD 'src/test/resources/media.dat' USING PigStorage()
AS (name:chararray, type:chararray <1>, year: chararray);
STORE B INTO 'my-collection-{type}/doc'<2> USING org.elasticsearch.hadoop.pig.EsStorage();
AS (name:chararray, type:chararray, year: chararray); <1>
STORE B INTO 'my-collection-{type}/doc' <2>
USING org.elasticsearch.hadoop.pig.EsStorage();
----

<1> Tuple field used by the resource pattern. Any of the declared fields can be used.
Expand All @@ -262,8 +264,8 @@ the table declaration can be as follows:

[source,sql]
----
A = LOAD '/resources/media.json' USING PigStorage() AS (json:chararray<1>);"
STORE B INTO 'my-collection-{media_type}/doc'<2>
A = LOAD '/resources/media.json' USING PigStorage() AS (json:chararray);" <1>
STORE B INTO 'my-collection-{media_type}/doc' <2>
USING org.elasticsearch.hadoop.pig.EsStorage('es.input.json=true');
----

Expand All @@ -278,8 +280,8 @@ As you would expect, loading the data is straight forward:
[source,sql]
----
-- execute Elasticsearch query and load data into Pig
A = LOAD 'radio/artists'<1>
USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?me*'<2>);
A = LOAD 'radio/artists' <1>
USING org.elasticsearch.hadoop.pig.EsStorage('es.query=?me*'); <2>
DUMP A;
----

Expand Down
Loading

0 comments on commit fcecc7b

Please sign in to comment.