Skip to content

Commit

Permalink
update(site): fix typo and add processing filter examples
Browse files Browse the repository at this point in the history
  • Loading branch information
fhussonnois committed May 24, 2020
1 parent be0d45a commit eac35f6
Show file tree
Hide file tree
Showing 4 changed files with 195 additions and 86 deletions.
147 changes: 80 additions & 67 deletions site/content/en/docs/Developer Guide/accessing-data-and-metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,91 +11,104 @@ Some filters (e.g : [AppendFilter](#appendfilter)) can be configured using *Simp

*Simple Connect Expression Language* (ScEL for short) is an expression language based on regex that allows quick access and manipulating record fields and metadata.

The syntax to define an expression is of the form : "`{% raw %}{{ <expression string> }}{% endraw %}`".
The syntax to define an expression is of the form : "`{{ <expression string> }}`".

Note : In some situation double brackets can be omitted if the expression is used to write a value into a target field.
{{% alert title="Note" color="info" %}}
In some situation double brackets can be omitted if the expression is used to write a value into a target field.
{{% /alert %}}

ScEL supports the following functionality :
ScEL supports the following capabilities :

* Field Selector
* Nested Navigation
* String substitution
* Functions
* **Field Selector**
* **Nested Navigation**
* **String substitution**
* **Functions**

## Field Selector

The expression language can be used to easily select one field from the input record : "`{% raw %}{{ username }}{% endraw %}`."
The expression language can be used to easily select one field from the input record :

"`{{ username }}`"

## Nested Navigation

To navigate down a struct value, just use a period to indicate a nested field value : "`{% raw %}{{ address.city }}{% endraw %}`."
To navigate down a struct value, just use a period to indicate a nested field value :

"`{{ address.city }}`"

## String substitution

The expression language can be used to easily build a new string field that concatenate multiple ones : "`{% raw %}{{ <expression one> }}-{{ <expression two>}}{% endraw %}`."
The expression language can be used to easily build a new string field that concatenate multiple ones :

"`{{ <expression one> }}-{{ <expression two>}}`"

## Functions
## Built-in Functions

ScEL supports a number of predefined functions that can be used to apply a single transformation on a field.

| Function | Description | Syntax |
| ---------------| --------------|-----------|
| `contains` | Returns `true` if an array field's value contains the specified value | `{% raw %}{{ contains(array, value) }}{% endraw %}` |
| `converts` | Converts a field'value into the specified type | `{% raw %}{{ converts(field, INTEGER) }}{% endraw %}` |
| `ends_with` | Returns `true` if an a string field's value end with the specified string suffix | `{% raw %}{{ ends_with(field, suffix) }}{% endraw %}` |
| `equals` | Returns `true` if an a string or number fields's value equals the specified value | `{% raw %}{{ equals(field, value) }}{% endraw %}` |
| `exists` | Returns `true` if an the specified field exists | `{% raw %}{{ ends_with(field, value) }}{% endraw %}` |
| `extract_array`| Returns the element at the specified position of the specified array | `{% raw %}{{extract_array(array, 0) }}{% endraw %}` |
| `is_null` | Returns `true` if a field's value is null | `{% raw %}{{ is_null(field) }}{% endraw %}` |
| `length` | Returns the number of elements into an array of the length of an string field | `{% raw %}{{ length(array) }}{% endraw %}` |
| `lowercase` | Converts all of the characters in a string field's value to lower case | `{% raw %}{{ lowercase(field) }}{% endraw %}` |
| `matches` | Returns `true` if a field's value match the specified regex | `{% raw %}{{ matches(field, regex) }}{% endraw %}` |
| `nlv` | Sets a default value if a field's value is null | `{% raw %}{{ length(array) }}{% endraw %}` |
| `replace_all ` | Replaces every subsequence of the field's value that matches the given pattern with the given replacement string. | `{% raw %}{{ replace_all(field, regex, replacement) }}{% endraw %}` |
| `starts_with` | Returns `true` if an a string field's value start with the specified string prefix | `{% raw %}{{ starts_with(field, prefix) }}{% endraw %}` |
| `trim` | Trims the spaces from the beginning and end of a string. | `{% raw %}{{ trim(field) }}{% endraw %}` |
| `uppercase` | Converts all of the characters in a string field's value to upper case | `{% raw %}{{ uppercase(field) }}{% endraw %}` |


ScEL supports nested functions. For example this expression replace all whitespace characters after transforming our field's value into lowercase.
| `contains` | Returns `true` if an array field's value contains the specified value | `{{ contains(array, value) }}` |
| `converts` | Converts a field'value into the specified type | `{{ converts(field, INTEGER) }}` |
| `ends_with` | Returns `true` if an a string field's value end with the specified string suffix | `{{ ends_with(field, suffix) }}` |
| `equals` | Returns `true` if an a string or number fields's value equals the specified value | `{{ equals(field, value) }}` |
| `exists` | Returns `true` if an the specified field exists | `{{ ends_with(field, value) }}` |
| `extract_array`| Returns the element at the specified position of the specified array | `{{extract_array(array, 0) }}` |
| `is_null` | Returns `true` if a field's value is null | `{{ is_null(field) }}` |
| `length` | Returns the number of elements into an array of the length of an string field | `{{ length(array) }}` |
| `lowercase` | Converts all of the characters in a string field's value to lower case | `{{ lowercase(field) }}` |
| `matches` | Returns `true` if a field's value match the specified regex | `{{ matches(field, regex) }}` |
| `nlv` | Sets a default value if a field's value is null | `{{ length(array) }}` |
| `replace_all ` | Replaces every subsequence of the field's value that matches the given pattern with the given replacement string. | `{{ replace_all(field, regex, replacement) }}` |
| `starts_with` | Returns `true` if an a string field's value start with the specified string prefix | `{{ starts_with(field, prefix) }}` |
| `trim` | Trims the spaces from the beginning and end of a string. | `{{ trim(field) }}` |
| `uppercase` | Converts all of the characters in a string field's value to upper case | `{{ uppercase(field) }}` |


In addition, ScEL supports nested functions.

For example, the following expression is used to replace all whitespace characters after transforming our field's value into lowercase.

```
{% raw %}{{ replace_all(lowercase(field), \\s, -)}}{% endraw %}
{{ replace_all(lowercase(field), \\s, -)}}
```

**Limitations** :
{{% alert title="Limitation" color="warning" %}}
Currently, FilePulse does not support user-defined functions (UDFs). So you cannot register your own functions to enrich the expression language.
{{% /alert %}}

* Currently, this is not possible to register user-defined functions (UDFs).

## Scopes


In previous section, we have demonstrated how to use the expression language to select a specific field. The selected field was part of our the current record being processed.
In previous section, we have shown how to use the expression language to select a specific field.
The selected field was part of our the current record being processed.

Actually, ScEL allows you to get access to additional fields through the used of scopes. Basically, a scope defined the root object on which a selector expression must evaluated.
Actually, ScEL allows you to get access to additional fields through the used of scopes.
Basically, a scope defined the root object on which a selector expression must evaluated.

The syntax to define an expression with a scope is of the form : "`{% raw %}{{ $scope.<selector expression string> }}{% endraw %}`".
The syntax to define an expression with a scope is of the form : "`{{ $<scope>.<selector expression string> }}`".

By default, if no scope is defined in the expression, the scope `$value` is implicitly used.

ScEL supports a number of predefined scopes that can be used for example :

- To override the output topic.
- To define record the key to be used.
- To get access to the source file metadata.
- **To override the output topic.**
- **To define record the key to be used.**
- **To get access to the source file metadata.**
- Etc.

| Scope | Description | Type |
|--- | --- |--- |
| `{% raw %}{{ $headers }}{% endraw %}` | The record headers | - |
| `{% raw %}{{ $key }}{% endraw %}` | The record key | `string` |
| `{% raw %}{{ $metadata }}{% endraw %}` | The file metadata | `struct` |
| `{% raw %}{{ $offset }}{% endraw %}` | The offset information of this record into the source file | `struct` |
| `{% raw %}{{ $system }}{% endraw %}` | The system environment variables and runtime properties | `struct` |
| `{% raw %}{{ $timestamp }}{% endraw %}` | The record timestamp | `long` |
| `{% raw %}{{ $topic }}{% endraw %}` | The output topic | `string` |
| `{% raw %}{{ $value }}{% endraw %}` | The record value| `struct` |
| `{% raw %}{{ $variables }}{% endraw %}` | The contextual filter-chain variables| `map[string, object]` |
| `{{ $headers }}` | The record headers | - |
| `{{ $key }}` | The record key | `string` |
| `{{ $metadata }}` | The file metadata | `struct` |
| `{{ $offset }}` | The offset information of this record into the source file | `struct` |
| `{{ $system }}` | The system environment variables and runtime properties | `struct` |
| `{{ $timestamp }}` | The record timestamp | `long` |
| `{{ $topic }}` | The output topic | `string` |
| `{{ $value }}` | The record value| `struct` |
| `{{ $variables }}` | The contextual filter-chain variables| `map[string, object]` |

Note, that in case of failures more fields are added to the current filter context (see : [Handling Failures](./handling-failures)

Expand All @@ -113,55 +126,55 @@ The scope `metadata` allows read access to information about the file being proc

| Predefined Fields (ScEL) | Description | Type |
|--- | --- |--- |
| `{% raw %}{{ $metadata.name }}{% endraw %}` | The file name | `string` |
| `{% raw %}{{ $metadata.path }}{% endraw %}` | The file directory path | `string` |
| `{% raw %}{{ $metadata.absolutePath }}{% endraw %}` | The file absolute path | `string` |
| `{% raw %}{{ $metadata.hash }}{% endraw %}` | The file CRC32 hash | `int` |
| `{% raw %}{{ $metadata.lastModified }}{% endraw %}` | The file last modified time. | `long` |
| `{% raw %}{{ $metadata.size }}{% endraw %}` | The file size | `long` |
| `{% raw %}{{ $metadata.inode }}{% endraw %}` | The file Unix inode | `long` |
| `{{ $metadata.name }}` | The file name | `string` |
| `{{ $metadata.path }}` | The file directory path | `string` |
| `{{ $metadata.absolutePath }}` | The file absolute path | `string` |
| `{{ $metadata.hash }}` | The file CRC32 hash | `int` |
| `{{ $metadata.lastModified }}` | The file last modified time. | `long` |
| `{{ $metadata.size }}` | The file size | `long` |
| `{{ $metadata.inode }}` | The file Unix inode | `long` |

## Record Offset

The scope `offset` allows read access to information about the original position of the record into the source file.
The available fields depend of the configured FilteInputRecord.
The available fields depend of the configured FileInputRecord.

| Predefined Fields (ScEL) | Description | Type |
|--- | --- |--- |
| `{% raw %}{{ $offset.timestamp }}{% endraw %}` | The creation time of the record (millisecond) | `long` |
| `{{ $offset.timestamp }}` | The creation time of the record (millisecond) | `long` |

Information only available if `RowFilterReader` is configured.

| Predefined Fields (ScEL) | Description | Type |
|--- | --- |--- |
| `{% raw %}{{ $offset.startPosition }}{% endraw %}` | The start position of the record into the source file | `long` |
| `{% raw %}{{ $offset.endPosition }}{% endraw %}` | The end position of the record into the source file | `long` |
| `{% raw %}{{ $offset.size }}{% endraw %}` | The size in bytes | `long` |
| `{% raw %}{{ $offset.row }}{% endraw %}` | The row number of the record into the source | `long` |
| `{{ $offset.startPosition }}` | The start position of the record into the source file | `long` |
| `{{ $offset.endPosition }}` | The end position of the record into the source file | `long` |
| `{{ $offset.size }}` | The size in bytes | `long` |
| `{{ $offset.row }}` | The row number of the record into the source | `long` |

Information only available if `BytesArrayInputReader` is configured.

| Predefined Fields (ScEL) | Description | Type |
|--- | --- |--- |
| `{% raw %}{{ $offset.startPosition }}{% endraw %}` | The start position of the record into the source file (always equals to 0) | `long` |
| `{% raw %}{{ $offset.endPosition }}{% endraw %}` | The end position of the record into the source file (equals to the file size) | `long` |
| `{{ $offset.startPosition }}` | The start position of the record into the source file (always equals to 0) | `long` |
| `{{ $offset.endPosition }}` | The end position of the record into the source file (equals to the file size) | `long` |

Information only available if `AvroFilterInputReader` is configured.

| Predefined Fields (ScEL) | Description | Type |
|--- | --- |--- |
| `{% raw %}{{ $offset.blockStart }}{% endraw %}` | The start position of the current block | `long` |
| `{% raw %}{{ $offset.position }}{% endraw %}` | The position into the current block. | `long` |
| `{% raw %}{{ $offset.records }}{% endraw %}` | The number of record read into the current block. | `long` |
| `{{ $offset.blockStart }}` | The start position of the current block | `long` |
| `{{ $offset.position }}` | The position into the current block. | `long` |
| `{{ $offset.records }}` | The number of record read into the current block. | `long` |

## System

The scope `system` allows read access to system environment variables and runtime properties.

| Predefined Fields (ScEL) | Description | Type |
|--- | --- |--- |
| `{% raw %}{{ $system.env }}{% endraw %}` | The system environment variables. | `map[string, string]` |
| `{% raw %}{{ $system.props }}{% endraw %}` | The system environment properties. | `map[string, string]` |
| `{{ $system.env }}` | The system environment variables. | `map[string, string]` |
| `{{ $system.props }}` | The system environment properties. | `map[string, string]` |

## Timestamp

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ For example, the below filter will only be applied on message having a log messa

```
filters.TagSecurityException.type=io.streamthoughts.kafka.connect.filepulse.filter.AppendFilter
filters.TagSecurityException.if={% raw %}{{ contains(data.logmessage, BadCredentialsException) }}{% endraw %}
filters.TagSecurityException.if={{ contains(data.logmessage, BadCredentialsException) }}
filters.TagSecurityException.invert=false
filters.TagSecurityException.field=tags
filters.TagSecurityException.values=SecurityAlert
Expand Down
Loading

0 comments on commit eac35f6

Please sign in to comment.