Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mappers): Stream name can now be accessed in stream maps #2699

46 changes: 26 additions & 20 deletions docs/stream_maps.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,30 +228,29 @@ can be referenced directly by mapping expressions.

#### Built-In Functions

- [`md5()`](inv:python:py:module:#hashlib) - returns an inline MD5 hash of any string, outputting
the string representation of the hash's hex digest.
- This is defined by the SDK internally with native python:
[`hashlib.md5(<input>.encode("utf-8")).hexdigest()`](inv:python:py:method:#hashlib.hash.hexdigest).
- [`datetime`](inv:python:py:module:#datetime) - This is the datetime module object from the Python
standard library. You can access [`datetime.datetime`](inv:python:py:class:#datetime.datetime),
[`datetime.timedelta`](inv:python:py:class:#datetime.timedelta), etc.
- [`json`](inv:python:py:module:#json) - This is the json module object from the Python standard
library. Primarily used for calling [`json.dumps()`](inv:python:py:function:#json.dumps)
and [`json.loads()`](inv:python:py:function:#json.loads).
The following functions and namespaces are available for use in mapping expressions:

| Function | Description |
| :------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`md5()`](inv:python:py:module:#hashlib) | Returns an inline MD5 hash of any string, outputting the string representation of the hash's hex digest. This is defined by the SDK internally with native python: [`hashlib.md5(<input>.encode("utf-8")).hexdigest()`](inv:python:py:method:#hashlib.hash.hexdigest). |
| [`datetime`](inv:python:py:module:#datetime) | This is the datetime module object from the Python standard library. You can access [`datetime.datetime`](inv:python:py:class:#datetime.datetime), [`datetime.timedelta`](inv:python:py:class:#datetime.timedelta), etc. |
| [`json`](inv:python:py:module:#json) | This is the json module object from the Python standard library. Primarily used for calling [`json.dumps()`](inv:python:py:function:#json.dumps) and [`json.loads()`](inv:python:py:function:#json.loads). |

#### Built-in Variable Names

- `config` - a dictionary with the `stream_map_config` values from settings. This can be used
to provide a secret hash seed, for instance.
- `record` - an alias for the record values dictionary in the current stream.
- `_` - same as `record` but shorter to type
- `self` - the existing property value if the property already exists
- `fake` - a [`Faker`](inv:faker:std:doc#index) instance, configurable via `faker_config`
(see previous example) - see the built-in [standard providers](inv:faker:std:doc#providers)
for available methods
The following variables are available in the context of a mapping expression:

| Variable | Description |
| :---------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `config` | A dictionary with the `stream_map_config` values from settings. This can be used to provide a secret hash seed, for instance. |
| `record` | An alias for the record values dictionary in the current stream. |
| `_` | Same as `record` but shorter to type. |
| `self` | The existing property value if the property already exists. |
| `fake` | A [`Faker`](inv:faker:std:doc#index) instance, configurable via `faker_config` (see previous example) - see the built-in [standard providers](inv:faker:std:doc#providers) for available methods. |
| `__stream_name__` | The name of the stream. Useful when [applying the same transformation to multiple streams](#applying-a-mapping-across-two-or-more-streams). |

```{tip}
The `fake` object is only available if the plugin specifies `faker` as an additional dependency (through the `singer-sdk` `faker` extra, or directly).
To use the `fake` object, the `faker` library must be installed.
```

:::{versionadded} 0.35.0
Expand All @@ -266,10 +265,17 @@ The `Faker` class.
The `Faker` class was deprecated in favor of instance methods on the `fake` object.
:::

:::{versionadded} 0.42.0
The `__stream_name__` variable.
:::

#### Built-in Alias Variable Names

The following variables are available in the context of the `__alias__` expression:
- `__stream_name__` - the existing stream name

| Variable | Description |
| :---------------- | :----------------------- |
| `__stream_name__` | The existing stream name |

:::{versionadded} 0.42.0
The `__stream_name__` variable.
Expand Down
1 change: 1 addition & 0 deletions singer_sdk/mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -337,6 +337,7 @@ def _eval(
names["_"] = record # Add a shorthand alias in case of reserved words in names
names["record"] = record # ...and a longhand alias
names["config"] = self.map_config # Allow map config access within transform
names["__stream_name__"] = self.stream_alias # Access stream name in transform

if self.fake:
from faker import Faker # noqa: PLC0415
Expand Down
28 changes: 28 additions & 0 deletions tests/core/test_mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -780,6 +780,12 @@ def discover_streams(self):
"aliased_stream_quoted.jsonl",
id="aliased_stream_quoted",
),
pytest.param(
{"mystream": {"source_table": "__stream_name__"}},
{"flattening_enabled": False, "flattening_max_depth": 0},
"builtin_variable_stream_name.jsonl",
id="builtin_variable_stream_name",
),
pytest.param(
{"mystream": {"__alias__": "'aliased_' + __stream_name__"}},
{"flattening_enabled": False, "flattening_max_depth": 0},
Expand All @@ -792,6 +798,28 @@ def discover_streams(self):
"builtin_variable_stream_name_alias_expr.jsonl",
id="builtin_variable_stream_name_alias_expr",
),
pytest.param(
{
"mystream": {
"email": "self.upper()",
"__else__": None,
}
},
{"flattening_enabled": False, "flattening_max_depth": 0},
"builtin_variable_self.jsonl",
id="builtin_variable_self",
),
pytest.param(
{
"mystream": {
"email": "_['email'].upper()",
"__else__": None,
}
},
{"flattening_enabled": False, "flattening_max_depth": 0},
"builtin_variable_underscore.jsonl",
id="builtin_variable_underscore",
),
pytest.param(
{},
{"flattening_enabled": True, "flattening_max_depth": 0},
Expand Down
6 changes: 6 additions & 0 deletions tests/snapshots/mapped_stream/builtin_variable_self.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{"type":"STATE","value":{}}
{"type":"SCHEMA","stream":"mystream","schema":{"type":"object","properties":{"email":{"type":["string","null"]}}},"key_properties":[]}
{"type":"RECORD","stream":"mystream","record":{"email":"ALICE@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"BOB@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"CHARLIE@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"STATE","value":{"bookmarks":{"mystream":{}}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{"type":"STATE","value":{}}
{"type":"SCHEMA","stream":"mystream","schema":{"properties":{"email":{"type":["string"]},"count":{"type":["integer","null"]},"user":{"properties":{"id":{"type":["integer","null"]},"sub":{"properties":{"num":{"type":["integer","null"]},"custom_obj":{"type":["string","null"]}},"type":["object","null"]},"some_numbers":{"items":{"type":["number"]},"type":["array","null"]}},"type":["object","null"]},"source_table":{"type":["string","null"]}},"type":"object","required":["email"]},"key_properties":[]}
{"type":"RECORD","stream":"mystream","record":{"email":"alice@example.com","count":21,"user":{"id":1,"sub":{"num":1,"custom_obj":"obj-hello"},"some_numbers":[3.14,2.718]},"source_table":"mystream"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"bob@example.com","count":13,"user":{"id":2,"sub":{"num":2,"custom_obj":"obj-world"},"some_numbers":[10.32,1.618]},"source_table":"mystream"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"charlie@example.com","count":19,"user":{"id":3,"sub":{"num":3,"custom_obj":"obj-hello"},"some_numbers":[1.414,1.732]},"source_table":"mystream"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"STATE","value":{"bookmarks":{"mystream":{}}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{"type":"STATE","value":{}}
{"type":"SCHEMA","stream":"mystream","schema":{"type":"object","properties":{"email":{"type":["string","null"]}}},"key_properties":[]}
{"type":"RECORD","stream":"mystream","record":{"email":"ALICE@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"BOB@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"RECORD","stream":"mystream","record":{"email":"CHARLIE@EXAMPLE.COM"},"time_extracted":"2022-01-01T00:00:00+00:00"}
{"type":"STATE","value":{"bookmarks":{"mystream":{}}}}
Loading