Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Add function for parsing key value pairs #30998

Closed
dpaasman00 opened this issue Feb 1, 2024 · 10 comments · Fixed by #31035
Closed

[pkg/ottl] Add function for parsing key value pairs #30998

dpaasman00 opened this issue Feb 1, 2024 · 10 comments · Fixed by #31035
Assignees
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium processor/transform Transform processor

Comments

@dpaasman00
Copy link
Contributor

Component(s)

pkg/ottl, processor/transform

Is your feature request related to a problem? Please describe.

I'd like to be able to parse key value pairs using OTTL similar to the Stanza key value parser operator, but currently there is no function in OTTL that achieves this.

Describe the solution you'd like

Add a new converter function that returns a pcommon.Map of the key value pairs parsed from a target string.

ParseKeyValuePairs(target, Optional[delimiter], Optional[pair_delimiter])

The function parameters would be as follows:

  • target: StringGetter Returns a string containing key value pairs to be parsed from such as "pkg=ottl func=keyvalue"
  • delimiter: Optional[string] A string containing the delimiter value to use to split the key value pair, such as "=". Default can be =.
  • pair_delimiter: Optional[string]A string containing the delimiter value to use to split pairs in the target string. Default can be white space.

Implementation can follow the Stanza package key value parser operator and use strings.FieldsFunc() when white space is the pair_delimiter, strings.Split() for non white space pair_delimiters, and strings.SplitN() for splitting the key and value in a given pair.

Describe alternatives you've considered

A possible alternative is implementing this function as an editor function. This option would allow users to immediately overwrite the source of the key value pairs with the newly parsed version. However, there is a precedent already set with the ParseJSON converter function to just return the parsed value. While it may be useful for this function to also replace the source string, I don't think it's necessary for it to do so and just returning a map of the pairs is enough.

Additional context

No response

@dpaasman00 dpaasman00 added enhancement New feature or request needs triage New item requiring triage labels Feb 1, 2024
@github-actions github-actions bot added pkg/ottl processor/transform Transform processor labels Feb 1, 2024
Copy link
Contributor

github-actions bot commented Feb 1, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@dpaasman00
Copy link
Contributor Author

I'd like to implement these functions once design is agreed on.

@TylerHelmuth TylerHelmuth added priority:p2 Medium and removed needs triage New item requiring triage labels Feb 1, 2024
@TylerHelmuth
Copy link
Member

@dpaasman00 can you share an example payload you'd be parsing?

@dpaasman00
Copy link
Contributor Author

dpaasman00 commented Feb 1, 2024

Sure, here's an example payload where the delimiter is = and the pair delimiter would be ;:

CMD=0; VER=10; MSGType= RECONNECT;              LEN=15; ERR=0; FLG=0 SPID=3040; SCH=112; SPH=10709; SM=0 RPID=2; RCH=0; RPH=0; RM=0

After parsing, the function would output these values in a map like:

{"CMD": "0", "VER": "10", "MSGType": "RECONNECT", "LEN": "15", ...}

Some other examples and potential use cases can be seen in the test cases for the Stanza Parser here.

Let me know if I can expand on any of this at all!

@TylerHelmuth
Copy link
Member

@dpaasman00 are these values in the log body or somewhere else? I assume the data is coming from a receiver that is not the filelogreceiver?

@dpaasman00
Copy link
Contributor Author

@TylerHelmuth Entirely depends on the situation. Typically we'd expect the raw string to be on the log body, but it could be on the attributes as well. In terms of receivers, the data can come from the filelogreceiver(especially in the case the pairs are located in the body) but this is also known to occur with the TCP, Splunk TCP, and Fluent Forward receivers.

Something we've seen in customer environments occasionally is cases where they are using very low powered machines and want to do as much parsing as possible at the gateway/aggregator level, where they have more compute power. So we'd like to support parsing key-values even when we are receiving data from receivers like the OTLP receiver.

@TylerHelmuth
Copy link
Member

I ask the question bc I want to make sure the use case exists somewhere that Stanza is not used. The TCP receiver can use stanza operators and therefore you wouldn't need the transformprocessor, but data from the fluentforward and OTLP receivers would need the transformprocessor.

@TylerHelmuth
Copy link
Member

I agree with the function parameters. Could we shorten the name to be ParseKeyValue?

@evan-bradley please take a look at this proposal.

@dpaasman00
Copy link
Contributor Author

Totally understand and no problem changing the name.

@evan-bradley
Copy link
Contributor

This all sounds good to me. I'm fine with having the name as ParseKeyValue so it lines up with the name of the Stanza operator.

evan-bradley pushed a commit that referenced this issue Feb 15, 2024
**Description:** <Describe what has changed.>
Adds a `ParseKeyValue` converter function that parses out key values
pairs into a `pcommon.Map`. It takes a `StringGetter` target argument
and 2 optional arguments for the pair delimiter and key value delimiter.
This is an adaptation of the Stanza Key Value Parser operator to provide
feature parity.

Given the following input string `"k1=v1 k2=v2 k3=v3"`, the function
would return the following map:
```
{ "k1": "v1", "k2": "v2", "k3": "v3" }
```

**Link to tracking Issue:** <Issue number if applicable>
Closes #30998 

**Testing:** <Describe what testing was performed and which tests were
added.>
Added unit tests and e2e test.

**Documentation:** <Describe the documentation added.>
Added function documentation.
XinRanZhAWS pushed a commit to XinRanZhAWS/opentelemetry-collector-contrib that referenced this issue Mar 13, 2024
**Description:** <Describe what has changed.>
Adds a `ParseKeyValue` converter function that parses out key values
pairs into a `pcommon.Map`. It takes a `StringGetter` target argument
and 2 optional arguments for the pair delimiter and key value delimiter.
This is an adaptation of the Stanza Key Value Parser operator to provide
feature parity.

Given the following input string `"k1=v1 k2=v2 k3=v3"`, the function
would return the following map:
```
{ "k1": "v1", "k2": "v2", "k3": "v3" }
```

**Link to tracking Issue:** <Issue number if applicable>
Closes open-telemetry#30998 

**Testing:** <Describe what testing was performed and which tests were
added.>
Added unit tests and e2e test.

**Documentation:** <Describe the documentation added.>
Added function documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium processor/transform Transform processor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants