feat: remove default exist condition for groupby values in favour of … #5420

nityanandagohain · 2024-07-03T12:02:42Z

Fixes #5136

it removes the default exists caluses added to the group by values.

…query correctness

srikanthccv · 2024-07-03T16:07:40Z

LGTM, can you make the changes for traces as well?

nityanandagohain · 2024-07-03T16:25:02Z

have removed it from traces as well

srikanthccv

Thanks

ankitnayan · 2024-07-03T18:15:17Z

The PR should have included these 2 points too IMO

Why was this there in the 1st place? I guess perf would be impacted now. If so, we must have some numbers with the PR
Is the _exist column not needed anymore? I see them being dereferenced

srikanthccv · 2024-07-04T02:32:09Z

Why was this there in the 1st place?

It was an incorrect assumption when the original code was written that group by keys must exist #2571 (comment). However, it returns results that don't make sense. For example, if we group by service.name and see the result in the table, all those logs that don't have the service.name set are not shown in the result table which is misleading.

Is the _exist column not needed anymore? I see them being dereferenced

These columns mainly exist to support the EXIST and NOT EXISTS operators.

can produce incorrect results for legitimate zero values as it was origianally intended for

srikanthccv · 2024-07-05T03:55:44Z

I see two things we can do it address the problem. This is only an issue for COUNT aggregations since it doesn't require any aggregate attribute.

Run two queries
- the first one will be the same as we currently do
- the second one will query for the data excluding the first one (the actual query details can be figured out) and show them under something like "n/a" or "empty" to indicate there are entries in the result that don't account for any of the selected group by combinations.

This will ensure we don't overcount things for legitimate zero-value scenarios and at the same time don't produce misleading results when the group key doesn't exist.

Also nudging users in the product to somehow guide COUNT without aggregate attribute is COUNT(*) which is different from COUNT(key).

ankitnayan · 2024-07-07T05:14:39Z

Run two queries

Inefficient?

Also nudging users in the product

Would be still problematic for logs?

What about having a default integer value and removing that in the where clause of the query?

nityanandagohain · 2024-07-08T12:37:59Z

@ankitnayan this is why we didn't go with default values last time

#2594 (comment)

srikanthccv · 2024-07-15T02:46:34Z

I summarized it below.

Here's a scenario where our current query logic falls short. Imagine an application service that logs data with various attributes, including code and service resource information. The service always includes a 'service.name' by default, but 'code.function' only appears in certain parts of the codebase.

Now, when a user tries to COUNT logs WHERE severity_text = ERROR, grouped by service.name and code.function, we run into issues. The query looks something like this:

SELECT
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), toIntervalSecond(60)) AS ts,
    resources_string_value[indexOf(resources_string_key, 'service.name')] AS `service.name`,
    attributes_string_value[indexOf(attributes_string_key, 'code.function')] AS `code.function`,
    toFloat64(count(*)) AS value
FROM signoz_logs.distributed_logs
WHERE ((timestamp >= 1720511534000000000) AND (timestamp <= 1720513334000000000)) AND severity_text = 'ERROR' AND has(attributes_string_key, 'service.name') AND has(attributes_string_key, 'code.function')
GROUP BY
    `service.name`,
    `code.function`,
    ts
ORDER BY value DESC

The problem? The WHERE clause filters out all logs missing either service.name or code.function in their attribute keys. This leads to inaccurate results because some ERROR logs might lack the 'code.function' attribute due to incomplete instrumentation or other reasons. Users expect to see these logs with the missing attribute as well. What is the best way to show it can be discussed

So, does it make sense to exclude logs without certain attribute keys? Not really. This default behaviour skews results in common use cases, and there's no easy workaround except writing a custom ClickHouse query.

Would it be better to include these logs under a special no-value or something group? I'd argue yes. It shows users that some error logs exist without a code.function, giving a more complete picture.

Is this always the right approach? Probably not. There might be cases where a missing attribute means the log isn't relevant. In such scenarios, we'd need a way to explicitly exclude these logs using a <key> EXISTS filter.

Now, let's consider numeric and boolean attributes. Take HTTP server logs with an http.status_code or grpc.code for the response. When grouping by http.status_code, users typically only care about logs with this attribute. Logs without it are usually internal and irrelevant. Here, our current approach actually gives the expected result.

Clearly, we can't make a one-size-fits-all decision. The context and nature of the data being queried matter. So, removing the default has conditions from the WHERE clause seems logical.

But simply removing these conditions changes our query to:

SELECT
    toStartOfInterval(fromUnixTimestamp64Nano(timestamp), toIntervalSecond(60)) AS ts,
    resources_string_value[indexOf(resources_string_key, 'service.name')] AS `service.name`,
    attributes_string_value[indexOf(attributes_string_key, 'code.function')] AS `code.function`,
    toFloat64(count(*)) AS value
FROM signoz_logs.distributed_logs
WHERE ((timestamp >= 1720511534000000000) AND (timestamp <= 1720513334000000000)) AND severity_text = 'ERROR'
GROUP BY
    `service.name`,
    `code.function`,
    ts
ORDER BY value DESC

The consequence? For logs without a key in the attribute array, we get zero-values: an empty string for strings, 0 for numbers, and false for booleans. This creates a new problem: legitimate zero values in user attributes (like grpc.code) get mixed up with our placeholder zeros. We end up overcounting when trying to show the count for spans without grpc.code.

Can we just add our own default values for missing keys? Nope, because we risk colliding with users' actual values for numbers or booleans. We can't predict what values people might use in their attributes.

One of the suggestions was to run two queries and the concern is it reads twice the data. We can avoid two queries by doing conditional counting. Something like. We should check if there is any performance change b/w this and the two queries approach.

SELECT
    countIf(mapContains(numberTagMap, 'http.status_code')) as real_count,
    count() as total_count,
    numberTagMap['http.status_code'],
    total_count-real_count as no_value_count
FROM signoz_traces.signoz_index_v2
WHERE timestamp >= (now() - toIntervalHour(10))
GROUP BY numberTagMap['http.status_code']

How do we make this backwards compatible? Since people have already set up dashboards/alerts 'we mustn't change their existing behaviour. We need to write a data migration that adds a filter item to the panel WHERE clause for each group by key. For every new panel, users with more context make the judgement and add the <key> EXISTS filter as needed.

Overall, I believe the fix in general should be implemented but how we handle the legitimate zero-values is the only concern.

feat: remove default exist condition for groupby values in favour of …

8a01e24

…query correctness

nityanandagohain requested review from raj-k-singh and srikanthccv July 3, 2024 12:02

github-actions bot added the enhancement New feature or request label Jul 3, 2024

feat: remove default exist condition in traces

726f1b7

srikanthccv previously approved these changes Jul 3, 2024

View reviewed changes

nityanandagohain mentioned this pull request Jul 25, 2024

[EPIC]: Logs new schema #5555

Open

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: remove default exist condition for groupby values in favour of … #5420

feat: remove default exist condition for groupby values in favour of … #5420

nityanandagohain commented Jul 3, 2024

srikanthccv commented Jul 3, 2024

nityanandagohain commented Jul 3, 2024

srikanthccv left a comment

ankitnayan commented Jul 3, 2024

srikanthccv commented Jul 4, 2024

srikanthccv commented Jul 5, 2024

ankitnayan commented Jul 7, 2024 •

edited

Loading

nityanandagohain commented Jul 8, 2024 •

edited

Loading

srikanthccv commented Jul 15, 2024

feat: remove default exist condition for groupby values in favour of … #5420

Are you sure you want to change the base?

feat: remove default exist condition for groupby values in favour of … #5420

Conversation

nityanandagohain commented Jul 3, 2024

srikanthccv commented Jul 3, 2024

nityanandagohain commented Jul 3, 2024

srikanthccv left a comment

Choose a reason for hiding this comment

ankitnayan commented Jul 3, 2024

srikanthccv commented Jul 4, 2024

srikanthccv commented Jul 5, 2024

ankitnayan commented Jul 7, 2024 • edited Loading

nityanandagohain commented Jul 8, 2024 • edited Loading

srikanthccv commented Jul 15, 2024

ankitnayan commented Jul 7, 2024 •

edited

Loading

nityanandagohain commented Jul 8, 2024 •

edited

Loading