Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the default exists condition for group by keys? #5136

Open
srikanthccv opened this issue Jun 3, 2024 · 5 comments · May be fixed by #5420
Open

Remove the default exists condition for group by keys? #5136

srikanthccv opened this issue Jun 3, 2024 · 5 comments · May be fixed by #5420
Assignees
Labels
bug Something isn't working query-service Regarding query service

Comments

@srikanthccv
Copy link
Member

Bug description

For every group by clause, we are adding a condition to make sure that the key exists

// add group by conditions to filter out log lines which doesn't have the key
for _, attr := range groupBy {
if !attr.IsColumn {
columnType := getClickhouseLogsColumnType(attr.Type)
columnDataType := getClickhouseLogsColumnDataType(attr.DataType)
conditions = append(conditions, fmt.Sprintf("has(%s_%s_key, '%s')", columnType, columnDataType, attr.Key))
} else if attr.Type != v3.AttributeKeyTypeUnspecified {
// for materialzied columns
conditions = append(conditions, fmt.Sprintf("%s_exists`=true", strings.TrimSuffix(getClickhouseColumnName(attr), "`")))
}
}
. This might not be correct in all cases. Say there are two keys service.name and ecs.service.name which are mutually exclusive i.e a log record has either one of them in attributes but not both. If I group by service.name and ecs.service.name it could return no result. Perhaps we should not add key exists condition ourselves and let the user do it if necessary.

@srikanthccv srikanthccv added the bug Something isn't working label Jun 3, 2024
@nityanandagohain
Copy link
Member

So you are saying that if the user doesn't explicitly use the exists then empty values is fine ?

So are with fine with:- If there I group by service.name. And there are n logs with no service name then count of n will be added to empty service name. So it will be difficult to differentiate between empty service name and logs where service.name doesn't exists.

@srikanthccv
Copy link
Member Author

So it will be difficult to differentiate between empty service name and logs where service.name doesn't exists.

In practice, this doesn't happen where service.name (or any attribute) is deliberately set to empty, and it should be distinguished from the attribute's existence. On occasions when it is important, users will need to add the EXISTS filter.

So you are saying that if the user doesn't explicitly use the exists then empty values is fine ?

The current result doesn't display anything and just discards the data, as we observed when trying to group by service.name (the total of each value in the group by doesn't match the total number of log lines). Therefore, showing an empty result is accurate and preferable.

@srikanthccv
Copy link
Member Author

This should be fixed otherwise the charts are just lying.

@srikanthccv srikanthccv added the query-service Regarding query service label Jul 1, 2024
@nityanandagohain
Copy link
Member

sure prioritising this

@srikanthccv
Copy link
Member Author

So it will be difficult to differentiate between empty service name and logs where service.name doesn't exists.

In practice, this doesn't happen where service.name (or any attribute) is deliberately set to empty, and it should be distinguished from the attribute's existence. On occasions when it is important, users will need to add the EXISTS filter.

This is a lazy assumption to make for any data type specifically expecting users to add the filter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working query-service Regarding query service
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants