Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clp-s: Add support for using "?" in wildcard queries on single-word values (resolves #392). #409

Merged
merged 7 commits into from
May 24, 2024

Conversation

wraymo
Copy link
Contributor

@wraymo wraymo commented May 15, 2024

References

#392

Description

This PR adds the feature requested in #392. Now we can use ? to match an arbitrary single character on a variable field. However, there are still two limitations.

Validation performed

Compressed the logs below

{"a": "Hello world?", "b": "wwoorr\lldd", "c":"*world"}
{"a": "Hello *", "b": "?world"}

The queries below all returned correct results

b: \?world
b: ?world
b: ?wo?ld
b: wwoorr\\lldd
c: ?world
c: \*world

@wraymo wraymo changed the title clp-s: Add support for fix wildcard issues clp-s: Add support for "?" wildcard character in a variable field May 15, 2024
@wraymo wraymo marked this pull request as draft May 15, 2024 21:06
@wraymo wraymo changed the title clp-s: Add support for "?" wildcard character in a variable field clp-s: Add support for queries on a variable field with "?" wildcard character. May 15, 2024
@wraymo wraymo changed the title clp-s: Add support for queries on a variable field with "?" wildcard character. clp-s: Add support for wildcard queries using "?" on a variable field. May 15, 2024
@wraymo wraymo marked this pull request as ready for review May 16, 2024 13:46
@@ -68,7 +68,7 @@ class StringLiteral : public Literal {
m_string_type = LiteralType::VarStringT;
}

if (m_v.find('*') != std::string::npos) {
if (m_v.find('*') != std::string::npos || m_v.find('?') != std::string::npos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's wrong in the original code as well, but could we fix this to handle escaped * and ? properly as well? This should really be checking just for unescaped * or ?, and ignoring them when they're escaped. E.g. var\? is just a variable string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@@ -915,7 +915,9 @@ void Output::populate_string_queries(std::shared_ptr<Expression> const& expr) {
}

std::unordered_set<int64_t>& matching_vars = m_string_var_match_map[query_string];
if (query_string.find('*') == std::string::npos) {
if (query_string.find('*') == std::string::npos
&& query_string.find('?') == std::string::npos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my other comment we should be handling escaping here as well. Specifically if there are no unescaped * or ? we should enter this code block, process the string to unescape escaped characters as necessary, then do the precise search.

E.g. \\var\? should enter this code block and get unescaped to become \var? before performing exact match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function wildcard_match_unsafe_case_sensitive can handle escape characters correctly. The corner case was mentioned above and will be fixed later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this code block uses m_var_dict->get_entry_matching_value() which doesn't respect escape characters.

Copy link
Contributor

@gibber9809 gibber9809 May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for the condition here we can add a method like

bool has_unescaped_wildchards(std::string const& query) {
    bool escaped{false};
    for (size_t i = 0; i < query.size(); ++i) {
        if ('*' == query[i] || '?' == query[i]) {
            return true;
        }
        if ('\' == query[i]) {
            ++i;
        }
    }
    return false;
}

and replace the if condition with false == has_unescaped_wildcards(query)?

@wraymo wraymo requested a review from gibber9809 May 16, 2024 20:39
Copy link
Contributor

@gibber9809 gibber9809 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe just add (fixes #392) to the commit message -- besides that title is good as commit message.

@kirkrodrigues
Copy link
Member

kirkrodrigues commented May 24, 2024

How about:

clp-s: Add support for using "?" in wildcard queries on single-word values (fixes #392).

Or since #392 is a feature request rather than a bug, we could use "resolves #392".

@wraymo wraymo merged commit fa60ca5 into y-scope:main May 24, 2024
11 checks passed
@wraymo wraymo changed the title clp-s: Add support for wildcard queries using "?" on a variable field. clp-s: Add support for using "?" in wildcard queries on single-word values (resolves #392). May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants