-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clp-s: Add support for using "?" in wildcard queries on single-word values (resolves #392). #409
Conversation
@@ -68,7 +68,7 @@ class StringLiteral : public Literal { | |||
m_string_type = LiteralType::VarStringT; | |||
} | |||
|
|||
if (m_v.find('*') != std::string::npos) { | |||
if (m_v.find('*') != std::string::npos || m_v.find('?') != std::string::npos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's wrong in the original code as well, but could we fix this to handle escaped *
and ?
properly as well? This should really be checking just for unescaped *
or ?
, and ignoring them when they're escaped. E.g. var\?
is just a variable string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
@@ -915,7 +915,9 @@ void Output::populate_string_queries(std::shared_ptr<Expression> const& expr) { | |||
} | |||
|
|||
std::unordered_set<int64_t>& matching_vars = m_string_var_match_map[query_string]; | |||
if (query_string.find('*') == std::string::npos) { | |||
if (query_string.find('*') == std::string::npos | |||
&& query_string.find('?') == std::string::npos) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my other comment we should be handling escaping here as well. Specifically if there are no unescaped *
or ?
we should enter this code block, process the string to unescape escaped characters as necessary, then do the precise search.
E.g. \\var\?
should enter this code block and get unescaped to become \var?
before performing exact match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function wildcard_match_unsafe_case_sensitive
can handle escape characters correctly. The corner case was mentioned above and will be fixed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but this code block uses m_var_dict->get_entry_matching_value()
which doesn't respect escape characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for the condition here we can add a method like
bool has_unescaped_wildchards(std::string const& query) {
bool escaped{false};
for (size_t i = 0; i < query.size(); ++i) {
if ('*' == query[i] || '?' == query[i]) {
return true;
}
if ('\' == query[i]) {
++i;
}
}
return false;
}
and replace the if condition with false == has_unescaped_wildcards(query)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Maybe just add (fixes #392) to the commit message -- besides that title is good as commit message.
References
#392
Description
This PR adds the feature requested in #392. Now we can use
?
to match an arbitrary single character on a variable field. However, there are still two limitations.?
in the middle return empty results. This should be fixes by clp-s: Handle cases where clp-string query-parsing falls back to decompress + match (fixes #403). #407.?
(*
) and\?
(\*
) cannot work well. This will be fixed later.Validation performed
Compressed the logs below
The queries below all returned correct results