Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query prefixes :~ and := #4251

Merged
merged 2 commits into from
Jan 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions beets/dbcore/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,23 @@ def string_match(cls, pattern, value):
raise NotImplementedError()


class StringQuery(StringFieldQuery):
"""A query that matches a whole string in a specific item field."""

def col_clause(self):
search = (self.pattern
.replace('\\', '\\\\')
.replace('%', '\\%')
.replace('_', '\\_'))
clause = self.field + " like ? escape '\\'"
Comment on lines +184 to +188
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a little bit on why this uses SQLite's LIKE operator instead of just plain =? Maybe I'm missing something, but it seems like that should work without any escaping…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using LIKE to perform a case-insensitive string match, though you are right that there are several ways to achieve that which I can think of:

  1. $field LIKE $value
    • Requires value to be escaped when building the query
    • Match can be satisfied by a COLLATE NOCASE index
  2. $field = $value COLLATE NOCASE
    • Can be applied on a per-clause basis, so WHERE artist = 'braid' COLLATE NOCASE AND album = 'No Coast' works as expected
    • Match can be satisfied by a COLLATE NOCASE index
  3. UPPER($field) = UPPER($value).
    • Simple to understand
    • Not sure how to get sqlite to use an index for this match

Seems like sqlite does not understand how to change the case of non-ascii characters out of the box, so none of these methods will do the right thing for those strings... not sure if this is a deal-breaker? If so, I am sure we can come up with some hack that will work reasonably well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, sorry for misunderstanding! I got it backward and thought this was the case-sensitive version; of course, that's just plain old MatchQuery in this PR. In that case, you're absolutely right and I don't think there's a strong reason to prefer either of the first two (good point about the index for option 3).

But in that case, is it a bug that string_match uses pattern == value? It should perhaps use pattern.lower() == value.lower() or similar for a similar effect in slow queries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will just have to live with the consequences for non-ASCII characters. This is already the case for SubstringQuery for similar reasons. It's not great, but the complexity of working around it is also not terribly attractive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good call on string_match being screwed up. Fixed that (and the test I added which was asserting the incorrect behavior...) in a second commit just now.

Looks like I was the one who got it backwards!

subvals = [search]
return clause, subvals

@classmethod
def string_match(cls, pattern, value):
return pattern.lower() == value.lower()


class SubstringQuery(StringFieldQuery):
"""A query that matches a substring in a specific item field."""

Expand Down
6 changes: 5 additions & 1 deletion beets/library.py
Original file line number Diff line number Diff line change
Expand Up @@ -1385,7 +1385,11 @@ def parse_query_parts(parts, model_cls):
special path query detection.
"""
# Get query types and their prefix characters.
prefixes = {':': dbcore.query.RegexpQuery}
prefixes = {
':': dbcore.query.RegexpQuery,
'~': dbcore.query.StringQuery,
'=': dbcore.query.MatchQuery,
}
prefixes.update(plugins.queries())

# Special-case path-like queries, which are non-field queries
Expand Down
1 change: 1 addition & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ New features:
* :doc:`/plugins/kodiupdate`: Now supports multiple kodi instances
:bug:`4101`
* Add the item fields ``bitrate_mode``, ``encoder_info`` and ``encoder_settings``.
* Add query prefixes ``=`` and ``~``.

Bug fixes:

Expand Down
37 changes: 34 additions & 3 deletions docs/reference/query.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,14 +93,45 @@ backslashes are not part of beets' syntax; I'm just using the escaping
functionality of my shell (bash or zsh, for instance) to pass ``the rebel`` as a
single argument instead of two.

Exact Matches
-------------

While ordinary queries perform *substring* matches, beets can also match whole
strings by adding either ``=`` (case-sensitive) or ``~`` (ignore case) after the
field name's colon and before the expression::

$ beet list artist:air
$ beet list artist:~air
$ beet list artist:=AIR

The first query is a simple substring one that returns tracks by Air, AIR, and
Air Supply. The second query returns tracks by Air and AIR, since both are a
case-insensitive match for the entire expression, but does not return anything
by Air Supply. The third query, which requires a case-sensitive exact match,
returns tracks by AIR only.

Exact matches may be performed on phrases as well::

$ beet list artist:~"dave matthews"
$ beet list artist:="Dave Matthews"

Both of these queries return tracks by Dave Matthews, but not by Dave Matthews
Band.

To search for exact matches across *all* fields, just prefix the expression with
a single ``=`` or ``~``::

$ beet list ~crash
$ beet list ="American Football"

.. _regex:

Regular Expressions
-------------------

While ordinary keywords perform simple substring matches, beets also supports
regular expression matching for more advanced queries. To run a regex query, use
an additional ``:`` between the field name and the expression::
In addition to simple substring and exact matches, beets also supports regular
expression matching for more advanced queries. To run a regex query, use an
additional ``:`` between the field name and the expression::

$ beet list "artist::Ann(a|ie)"

Expand Down
52 changes: 52 additions & 0 deletions test/test_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,16 +94,19 @@ def setUp(self):
items[0].album = 'baz'
items[0].year = 2001
items[0].comp = True
items[0].genre = 'rock'
items[1].title = 'baz qux'
items[1].artist = 'two'
items[1].album = 'baz'
items[1].year = 2002
items[1].comp = True
items[1].genre = 'Rock'
items[2].title = 'beets 4 eva'
items[2].artist = 'three'
items[2].album = 'foo'
items[2].year = 2003
items[2].comp = False
items[2].genre = 'Hard Rock'
for item in items:
self.lib.add(item)
self.album = self.lib.add_album(items[:2])
Expand Down Expand Up @@ -132,6 +135,22 @@ def test_get_one_keyed_term(self):
results = self.lib.items(q)
self.assert_items_matched(results, ['baz qux'])

def test_get_one_keyed_exact(self):
q = 'genre:=rock'
results = self.lib.items(q)
self.assert_items_matched(results, ['foo bar'])
q = 'genre:=Rock'
results = self.lib.items(q)
self.assert_items_matched(results, ['baz qux'])
q = 'genre:="Hard Rock"'
results = self.lib.items(q)
self.assert_items_matched(results, ['beets 4 eva'])

def test_get_one_keyed_exact_nocase(self):
q = 'genre:~"hard rock"'
results = self.lib.items(q)
self.assert_items_matched(results, ['beets 4 eva'])

def test_get_one_keyed_regexp(self):
q = 'artist::t.+r'
results = self.lib.items(q)
Expand All @@ -142,6 +161,16 @@ def test_get_one_unkeyed_term(self):
results = self.lib.items(q)
self.assert_items_matched(results, ['beets 4 eva'])

def test_get_one_unkeyed_exact(self):
q = '=rock'
results = self.lib.items(q)
self.assert_items_matched(results, ['foo bar'])

def test_get_one_unkeyed_exact_nocase(self):
q = '~"hard rock"'
results = self.lib.items(q)
self.assert_items_matched(results, ['beets 4 eva'])

def test_get_one_unkeyed_regexp(self):
q = ':x$'
results = self.lib.items(q)
Expand All @@ -159,6 +188,11 @@ def test_invalid_key(self):
# objects.
self.assert_items_matched(results, [])

def test_get_no_matches_exact(self):
q = 'genre:="hard rock"'
results = self.lib.items(q)
self.assert_items_matched(results, [])

def test_term_case_insensitive(self):
q = 'oNE'
results = self.lib.items(q)
Expand All @@ -182,6 +216,14 @@ def test_key_case_insensitive(self):
results = self.lib.items(q)
self.assert_items_matched(results, ['beets 4 eva'])

def test_keyed_matches_exact_nocase(self):
q = 'genre:~rock'
results = self.lib.items(q)
self.assert_items_matched(results, [
'foo bar',
'baz qux',
])

def test_unkeyed_term_matches_multiple_columns(self):
q = 'baz'
results = self.lib.items(q)
Expand Down Expand Up @@ -350,6 +392,16 @@ def test_substring_match_non_string_value(self):
q = dbcore.query.SubstringQuery('disc', '6')
self.assertTrue(q.match(self.item))

def test_exact_match_nocase_positive(self):
q = dbcore.query.StringQuery('genre', 'the genre')
self.assertTrue(q.match(self.item))
q = dbcore.query.StringQuery('genre', 'THE GENRE')
self.assertTrue(q.match(self.item))

def test_exact_match_nocase_negative(self):
q = dbcore.query.StringQuery('genre', 'genre')
self.assertFalse(q.match(self.item))

def test_year_match_positive(self):
q = dbcore.query.NumericQuery('year', '1')
self.assertTrue(q.match(self.item))
Expand Down