Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query prefixes :~ and := #4251

Merged
merged 2 commits into from
Jan 26, 2022
Merged

Conversation

rcrowell
Copy link
Contributor

Description

Adds 2 new query prefixes to specify exact matches for strings; = for case-sensitive matches and ~ for case-insensitive. For example, the queries ~braid and =Braid both match "Braid" but not "Braids".

I have often wished for a way to do this without having to write a regex on the command line. Perhaps there is a better way than what I am suggesting in this PR, but I wanted to get the ball rolling on this (imo) handy feature.

To Do

  • Documentation. (If you've add a new command-line flag, for example, find the appropriate page under docs/ to describe it.)
  • Changelog. (Add an entry to docs/changelog.rst near the top of the document.)
  • Tests. (Encouraged but not strictly required.)

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truly awesome!! I have thought for a long time that we needed something simple like this, and requests for it come up fairly frequently. (I can't seem to find an issue for such a feature request, but it does come up all the time informally.) The docs are especially clear; I really liked the use of examples.

I have one small technical question within; otherwise, I think this should be good to go.

Comment on lines +184 to +188
search = (self.pattern
.replace('\\', '\\\\')
.replace('%', '\\%')
.replace('_', '\\_'))
clause = self.field + " like ? escape '\\'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a little bit on why this uses SQLite's LIKE operator instead of just plain =? Maybe I'm missing something, but it seems like that should work without any escaping…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using LIKE to perform a case-insensitive string match, though you are right that there are several ways to achieve that which I can think of:

  1. $field LIKE $value
    • Requires value to be escaped when building the query
    • Match can be satisfied by a COLLATE NOCASE index
  2. $field = $value COLLATE NOCASE
    • Can be applied on a per-clause basis, so WHERE artist = 'braid' COLLATE NOCASE AND album = 'No Coast' works as expected
    • Match can be satisfied by a COLLATE NOCASE index
  3. UPPER($field) = UPPER($value).
    • Simple to understand
    • Not sure how to get sqlite to use an index for this match

Seems like sqlite does not understand how to change the case of non-ascii characters out of the box, so none of these methods will do the right thing for those strings... not sure if this is a deal-breaker? If so, I am sure we can come up with some hack that will work reasonably well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, sorry for misunderstanding! I got it backward and thought this was the case-sensitive version; of course, that's just plain old MatchQuery in this PR. In that case, you're absolutely right and I don't think there's a strong reason to prefer either of the first two (good point about the index for option 3).

But in that case, is it a bug that string_match uses pattern == value? It should perhaps use pattern.lower() == value.lower() or similar for a similar effect in slow queries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will just have to live with the consequences for non-ASCII characters. This is already the case for SubstringQuery for similar reasons. It's not great, but the complexity of working around it is also not terribly attractive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, good call on string_match being screwed up. Fixed that (and the test I added which was asserting the incorrect behavior...) in a second commit just now.

Looks like I was the one who got it backwards!

@sampsyo
Copy link
Member

sampsyo commented Jan 26, 2022

Looks perfect. Thanks so much for addressing this with style!

@sampsyo sampsyo merged commit 19e4f41 into beetbox:master Jan 26, 2022
@kergoth
Copy link
Contributor

kergoth commented Mar 16, 2022

@sampsyo Does this now conflict with the fuzzy plugin? Just checking my understanding.

@sampsyo
Copy link
Member

sampsyo commented Mar 16, 2022

Ack, yes, you're absolutely right, @kergoth—at least with the default prefix. (Fortunately, the plugin uses a configurable prefix.) It might be nice to at least document this, although I don't think it's the end of the world in itself.

@kergoth
Copy link
Contributor

kergoth commented Mar 16, 2022

Ack, yes, you're absolutely right, @kergoth—at least with the default prefix. (Fortunately, the plugin uses a configurable prefix.) It might be nice to at least document this, although I don't think it's the end of the world in itself.

Agreed, thanks!

@wisp3rwind
Copy link
Member

This hasn't seen a release yet, so maybe would could still change the prefix? In my opinion, that's preferable compared to the conflict. I don't really have a great suggestion, though... Maybe =~?

@sampsyo
Copy link
Member

sampsyo commented Mar 19, 2022

Hmm, yeah, =~ is a pretty good idea… it makes it more obviously related to the = prefix for exact-match queries. Let's do this!

sampsyo added a commit that referenced this pull request Aug 17, 2022
PR #4251 added exact match queries, which are great, but it was
subsequently pointed out that the `~` query prefix was already in use:
#4251 (comment)

So this changes the prefix from `~` to `=~`. A little longer, but
hopefully it makes the relationship to the similarly-new `=` prefix obvious.
arsaboo added a commit to arsaboo/beets that referenced this pull request Aug 26, 2022
commit e584b04
Merge: 7467bc3 2ebc28d
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 10:44:31 2022 -0700

    Merge pull request beetbox#4199 from jcassette/duplicate

    Allow to configure which fields are used to find duplicates

commit 2ebc28d
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 10:36:40 2022 -0700

    Improve changelog for beetbox#4199

commit 1054b72
Merge: 3c945cb 6e0f7a1
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 10:34:15 2022 -0700

    Merge branch 'master' into duplicate

commit 3c945cb
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 10:31:45 2022 -0700

    Change config key from "single" to "item"

    For consistency with the rest of the terminology in the docs/config.
    Also, correct the documentation (which previously only covered albums).

commit bcc8903
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 10:27:31 2022 -0700

    Refactor query utilities

    We now use somewhat more general query constructors in `dbcore`,
    avoiding the need for somewhat special-purpose `duplicates` methods on
    the model objects.

commit ca38486
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 10:12:47 2022 -0700

    Clarify some control flow

commit 7467bc3
Merge: 6e0f7a1 8cb3143
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 10:01:37 2022 -0700

    Merge pull request beetbox#4450 from beetbox/deprecations

    Resolve some deprecation warnings

commit 8cb3143
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 09:50:53 2022 -0700

    Avoid BeautifulSoup deprecation warning

    The `text` parameter to `SoupStrainer` was renamed to `string` in 2015
    (4.4.0) and started producing a warning this year (4.11.0).
    https://bazaar.launchpad.net/%7Eleonardr/beautifulsoup/bs4/view/head:/CHANGELOG

commit 8c84bae
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 08:18:49 2022 -0700

    Remove `match_querystring` in `responses`

    Quoth the responses documentation:

    > querystring is matched by default

    Not sure how recent this is, unfortunately---but probably 0.17.0, since
    that's the version where `match_querystring` was deprecated.

commit 63b7595
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 08:13:07 2022 -0700

    Remove use of `imp`

    The replacements in `importlib.util` have been available since Python
    3.5.

commit 2c9f699
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 08:06:10 2022 -0700

    Use non-deprecated name for `notify_all`

    `notifyAll` was deprecated in:
    python/cpython#87889

    The new name, `notify_all`, has been available since Python 3.0.

commit 6e0f7a1
Merge: f0a6bbb bf8fbed
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 07:09:12 2022 -0700

    Merge pull request beetbox#4412 from beetbox/album-items

    Document Album.items() / LibModel.items() conflict

commit f0a6bbb
Merge: 40d7fa6 fafddce
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sun Aug 21 07:07:23 2022 -0700

    Merge pull request beetbox#4447 from wisp3rwind/pr_version_regex

    release.py: fix version regex (remove u'' string prefix)

commit bf8fbed
Author: Callum Brown <callum@calcuode.com>
Date:   Sun Aug 21 14:34:18 2022 +0100

    Clarify Album.items() conflict

commit 40d7fa6
Merge: 4761c35 fb9e95b
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sat Aug 20 17:14:02 2022 -0700

    Merge pull request beetbox#4095 from Duncaen/formatted-modify

    Formatted modify and import --set-field.

commit fb9e95b
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sat Aug 20 16:50:20 2022 -0700

    Fix some long lines

commit b207224
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sat Aug 20 16:47:01 2022 -0700

    Further document formatted modify with examples

    I think these can make it clearer why someone would want to use this
    feature. (Part of beetbox#4095.)

commit dad918e
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sat Aug 20 16:43:55 2022 -0700

    Out-of-date changelog fixes

commit 7af40db
Merge: 0456c8f 4761c35
Author: Adrian Sampson <adrian@radbox.org>
Date:   Sat Aug 20 16:37:52 2022 -0700

    Merge branch 'master' into formatted-modify

commit 4761c35
Merge: 18ab441 b7ff616
Author: Benedikt <wisp3rwind@posteo.eu>
Date:   Sat Aug 20 07:33:23 2022 +0200

    Merge pull request beetbox#4395 from clach04/patch-1

    Version bump to 1.6.1

commit fafddce
Author: wisp3rwind <17089248+wisp3rwind@users.noreply.github.com>
Date:   Sat Aug 20 07:30:15 2022 +0200

    release.py: fix version regex (remove u'' string prefix)

commit 18ab441
Merge: 0ae7d66 93725c4
Author: Adrian Sampson <adrian@radbox.org>
Date:   Fri Aug 19 17:54:52 2022 -0700

    Merge pull request beetbox#4444 from BinaryBrain/master

    Add Beetstream in the plugin list

commit 93725c4
Author: Sacha Bron <me@sachabron.ch>
Date:   Sat Aug 20 01:30:38 2022 +0200

    Add Beetstream in the plugin list

commit 0ae7d66
Merge: e995019 32ce44f
Author: Benedikt <wisp3rwind@posteo.eu>
Date:   Thu Aug 18 18:11:03 2022 +0200

    Merge pull request beetbox#4441 from beetbox/exact-prefix

    Change the prefix for exact match queries

commit 32ce44f
Author: Adrian Sampson <adrian@radbox.org>
Date:   Wed Aug 17 16:25:17 2022 -0700

    One more test fix

commit 495c8ac
Author: Adrian Sampson <adrian@radbox.org>
Date:   Wed Aug 17 16:11:16 2022 -0700

    Update exact query prefix tests

commit f71e503
Author: Adrian Sampson <adrian@radbox.org>
Date:   Wed Aug 17 16:05:33 2022 -0700

    Change the prefix for exact match queries

    PR beetbox#4251 added exact match queries, which are great, but it was
    subsequently pointed out that the `~` query prefix was already in use:
    beetbox#4251 (comment)

    So this changes the prefix from `~` to `=~`. A little longer, but
    hopefully it makes the relationship to the similarly-new `=` prefix obvious.

commit e995019
Author: Adrian Sampson <adrian@radbox.org>
Date:   Wed Aug 17 15:55:25 2022 -0700

    Doc tweaks for beetbox#4438

commit fa81d6c
Merge: 6eec17c 6aa9804
Author: Adrian Sampson <adrian@radbox.org>
Date:   Wed Aug 17 15:54:43 2022 -0700

    Merge pull request beetbox#4438 from jaimeMF/singleton_unique_paths

    Add path template "sunique" to disambiguate between singleton tracks

commit 6aa9804
Author: Jaime Marquínez Ferrándiz <jaime.marquinez.ferrandiz@fastmail.net>
Date:   Wed Aug 17 17:03:16 2022 +0200

    Document the %sunique template

commit f641df0
Author: Jaime Marquínez Ferrándiz <jaime.marquinez.ferrandiz@fastmail.net>
Date:   Tue Aug 16 17:54:12 2022 +0200

    Encapsulate common code for the aunique and sunique templates in a single method

commit 8d957f3
Author: Jaime Marquínez Ferrándiz <jaime.marquinez.ferrandiz@fastmail.net>
Date:   Fri Aug 12 14:19:52 2022 +0200

    Add path template "sunique" to disambiguate between singleton tracks

commit 6eec17c
Merge: 1dddcb8 6803ef3
Author: Adrian Sampson <adrian@radbox.org>
Date:   Fri Aug 5 09:15:00 2022 -0400

    Merge pull request beetbox#4433 from vicholp/master

    Fix get item file in web plugin

commit 6803ef3
Author: vicholp <vlinerospardo@gmail.com>
Date:   Wed Aug 3 01:22:45 2022 -0400

    add test to get item file of web plugin

commit fde2ad3
Author: vicholp <vlinerospardo@gmail.com>
Date:   Wed Aug 3 01:22:35 2022 -0400

    fix get item file of web plugin

commit 1cde938
Author: Callum Brown <callum@calcuode.com>
Date:   Tue Jul 12 11:21:52 2022 +0100

    Document Album.items() / LibModel.items() conflict

    Closes: beetbox#4404

commit b7ff616
Author: clach04 <clach04@gmail.com>
Date:   Fri Jul 1 17:51:54 2022 -0700

    Version bump to 1.6.1

    Matche setup.py (package) version

commit bf9bf48
Merge: bcf2e15 10338c2
Author: Julien Cassette <jcassette@users.noreply.github.com>
Date:   Sun Jan 30 16:47:44 2022 +0100

    Merge branch 'master' into duplicate

    # Conflicts:
    #	docs/changelog.rst

commit bcf2e15
Author: Julien Cassette <jcassette@users.noreply.github.com>
Date:   Sun Jan 30 16:38:34 2022 +0100

    Move construct_match_queries() to dbcore.Model

commit 7633465
Author: Julien Cassette <jcassette@users.noreply.github.com>
Date:   Sat Jan 22 22:36:47 2022 +0100

    Add duplicate_keys feature for singletons

commit f50d250
Author: Julien Cassette <jcassette@users.noreply.github.com>
Date:   Sun Jan 2 17:25:30 2022 +0100

    Review duplicate_keys feature

commit 6ce29a6
Author: Julien Cassette <jcassette@users.noreply.github.com>
Date:   Sat Nov 27 14:36:59 2021 +0100

    Allow to use flexible attributes in duplicate_keys

commit 3fdfaaa
Author: Julien Cassette <jcassette@users.noreply.github.com>
Date:   Sun Nov 21 18:41:06 2021 +0100

    Allow to configure which fields are used to find duplicates

commit 0456c8f
Author: Duncan Overbruck <mail@duncano.de>
Date:   Wed Dec 15 14:32:11 2021 +0100

    test multiple items in test_modify_formatted

commit 795bc2e
Author: Duncan Overbruck <mail@duncano.de>
Date:   Wed Dec 15 14:31:15 2021 +0100

    compile modify templates only once

commit a2030d1
Author: Duncan Overbruck <mail@duncano.de>
Date:   Wed Oct 6 15:52:08 2021 +0200

    changelog: import/modify field formatting

commit 5824d46
Author: Duncan Overbruck <mail@duncano.de>
Date:   Wed Oct 6 15:44:12 2021 +0200

    changelog: rewrite permissions cover art change

commit 819ba73
Author: Duncan Overbruck <mail@duncano.de>
Date:   Wed Oct 6 15:40:03 2021 +0200

    allow templates/formatting of set_fields on import

commit 636e36e
Author: Duncan Overbruck <mail@duncano.de>
Date:   Wed Oct 6 15:14:34 2021 +0200

    allow templates/formatting when setting fields with modify
JOJ0 added a commit to JOJ0/beets that referenced this pull request Oct 30, 2022
PR beetbox#4251 added exact match queries, which are great, but it was
subsequently pointed out that the `~` query prefix was already in use:
beetbox#4251 (comment)

So this changes the prefix from `~` to `=~`. A little longer, but
hopefully it makes the relationship to the similarly-new `=` prefix obvious.

# Conflicts:
#	docs/changelog.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants