Implement parallel replaygain analysis #3478

ybnd · 2020-01-28T20:42:09Z

Implements #2224

Add --jobs or -j to replaygain to set the pool size
Single-threaded execution by default, if --jobs is unset
If multithreaded, calls Backend.compute_album_gain or Backend.compute_track_gain asynchronously with metadata storing/writing in the callback

* Add `--jobs` or `-j` to `replaygain`-> set the pool size * Single-threaded execution by default, if `--jobs` is unset * If multithreaded, calls `Backend.compute_album_gain` or `Backend.compute_track_gain` asynchronously with metadata storing/writing in the callback

sampsyo

Awesome; thanks for tackling this!!

I have a few comments in here about how we can reduce some code duplication by introducing a "maybe run in parallel" utility.

Could I ask you to please also add to the documentation for the plugin (and add a changelog entry)?

beetsplug/replaygain.py

sampsyo · 2020-01-29T15:52:37Z

beetsplug/replaygain.py

+            self._log.debug(u'done analyzing {0}', item)
+
+        try:
+            if hasattr(self, 'pool'):


Perhaps we could simplify these conditionals by introducing a utility like _apply that takes an optional pool argument? If the pool is None, then this utility would just call the function directly. Otherwise, it would use a proper call to run the function asynchronously. Then we can centralize the "maybe do this parallel" logic rather than replicating it for the individual calls.

Yeah, that would be much better. Thanks!

Implemented in 388d2d2, with the exception of the optional pool argument, do the "pool or no pool" check in _has_pool instead. The plugin should still have pool as an attribute for parallel execution to work at import ~ 79c5535.

sampsyo · 2020-01-29T16:00:07Z

beetsplug/replaygain.py

@@ -1381,17 +1433,30 @@ def commands(self):
        def func(lib, opts, args):
            write = ui.should_write(opts.write)
            force = opts.force
+            jobs = opts.jobs


Instead of defaulting to off when jobs = 0, let's use our cpu_count utility:

beets/beets/util/__init__.py

Line 760 in c1937e1

def cpu_count():

Should we do something like default = math.ceil(cpu_count() / 4) to keep CPU usage reasonable by default for large queries?

And maybe have it default to no parallelism if configured with threaded: no?

Maybe the best policy would be to match the behavior of the convert plugin:
https://github.com/beetbox/beets/blob/master/beetsplug/convert.py#L119

The option is named threads there, and it does default to the CPU count. I think it's probably OK to default to that? If people want to run in the background without disruption, it's easy to turn down the count…

beetsplug/replaygain.py

And change local function `func` to `ReplayGainPlugin` method `replaygain_func` so that `self` is passed explicitly

ybnd · 2020-01-30T09:11:58Z

Ok, so I remembered why I made it default to non-parallel :)

While it does work with my on-disk library, it fails in test_replaygain.py on my machine, maybe because there the library is :in_memory:? For some reason it works in CI though.

The Travis build fails because I'm not propagating exceptions to the main thread. Apparently it's a common issue, didn't know that...

…ut ~ Python 3.8

As suggested in beetbox#3478 (comment)

* With `bs1770gain` installed the `Bs1770gainBackend` tests fail, but this should be fixed by beetbox#3480.

sampsyo

Looking good! A couple more comments within.

beetsplug/replaygain.py

sampsyo · 2020-01-30T17:17:38Z

beetsplug/replaygain.py

+        write = ui.should_write(opts.write)
+        force = opts.force
+
+        if opts.threads != 0:


It seems like this condition might be supposed to be != 1? Or better yet, the condition should come after checking the config file too?

I wanted to include the option to completely bypass ThreadPool from the CLI with --threads 0, seems like a useful feature down the line in case we ever suspect the parallel processing is messing something up

ThreadPool(1) still provides one worker thread that can work asynchronously which may be useful in some cases? I don’t know.

beetsplug/replaygain.py

sampsyo · 2020-01-31T00:26:39Z

Looking pretty good! I assume we should probably figure out the exception logging thing before merging?

* ExceptionWatcher instance running in parallel to the pool, monitoring a queue for exceptions * Pooled threads push exceptions to this queue, log non-fatal exceptions * Application exits on fatal exception in pooled thread * More front end info logs in the CLI

ybnd · 2020-02-04T18:31:21Z

Added an exception handling thread.

ReplayGainError exceptions raised in the worker threads jost get logged, while FatalReplayGainError and other exceptions are reraised in the main thread.

jtpavlock · 2020-07-11T00:48:10Z

Pinging @sampsyo in case he didn't see the last request for review.

Did you have anything else to add to this?

ybnd · 2020-08-12T10:02:36Z

Fixed the conflicts on my end.

Some tests are failing though, fixing those now.

ybnd · 2020-08-13T12:40:49Z

beetsplug/replaygain.py

+    def _store(self, item):
+        try:
+            item.store()
+        except OperationalError:
+            # test_replaygain.py :memory: library can fail with
+            #    `sqlite3.OperationalError: no such table: items`
+            # but the second attempt succeeds
+            item.store()


As far as I can tell, the exception is never thrown when working with an 'actual' database.

stale · 2020-12-11T13:07:50Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

sampsyo · 2020-12-11T13:58:47Z

So sorry for the delay on this. Everything looks awesome, for the most part—I have no specific code-level comments on the actual parallelization.

However, I do think we need to do something about the OperationalError handler that @ybnd highlighted. OperationalError is a very generic exception and can arise in lots of cases where we really do want to know something went wrong with the database—for example, if we make an error in generating the SQL query. Is there any way we can move this handler/suppression to the test itself instead of putting it here in the plugin?

ybnd · 2020-12-12T15:09:48Z

@sampsyo No problem, I know you've been super busy :)

Yeah, that would be way more sensible. I'll look into it!

ybnd

I've pulled the sqlite3.OperationalError thing into test_replaygain.py as suggested.

Somewhat annoyingly this whole workaround is not "necessary" for the CI builds since they just skip all of the real tests...
I double-checked with the master branch and it's this PR that's causing this, for sure.

How do you think I should proceed?

Leave it like this and submit an issue (which you could assign to me so I could keep looking into it)
Look into it before going forward with this PR

ybnd · 2020-12-14T21:15:21Z

beetsplug/replaygain.py

+             in test_replaygain.py, so that it can still fail appropriately
+             outside of these tests.
+        """
+        item.store()


Removed the OperationalError handler, added an explanation as to why this seemingly useless method is there

ybnd · 2020-12-14T21:26:08Z

test/test_replaygain.py

+        item.store()
+
+
+@patch.object(ReplayGainPlugin, '_store', _store_retry_once)


As far as I can see there's no way to try/except on a higher level; the "false alarm" happens at every item, so it looks like we still need to ignore it for every item...

sampsyo · 2020-12-14T22:41:08Z

Awesome. This is really mysterious, but I like your temporary workaround to make the tests go through. Since you're comfortable with it, I'd like to forge ahead and merge this. Would you mind opening a new issue to get to the bottom of what's going on?

Many thanks once again!!

skapazzo · 2020-12-19T14:48:19Z

This seems to give me problems: replaygain metadata doesn't get written to file on import.

If I import an album with replaygain: auto: yes and any backend that has do_parallel = True replaygain values get calculated and saved in the db but not written to the files.
If I set do_parallel = False for the backend I'm using the metadata gets written as expected; setting threads = 0 doesn't help.

jackwilsdon · 2020-12-19T19:54:53Z

@skapazzo could you please open a new issue for this?

skapazzo · 2020-12-19T20:02:55Z

@skapazzo could you please open a new issue for this?

Sure, I'll do it straight away.

The parallelism strategy in #3478, in retrospect, used a pretty funky way to deal with exceptions in the asynchronous work---since `apply_async` has an `error_callback` parameter that's meant for exactly this. The problem is that the wrapped function would correctly log the exception *and then return `None`*, confusing any downstream code. Instead of just adding `None`-awareness to the callback, let's just avoid running the callback altogether in the case of an error.

The parallelism strategy in beetbox#3478, in retrospect, used a pretty funky way to deal with exceptions in the asynchronous work---since `apply_async` has an `error_callback` parameter that's meant for exactly this. The problem is that the wrapped function would correctly log the exception *and then return `None`*, confusing any downstream code. Instead of just adding `None`-awareness to the callback, let's just avoid running the callback altogether in the case of an error.

sampsyo reviewed Jan 29, 2020

View reviewed changes

ybnd added 3 commits January 30, 2020 09:35

Match --jobs default & signature to that of convert plugin (--threads)

42e895c

And change local function `func` to `ReplayGainPlugin` method `replaygain_func` so that `self` is passed explicitly

Consolidate ThreadPool checking, opening and closing into methods

388d2d2

Open/close pool at begin/end of import session

79c5535

ybnd force-pushed the parallel-replaygain branch from a24baa5 to 6394d3e Compare January 30, 2020 10:55

Workaround to pass ReplayGainLdnsCliMalformedTest.test_malformed_outp…

0fede91

…ut ~ Python 3.8

ybnd force-pushed the parallel-replaygain branch from 6394d3e to 0fede91 Compare January 30, 2020 11:08

ybnd added 3 commits January 30, 2020 12:38

Clean up single/multithreaded execution selection logic

b126eca

As suggested in beetbox#3478 (comment)

Fix --threads argument handling

b903584

Exception handling in main & worker threads

65ffca2

* With `bs1770gain` installed the `Bs1770gainBackend` tests fail, but this should be fixed by beetbox#3480.

sampsyo reviewed Jan 30, 2020

View reviewed changes

ybnd added 2 commits January 30, 2020 19:53

Implement comments

f51a68c

Add to documentation

b39df01

ybnd force-pushed the parallel-replaygain branch from e41feb9 to b39df01 Compare January 30, 2020 19:31

ybnd added 3 commits January 31, 2020 13:50

Remove temporary workaround for silent exceptions

4970585

Handle keyboard interrupts more cleanly

9bd7842

ybnd force-pushed the parallel-replaygain branch from 02600ce to 9bd7842 Compare February 4, 2020 18:16

ybnd requested a review from sampsyo February 5, 2020 07:21

ybnd force-pushed the parallel-replaygain branch from 6d9cec9 to 49c7100 Compare February 5, 2020 07:40

Add to changelog

9b283be

ybnd force-pushed the parallel-replaygain branch from 49c7100 to 9b283be Compare February 5, 2020 09:22

Fix conflicts

50757d3

ybnd force-pushed the parallel-replaygain branch from 7b4918d to 50757d3 Compare August 12, 2020 09:42

Merge branch 'master' into parallel-replaygain

72710cd

Re-add to changelog

f4db4ae

Fix replaygain.py to pass test_replaygain.py

e187659

ybnd commented Aug 13, 2020

View reviewed changes

stale bot added the stale label Dec 11, 2020

stale bot removed the stale label Dec 11, 2020

ybnd added 2 commits December 14, 2020 22:10

Move OperationalError handler to test_replaygain.py

363f71a

Merge remote-tracking branch 'origin/master' into parallel-replaygain

e3205aa

ybnd force-pushed the parallel-replaygain branch from d512506 to e3205aa Compare December 14, 2020 21:21

ybnd commented Dec 14, 2020

View reviewed changes

sampsyo merged commit 8645f56 into beetbox:master Dec 14, 2020

ybnd mentioned this pull request Dec 15, 2020

ReplayGainPlugin throws false alarm sqlite3.OperationalError in test suite #3809

Closed

skapazzo mentioned this pull request Dec 19, 2020

Replaygain: metadata doesn't get written to file on import with parallelized backends. #3815

Closed

sampsyo mentioned this pull request Oct 1, 2022

replaygain: Fix error handling for parallel runs #4506

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement parallel replaygain analysis #3478

Implement parallel replaygain analysis #3478

ybnd commented Jan 28, 2020

sampsyo left a comment

sampsyo Jan 29, 2020

ybnd Jan 29, 2020

ybnd Jan 30, 2020

sampsyo Jan 29, 2020

ybnd Jan 29, 2020 •

edited

Loading

ybnd Jan 29, 2020

sampsyo Jan 29, 2020

ybnd commented Jan 30, 2020 •

edited

Loading

sampsyo left a comment

sampsyo Jan 30, 2020

ybnd Jan 30, 2020

sampsyo commented Jan 31, 2020

ybnd commented Feb 4, 2020

jtpavlock commented Jul 11, 2020

ybnd commented Aug 12, 2020

ybnd Aug 13, 2020

stale bot commented Dec 11, 2020

sampsyo commented Dec 11, 2020

ybnd commented Dec 12, 2020

ybnd left a comment •

edited

Loading

ybnd Dec 14, 2020

ybnd Dec 14, 2020

sampsyo commented Dec 14, 2020

skapazzo commented Dec 19, 2020

jackwilsdon commented Dec 19, 2020

skapazzo commented Dec 19, 2020

		item.store()


		@patch.object(ReplayGainPlugin, '_store', _store_retry_once)

Implement parallel replaygain analysis #3478

Implement parallel replaygain analysis #3478

Conversation

ybnd commented Jan 28, 2020

sampsyo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ybnd Jan 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ybnd commented Jan 30, 2020 • edited Loading

sampsyo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sampsyo commented Jan 31, 2020

ybnd commented Feb 4, 2020

jtpavlock commented Jul 11, 2020

ybnd commented Aug 12, 2020

Choose a reason for hiding this comment

stale bot commented Dec 11, 2020

sampsyo commented Dec 11, 2020

ybnd commented Dec 12, 2020

ybnd left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sampsyo commented Dec 14, 2020

skapazzo commented Dec 19, 2020

jackwilsdon commented Dec 19, 2020

skapazzo commented Dec 19, 2020

ybnd Jan 29, 2020 •

edited

Loading

ybnd commented Jan 30, 2020 •

edited

Loading

ybnd left a comment •

edited

Loading