Google AEC rework: unbreak/resurrect, mt8195, DP integration #8571

andyross · 2023-12-05T02:54:33Z

This series updates the Google RTC Audio Processing module, which has
always been deployed out of platform branches, to run on SOF main.
The tactic taken is a little different: previous integrations have
relied on getting pipeline dependencies to work, where this does a
little trickery at the component level (both in
google_rtc_audio_processing and demux) to control propagation of
command triggers across pipeline boundaries. This should evolve
better with IPC4, which in (almost?) all cases refuses to propagate
those commands anyway.

The main advantage is that the pipelines become decoupled: you can
start an AEC'd capture stream without playback running and launch the
playback DMA later which then integrates with AEC dynamically; you can
shut either down in either order, start the playback before
capture, etc...

A secondary advantage is that, except for one logic fix with reset
propagation that I'm all but certain was a bug, no behavior changes
are needed to the pipeline state logic.

It also ends up a little cleaner and a little faster.

This work was done against a mt8195 device. I'm going to look at MTL
immediately, hopefully nothing will be needed except the format
conversion.

src/audio/google/google_rtc_audio_processing.c

src/audio/mux/mux.c

src/audio/google/google_rtc_audio_processing.c

lgirdwood · 2023-12-05T16:58:34Z

@thesofproject/mediatek pls review.

src/audio/pipeline/pipeline-graph.c

src/audio/google/google_rtc_audio_processing.c

marcinszkudlinski · 2023-12-08T13:43:28Z

I understand this change is soley for IPC3 and legacy platforms.

To merge changes from 007 branch, where all IPC3 have been removed completely from google_rtc_audio_processing.c, we need to rename the current file to _ipc3 OR (maybe better) merge 007 changes to a new file google_rtc_audio_processing_dp.c

andyross · 2023-12-08T18:16:59Z

@marcinszkudlinski I'd like to keep this in one implementation, actually. Other platforms are otherwise trapped in some very old topic branches that will be a pain to maintain. (Also some of the state fixes here are turning out be be needed even with IPC4 I think). Working on merging the two forks right now, should have a new version up over the weekend. I'll submit to both branches, but this will be the one with clean commit messages.

andyross · 2023-12-11T05:17:57Z

Many updates and fork merges from mtl-007-drop-stable, including source/sink rework, s32_le support, DP scheduler support, more cleanup, and a handful of bugs fixed that were present in the earlier versions. Please re-review.

The code as it stands here works for me to get working audio through the full AEC stack on mt8195 and (with one workaround in the last patch) Intel MTL. I'll push a separate patch to the mtl-007-drop-stable branch containing these changes in squashed form.

Squashed fixups to this code from thesofproject#8571 Signed-off-by: Andy Ross <andyross@google.com>

src/audio/pipeline/pipeline-params.c

src/audio/google/google_rtc_audio_processing.c

src/audio/google/google_rtc_audio_processing_mock.c

src/audio/google/google_rtc_audio_processing.c

andyross · 2024-01-10T23:22:13Z

Another spin, now that almost all the split-out PRs have been merged. This is down to one patch now, albeit a rather large one. This will work out of the box to get an AEC build working on MTL. If you want to run it with mt8195 you also need #8691.

I'm pretty sure existing review comments have been addressed, the main ones outstanding are:

Static vs. dynamic allocation in the giant buffer: I feel really strongly that using the heap for this is dangerous. If we need it someday, it's easy to get back. But right now I'm maybe 60% certain that AEC can be made to fail just by doing a lot of pipeline start/stop operations that fragment the heap.
Floating point conversion routines. As mentioned upthread, there just isn't anything in SOF that matches the performance characteristics here. The code here is designed to optimize down to single saturation and conversion instructions in 8-10 instruction inner loops that include de/interleaving inline. The two existing tools (designed for high precision fixed point constant generation and for flat array conversion) just aren't a fit. And AEC is, after all, a performance case.

Anyway, please re-review. It would be good to get this in soon, some more work is starting to back up behind it. (And yes, there's some drama about merge and release strategy vis a vis #8722. But regardless of which merges first, we need to get to a go/no-go decision on this PR).

marc-hb · 2024-01-11T02:58:56Z

@lrudyX , @wszypelt 2 different errors in https://sof-ci.01.org/sof-pr-viewer/#/build/PR8571/build13389086

bad permissions on .ssh/config
all TGL tests failed with "connection refused"

None seem relevant here.

andyross · 2024-01-11T03:57:12Z

Split out one patch that can be fairly clearly explained in isolation.

andyross · 2024-01-11T03:59:09Z

And indeed, the checkpatch issue is a false positive, it doesn't like #define ALWAYS_INLINE inline __attribute__((always_inline)) because it wants us to use __always_inline, which is Linux kernel API that neither SOF nor Zephyr use. In fact Zephyr provides ALWAYS_INLINE for this, which is exactly why this line is here for xtos builds, so they can use the "most reasonably official" API.

wszypelt · 2024-01-11T10:02:49Z

@marc-hb TGL was repaired -> new results should be available within 40 minutes
bad permissions on .ssh/config, -> I need to discuss this with the process team

marcinszkudlinski

One general problem; missing

CONFIG_COMP_GOOGLE_RTC_AUDIO_PROCESSING=y
CONFIG_COMP_STUBS=y

in app/boards/intel_adsp_ace15_mtpm.conf

that means - IPC4 code is never compiled in CI checks

src/audio/google/google_rtc_audio_processing.c

lkoenig · 2024-01-16T08:05:16Z

Static vs. dynamic allocation in the giant buffer: I feel really strongly that using the heap for this is dangerous. If we need it someday, it's easy to get back. But right now I'm maybe 60% certain that AEC can be made to fail just by doing a lot of pipeline start/stop operations that fragment the heap.

I agree with you @andyross . Furthermore I do not see any short or medium term situation where one would want to run 2 AEC. If someone can come up with a valid use-case then maybe we should revisit the static allocation otherwise I do think it is safer.

andyross

Sorry for delays. My ~~lab~~ home has been without power for four days now, so I haven't had access to devices to test with. The power company's web site says they're down to the last 12k/160k of failures though, so hopefully today I'll be able to get back to defrost things. Will have a respin up as soon as possible.

Regarding the static allocation, I'll admit this has been a lot more controversial than I expected. Here's the full argument:

AEC wants an extremely large buffer in a contiguous block, the current one is 200k and Lionel wants more.
Experimentally, that corresponds to about half of the available heap memory on the first stream startup (I could get a ~430k block allocated, IIRC).
Dynamic heaps just aren't going to be reliable when faced with those numbers. I can basically guarantee that after enough overlapping stream operations (especially ones that cross use cases, like opening an HDMI output before closing your headset capture, etc...), you can get the heap into a state where capture won't start because AEC gets a heap failure.
Such an error is fatal, but preventable by just making sure the memory is there to begin with.

Longer term, yes, we will surely want to support more than one AEC instance. But longer term we have better options, like for example integrating the internal AEC allocators with the SOF/Zephyr heap, so it doesn't require the big block. Or alternatively in the MMU future where such a feature will probably be deployed, we'll be able to map a module's .bss virtually, which would be an objectively better choice than fine-grained heap allocation of large blocks.

src/audio/google/google_rtc_audio_processing.c

The pipeline code is complicated, and some trigger events (for example reset of a playback pipeline that was started before capture) cross pipelines from the playback/reference stream into the AEC component unexpectedly. Ignore/halt triggers we get from the reference pipeline. Also work around a bug with some stale code in IPC3 builds that detects the lack of an active reference source as an "error" incorrectly. Signed-off-by: Andy Ross <andyross@google.com>

Lots of work on AEC, with an eye to getting it working on main, where it had bitrotten, and pulling in various features (IPC3 pipeline state management and DP scheduling) that had merged in other branches. Dynamically configure stream formats (sample format/rate and channel count) from the connected streams at prepare() time instead of relying on build-time tuning. Port the code to use the source/sink API, in as sophisticated a manner as I can find. Copies unroll cleanly into just a few instructions per sample, including integer/float conversions and de-/interleaving. Support both 16 and 32 bit sample formats, with a fairly clever inlining scheme to share as much code as possible between them. The component will select "copy" function pointers at prepare() time. This works by using the _float32 variant of the AEC API (which is actually the core internal implementation) instead of the _int16 one which involves a conversion. The large buffers required by this component (input and output staging and an internally-managed pool/heap block) are now static symbols instead of dynamic memory from the heap. These are very large, taking up about half of what is available to the linker on MTL. Relying on heap allocation is just dangerous in this context. This fully decouples AEC from the playback stream. It will run without an active reference happily, feeding zeros to the processing, and pick up in stride when the pipeline starts. Note that this feature is unexercisable on IPC4, where the kernel automatically starts up connected pipelines. Fixes a few bugs and misfeatures also: + Chunk the copies by full buffer strides between AEC processing calls instead of testing at each copied frame. + Copy the reference and mic streams in tandem, preventing them from becoming out of sync if the devices weren't themselves synchronized. + Copy the AEC results to the output stream after the call to ProcessCapture() instead of before. This was a hidden latency bug in the original code, I think. + Cleans up the Kconfig to remove stale variables and guard all the component-specific tunables under the top-level component variable. Also uses a default instead of a select to couple to CONFIG_STUBS, allowing AEC to be manually tuned. Signed-off-by: Andy Ross <andyross@google.com>

Cleaner API. Also avoids the need to specify cache line padding at the end. Signed-off-by: Andy Ross <andyross@google.com>

This feature is already enabled in topology, enable it in the build to match. Signed-off-by: Andy Ross <andyross@google.com>

andyross · 2024-01-18T13:46:51Z

Rebase/respin. Should have addressed the existing review comments (other than the argument about static buffers, obviously)

lkoenig · 2024-02-01T10:33:48Z

Longer term, yes, we will surely want to support more than one AEC instance.

I've thought about that and I can not see a use case for running more than one AEC at the same time so I do not think this is something we should plan to support.

Hence I am arguing that having the AEC memory pre-allocated is good solution as it'll avoid memory fragmentation, it guarantees that one can use the AEC.

cujomalainey · 2024-02-02T18:37:32Z

Longer term, yes, we will surely want to support more than one AEC instance.

I've thought about that and I can not see a use case for running more than one AEC at the same time so I do not think this is something we should plan to support.

Hence I am arguing that having the AEC memory pre-allocated is good solution as it'll avoid memory fragmentation, it guarantees that one can use the AEC.

We should talk about this. I think there are cases we need to consider that might be important. I agree that in the general set of use cases this is true though.

andyross · 2024-03-07T17:33:21Z

This is hopelessly conflicted vs. current main. Closing this PR and opening a new one with re-split changes that can merge over main. That should have addressed existing review comments, but please check.

andyross requested review from cujomalainey and johnylin76 December 5, 2023 02:54

andyross requested review from ranj063, yaochunhung, kuanhsuncheng, a team, fkwasowi, lgirdwood, plbossart, mmaka1, lbetlej, dbaluta and kv2019i as code owners December 5, 2023 02:54

andyross requested a review from bzhg December 5, 2023 03:00

andyross commented Dec 5, 2023

View reviewed changes

src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved

lyakh reviewed Dec 5, 2023

View reviewed changes

src/audio/mux/mux.c Outdated Show resolved Hide resolved

src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved

lkoenig reviewed Dec 5, 2023

View reviewed changes

src/audio/google/google_rtc_audio_processing.c Show resolved Hide resolved

andyross commented Dec 5, 2023

View reviewed changes

src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved

johnylin76 reviewed Dec 5, 2023

View reviewed changes

src/audio/pipeline/pipeline-graph.c Outdated Show resolved Hide resolved

btian1 reviewed Dec 8, 2023

View reviewed changes

src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved

andyross force-pushed the mt8195-aec branch from 1ffe199 to cf1ab40 Compare December 11, 2023 05:13

andyross added a commit to andyross/sof that referenced this pull request Dec 11, 2023

google_rtc_audio_processing: Fixes for MTL branch

259edac

Squashed fixups to this code from thesofproject#8571 Signed-off-by: Andy Ross <andyross@google.com>

andyross mentioned this pull request Dec 11, 2023

[mtl-007] AEC fixes from main branch submission #8603

Closed

lyakh reviewed Dec 11, 2023

View reviewed changes

lkoenig reviewed Dec 11, 2023

View reviewed changes

andyross mentioned this pull request Dec 12, 2023

Update GOOGLE_RTC_AUDIO_PROCESSING for Zephyr #7202

Closed

andyross mentioned this pull request Jan 10, 2024

Google aec sink src dp #8722

Merged

andyross force-pushed the mt8195-aec branch from 9df2893 to bd12891 Compare January 10, 2024 23:22

andyross requested review from lgirdwood, marcinszkudlinski and cujomalainey January 10, 2024 23:22

andyross force-pushed the mt8195-aec branch from bd12891 to 6978c7c Compare January 10, 2024 23:47

andyross force-pushed the mt8195-aec branch from 6978c7c to 0c88ae0 Compare January 11, 2024 03:55

andyross changed the title ~~Google AEC updates with mt8195 integration~~ Google AEC rework: unbreak/resurrect, mt8195, DP integration Jan 11, 2024

marc-hb mentioned this pull request Jan 11, 2024

Firmware recovery thesofproject/linux#1675

Open

marcinszkudlinski reviewed Jan 12, 2024

View reviewed changes

johnylin76 reviewed Jan 15, 2024

View reviewed changes

src/audio/google/google_rtc_audio_processing.c Show resolved Hide resolved

src/audio/google/google_rtc_audio_processing.c Show resolved Hide resolved

andyross commented Jan 17, 2024

View reviewed changes

andyross added 4 commits January 18, 2024 04:47

google_rtc_audio_processing: Specify main buffer size in kb

a9fc0c3

Cleaner API. Also avoids the need to specify cache line padding at the end. Signed-off-by: Andy Ross <andyross@google.com>

boards/intel_adsp_ace15_mtpm: Enable Google AEC

584bfcc

This feature is already enabled in topology, enable it in the build to match. Signed-off-by: Andy Ross <andyross@google.com>

andyross force-pushed the mt8195-aec branch from 0c88ae0 to 584bfcc Compare January 18, 2024 13:46

marc-hb mentioned this pull request Jan 19, 2024

[BUG] FW reported error: 104 - Other failure of module instance initialization request thesofproject/linux#4832

Closed

andyross mentioned this pull request Mar 7, 2024

Google AEC rework: dynamic formats, legacy platform support #8919

Merged

andyross closed this Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google AEC rework: unbreak/resurrect, mt8195, DP integration #8571

Google AEC rework: unbreak/resurrect, mt8195, DP integration #8571

andyross commented Dec 5, 2023 •

edited

Loading

lgirdwood commented Dec 5, 2023

marcinszkudlinski commented Dec 8, 2023

andyross commented Dec 8, 2023

andyross commented Dec 11, 2023

andyross commented Jan 10, 2024

marc-hb commented Jan 11, 2024

andyross commented Jan 11, 2024

andyross commented Jan 11, 2024

wszypelt commented Jan 11, 2024

marcinszkudlinski left a comment

lkoenig commented Jan 16, 2024

andyross left a comment

andyross commented Jan 18, 2024

lkoenig commented Feb 1, 2024

cujomalainey commented Feb 2, 2024

andyross commented Mar 7, 2024

Google AEC rework: unbreak/resurrect, mt8195, DP integration #8571

Google AEC rework: unbreak/resurrect, mt8195, DP integration #8571

Conversation

andyross commented Dec 5, 2023 • edited Loading

lgirdwood commented Dec 5, 2023

marcinszkudlinski commented Dec 8, 2023

andyross commented Dec 8, 2023

andyross commented Dec 11, 2023

andyross commented Jan 10, 2024

marc-hb commented Jan 11, 2024

andyross commented Jan 11, 2024

andyross commented Jan 11, 2024

wszypelt commented Jan 11, 2024

marcinszkudlinski left a comment

Choose a reason for hiding this comment

lkoenig commented Jan 16, 2024

andyross left a comment

Choose a reason for hiding this comment

andyross commented Jan 18, 2024

lkoenig commented Feb 1, 2024

cujomalainey commented Feb 2, 2024

andyross commented Mar 7, 2024

andyross commented Dec 5, 2023 •

edited

Loading