nv2a: Implement BLEND_AND image blit operation. #570

abaire · 2021-11-24T16:11:44Z

CPU-based implementation of BLEND_AND hardware op.

Fixes #324 for opcode 0x02 (which appears to be the case for all known failures)

mborgerson · 2021-11-24T22:43:25Z

hw/xbox/nv2a/pgraph.c

+            for (int x = 0; x < bytes_per_row; ++x, ++source_row, ++dest_row) {
+                uint32_t a = *source_row * beta_mult;
+                uint32_t b = *dest_row * inv_beta_mult;
+                *dest_row = (a + b) / max_beta_mult;


I believe you have a test case for image blitting, right? Does your test confirm this behavior? I'm not fully confident about this blending logic and would like to double check if you have it handy

The macro level behavior does match the hardware (i.e., this is a simple blend with the beta as the source factor). See response to your comment below wrt. the capping, which would change the math if more than 8 bits are utilized.

I agree this does simple blending, but what I'm not sure about is, for instance:

whether or not this must take into account source and/or destination alpha component

whether or not negative values are actually supported, contrary to previous assumptions

Can you please test this and find out

Some initial findings:

As far as I can see, the hardware throws an exception when trying to configure the surface context with an alpha-enabled color format (NV062_SET_COLOR_FORMAT_LE_A8R8G8B8) when using operation BLEND_AND. I've tried both blitting directly to channel 11 (framebuffer) and into an interim memory buffer, both fail with a "format exception error" as reported by the pbkit handler. Interestingly/annoyingly the exception is only raised when the NV09F_SIZE parameter is pushed, but this matches the comment in xemu's pgraph.c - I'm guessing the failure is only detected once the operation is actually kicked off.

SRCCOPY seems to ignore the source alpha but works in both XRGB and ARGB surface modes as well as when bouncing through the compositing buffer I tried for the BLEND_AND testing.

Using a negative beta (high bit set) does result in a fully transparent image on HW.

I still need to do a bit more work to verify whether the HW actually supports more than 8 bits, will try to get back around to that later today.

Found a bit more time, it looks like anything past 8 bits is indeed ignored. Results from hardware are at https://github.com/abaire/nxdk_pgraph_tests_golden_results

Key tests:

beta 0x007FFFFF - all pixels are the same color (the clear color).

beta 0x00800000 - most of the boxes from the test image (note, only a subimage from this image is used, slightly larger than the green grid) have slightly different color from the background.

0x44400000 and 0x444FFFFF have 0 pixel differences.

The differences can be confirmed w/ Gimp or imagemagick's compare utility (or presumably any image diff that does pixel comparison and not perceptual).

E.g.: compare -verbose -metric AE ImageBlt_BLENDAND_XRGB_B44400000.png ImageBlt_BLENDAND_XRGB_B444FFFFF.png difference.png

Did a bunch more testing while looking into #578.

It looks like X8R8G8B8 always sets alpha to 0xFF on hardware. This is true for BLEND_AND and SRCCOPY operations. This means that SRCCOPY is slightly incorrect (I'll just roll that fix into this PR unless you feel strongly about pulling it out).

I still have not been able to get a BLEND_AND with an ARGB surface context working, so I'll skip the blend on the alpha channel since it'll be set to 0xFF (or 0x00 for ZRGB, will do that in a followup).

Updated. I patch up SRCCOPY after the memmove but added a separate loop for the BLEND_AND to avoid useless divides. Let me know if you'd prefer to just have one patch-after path instead. Given that these ops don't seem to be used often I guess it's arguably premature optimization.

hw/xbox/nv2a/pgraph.c

mborgerson · 2021-11-24T23:02:42Z

hw/xbox/nv2a/pgraph.c

+            // The parameter is a signed fixed-point number with a sign bit and
+            // 31 fractional bits. Note that negative values are clamped to 0,
+            // and only 8 fractional bits are actually implemented in hardware.
+            beta->beta = parameter & 0x7f800000;


Does your test confirm these claims? I would like to double check that:

Indeed only 8 fractional bits are available

Negatives are indeed clamped

Sign bit is actually a sign bit or a signed integer component (+/- or 0,-1)

I actually capped the values in the test code as well as I assumed the envytools doc was correct. I'll adjust and re-test (once I'm back with my hardware) and update if necessary.

Also as some evidence that the doc may be correct, Dr. Muto uses 0x7d400000 as beta.

nv2a: pgraph method (4): 0x12 -> 0x0300 (0x7d400000)

0x7d400000 suggests at least 9 fractional bits

Ha, good call. I'm going to WIP this until I can update the tests on HW.

hw/xbox/nv2a/pgraph.c

mborgerson · 2021-11-24T23:39:08Z

hw/xbox/nv2a/pgraph.c

+    // not actually dirty and miss regions that are.
+    memory_region_set_client_dirty(d->vram,
+                                   (dest - d->vram_ptr),
+                                   image_blit->height


Because this moved, please update these calls to reflect requested change from your other PR, w.r.t. size of dirtied region and comment, when you rebase this second patch

Yup, will do as soon as the first PR is merged.

mborgerson · 2022-01-08T09:47:04Z

Thanks!

Merged with #624

mborgerson reviewed Nov 24, 2021

View reviewed changes

abaire force-pushed the support_blendand_blit branch from d5a50f9 to 2ca24c2 Compare November 25, 2021 00:01

abaire mentioned this pull request Nov 25, 2021

Verify clamping behavior for beta factor in BLEND_AND abaire/nxdk_pgraph_tests#2

Closed

abaire marked this pull request as draft November 25, 2021 01:45

abaire marked this pull request as ready for review November 29, 2021 22:46

abaire force-pushed the support_blendand_blit branch 2 times, most recently from 246cee7 to 6443b2a Compare November 30, 2021 14:02

abaire marked this pull request as draft December 2, 2021 03:07

abaire force-pushed the support_blendand_blit branch from 6443b2a to 1ba5cfb Compare December 2, 2021 03:31

abaire marked this pull request as ready for review December 2, 2021 03:33

nv2a: Download destination surface for partial SRCCOPY blits

e4e91ce

abaire force-pushed the support_blendand_blit branch from 1ba5cfb to fc00fe8 Compare January 8, 2022 02:35

nv2a: Implement BLEND_AND image blit operation

e755d93

abaire force-pushed the support_blendand_blit branch from fc00fe8 to e755d93 Compare January 8, 2022 02:42

mborgerson closed this Jan 8, 2022

abaire deleted the support_blendand_blit branch January 8, 2022 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nv2a: Implement BLEND_AND image blit operation. #570

nv2a: Implement BLEND_AND image blit operation. #570

abaire commented Nov 24, 2021

mborgerson Nov 24, 2021

abaire Nov 24, 2021

mborgerson Nov 25, 2021 •

edited

Loading

abaire Nov 29, 2021

abaire Nov 29, 2021

abaire Dec 2, 2021

abaire Dec 2, 2021

mborgerson Nov 24, 2021 •

edited

Loading

abaire Nov 24, 2021

abaire Nov 25, 2021

mborgerson Nov 25, 2021

abaire Nov 25, 2021

mborgerson Nov 24, 2021 •

edited

Loading

abaire Nov 24, 2021

mborgerson commented Jan 8, 2022

nv2a: Implement BLEND_AND image blit operation. #570

nv2a: Implement BLEND_AND image blit operation. #570

Conversation

abaire commented Nov 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mborgerson Nov 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mborgerson Nov 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mborgerson Nov 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mborgerson commented Jan 8, 2022

mborgerson Nov 25, 2021 •

edited

Loading

mborgerson Nov 24, 2021 •

edited

Loading

mborgerson Nov 24, 2021 •

edited

Loading