Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nv2a: Implement BLEND_AND image blit operation. #570

Closed
wants to merge 2 commits into from

Conversation

abaire
Copy link
Contributor

@abaire abaire commented Nov 24, 2021

CPU-based implementation of BLEND_AND hardware op.

Fixes #324 for opcode 0x02 (which appears to be the case for all known failures)

for (int x = 0; x < bytes_per_row; ++x, ++source_row, ++dest_row) {
uint32_t a = *source_row * beta_mult;
uint32_t b = *dest_row * inv_beta_mult;
*dest_row = (a + b) / max_beta_mult;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you have a test case for image blitting, right? Does your test confirm this behavior? I'm not fully confident about this blending logic and would like to double check if you have it handy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro level behavior does match the hardware (i.e., this is a simple blend with the beta as the source factor). See response to your comment below wrt. the capping, which would change the math if more than 8 bits are utilized.

Copy link
Member

@mborgerson mborgerson Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this does simple blending, but what I'm not sure about is, for instance:

  • whether or not this must take into account source and/or destination alpha component
  • whether or not negative values are actually supported, contrary to previous assumptions

Can you please test this and find out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial findings:

  • As far as I can see, the hardware throws an exception when trying to configure the surface context with an alpha-enabled color format (NV062_SET_COLOR_FORMAT_LE_A8R8G8B8) when using operation BLEND_AND. I've tried both blitting directly to channel 11 (framebuffer) and into an interim memory buffer, both fail with a "format exception error" as reported by the pbkit handler. Interestingly/annoyingly the exception is only raised when the NV09F_SIZE parameter is pushed, but this matches the comment in xemu's pgraph.c - I'm guessing the failure is only detected once the operation is actually kicked off.
  • SRCCOPY seems to ignore the source alpha but works in both XRGB and ARGB surface modes as well as when bouncing through the compositing buffer I tried for the BLEND_AND testing.
  • Using a negative beta (high bit set) does result in a fully transparent image on HW.

I still need to do a bit more work to verify whether the HW actually supports more than 8 bits, will try to get back around to that later today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a bit more time, it looks like anything past 8 bits is indeed ignored. Results from hardware are at https://github.com/abaire/nxdk_pgraph_tests_golden_results

Key tests:

  • beta 0x007FFFFF - all pixels are the same color (the clear color).
  • beta 0x00800000 - most of the boxes from the test image (note, only a subimage from this image is used, slightly larger than the green grid) have slightly different color from the background.
  • 0x44400000 and 0x444FFFFF have 0 pixel differences.

The differences can be confirmed w/ Gimp or imagemagick's compare utility (or presumably any image diff that does pixel comparison and not perceptual).

E.g.: compare -verbose -metric AE ImageBlt_BLENDAND_XRGB_B44400000.png ImageBlt_BLENDAND_XRGB_B444FFFFF.png difference.png

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a bunch more testing while looking into #578.

  • It looks like X8R8G8B8 always sets alpha to 0xFF on hardware. This is true for BLEND_AND and SRCCOPY operations. This means that SRCCOPY is slightly incorrect (I'll just roll that fix into this PR unless you feel strongly about pulling it out).
  • I still have not been able to get a BLEND_AND with an ARGB surface context working, so I'll skip the blend on the alpha channel since it'll be set to 0xFF (or 0x00 for ZRGB, will do that in a followup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I patch up SRCCOPY after the memmove but added a separate loop for the BLEND_AND to avoid useless divides. Let me know if you'd prefer to just have one patch-after path instead. Given that these ops don't seem to be used often I guess it's arguably premature optimization.

hw/xbox/nv2a/pgraph.c Outdated Show resolved Hide resolved
hw/xbox/nv2a/pgraph.c Outdated Show resolved Hide resolved
hw/xbox/nv2a/pgraph.c Outdated Show resolved Hide resolved
// The parameter is a signed fixed-point number with a sign bit and
// 31 fractional bits. Note that negative values are clamped to 0,
// and only 8 fractional bits are actually implemented in hardware.
beta->beta = parameter & 0x7f800000;
Copy link
Member

@mborgerson mborgerson Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your test confirm these claims? I would like to double check that:

  • Indeed only 8 fractional bits are available
  • Negatives are indeed clamped
  • Sign bit is actually a sign bit or a signed integer component (+/- or 0,-1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually capped the values in the test code as well as I assumed the envytools doc was correct. I'll adjust and re-test (once I'm back with my hardware) and update if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also as some evidence that the doc may be correct, Dr. Muto uses 0x7d400000 as beta.

nv2a: pgraph method (4): 0x12 -> 0x0300 (0x7d400000)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x7d400000 suggests at least 9 fractional bits

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, good call. I'm going to WIP this until I can update the tests on HW.

hw/xbox/nv2a/pgraph.c Outdated Show resolved Hide resolved
hw/xbox/nv2a/pgraph.c Outdated Show resolved Hide resolved
// not actually dirty and miss regions that are.
memory_region_set_client_dirty(d->vram,
(dest - d->vram_ptr),
image_blit->height
Copy link
Member

@mborgerson mborgerson Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this moved, please update these calls to reflect requested change from your other PR, w.r.t. size of dirtied region and comment, when you rebase this second patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, will do as soon as the first PR is merged.

@abaire abaire marked this pull request as draft November 25, 2021 01:45
@abaire abaire marked this pull request as ready for review November 29, 2021 22:46
@abaire abaire force-pushed the support_blendand_blit branch 2 times, most recently from 246cee7 to 6443b2a Compare November 30, 2021 14:02
@abaire abaire marked this pull request as draft December 2, 2021 03:07
@abaire abaire marked this pull request as ready for review December 2, 2021 03:33
@mborgerson
Copy link
Member

Thanks!

Merged with #624

@mborgerson mborgerson closed this Jan 8, 2022
@abaire abaire deleted the support_blendand_blit branch January 8, 2022 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unhandled NV_IMAGE_BLIT methods
2 participants