-
-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nv2a: Implement BLEND_AND image blit operation. #570
Conversation
hw/xbox/nv2a/pgraph.c
Outdated
for (int x = 0; x < bytes_per_row; ++x, ++source_row, ++dest_row) { | ||
uint32_t a = *source_row * beta_mult; | ||
uint32_t b = *dest_row * inv_beta_mult; | ||
*dest_row = (a + b) / max_beta_mult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you have a test case for image blitting, right? Does your test confirm this behavior? I'm not fully confident about this blending logic and would like to double check if you have it handy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro level behavior does match the hardware (i.e., this is a simple blend with the beta as the source factor). See response to your comment below wrt. the capping, which would change the math if more than 8 bits are utilized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this does simple blending, but what I'm not sure about is, for instance:
- whether or not this must take into account source and/or destination alpha component
- whether or not negative values are actually supported, contrary to previous assumptions
Can you please test this and find out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial findings:
- As far as I can see, the hardware throws an exception when trying to configure the surface context with an alpha-enabled color format (
NV062_SET_COLOR_FORMAT_LE_A8R8G8B8
) when using operationBLEND_AND
. I've tried both blitting directly to channel 11 (framebuffer) and into an interim memory buffer, both fail with a "format exception error" as reported by the pbkit handler. Interestingly/annoyingly the exception is only raised when theNV09F_SIZE
parameter is pushed, but this matches the comment in xemu's pgraph.c - I'm guessing the failure is only detected once the operation is actually kicked off. SRCCOPY
seems to ignore the source alpha but works in both XRGB and ARGB surface modes as well as when bouncing through the compositing buffer I tried for theBLEND_AND
testing.- Using a negative beta (high bit set) does result in a fully transparent image on HW.
I still need to do a bit more work to verify whether the HW actually supports more than 8 bits, will try to get back around to that later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a bit more time, it looks like anything past 8 bits is indeed ignored. Results from hardware are at https://github.com/abaire/nxdk_pgraph_tests_golden_results
Key tests:
- beta
0x007FFFFF
- all pixels are the same color (the clear color). - beta
0x00800000
- most of the boxes from the test image (note, only a subimage from this image is used, slightly larger than the green grid) have slightly different color from the background. 0x44400000
and0x444FFFFF
have 0 pixel differences.
The differences can be confirmed w/ Gimp or imagemagick's compare
utility (or presumably any image diff that does pixel comparison and not perceptual).
E.g.: compare -verbose -metric AE ImageBlt_BLENDAND_XRGB_B44400000.png ImageBlt_BLENDAND_XRGB_B444FFFFF.png difference.png
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a bunch more testing while looking into #578.
- It looks like
X8R8G8B8
always sets alpha to0xFF
on hardware. This is true forBLEND_AND
andSRCCOPY
operations. This means thatSRCCOPY
is slightly incorrect (I'll just roll that fix into this PR unless you feel strongly about pulling it out). - I still have not been able to get a
BLEND_AND
with anARGB
surface context working, so I'll skip the blend on the alpha channel since it'll be set to 0xFF (or 0x00 forZRGB
, will do that in a followup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. I patch up SRCCOPY
after the memmove
but added a separate loop for the BLEND_AND
to avoid useless divides. Let me know if you'd prefer to just have one patch-after path instead. Given that these ops don't seem to be used often I guess it's arguably premature optimization.
// The parameter is a signed fixed-point number with a sign bit and | ||
// 31 fractional bits. Note that negative values are clamped to 0, | ||
// and only 8 fractional bits are actually implemented in hardware. | ||
beta->beta = parameter & 0x7f800000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your test confirm these claims? I would like to double check that:
- Indeed only 8 fractional bits are available
- Negatives are indeed clamped
- Sign bit is actually a sign bit or a signed integer component (+/- or 0,-1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually capped the values in the test code as well as I assumed the envytools doc was correct. I'll adjust and re-test (once I'm back with my hardware) and update if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also as some evidence that the doc may be correct, Dr. Muto uses 0x7d400000
as beta.
nv2a: pgraph method (4): 0x12 -> 0x0300 (0x7d400000)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x7d400000
suggests at least 9 fractional bits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, good call. I'm going to WIP this until I can update the tests on HW.
hw/xbox/nv2a/pgraph.c
Outdated
// not actually dirty and miss regions that are. | ||
memory_region_set_client_dirty(d->vram, | ||
(dest - d->vram_ptr), | ||
image_blit->height |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this moved, please update these calls to reflect requested change from your other PR, w.r.t. size of dirtied region and comment, when you rebase this second patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, will do as soon as the first PR is merged.
d5a50f9
to
2ca24c2
Compare
246cee7
to
6443b2a
Compare
6443b2a
to
1ba5cfb
Compare
1ba5cfb
to
fc00fe8
Compare
fc00fe8
to
e755d93
Compare
Thanks! Merged with #624 |
CPU-based implementation of BLEND_AND hardware op.
Fixes #324 for opcode 0x02 (which appears to be the case for all known failures)