Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible speedups #53

Closed
wants to merge 23 commits into from
Closed

Possible speedups #53

wants to merge 23 commits into from

Conversation

JoeMatt
Copy link
Collaborator

@JoeMatt JoeMatt commented Sep 14, 2021

I'm making a PR for some old optimizations I made for an iOS fork.

This PR is more for the devs to test and cherry-pick, I don't expect this code to be "merge quality", but certainly buildable/testable at least.

For instance, probably don't want these hacks that were specific to iOS loading,
811e73f

The blitter really is the slowest part. I wanted to write something with SIMD or better vectorization or improved memcpy or something. That was by far the slowest part when I profiled this pretty heavily a while ago (at least it was excruciatingly slow on ARM without my C struct hacks to help the compiler optimize better.


Update:
Forgot to mention I have these macros defined in my build (in XCode) that need to be added to a header or C compile flags

C

#define INLINE=inline 
#define USE_STRUCTS=1

flags

-DINLINE=inline -DUSE_STRUCTS=1 -D__GCCUNIX__  

Inline macro may need to be customized based on arch. Mac OS is using Clang/C99/gnu++11 syntax for inline.

I also have unroll loops aka -qunroll=no but i forget why.

@lgtm-com
Copy link

lgtm-com bot commented Sep 14, 2021

This pull request fixes 3 alerts when merging 1e881fa into 2069160 - view on LGTM.com

fixed alerts:

  • 2 for Comparison result is always the same
  • 1 for Wrong type of arguments to formatting function

@inactive123
Copy link
Contributor

Both Wolfenstein 3D and Doom don't start up for me with this PR on Windows 10 (64bit build).

@inactive123
Copy link
Contributor

inactive123 commented Sep 19, 2021

I don't understand why audio_batch_cb is commented out too. This means there will be no sound at all in the libretro core.

if this is something you ported from Provenance and did not test in RetroArch with the libretro core, that isn't going to do us much good. We'd need a backport of the changes you made then to the core and to ensure things still work.

@inactive123
Copy link
Contributor

That being said, there are some good ideas in here.

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Sep 19, 2021

That being said, there are some good ideas in here.

Thanks!

This was being used in Provenance and we have our own video and audio pipelines.

This PR was never intended for mass consumption but rather for me to get it to work well in Prov and back port to upstream.

But as you know, there really isn't a true upstream (my original work wasn't from retroarch's fork but the original .Zip source).

So any things that don't make sense are due to that history. I later back ported to your fork but a lot of hacks and such for my specific purposes still exist, which is why i wanted to state that in the original description.

I felt bad keeping all these changes to myself after seeing other wanting this core to make progress so as much as I myself hate when people do a dump of code "here's my amazing but difficult to read changes, bye", at the same time, I have very little time to focus on this as much as I would love to so I figured it was better to submit in the present state and hopefully someone else more talented and with the time can take some value from this work.

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Sep 19, 2021

Both Wolfenstein 3D and Doom don't start up for me with this PR on Windows 10 (64bit build).

I'm shocked this builds on Windows and you got it to load at least!

Some ROMs work better with internal or provided BIOSs and other flags in my experience, you may already probably know that if you're familiar with the issues with this core already.

I think my hacks on the blitter broke the ID Roms though because they were coded better than most and used some hardware hacks for performance so I'm not suprised those are broken. I bet a little debugging and I or someone else could figure out what's wrong there.

This core could use some unit tests to figure out what about the cpu emu is different with my changes but that's probably a stretch at this point.

@@ -2031,7 +2061,7 @@ static void DSP_jr(void)
}//*/
dsp_pc += 2; // For DSP_DIS_* accuracy
DSPOpcode[pipeline[plPtrExec].opcode]();
dsp_opcode_use[pipeline[plPtrExec].opcode]++;
// dsp_opcode_use[pipeline[plPtrExec].opcode]++;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a cause of breaking @twinaphex
i'll re-add and repush

@inactive123
Copy link
Contributor

Alright, thanks.

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Sep 19, 2021

Alright, thanks.

also i'm using pre-processor macros in XCode,
-DINLINE=inline -DUSE_STRUCTS=1

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Sep 19, 2021

I don't understand why audio_batch_cb is commented out too. This means there will be no sound at all in the libretro core.

if this is something you ported from Provenance and did not test in RetroArch with the libretro core, that isn't going to do us much good. We'd need a backport of the changes you made then to the core and to ensure things still work.

Pushed update to un-delete this.

The bit shifting and masking is expensive on ARM64 for some reason.
The unions seem to greatly reduce the perfomance hit of these common calls.
The values for offset0 and offset1 were coming out to 63 when they should be no more than 3. I think the devide should have beena modulus? I wrote out the code with more vars to figure ouit what was going on
looking at the offical shamusworld repo the division was moved after the 0xFF checks
GPU_RUNNING running macro was pretty slow on ARM for some reason. Bitswise structs are faster in my testing
Signed-off-by: Joe Mattiello <mail@joemattiello.com>
Signed-off-by: Joe Mattiello <mail@joemattiello.com>
Signed-off-by: Joe Mattiello <mail@joemattiello.com>
@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Sep 19, 2021

Re-based and force pushed off of upstream b44e2e2

@lgtm-com
Copy link

lgtm-com bot commented Sep 19, 2021

This pull request fixes 2 alerts when merging ada5c15 into b44e2e2 - view on LGTM.com

fixed alerts:

  • 2 for Comparison result is always the same

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Sep 19, 2021

Both Wolfenstein 3D and Doom don't start up for me with this PR on Windows 10 (64bit build).

FYI, libretro wiki says Wolf 3D already doesn't boot,
https://docs.libretro.com/library/compatibility/jaguar/

Doom needs a hack which I don't have a menu UI for in Prov, not sure if you tested with that in retroarch.

Another issue is that the wiki for retroarch doesn't mention Jaguar BIOS, but it's been known since the old forums on this core's original author that some games only run or run better with the emulation bios or the Atari official bios.

The wiki for Jaguar makes no mention of BIOS so that should probably be amended and the core option for the bios be mentioned.

"virtualjaguar_bios",

https://atariage.com/forums/topic/76369-jaguar-bios/

@JoeMatt
Copy link
Collaborator Author

JoeMatt commented Oct 8, 2021

Closing for #55 with rebase

@JoeMatt JoeMatt closed this Oct 8, 2021
JoeMatt added a commit to Provenance-Emu/virtualjaguar-libretro that referenced this pull request Oct 20, 2021
JoeMatt added a commit to Provenance-Emu/virtualjaguar-libretro that referenced this pull request Nov 4, 2021
JoeMatt added a commit to Provenance-Emu/virtualjaguar-libretro that referenced this pull request Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants