Status update from Elad
Added 2024-03-29 16:40:11 +0000 UTCHi,
It's EladAsh with an update on the ongoing CELL emulation improvements in the last two months and future plans.
SPU Cache And Code Compilation Performance Enhancements:
1. New techniques have been implemented to significantly reduce bogus code compilation by analyzing SPU instruction branch validity requirements. (a theory proven by real world code observation and testing)
The theory was: although all branches are technically valid, some result in wraparound the SPU memory bounds.
And, although relative branches wrapping around SPU memory bounds is the only way to get to code in far regions, this technique is hacky and would not be compatible if Sony ever decided to increase SPU memory size.
So, in order to do it properly with future-proofing against possible memory size extension (that never actually came), SPU code must use indirect branches, adding or subtracting great offsets manually from the current address in code and jumping to exact address.
2. SPU LLVM block visibility and usability by SPU threads has been made possible on an early stage of SPU block compilation by postponing cache writes to disk, greatly reducing SPU starvation of code and reducing game stutter during in-game compilation.
3. If you are using a CPU with fewer than 12 threads, RPCS3 had always used a single thread to compile SPU code in-game, this is because CPU time is consumed by other emulation threads and not much is left for SPU compilation.
So, I added another thread with a trick, hooking SPU event sleeping to be extended for a bit more while SPU blocks are being compiled. Thus allowing CPU time to be available for all threads at this critical when compiling SPU blocks.
You can read more about these improvements here:
https://github.com/RPCS3/rpcs3/pull/15282
https://github.com/RPCS3/rpcs3/pull/15285
https://github.com/RPCS3/rpcs3/pull/15284
Game Loading performance boost:
For a while now, game boot performance with PPU LLVM has sluggish due to function initialization step.
LLVM API, the host application interface of our PPU/SPU decoders, is noturisly slow to resolve symbol references externally.
I have tried to tackle this issue from multiple angles, from reducing the amount of PPU functions to avoiding writing some which may not ever be referenced.
But ultimately, I decided to opt for internal symbol resolving, which was faster in theory.
So, I have integrated an LLVM-based function in every PPU LLVM executable module a function that references all relevant PPU function symbols and write them down onto the PPU jump-table.
This indeed made symbol resolving an instantaneous stage by now!
The pull request: https://github.com/RPCS3/rpcs3/pull/15333
SaveStates are no longer experimental!
Through an extensive series of bug-fixes, I am glad to announce that savestates are now compatible with nearly every game at consistent compatibility! (you can create a savestate a thousand times in sequence and they still won't break)
The initial bug-fixes have been nothing significant, fixing remaining issues with newly introduced savestate compression.
The next bugs have been more interesting:
1. Improving and importing some SPU-Compatible Savestates Mode functionality to main code.
Now, what was changed and why is there an SPU-Compatible mode?
Let's recap: In the SPU LLVM decoder, there exist two optimizations that affect how the SPU register state is perceived externally..
The first, is store-elimination by successor chunks, what this optimization does is detect if a register store is "not needed" by examining further registers stores in further SPU chunks.
For example, let's say that is a register store is made with value X, but I know that in fact, is going to be replaced with 100% probability with another store of value Y in following code.
If this assumption is true, the older register store can be safely eliminated because following the SPU block operation, the register is going to have value Y anyway.
All the above was assumed that the SPU block's execution is not be stopped between store X and store Y.
When savestates are involved, it gets tricky, because savestates require the current SPU register values to be precise as exact to their true values before saving.
The old solution was, simply wait the SPU block execution is complete before saving.
Now, this solution breaks, when external-condition based loops are involved.
What are external-condition based loops? Let's simplify how most compilers define loops:
There are constant-condition based loops, argument-based and external-state based.
Constant-condition based means, that the compiler knows exactly how many times a loop is going to execute, very predictable, and known to quit the loop.
Argument-based loop means, that the amount of times a loop is executed through is defined by a value (or the result of some operation) that was passed to the function or evaluated on earlier code, not very predictable, but in normal code it should quit the loop after some time.
External-state condition loops is where things get complicated, let's say a function waits for memory at address X to become some value,
It can happen now, in a minute or even after 3 months, but as long as it does not happen the loop never quits.
External-state condition loops are usually the loop type in the function main(), waiting indefinitely for the user to signal the program to exit.
With the above loop type used, there is no guarantee that the code will reach the point that the store of Y occurs.
Thus, when saving with this optimization enabled, you may see a "failed to lock SPUs threads" message when saving and the savestate require is ignored.
So, I implemented some testers which test the safety of the loop by examining possible external-condition detection such as usage of SPU channels and read of availability of them.
SPU channels are meant for, for the most part, for communication with the PPUs and thus are likely involved in external-conditioned loops.
Because breaking the optimization with the read of one SPU channel is known reduce the performance of Red Dead Redemption, with the specific channel I made this exclusive for SPU-Compatible Savestate Mode.
But, in other common cases such as the use of RawSPU channels, I partially integrated a safe return for savestates with SPU register state intact into the main code.
In the case of RawSPU channels, because of their tendency to be involved in long loops performance has nearly not been compromised.
The second bug-fix is to fix the detection of when it is possible to postpone register stores (is an optimization), to a point not beyond a possible savestate return from the SPU block.
Pull requests:
https://github.com/RPCS3/rpcs3/pull/15364
https://github.com/RPCS3/rpcs3/pull/15356
Many more, bugfixes and improvements have been made to savestates, such as:
* Fix PRX function write-out, fixing PPU performance when using savestates.
* Saving of PPU threads running and sleep queue order, so the thread that were just running before saving, are also the once to run when loading the savestate.
The need for this is technically going against all programming practices known for mankind, but some games coded for PS3, are coded a bit different.
Because, in normal code, if you have let's say 12 threads on your CPU or 6 threads, programs are not meant to assume which threads are currently running on the hardware CPU threads.
* Making the SPU STOP instruction a safe return point for savestate for SPU Mega block mode.
* Waiting longer for SPUs to lock in the state.
* An early detection of SPU state which is likely not to be able to quit the SPU loop.
Prompting the user to use SPU-Compatible Savestates mode without making the game even stutter by not pausing SPUs at all.
* Saving pending RSX flips and perform them on savestate load.
* A progress dialog for savestates that indicates the progress of saving during save time.
* Protection against making savestate during game-saving operation.
* Compression speed has been improved.
* Fix of a crash on savestate load if saved during early stages of draw call. (thanks to kd-11)
Future plans:
I am planning to return to SPU Analyzer extensions that I started a year ago initially for Work-In-Progress SPU "PUTLLC16" optimization that would enable SPU optimizations far more advanced than what was possible until now.
Integrating advanced compiler-grade code analysis and pattern detection in an SPU friendly manner.
Comments
insane work buddy
Lenny Lelennski
2024-12-30 07:57:50 +0000 UTCI will always support emulating older games or games out of print. Sure, if the games gets a remastered or HD re-release I'll buy it, but most PS3 games are pretty much stuck in limbo.
Vulgar Tongue Official
2024-07-26 01:56:30 +0000 UTCI thank you and the rest of the team for your efforts on RPCS3, I hope my small contribution can help in freeing some great games from the clutches of being PS3 exclusive and through emulation be lifted to heights simply not possible on the original hardware.
Dimitri Remkes
2024-06-11 23:50:20 +0000 UTCThank-you! Great to see progress.
Alexander (Sasha) Wait Zaranek
2024-04-03 02:22:36 +0000 UTCThank you so much for your hard work!
Andreas
2024-03-31 16:09:43 +0000 UTCthis is insanely cool, huge respect to you with how far you've been able to take this in just the last two months. this is the kind of shit that the term "code wizardry" was coined for
Dom Portera
2024-03-30 15:34:23 +0000 UTCAmazing work!
DSonk42145
2024-03-29 23:36:50 +0000 UTCTHANK YOU ♥ ♥ ♥
Joey Keilholz
2024-03-29 18:28:29 +0000 UTC