XaiJu
Nekotekina
Nekotekina

patreon


Status update from kd-11 (26-09-2023)

Hi all,

It's kd-11 here with another RSX progress update. This one is long overdue, sorry for that, but I have some interesting updates to talk about.

My last update was about my lengthy experiments with executing SPU on GPU and coming to the conclusion that modern GPUs are still not at a level where that is a feasible approach. Once that work wrapped up, I took a short break and got right back into it. Here's a quick breakdown of all the work that went into the project since then:

1. A cheaper way of emulating RSX attribute interpolation on NVIDIA cards was implemented that removes the fp64 requirement. I'd like to thank Triangl (xenia) for the heads-up.

2. A fix for blit engine tiling was added that fixed broken graphics in some 2D titles (e.g Child of Light)

3. OpenGL texture format conversion (bitcast) routines were overhauled which fixed artifacts in some games such as Skate 3.

4. Vulkan memory allocation and spilling behaviour under low memory conditions was rewritten. This greatly improved stability of RPCS3 on memory-constrained systems when upscaling is in use.

5. A texture cache bug in surface clipping was fixed that fixed device lost errors in the newer R&C titles, specifically Tools of Destruction and Quest for Booty.

6. Support for VK_EXT_custom_border_color was implemented. Previously it was thought to not be required since most PS3 games just used a border color of 0 (default) which matches the vulkan enum TRANSPARENT_BLACK. However, we discovered that insomniac's engine makes use of custom colors and this was the cause of some rare artifacts that were observed in the R&C titles.

7. Descriptor lifetime management was completely rewritten. Our vulkan backend still used the basic framework I added in early 2016 which was mostly copied from the original vulkan tutorials. The examples tied descriptors to the concept of a "frame" and every frame allocated everything it needed inside it. Since RPCS3 is not a game, we don't actually know beforehand how many things we'll need, how many draw calls or shaders need to be loaded, etc. Instead we had a hardcoded limit for every type of thing and we manually kept increasing the numbers any time someone reported a crash. This was not scalable and I finally threw it all away and implemented something more sane.

8. Surface cache trimming was fixed. A bug in our surface cache led to a corner case where we would "leak" around 250MB of VRAM every minute. This wasn't a real leak, we had references to everything but were not releasing them on time leading to out of memory situations every few minutes. These would successfully recover but come with a big "hitch" or even a crash on some GPUs with low VRAM.

9. Surface cache cleanup was fixed. This was a long-known issue where sometimes a random error would appear in the logs with a message saying "Resource was destroyed whilst holding a resource reference" and VRAM would creep upwards. This is a different issue from the one described above and happened in all games. This was a true reference leak with resources being "lost" in VRAM.

10. Some work went into fixing graphical glitches exclusive to Apple Silicon. The root cause here is that apple uses a tile-based deferred renderer which basically splits up the framebuffer into tiles, renders them into a special tile memory area and transfers the results out explicitly when we issue barriers or subpass directives (I'm grossly simplifying here, but you get the idea). We had a lot of speedhacks that abused the fact that "normal" a.k.a immediate-mode GPUs have their outputs come in real-time (i.e you can see changes in VRAM as the command stream is being executed even without subpasses or barriers). These speedups do not work with tile-based GPUs and we had to scale back on aggressive optimizations in some areas such as occlusion handling. This made some games like Skate 3 playable again on M1 and M2.

11. Our vertex cache was reimplemented to be much faster and with much better hit-rates. In some draw-call-heavy titles like infamous, we got a healthy 10% bump in performance. Enabling RSX multithreading also does not disable the cache as it used to previously.

12. A fix was added for some problems observed on Intel GPUs where sometimes an error VK_ERROR_FRAGMENTATION was thrown by the driver. This fix was critical to getting things working well on Arc and getting them to be competitive.

13. A fix was added for an occlussion problem observed on NVIDIA GPUs where issuing a high number of query commands could drop a 4090 to single digit framerates in some games. The solution was to buffer very briefly and batch requests together. Not great for latency but it makes the difference between 5fps and 60 so it is worth it.

14. Support for the synchronization2 extension was added. We don't need it for most of what we do but the WCB/Blit path relied on events which were very poorly defined in Vulkan 1.0. This work eliminated some artifacts on AMD and Intel GPUs. NVIDIA uses a very different synchronization model that is more lenient and was thus unaffected.

15. A workaround was included for Apple's C++ compiler which is not C++20 compliant.

16. Virtualized subimage views. This was a big one and was prompted by profiling Gran Turismo games which had a notoriously high GPU load on RPCS3. The goal with this was to achieve something that doesn't exist in the Vulkan spec - observing a 2D subregion of a 2D/3D/Cube texture that is not anchored at (0,0). e.g You can have a 1280x720 image but you need to crop a 360p section starting at offset (640, 360). Previously this was handled through the subimage pipeline of the texture cache which would make copies to make sure things like tiling (repeating) worked correctly. With this changeset, we handle this in the shaders with some coordinate math instead wherever possible. This greatly reduced GPU burden on the copy engine in GT titles and gave a nice speedup. Many AAA first-party titles were affected by this.

17. A fix was added to our MSAA emulation to clamp sampling weights used in filtering emulation - this is implemented in SW by default for compatibility and performance reasons. This fixed flickering and black artifacts in some games.

18. Another fix was added to the MSAA emulation to add coordinate wrapping (repeat/tiling) in the filter emulation. This fixed the infamous black headlights that affected specific car models in the Gran Turismo games.

19. A major rewrite of projected texturing was done to work in conjuction with the changes introduced with the subimage stuff (see no. 16). This fixed a lot of problems with dynamic local lights in mostly AAA titles especially the Naughty Dog titles.

20. Multiple fixes were made to the texture cache for minor problems such as crashes and shaders failing to compile. None of them seem major enough to mention on their own.

21. Fixed another device lost crash on NVIDIA that happened when loading textures asynchronously. The async loaded uses a dedicated transfer+compute queue but depth textures must always be loaded on the graphics queue.

22. Fixed a random crash observed in "The Evil Within". This one was interesting, the code was swapping out textures with different ones to match formats in a way that the shader cache was not able to observe the change. This led to a drift between the two and random crashes would occur when textures ended up being too different.

23. Image "reconstruction" routines were rewritten. This has nothing to do with DLSS/FSR, just how we rebuild some images from other images in the texture cache, e.g. games will render many small 2D regions then load them all at once with a descriptor that suddenly decides all that memory was a dynamic cubemap with mipmaps all along. Previously some corner cases would fail when upscaling was introduced into the picture. This is now fixed.

24. Fixed a bunch of debug-mode crashes (with AUDIT macro enabled) caused by outdated texture cache code. This has been cleaned up and many games work fine if you compile RSX in full debug mode now.

25. Fixed another NVIDIA device lost error that occurred in the GTA:V prologue mission.

And now for the major one. For years I've been mentioning the RSX detiler work and that is now in a PR for merge. For a quick rundown, this has nothing to do with traditional texture tiling/swizzling and instead is just a memory area addressed directly in a raw manner. To read memory in such a location, you don't just fetch it. Instead you build an address based on the physical memory layout. The address you specify contains the row, column, bank and partition to read from all encoded into the address value. Obviously this is a huge pain to deal with as normally you have a memory controller that hides this stuff from the OS and lets you just specify a linear value. However, there are tricks that make this raw memory setup faster on real hardware and some AAA titles were using this addressing mode. The work is still not perfect, there are a bunch of corner cases left, but it will get merged very soon. The improvements are significant accross the board especially when it comes to support for SPU MLAA. For those interested, the PR is open at https://github.com/RPCS3/rpcs3/pull/14647 pending some final fixes for reported regressions. Work is far from over with this, but it is a step into getting some of my ancient local code upstreamed.

So, what's next? Well, there are a few key areas that I want to improve, starting with improving performance of SPU-heavy titles when using WCB option. The core problem is understood already, so I just have to implement the fixes. There's also still a lot to do with the surface cache, I haven't forgotten about that either.

I would like to thank all our patrons for their continued support.

Regards,

kd-11


Comments

Thanks for all the hard work! 2024 will be an important year for game emulation. (And making sure it remains possible and legal.)

Alexander (Sasha) Wait Zaranek

Thank you all for the hard work. Hopefully 2024 will be your year.

Ali Ali

Great work!!

DSonk42145

Great Work! :)

Chad ThunderSocks

GOAT

Dormant_Hero

Thanks for the lengthy update, always appreciated.

polytoad

Love the detail, keep up the good work!

Frostymm

Wow that's amazing, good job!

Walter Huf


More Creators