XaiJu
Nekotekina
Nekotekina

patreon


Status update from kd-11 (04-04-2022)

Hi,

It's kd-11 with another update on RPCS3 development. A lot of improvements have been made in the past 8 weeks. I'll review them in two separate sections with bugfixes and optimizations separate. Let's start with the list of bugfixes:

1. Fixed an out of bounds crash when running insomniac titles on the emulator. Resistance series and the Ratchet&Clank series were impacted by this issue.

2. Fixed some vulkan crashes caused by incorrect resource cleanup when exiting the emulator.

3. Fixed broken visuals in some games when using NVIDIA GPUs due to memory contents being uninitialized. This was discovered when reintroducing async compute for NVIDIA GPUs.

4. Implement native GPU synchronization as an experimental feature. This allows your host GPU to communicate directly with CELL without going through any RSX code and is more efficient as well as more accurate. It is required to fix glitches in some titles such as Avatar. I'm keeping an eye on this option and trying to find ways to make it the default. Currently optional because it relies on some driver extensions that are not supported across the board.

5. Fixed a bug affecting some popular mods where textures were cut off and incomplete.

6. Fixed a bug affecting games when MSAA was enabled. Flickering would occur randomly which could be very distracting.

7. Fixed surface cache reference leakage when sharing data with the texture cache using the "Write Color Buffers" option. There were random crashes with vk::as_rtt() assert message that are now fixed.

8. Try to handle external injected tools leaking error codes to the emulator. This was reportedly happening with some streaming software and causing random crashes due to bugs in these tools. The emulator will now try to handle unexpected error codes coming from the driver layer.

9. MSAA was rewritten from the ground up to improve performance and allow complex interactions to work without creating a GPU bottleneck. The majority of the work is done, but another PR is currently in review to implement some missing features (texture filtering).

And now, for what has taken up most of my time the past few weeks, optimizations.

1. Removed redundant calls to read the system clock in the hot path among other micro-optimizations. The goal was to simplify and cache data that is read every draw call instead of doing long calculations on the fly. These functions are very basic but can be called millions of times per second, so optimizing can have a small but tangible benefit. Around 2-5% performance uplift was observed, especially in games with many textures or ZCULL commands.

2. An image cache was introduced to the vulkan texture cache which drastically lowers the amount of new images created by the driver. This reduces CPU pressure when running games with dense 3D scenes. A similar cache existed previously but was broken and did not work as intended most of the time.

3. A spec-compliant version of asynchronous compute for texture streaming was written. I mentioned this in the last update, the core was completed and merged. Some performance considerations remain and it can sometimes be slower than expected, but I am working on improving this to make better use of the GPU. The primary concern now is the long queue depth that can happen in games where thousands of uploads happen per frame. This is unfortunately not a rare occurrence with some AAA engines, so I will have to tune the code to handle this down the line. Those with fast GPUs can already try this feature and enjoy a performance bump in some titles.

4. The surface cache was rewritten to be segmented. This allows for very fast traversal of the storage structures which can improve texture upload times significantly. A minor uplift in performance was measured, but it is difficult to isolate this component by itself as it relies on data from other sources.

5. The vertex program decompiler was rewritten to allow packing of local data into much smaller structures. The gains here can be impressive, with upto 8KB of data saved per draw call. Since the gains are per draw call, the performance uplift can vary from barely any to over 20% better fps in games.

What's next? Well, Vulkan 1.3 integration is still high on my list, but I want to eliminate any obvious CPU bottlenecks we have to maximize the impact of switching to different APIs. I will keep taking profile captures and optimizing for now as well as looking at bug reports on GitHub to ensure a good end user experience even as we experiment with optimizations.

Thanks you all for your continued support,

Regards, kd-11

Comments

Great work, can't wait on The Last Of Us being optimized

Koen

You gave me the ability to relive some memories. With love and support.

Aelzaire

Excellent work!

Eugene J Gimblet Jr.

Always amazed at your work, thanks for the status update!

Andreas

Awesome work as always. I’m really inspired by this project to jump into emulator development!

distributedfox

Great work!

DSonk42145

Incredible work! Thank you for the status updates.

polytoad

Thank you for the report boss 🥳

Dormant_Hero

I've noticed the performance inching upwards lately, great work guys!

Lagahan

I'll make a report later, it will be pretty big. Sorry about this.

Nekotekina

I'm wondering what Nekotekina has been doing recently. Haven't seen his report for over a year. Is he still the core developer of RPCS3?

Master


More Creators