XaiJu
Nekotekina
Nekotekina

patreon


Status update from kd-11 (21-02-2021)

Hi,

It's kd-11 with another RSX development update for RPCS3.

Over the past month, I've been doing some more research on how to improve the rpcs3 texture pipeline. This work started a few weeks ago with the implementation of passthrough DMA and is still ongoing. But before I return to updates on this issue, let us begin with a quick recap of what has been happening so far:

1. Vulkan renderer saw some significant restructuring to help with maintainability. This largely just involved breaking some massive headers into smaller chunks and removing code from headers where it made sense to do so.

2. Fixed a regression in GOW3 and MGS4 related to depth buffers and stippled rendering.

3. Fixed a light bleed bug in select scenes in MGS4 related to shadow rendering. This bug was only visible when running the game with an NVIDIA GPU.

4. Fixed broken rendering of games on NVIDIA GPUs where random flashing triangles would appear on the screen. e.g Rachet & Clank: A Crack In Time

5. Fixed vulkan crashing with VK_ERROR_DEVICE_LOST for NVIDIA GPUs in the Turing and Ampere family when running Killzone games.

6. Fixed texture upload times creeping upwards in SSX due to the game having too many duplicated sections in a small area of memory.

Now onto a big change that got merged:

As a followup to the R&D discussion in the last update, I implemented several minor adjustments to texture cache operation that boosted performance in titles that were moving a large number of textures across from CPU to GPU every frame. One of the most affected engines was the Killzone 2 and Killzone 3 engines which could stream around 600-700 textures is busy scenes - each frame. Highlights of this work include:

1. A patch to improve GPUOpen vma allocator when numerous small allocations are present. Previously, the time to allocate a new block would go up with increasing number of live allocations to the point where it becomes ridiculously slow. This patch has been accepted upstream as well by AMD.

2. Force-align memory requests to the GPU to the minimum hardware granularity supported. Not really an issue for AMD but saves a lot of time on NVIDIA with their weird alignment requirements. This leverages the GPUOpen change mentioned previously.

3. Rearrange texture cache internal structures for faster searching. A lot of time was being wasted iterating structures previously.

4. Re-use textures when possible instead of creating new ones to avoid driver overhead.

5. Move gargage collection to the offloader thread as it is not a time-critical task. Making it asynchronous also helps smooth frametimes and improve performance.

6. When using small textures, just hash the memory range instead of using page protection. This avoids the process of unlocking and relocking memory ranges in case of a fault.

7. Other minor optimizations.

So, what about since then?

Even after this set of changes, I was not satisfied with the texture upload performance. Most users' GPUs have extra headroom and we can utilize that to handle texture streaming. This is what I have been experimenting with for some time now and I have observed some interesting results. There is definitely an improvement, although it will be difficult to fully integrate the whole set of changes as-is. With proof of concept builds we have observed with testers massive performance uplifts of upto 50% with an RTX 3090. This improvement however does not come cheap. The extra work at this time will fully saturate a low-mid range GPU. On an RX470, I was getting over 90% utilization in my benchmark scenes, compared to around 50% before.

Screenshot master: https://cdn.discordapp.com/attachments/442667232489897997/810976735692324934/unknown.png 

Screenshot proof-of-concept build: https://cdn.discordapp.com/attachments/442667232489897997/810978559471517752/unknown.png 

(Settings: MTRSX + WCB, default everything else)

However, there are downsides. For one, efficiency is questionable, we're using almost double the GPU grunt for about 50% more performance. But even worse are the flickering and other minor glitches that prevent me from simply submitting this work as-is right now. This is due to difficulty in maintaining efficient synchronization between the GPU and CPU at high framerates. The GPU is also working harder now, which means if you have a weaker system communication latency worsens slightly. The changeset is also massive, having touched almost every single file in the Vulkan backend and is in total thousands of lines of change. This is not manageable right now, so I have decided to commit smaller chunks of it over the next week or two to at least lay a solid foundation before enabling the functionality required. I will provide an update once the work is completed.

Thank you all for your continued support,

Regards,

kd-11

Comments

1080ti:D

JiaWen Li

50% more performance using more GPU sounds amazing. Would love to try that.

Dormant_Hero

Nice. Will definitely try out the new changes with my RTX 3090. Not sure how my 9700K will handle the additional threads. My system could really use a CPU upgrade.

Povilas Staniulis

Amazing update. It's really super interesting. 😁 Thanks for the deep dive

Faviann Di Tullio


More Creators