February is here which must mean that January has slid by once again.
While this time of year isn’t usually so hot for those twitter-melting AAA games, we all sometimes need less action packed schedules. Ubisoft proved they still have some ability to make a compelling video game, and potential game of the year: Turnip Boy Robs a Bank, proves that just about any idea can make it!
Opinions on games aside, it’s time for the usual agenda.
As we’ve perhaps highlighted before, macOS as a platform is a little bit of a nightmare as far as 3D graphics go. Everything for Ryujinx needs to go through MoltenVK which, while a life-saver, is not immune from frustration. On all platforms, if a shader compilation fails (usually caused by an invalid output from the game) we can attempt to skip the draw to avoid a complete crash of the program. There is however a weird case on macOS where some pipeline variants are A-OK but some will fail; it’s this final case that could still cause a consistent crash in games like Fire Emblem: Three Houses.
Adding conditions to return in more failed cases does not 100% fix the graphical rendering, but it does avoid the crash and actually allow users to provide us logs for future pipeline issues!

As proved above, GPU drivers are often heavily overlooked in how much they actually impact nearly everything. Many people assume in general it’s relatively similar outside of certain features whereas in reality they are some of the largest and most complicated programs ever written. They all have different and individual paths for operations that may or may not be designed around specific hardware or tailored for specific software. One of the reasons that we, and in general a lot of 3D application developers, give Nvidia a lot of praise is that their driver is ridiculously resilient to the developer taking almost any route. Because of this, it is often easy to spot large discrepancies in rendering costs when, on paper, two devices may be equally matched.
Thus, there are a few in-flight changes that are attempting to reduce the cost that other drivers place on operations with the first of which being the use of templates for descriptor updates.
The main beneficiary with template use is the RADV Mesa driver which can see global improvements of up to 5% in framerate, and about the same drop in Vulkan backend time%. For comparison, Nvidia saw no improvement in framerate and a backend time cost reduction of 0.6%. They were already handling the suboptimal path extremely well.
Citizens Unite!: Earth x Space was problematic to play prior to January unless wearing an eyepatch, or something similar that would impair a chunk of your vision. When some draw commands trigger, they can pass parameters “inline” which does not update any of the 3D engine state. If another draw is requested that utilizes state variables, then the draw would have incorrect data from which to construct the image. By forcing a vertex buffer update when the switch between drawing types occurs.
Before:

After:

Unusually, this abruptly concludes our section on GPU updates.
However, the resources saved here allowed a pretty major piece of work to be completed by project lead, gdkchan.
ARM devices are having their own renaissance period at the moment after Apple made a statement piece with their M1 line of devices. No more is the future looking like the architecture will purely be the domain of mobile and small form-factor products. Microsoft seems to finally be stepping up with the Windows on ARM efforts, Linux is already in a great spot, Apple has been all-in since 2020 and we’re starting to see chip designers like Qualcomm enter the PC marketplace.
Ryujinx already works on any ARM device that runs a supported OS via our JIT compiler but this is often extremely wasteful when you consider that the Switch itself is also ARM-based. On macOS devices we currently use the Hypervisor services that Apple provides in order to, almost, natively execute Switch code with near zero overhead but this is not a particularly global solution to the problem. We cannot make use of the Apple hypervisor anywhere but macOS and while we could implement more solutions targeting Windows and Linux, the code bloat would be massive.
We decided on a hybrid approach of still JITing the code, but with new ‘lightning’ paths.
All Switch code is still being passed to our JIT compiler but, when running on an ARM CPU, now with the ability to look at a code block and check if anything actually needs re-compiling. In the best case scenarios it can act as a zero-cost passthrough similar to a hypervisor, and at worst it will still do a fraction of the work an x86 system would need.

The graph above plots the time it would take for an Apple M1 chip to re-compile all of the code in a game binary under certain conditions.
Pink = Full recompilation with focus on speed over code quality.
Red = Full recompilation with focus on code quality over speed.
Green = Lightning recompilation where needed.
Okay so speed is absolutely not a problem. But as the Old JIT had high and low code quality modes, how does code quality itself stack up (think of this as binary size or “number of instructions”)?

The takeaway is that the new JIT can produce better code, and do so much faster than the old JIT. The most dramatic difference is seen on Neptunia, which has the largest code size. It took 4.5 minutes to compile with the old JIT on HighCq mode, while the new JIT took only 4.64 seconds. The old JIT produced 633MB of code in the HighCq mode, while the new one produced 348MB. This is almost half the size, while taking a fraction of the time.
Many may recognise that New Super Mario Bros. U Deluxe (NSMBUD in the above graphs) is actually a 32-bit game and cannot be executed natively via hypervisor or any other method without a JIT. By focusing on an approach that not only considers 64-bit titles, we can significantly reduce the overhead on games like NSMBUD or Mario Kart 8 Deluxe that have been a major pain point for ARM64 devices with the easiest show-case being boot times.
Old JIT (no PPTC):
https://streamable.com/qt4mw9
Old JIT (PPTC):
https://streamable.com/w8kpun
New JIT (no PPTC):
https://streamable.com/0yr45e
With this, we should have an excellent foundation for future ARM devices on our supported platforms that is low maintenance and not dependent on any specific frameworks (Apple users just got lucky this time!).
We’re sure that some of you are asking why we aren’t pursuing an alternative approach dubbed “Native Code Execution” or “NCE” which has been massively popularized by Switch emulators on Android devices. There are a few reasons why NCE is not our preferred response to tapping the potential of ARM devices:
- It is impossible to run Switch system instructions like service calls directly even on an ARM device as they are specific to the Switch OS & Kernel. As such an NCE approach requires patching of the game ROM to redirect these instructions into host emulator calls. These modifications give the game access to the host and are fully visible to the guest program.
- Interrupting the guest threads becomes somewhat complicated, since we can't insert "interruption points" in the code. We can use pthread_kill on Unix-like OS, but Windows has no such a thing.
- The code will access the emulator address space directly, so we need to set it up in a way that makes the game happy with the allocated guest region being contained entirely inside the guest address space.
The outcome is that 36-bit games likely wouldn’t work, as we probably won't be able to allocate 36-bit worth of virtual memory right at the start of the emulator address space. There are not many of those, so while not a huge deal, something to consider.
For platforms with 16KB page size, there are additional challenges as we can't map the text segment and data segments as RX and RW respectively, since they are 4KB aligned, not 16KB aligned. We also can't just map them all as RWX if the platform has W^X. Making this work would require even further patching of the executable.
However, given that the approach will have its advantages in certain titles, it is potentially worth looking into again in the future. Right now however, we valued a consistent approach that integrated cleanly into the JIT project, offers near native execution speeds and very importantly, improves 32-bit game execution immensely as well.
After that heavy section we’ll blitz through a few service changes that were finalized in January:
GUI side we have seen a few major additions to our Avalonia frontend.
Right-to-left language support was added and after many years and countless requests, a mod manager is now implemented which will allow users to add any number of game mods, and actually be able to turn them off!

Additionally the HLE controller applet was re-worked a little to offer something more visual as a guide for supported players and controllers.

The applet will now show icons of which controller types the game is requesting and how many players it will accept. In general this was one of the more confusing applets due to the “wall of text” effect and the fact that older consoles simply didn’t care as much as the Switch does about player/controller combos.
On the topic of player/controller combos, a better system of identifying controllers on disconnect/reconnect was implemented. Previously the system relied on the controller ID (standardized for a specific controller) and also the global index which is assigned at connect time. Unfortunately this global index would change depending on when and how the device was connected, resulting in devices that were not assigned the correct profile, or not assigned at all on connection. The system now uses a separate index in addition to the GUID to track devices; this results in much higher consistency when adding a controller and also reconnected controllers actually getting their profiles loaded correctly.
For those who are, very patiently, waiting on further updates from us on the UI development side, we last mentioned that one of the major blockers for our Avalonia switch was Linux side for Steam Deck users. We thought we’d found the root problem and opened a PR to Avalonia directly to fix it, unfortunately as with all things Linux, it is rarely that simple.
The drag on this is as frustrating for us as it is for everyone else but the swap within the auto-updater needs to be synchronized and doing it before the Deck issue is resolved would break a lot of peoples install. We’ll continue to relay advancements on this subject as they arrive.
Closing Words
As we enter 2024 en-masse, we are all incredibly thankful for everyone’s support towards this project. It’s been 6 years so far and whether it was through Patreon, reporting bugs, or code contributions on GitHub, or assisting other users in our Discord, because of all of you, the motivation and goal list is still just as high as it was in 2018. We are truly in awe of how far this project has come, so once again thank you!
2024-02-12 01:15:15 +0000 UTC
View Post
Happy New Year!
We hope everyone is having a great start to 2024 and that you’re all rejuvenated for another year of listening to these (mostly) regular rambles. We’ve got a fair bit to go through including the usual sprinkling of graphical fixes, a nice meaty section that users of battery-powered devices will want to read and, of course, our usual yearly wrap up on a few different topics, including compatibility and performance across 2023.
We’ll start out with a certified retro title in Monster Hunter Rise (MHR). Released in the antiquated year of 2021, the game has received several updates and DLC through its life, the newest of which has been a pain to emulate since its release. From the ‘Sunbreak’ DLC and update onwards, MHR would crash on boot before reaching the title screen with an error that was very complicated to solve within the buffer cache.
The old implementation made the assumption that all the memory regions where buffers could be located are contiguous (they’re next to each other). In most cases this assumption is correct, but if you can guess a title where it isn’t, then you probably have average pattern recognition skills. To solve this, support was implemented for Vulkan’s spare mapping feature which allows multi-range buffers to be created from multiple physical buffers.

Perhaps the obvious downside here is that the feature is limited to Vulkan. OpenGL does technically support sparse mappings, but doesn’t allow you to choose where it will be mapped, making it effectively useless for our use-case. Metal on macOS is in the same boat as OpenGL, while supported, it does not provide enough control of the buffer mapping to be viable, with the limitation thus extending to MoltenVK too.
For devices and drivers that fully support Vulkan (Nvidia, Intel and AMD) however, Sunbreak and onwards is finally playable!
Fashion Designer, a game which we’re sure will be topping Game of the Year charts across the globe, was exhibiting a particularly strange glitch.

While we’re all huugggge fans of socks (having received around 30 pairs at Christmas), this seems like a few too many. On closer inspection, there are a few too many of everything.
If you’re one of the few folks left to play Fashion Designer, then each one of these icons is meant to be a different item of clothing. Duplicates are bad.
Our culprit resides within the texture cache where everything has a specific lifespan before the texture is flushed. Certain flags on each texture get set when the texture is first accessed, and when it is finally swapped out for a new texture. Fashion Designer appeared to be rendering different objects to the same texture, and as such only the first use of the texture was correctly setting the cache flags. When the game went on to request more draws of different objects, the same object texture was being copied multiple times. By resolving this edge case, our full arrangement of clothing items can be viewed.

Remaining on quirky uses of textures for the time being, when you catch a Cicada in Yo-Kai Watch 1, a nice 2D image of the bug is meant to take up a large part of the screen. Unfortunately, due to a questionable if-statement silencing the one log warning that would have told us immediately where the issue was, it has taken a fair while to track down the… bug.

For whatever reason, the team developing Yo-Kai Watch 1 decided to perform an image store on a texture that is a quarter of the width of the base format, but stores four times the data per pixel as an RGBA32 texture. If this sounds pointless, it is, because what is done with that image straight away? It’s accessed as an RGBA8 image which is an incompatible format conversion!
Adding a copy dependency to these formats resolves the bug and restores the bug.

This was also the cause of major graphical corruption in Wet Steps. The game still has a few other issues, but the difference is major.
Before:

After:

Super Mario RPG had impressively few bugs when it released in late November, but our eagle-eyed users instantly noticed that Mario himself, and a few environmental objects were slightly dull. While Mario is getting on in years and has probably lost the sparkle of his Gamecube youth, it turned out that bindless elimination was not working correctly in a couple of cases. In the event that a shader handle is assigned twice via different paths, bindless elimination was unable to be extended as it is unable to find the handle operation. However, even if different paths exist, the value is actually always the same as any relevant data is unable to be modified once inside the shader pass. By fixing this to simply pick the first value if multiple routes exist, Super Mario RPG renders correctly.
Before:

After:

Another game that highlighted an unhandled edge case in bindless elimination was Detective Pikachu Returns. This one is more subtle, but extending through shuffle resolves cubemap reflections throughout the game.
Before:

After:

Let’s take a short interim to talk through some of the smaller, but perhaps interesting changes that have taken place over the last couple of months.
Alright strap in, it’s time for the “Ryujinx blog teaches you computer science” segment.
Sleeping
We’re all aware that sleep is important right? Unfortunately, as is probably relatable to many, it’s actually more difficult than it may appear to sleep for the correct amount of time. Undersleeping and then feeling exhausted, oversleeping and missing your train is an all-too common experience in the modern world. Computers are not so different.
When they’ve finished their tasks, they’d like nothing better than to pull up their duvet and tuck in for a solid 6ms nap before work the following nanosecond.
Ryujinx needs to operate on a very tight schedule and the best way to do this is actually not to sleep. No desktop kernel is truly “real time” in the sense that it is impossible for us to sleep in one instant and wake whenever asked. There is always variability and delay to our requests (pretty graphs later).
Alternatives to sleeping
The alternative to sleeping is called “spin-waiting”, in which the program retains control of the CPU and continually asks it to “spin” until an event is triggered which will release the CPU from this spin cycle. Spinning can be used to take very granular control over when and how the CPU stops and starts execution of a thread within a program but comes with a major downside. It’s still active and using power the whole time. Consider sleeping vs spinwaits to be the difference between sleeping and sitting in a waiting room. You’re ready quicker if requested from a waiting room, but you’re less rested than if someone called you from sleep.
The problem
Now that we have the terminology out of the way, let’s look at the problem. If we have a single thread, monitoring a single event that wants to wait 1ms at a time, we have no problem at all.

Pretty much all three major operating systems will allow us to sleep with 1ms granularity, and be able to wake at the right time. However, consider this same scenario but with 10 different wait events on the same thread, all out of sync with each other by 0.1ms.

Now we start to run into issues. The solution we used up until the end of 2023 was to simply spin through these waits, but as you can see, this means spinning the entire time as we need to be awake to handle the thread that is about to wake every 0.1ms.
We spoke in our last report about a change to `ServerBase` which stopped polling every 1ms. This reduced CPU utilization and power usage dramatically due to the problem above. ServerBase was a major contributor to the huge stack of concurrent waits we had to deal with. Unfortunately it is only one of many thread types that request constant waits, game threads are just as large a problem that were not dealt with.
So how do we move forward? The CS nerds reading have been shouting “nanosleep!” at their screens for the last couple of minutes and they’re half right.
Linux/macOS
Linux and macOS both provide a `nanosleep` syscall to wait a precise number of nanoseconds. Nanosecond precision is more than capable of handling our above scenario so let’s give it a whirl.

Upon testing, we see that nanosleep is not quite as precise as its name claims. At very low nanosecond values we see very consistent (and small) wake error values, but once we reach the threshold of 0.5ms a huge spike in error occurs which eventually levels off at around 1.5ms.

macOS curiously again has different behavior. At tiny wait requests the syscall is remarkably accurate with minimal error on sleep requests down to the nanosecond. Unfortunately the error is directly proportional to the wait time requested and keeps climbing until we see almost 0.5ms of delay when asking for a 3ms sleep, and over 3ms delay when asking for 20ms of sleep.
While these sound bad, the use case we wanted was already for those smaller sleep times, so both macOS and Linux have an efficient and easy way out here via nanosleep.
Let’s talk about Windows…
Windows
The Windows NT kernel is, by far, the least “real time” of the three major options presented. It has no nanosleep equivalent and as such is highly limited in how to deal with our sleep problem. By default you can sleep to an accuracy of 1ms which, as we hope to have reiterated, is not good enough for as low as two concurrent waits (each 0.5ms apart) let alone 10. Is all hope lost then? Is 1ms really the best we can do? We’re pretty smart so the answer is: hell no.
On most x86_64 systems you can perform a query to the clock resolution and discover that there is actually a 0.5ms resolution “base clock” and that perhaps more interestingly, by default, any waits you perform will align to the nearest “base clock tick” automatically. If you sleep with no thought, this means your thread may wake late due to alignment with the next base tick. However, if you have this information, and make a very smart guess about when the next tick will occur, you can time your sleep to 0.5ms precision and nearly always wake right before it.
None of this solves our 10 concurrent 1ms wait issues though. If we detect that a wait event is base-clock aligned or very close to, we can allow the system spoken about to align the waits and wake just before the next clock tick. If the wait is not base-clock aligned (or extremely precise like 0.01ms) then there is unfortunately nothing that the NT kernel provides us to solve this beyond continuing to use spinwaits where needed.
Class dismissed, let’s look at what all of this actually does. We said before this whole thing is about CPU usage and power draw, so some more graphs seem to be in order.

Apple and Linux devices will be seeing the largest benefit here with some seriously impressive efficiency gains at equal frame rate and resolution. Tears of the Kingdom is being slashed by almost 40%, Red Dead Redemption and Breath of the Wild both see very healthy 15-20% shifts and Pokemon Violet practically sheds everything with a 75% reduction. For perspective, that is an M2 macbook air emulating Pokemon Violet more efficiently than the Switch plays it natively.
Areas of games that don’t really do much like title screens, or when emulation is paused, see some wild reductions on devices like the Steam Deck.
Breath of the Wild before (14.2 W):

After (7.9 W):

These changes are the difference between multiple hours of battery life and devices such as the Macbook Air’s ability to not thermal throttle, or to avoid throttling for far longer. It also allows devices to reach and maintain their boost clocks during the times they’re actually needed, rather than being continually tricked into boosting on menus and when paused. If none of that is cool then we saved you some money on your next energy bill, take it or leave it!
Lastly, for those of you who find it interesting that we decided to write this thing in C#/.NET, we had a new and shiny .NET version to update to in November: .NET 8 (they grow up so fast…). Microsoft always provides an absolutely huge document on all of the performance improvements they’ve brought to the table each year but for us, they can usually be distilled into our patented Super Mario Odyssey pole benchmark.
.NET 7

.NET 8

This very blog also got a very cool mention during .NET conference 2023 as part of their excellent talk on Dynamic PGO. Check that out here. Look ma! We’re on TV!

Our 2023 round-up
So how did 2023 go for us overall? Well let’s take a look shall we?
Hope you all aren’t bored with graphs yet.

We added over 700 games to our compatibility list during 2023 which brings the total for games tested (and reported) on Ryujinx to 4255. Over 83% of those are reported as having no graphical, technical or stability issues at all, with a further 12% of titles having at least one problem. This category is mostly filled with titles that have minor graphical glitches or stability issues. In other systems of marking, they may be considered playable also. The remaining 4.5% of titles only progress as far as the menus, if at all.
As far as performance goes, we had a year on year average improvement of 36% in our own usual suite of games. As usual this is highly game and hardware-dependent.

We’re not quite sure what happened to Persona 5 and NieR, but both have seen a 61% and 89% improvement respectively. Our two Zelda games tested jumped by 20% and 35% (Tears of the Kingdom does indeed run on our January builds), and our classic benchmark of Super Mario Odyssey continues to keep climbing no matter what we do.
All in all, a great result and as usual, all of this just works. With no need to fiddle with settings for different games, the out of box experience is the best it’s ever been.
2023 sure had plenty of game releases so let’s jot up just how many Ryujinx was right in the action for, running on day 1!
- The Legend of Zelda: Tears of the Kingdom ✓
- Super Mario Wonder ✓
- Super Mario RPG ✓
- Pikmin 4 ✓
- Metroid Prime Remastered ✓
- Sea of Stars ✓
- Octopath Traveller II ✓
- Fire Emblem Engage ✓
- Kirby’s Return to Dreamland Deluxe ✓
- Advance Wars 1+2: Re-boot Camp ✓
A ridiculous list of titles, first-party and otherwise being playable with no changes always fills us with pride. It would be wrong of us to omit that a couple of these games (Tears of the Kingdom among them) didn’t need some additional love to reach our standards, but the majority of the year has gone without a hitch.
Closing words
That’s all from us this fine January. We hope you’ll stick around for another year of this madness because we’re sure that 2024 will not be dull!
Onto our scheduled sales pitch: if you would like to contribute to Ryujinx then there are many ways in which you can assist us. Knowledgeable in emulation, graphics development or even just use C#/.NET in your day job and want something cool to stat-pad your resume? We’re always looking for more folks to check out our GitHub. Fix bugs, add features, or our personal favourite: stare blankly at Visual Studio while imposter syndrome slowly creeps over your shoulder.
If you couldn’t write “Hello World” in English, let alone a programming language, you can still help us fund our development time and other costs monetarily via our Patreon, or help us with testing games and helping fellow users figure out this whole emulation thing on our Discord.
Until next time!
2024-01-10 17:57:59 +0000 UTC
View Post
Had your fill of candy, sweets, or whatever your localized version of ‘edible items that are actually quite bad for you but they taste great’? We hope so, because the end of a month signals not only the start of the next, but also another progress report from yours truly.
In what was quite possibly the most spooky game launch of all time, Super Mario Wonder graced both our Switches and our PCs on exactly the same date… Curious how that keeps happening.
Either way let’s fill our October goody bags with just a few more sweet treats!
The first item on the agenda starts with an adequately thematic title in the form of Luigi’s Mansion 3. As one of the visual masterclasses of the Switch, LM3 loves to be a little jank in a lot of places: rendering the lobby interlaced and using an extremely aggressive dynamic resolution mode being just a couple of the annoyances we’ve had to contend with over the years. We spoke last month about a couple of fixes for AMD GPUs in regard to certain objects and shadows, but there was still a single case where everyone could very easily notice that Luigi was not having a good time.

We can save him some dignity here and confirm that he is in excellent control of his bladder. The issue lies in the texture formats. LM3 renders this shadow as an R16Unorm (color format), but then proceeds to sample it as a D16Unorm (depth format). Adding support for a copy dependency between these formats restores the shadow to its appropriate shape and longevity.

While checking Cocoon, we isolated further shader instructions on the GPU that we did not yet support. While this usually isn’t too big of a deal (just a warning in the console log), this particular instruction was actively generating invalid code and not just failing gracefully. Implementing proper support for querying the amount of samples on a multisampled texture kills both birds with one stone. Shader instruction implemented, garbage output gone!
Cocoon before:

Cocoon after:

Coming off the back of such a simple name like Cocoon, our next target is the even more concise: Neptunia GameMaker R:Evolution.
Much like flushing a toilet, GPUs need a method to remove data from their memory (VRAM) after the game is no longer using it. One of the core rules around flushing textures however, is that you should not attempt to do it once the texture itself has become unmapped. This is because once unmapped, the GPU has no idea what is now actually stored at its memory location, the CPU could have put something else there already. If anyone was guessing at what could happen if the GPU attempted to flush data that wasn’t actually what it thought it was… well that’s how you get data corruption!
In the case of our GameMaker friend above, this is precisely the scenario occurring. Unfortunately for this game, the resulting data corruption simply resulted in a hard crash rather than any pretty digital art. Skipping these invalid flushes allows the title to proceed in fine fashion.

AMD served us a fresh, and steaming, dish of frustration with the release of their 23.2.x line of Radeon drivers. Imagine the horror, waking up, updating your GPU drivers, then…

In our SPIR-V backend, we previously attempted to re-use function parameters across multiple calls to the same function. For whatever reason, this now completely baffles the AMD SPIR-V compiler, resulting in the abstract line art you can view above.
Giving each function its own set of temporary values seems to resolve the issues.

NativeAOT is a rather new-ish feature of .NET which we’ve mentioned a couple of times before in these reports. It effectively tries to bridge the gap between a fully JITed runtime like usual C# or Java, and a fully compiled language such as C/C++. Compiling C# to direct machine code Ahead of Time (AOT) has some excellent benefits in terms of boot times and portability to platforms where you may not have access to the full .NET runtime and JIT.
You do lose some features of .NET when doing this though, namely the ability to generate code during program runtime. Ryujinx currently uses a JIT for some Switch GPU macros which tries to directly emit .NET IL (Intermediate language) to avoid going via a slower interpreter route. Simply enabling NativeAOT and forcing the interpreter reduces performance in Super Mario Odyssey from 90FPS down to 75FPS, a huge 17% dip.

The solution to this is to implement HLE macro’s that attempt to match the lower level NVN macro directly, instead of leaving it up to the emitted IL. Under NAoT this brings the performance of SMO backup to 85FPS. Still short of the 90 mentioned prior, but that 5FPS gap is made up by additional factors unrelated to these NVN macros.

Onto other news, a fallback has been added for GPU drivers that do not support the OpenGL equivalent of `textureGatherOffsets`. MoltenVK technically reports that it does support such an extension, but sometime between the initial macos1 release and today, an update to SPIRV-cross (a library MVK uses to convert Vulkan shaders to Metal shaders) has made it attempt use said feature… which crashes the metal compiler because metal doesn’t support it!
This fallback once again allows Xenoblade Chronicles: Definitive Edition to render (although there are still an array of other issues, especially on newer versions of MoltenVK).

Sifu, a roguelike with a very short name, has been a frustration to play on Ryujinx since its release many months ago. While visually and mechanically sound, we’ve heard that time-dependent random crashes are not many users' favorite issue to deal with.
The cause was due to the game problematically calling a function which was replacing buffers in the surface flinger (this is basically the service that makes one big buffer of data from lots of smaller buffers). While the eventual result of a large chain reaction of issues was a memory unmap crash, the root was a small issue in a single basic counter not being decremented properly. By making sure this counter is decremented if these problematic cases are hit, Sifu players no longer need to hold their breath.
Moving away from all that graphical stuff, let’s talk multiplayer!
For those unaware while one upstreaming effort has been completed in the macOS changes, another was also started. Getting all the LDN functionality we’ve been working on over the last 3 years into our main releases has finally become a focus now that more time has opened up to work on the cleanup and reverse-engineering aspect of the service.
Last month we added support for the actual service implementation with the caveat that while a lot of the ‘Switch’-side stuff is now in place, we still need some way to make it useful. This is currently done in two ways in our LDN builds:
- Custom, over the internet, implementation over our own servers called “RyuLDN”. This allows Ryujinx users to connect to each other from around the world, but has the downsides of being limited to Ryujinx users only.
- Ldn_mitm is an alternative that transforms the functionality of any game that has LDN functionality, into one which has LAN functionality. This means that any real Switch with the ldn_mitm sysmodule can connect to any other equivalent Switch, and additionally to Ryujinx. The downside is that for this method to work, all systems must be on the same network, whether real or virtual.
As the second approach is ultimately simpler for the time being, it is the first to become available in our main releases.

A change which many users on integrated and lower power systems (or anyone who stares at their power usage instead of playing their games!) may enjoy, was an adjustment to how we signal for a session to be added to a given ServerBase. In the past, we we’re simply polling for 1ms which resulted in a fair amount of ‘fake’ CPU usage. While it was fake in that the thread would yield if any other task required it, what wasn’t fake was that it forced the thread into constant real use, inflating its presence when profiling, and causing large influences on power consumption.
By adjusting this logic to signal an event on session addition instead of polling, general CPU usage (especially when emulation is paused) sees considerable reductions. However, the real gains can be seen on battery-powered devices such as the Steam Deck. Mario Kart 8 Deluxe, while simply sitting on the character select screen, has its power consumption cut from 15.8W, all the way to 9.1W.
MK8D before:

MK8D after:
We’d like to reiterate that we do not expect this to have much influence on raw performance (as mentioned above the usage hog wasn’t ‘real’), but equal performance at a 42% reduction in wattage is certainly a win in our book. This won’t be the case in every game, but there are still further improvements to how we perform submillisecond waits that can be made in future.
As for some other quality of life features that were added this month, the long requested ability to add game shortcuts to your desktop via a simple click finally materialized. Right-click any title and look to the very bottom of the context menu, from there the desired game will have a shortcut of itself (complete with game icon) created on the current active desktop.
Windows:

Linux:

macOS:

Secondly, aspect ratio can now be changed from the bottom bar if you use our Avalonia frontend. This allows hot swapping between them if, for whatever reason, 16:9 just isn’t cutting it mid-session, and you just need the game to be 4:3.

Last but certainly not least, the library we use to emulate most of the file system services - LibHac, was updated to version 0.19.0 and continues the trend of open-source developers hating naming anything version 1.x.x!
New version, new stuff. `IFileSystem.GetFileSystemAttribute’, a new filesystem service added in firmware 16.0.0 is now fully supported allowing newer titles such as Tiny Thor, Cassette Beats and DeepOne to head in-game.
DeepOne:

Cassette Beats:

Tiny Thor:

That last one clearly still needs a bit of work!
Closing Words
If you make it this far in these reports then you deserve a gold star, or a cookie, or something that more generally triggers dopamine.
We’d like to once again thank everyone who supports us every month on Patreon, contributes code to us on GitHub and those who help other users out with troubleshooting and bug reporting in our Discord! We couldn’t do it without you and with that, we hope to see you again next month.
2023-11-10 23:42:16 +0000 UTC
View Post
Gather once more at the three-quarter mark, for beyond this point, your days become dark.
September was an unusually quiet month for our friends publishing games with the only games of note being a rather underwhelming Baten Kaitos remaster, alongside a questionable Mortal Kombat 1. At least the latter managed to bring some comedy to its otherwise haunting visuals.
Beyond those we’ve got stuff to chat about from all over the project from LDN, Mac improvements and a boat load of service work.
Hop on down.
Our first stop is at the port of Delfino, where shines lag is the name of the game. The title screen of Super Mario Sunshine (part of the 3D All-Stars collection) heavily stresses a couple of our buffer conversion shaders, specifically those converting the stride; stride being effectively the gap between elements of the same ‘type’ in a vertex specification or other dataset. If you made a shopping list such as: Tomatoes, 1, Bread, 1, Apples, 6, the ‘stride’ between your items is one, or in a computer, however much memory the number of each item takes up!
Either way, to not bore everyone with theory, we need to convert some buffer formats that SMS uses into something a little more sensible for your real GPU via compute shaders. Nvidia does not need these conversions and works perfectly fine without them, but AMD (and when forcing Nvidia to use the conversion) struggles significantly.
https://user-images.githubusercontent.com/6294155/264176691-8083b12e-8602-487f-95fb-a505fb1441b3.mp4
This is what writing 230MB in compute each time a buffer needs conversion looks like!
To reduce the impact of this insanity, we can instead device map the converted vertex buffers (as they’re only ever accessed from GPU) and also allow the conversion shaders themselves to scale the work-group size. This lends itself well to most dedicated GPUs that have more cores to work with. Even together the problem is not entirely eliminated, but the difference is still stark.
https://user-images.githubusercontent.com/6294155/264176768-9596f440-1d87-404d-9369-7dece7bb0a72.mp4
On the topic of rough performance, Mortal Kombat 1 was an unexpected thorn early into September as even our users with the highest end of systems were struggling to reach the native frame cap of 60FPS.
Further inspection and profiling revealed that MK1 was creating over 100 buffer textures that would all overlap at once. MK1 exposed a corner case in the buffer cache implementation where many buffer textures could be created as a view of fellow overlapping buffer ranges. If all of this is jargon, the basic outcome is that the scenario that the buffer cache was checking is fundamentally impossible and as such just a waste of time its, and your, time.


A nice 46% improvement can be seen once this has been corrected. The game also seems to have a maximum framerate cap, so the value here is probably higher!
Not content with just the one game being affected, a second issue where the texture lookup array was being resized on every lookup return was corrected, yielding some nice gains in some FIFO limited UE4 titles and coincidentally improving frametime stability in Mortal Kombat 11.
The real winner though is R-TYPE FINAL 2 which sees a staggering 750% improvement from 8FPS all the way to its engine cap of 60FPS. If a side-scrolling space shooter is just what you were dying to play, then now is the time.

September also marked the arrival of a Baten Kaitos I&II remaster, the first of which, if you’re interested in trivia, holds the longest 100% speedrun world record clocking in at fourteen real-world days.
As is apparently only fitting, it brought with it a whole new service class: `ngc`. This was a bit of a mystery for a couple of days as no one could actually tell what it did. BKI&II seemed to register the service but never actually made any calls to it. Upon further inspection however, it seems that in firmware 16.0.0 Nintendo have moved their profanity and general input filtering checks into a service of their own.
NGC, “No Good Content”, seems to have taken over the role that used to be provided by the general firmware word blacklist that has been used since the 3DS/Wii U days and comes in at close to 5,000 lines for us.
There are four parts to this new service:
- GetContentVersion - Simply grabs the version of the bad word dictionary to use from a firmware file version.dat.
- Check - These methods actually perform the heuristics on any text to determine words or strings to flag. There is a common dictionary of terms to always flag, and then a per-region specific dictionary that can check specific strings that are problematic in certain regions.
- Mask - This method replaces any bad words within a string to be asterisks (*) up to the first 512 characters; beyond this the string will not be processed. Other than that there is a rather crude email-address check, and new abilities to both ‘normalize’ text according to Unicode standards and transform a string into canonical format.
- Reload - What it says on the tin. Unmounts and remounts the system archives. Unknown use, possibly just a failsafe.
On the whole, quite a lot for basic word checking. We can’t show the list of generated terms and sub-strings in the various dictionaries for obvious reasons, but some of them are… imaginative!

Two of our already implemented services: lbl which (prior to firmware 10.0.0) controlled the backlight and screen services, alongside wlan which manages the general LAN services, were both moved to our new horizon project. We highlighted this when it was first added, but the core premise is that the way we originally handled a lot of service implementations had a number of key flaws. As there are a lot of services that we’ve implemented over the last 5 years however, this is moving piecemeal with services being migrated over time. More about this specific change was covered in the first progress report of this year.
Alas, we obviously cannot continue without answering the question. Can it run Crysis?!
The answer, prior to September, was a resounding no! Luckily it isn’t some GPU insanity, or a one-use, custom CPU instruction, just some network checks… honestly quite disappointing considering the game’s legacy. By stubbing the remaining unsupported BSD socket options, everyone's cult classic PC killer can actually get back to business.

We haven’t had to touch the audio services in a while but September blessed us with the release of Ys X: Nordics which amplified a few issues with the implementation of the compressor effect in the audio renderer. There is a whole list of small inaccuracies that were cleaned up, and this allowed the title to find its voice.

Super Bomberman R 2 is the final title this month that poked holes in our fake software Switch with its use of brand new services of the friend class. As almost all of these types of services are useless without a connection to Nintendo, they could be easily stubbed and allow the game to be fully playable! Albeit a little blurry.
If any of you are developing a game, please let people disable anti-aliasing filters!

Way back in 2021 when everyone was stuck inside and begging for some multiplayer, we released a preview build of a feature more generally called LDN. In reality that is just the name of the services that handle the Switch’s local wireless functionalities and is something to be reverse engineered and implemented just like any other.
The initial preview got pretty popular and was “good enough” for most people who wanted multiplayer to make use of, but not really in a clean or accurate enough state for us to merge it into the main codebase. We never really intended a couple of years to pass but stuff happens and priorities change.
Now at the end of 2023, there is once again some focus on getting all of this organized. The initial ldn:u, INetworkClient interface and DisabledLdnClient implementations were finalized which is a huge piece of the puzzle, even if they don’t yet provide any of the framework for actually making use of the local wireless features. Stay tuned for the follow-up work which will implement the actual bridging of these “local” connections over a more useful target… the internet for instance.
For our macOS users, specifically those on M1/M2 chipsets, there were a number of titles that could very easily get stuck on boot, on loading screens, or almost anywhere else in gameplay. The issue was isolated to some skipped VCPU interrupts that now have their own dedicated VTimer in order to periodically interrupt execution, if the full call is missed. This allows titles such as Persona 5: Strikers, Bravely Default and Life is Strange True Colors to be playable beyond a brief two minute session.



To round off this month let’s jump through some quick-fire changes to the miscellaneous side of emulation.
Closing Words
As September has already faded, we hope that everyone will have an excellent autumn period. We ourselves are juggling a number of different tasks, some of which are finally catching up with us! Our resident LDN fiend is deep in the weeds of finally getting all our multiplayer work collected, and those with a Steam Deck are pulling their hair out trying to fix whatever quarrel Avalonia and Gamescope are fighting over.
As usual we’d like to thank all our supporters, no matter what form that may take. If you’d like to chip in in any way you can then we always welcome code contributions on GitHub, donations to our Patreon and helping fellow users via our Discord. Until next time friends!
2023-10-09 21:03:14 +0000 UTC
View Post
Miss us? Summer only lasts so long and some of us need our holidays.
2023 continued to deliver swathes of new switch titles with Pikmin 4 and a remast-... what? It’s a port? Well if you say so… Ahem, a re-release of Rockstar’s western classic: Red Dead Redemption. A game which has been killing emulators since before some of our readers could read. Luckily we’re, as the kids say, built different.
We’ve got a good one cooked up so let’s get to it.
We’ll start with some good old OpenGL games: Wreckfest and 20XX; the latter not to be confused with 30XX or a niche Super Smash Bros: Melee meme of the same name. Both of these titles rendered perfectly, but upside down. Requiring users to bring a horizontal mirror with them before sessions seemed a tough ask, so fixing an incorrect fragment origin seemed a better solution.
20XX before:

20XX after:

Wreckfest before:

Wreckfest after:

Let’s stick to some guest OpenGL game bugs in the form of Dragon Quest Builders. A game which a discord user, who will go unnamed, has mentioned in at least 30% of their server messages. The dedication to the cause is truly inspiring.
DQB, and potentially other OpenGL switch games, had a very strange issue where item icons would simply be muddled up with other item icons. Even with direct comparisons it’s hard to spot what exactly is wrong unless you’ve played before.

That sword does not look like cloth at all.
By tracking buffer copies that modify texture memory, which resolves an issue where buffer data was being copied directly into memory. This only works correctly if the texture data does not already exist, if it does… weird stuff.

Alright after this we’ll shut up about OpenGL games, they’re just so annoying?
Some of them were using some interesting texture formats that we’ve never seen before like: Z16RUnormGUintBUintAUint which appears to just be an extremely long alias for Z16Unorm. This is used for some shadowmaps in titles such as Go Rally, Pyramid Quest and Monster Blast.
Go Rally before:

Go Rally after:

Pyramid Quest before:

Pyramid Quest after:

Moving onto Jurassic World Evolution Complete Edition, which is surely in contention for the ‘most-Bethesda game name’ award, had been in a state of relative limbo since its release. The game has booted since release, but upon entry to a campaign, would deliver nothing but a view of your desktop, the program having crashed. However, by eventually tracking the issue down to a mishandled case in shader instructions, we can finally get a look at some trees.

Red. Dead. Redemption?
For those that didn’t experience this game all the way back in the late 2000s, Rockstar graciously blessed us with a new switch port of their original PS3/X360 title. As it has never yet seen a PC release, it has stood the test of time as the game to stress PS3 and Xbox 360 emulators such as RPCS3 and Xenia. You can’t find a video on these without stumbling across RDR1 somewhere.
So the question on everyone's mind, admittedly including our own, was if the Switch version was going to be another wasted effort, or a genuine alternative route to getting it onto PC. Thankfully the answer is fairly positive. After fixing a Vulkan-specific bug with masked stencil clears, which resolved a very interesting psychedelic effect where foliage failed to render, the rest of the experience was as close to flawless as we dare call anything.
Vulkan before:

Vulkan after:

Hardware requirements for the native 30FPS are fairly modest and performance can reach 60FPS (and beyond) on the top-end CPUs. Switch emulation being relatively GPU-light means that resolution scaling to 4K or higher is effectively free on any competent GPU. We dislike outwardly making comparisons to other emulators, especially when they’re completely different consoles, but we’re confident that we’d place favorably in an RDR1 emulation tier list!
Moving away from video game westerns, how about we talk about AMD? It's been a while since we’ve had a good therapeutic rant. Well maybe this one is a little more justified.
The Switch is powered by an Nvidia-designed Tegra X1 which means that sometimes, where Nvidia and AMD diverge in how they design their hardware, workarounds are going to be required.
One such issue showed itself in a lot of games, especially Unreal Engine titles. GPUs have a property which is usually called `Invocations per subgroup` and crucially AMD and Nvidia diverge here in their GPU designs. Nvidia uses 32 invocations per subgroup, while AMD uses 64. RDNA onwards support a Vulkan extension which allows a GPU to change its subgroup size but only for compute shaders, so while this fixed some games like Shin Megami Tensei V on modern AMD GPUs, if a game used these operations in the rest of the graphics pipeline, no dice.

The solution is to simply sub-divide the 64 into two groups of 32 instead of just ignoring any extra invocations beyond the 32nd, resulting in the mess seen above. Fixing this fixes a staggering number of AMD-exclusive graphical bugs.
Marvel's Ultimate Alliance 3 after:

Nier Automata: The End of YoRHa Edition before:

After:

Luigis Manshion 3 before:

After:

Monster Hunter Rise before:

After:

It was a good couple of months for AMD users in general, as new contributor gleng, isolated and fixed a prevalent issue AMD owners were experiencing on macOS devices. Namely that some games like Pokemon Scarlet/Violet only rendered in ¼ of the screen space.

The AMD metal driver, or when it goes through MoltenVK, seems to have serious issues using the `VK_EXT_shader_viewport_index_layer` extension which results in disaster. Luckily there doesn’t seem to be any negatives to just disabling the use of this specifically for AMD devices when using MoltenVK.

All this macOS talk brings us nicely into our next little section. Because this is gonna be the last one.
These last couple of months have been really huge for our macos1 upstream progress coming off the back of transform feedback emulation in June. The final large refactor of the shader backend was pushed through which simplifies one of the final puzzle pieces of geometry shader emulation.
Before we get to that though, a long awaited change on the performance front was the introduction of Buffer Mirrors. The Switch GPU has certain methods that allow it to load arbitrary data into buffers at any time with no need for additional barriers. Vulkan on the other hand does not have any functionality that allows these arbitrary updates; you must be outside of a render pass and manually perform a buffer copy or compute write to achieve a similar outcome. On desktop GPUs, interrupting a render pass is not particularly expensive and can even be considered free. This is unfortunately not the case on mobile GPUs (such as those found in M1/M2 chips) where ending a render pass is much more of a commitment.
The solution to this intriguing issue is aptly ingenious. Whenever a game requests an inline buffer update, we can take a ‘mirror’ of the current binding, perform the update and then rebind to the new mirrored buffer. This can greatly improve performance on M1/M2 Mac’s depending on how often they use these inline buffer updates. Most games perform them to some extent so improvements are global.

And the coup de grace, geometry shader emulation, was also finalized and merged right at the end of August. This puts a cap on all of the major milestones we wanted to hit and full parity with the initial macos1 release, though faster, better and more importantly, with cleaner code that doesn’t impact other OS’s or hardware configurations.
The implementation of geometry shader emulation present today is a little different to how it was implemented last year. Instead of additional vertex draws, this is an implementation fully in compute shaders, which were chosen to support subgroup operations, something that was impossible when emulated via vertex shaders.
Splatoon 3 before:

After:

Crash Bandicoot N. Sane Trilogy before:

After:

Marvel's Ultimate Alliance 3 before:

After:

Well, there you have it. While the eagle-eyed among you may have noticed that we’re still missing a single item from our upstream list, we do not consider it to be of great importance in most cases. As such, if you check our website, macos1 has finally been retired and we highly recommend all macOS users to go and grab our latest, fully-featured and auto-updating release!

This does however mean that you’re gonna need to now share this progress report with everyone else. The days of entire sections being devoted to Steve Apple are now behind us; without further delay, let’s shift onward!
To start our descent toward the end of this report let’s blast through a few miscellaneous smaller changes in July and August:

Where is the sea?
Ava UI: Make some settings methods async #5332
[Vulkan / Avalonia] Implement Color Space Passthrough option #5531
The last update we gave on any GUI advancement was that we were patiently waiting on Avalonia, the framework we’ve built our new frontend with, to update to their next major milestone, 11.0. Well, that happened and brings with it a few excellent improvements.
- Performance no longer differs depending on window size. Previously a small window would perform better than a fullscreen window!
- General performance and responsiveness of the framework improved dramatically.
- The title bar will finally match your system color theme on Windows.
- Lots of misaligned elements on macOS were resolved.
- Flatpak compatible! This was the main issue prior to 11.0. If we’d have jumped early, Linux users would have been left behind.
So the question now, if that was merged in July is, what's the hold up now?
Well, there were still bugs in this endless game of whack-a-mole you play with software. macOS and Linux had an issue where if the window was not focused, dialogs like the software keyboard would not spawn, resulting in a highly annoying softlock. Many assumed it was a real in-game softlock and not just a GUI bug.

As expected, a seemingly pointless `isActive` check was being performed on the window before the program would provide any content to populate the dialog. Removing this was all that was needed.
To further improve performance, lots of the settings configuration states, mainly the function that queries the Vulkan device list, were made asynchronous to significantly reduce the time the settings window takes to open. Previously the main thread was spending almost 60% of its time just to populate the GPU device drop-down, a task that does not need to block!

There is currently one more improvement to startup times in the pipeline that we want to get in before shipping this as the main frontend, as we really do not want there to be any downsides. If this seems like perfectionism, that’s because it really is. Stay tuned.
And finally, for those who are lucky enough to own a P3 compatible display, there is a new option for Vulkan to pass-through the color space selection to match your display, instead of forcing all content to sRGB. While you will technically be sacrificing color accuracy, if you like a wider gamut and a little more saturation, then give it a whirl. If any of you own an OLED switch, it’s similar in principle to the “Vivid” mode that those models offer.

Closing words
For those of us who do not follow us on Twitter, firstly go and do that, we recently previewed a lot more on the final patreon goal of texture replacement. Check out that tweet here if you haven’t already.
To summarize, it’s available to test and we’d really appreciate it if folks who have experience working in this area, either as an artist, modder or interested party, to give us feedback on the tools you use and if everything works as expected. This is one of those features that ultimately other people will be using, and it’s best to get in early while major changes can still be made!

Simple button icon switch
Once again we’d like to extend our thanks to everyone who continues to support us on Patreon, help and hang with others on our Discord and help code this monster on GitHub!
2023-09-08 21:31:33 +0000 UTC
View Post
Wooooooahhh we’re half-way thereeee… 🎶
No matter how the year has gone so far, the midpoint always feels weird. Wasn’t it like January a couple weeks ago? Compared to May, while no genre defining blockbusters graced us with their presence. Pikmin fans had a demo to sink their teeth into, a new entry: Master Detective Archives: RAIN CODE from the development team of Danganronpa and a few other titles that we’re sure folk who are more cultured in video games than us are aware of. Both titles mentioned above also had day 1 compatibility, which bodes nicely for the full release of Pikmin 4 in particular!
Enough chit, let’s chat.
For those who are sick of hearing about Tears of the Kingdom, we apologize. A couple of fixes spilled over into June so we can all talk about it for another couple of paragraphs.
Intel GPU owners were not particularly thrilled that they were actually bottom class citizens behind Mac users when it came to booting this game. The Intel Vulkan driver has a rather hilarious bug in that if a render barrier is placed after a return, it will simply hang. Removing these barriers on Intel drivers if the flow is potentially divergent allows ToTK to finally head in-game.

Another check we now have to make on the long list of vendor-specific bugs…
And lastly for Zelda, many users across the board were noticing a random issue with gloom deposits, where the ground texture would get itself into a completely broken state.

The compressed 3D texture for gloom was undergoing an incorrect layout conversion when decompressed and causing the rather spindly effort above. On Vulkan this issue was seemingly random depending on whether it was above to form copy dependencies or not, while in OpenGL it was consistently broken and hence very simple to reproduce. Fixing this layout conversion when the 3D depth value equals 1 resolves the issues in both backends. OpenGL once again coming in clutch, regardless of the haters.

Interestingly enough this change also fixed a long-standing issue in Spiritfarer where the character and NPC sprites were entirely missing; bucking the usual trend of indie titles fixing AAA issues, not the other way around.
Before:

After:

We mentioned last month that gdkchan had been working on a fairly huge refactor to the GPU emulator in order to reduce as much backend-specific code as possible. The end goal of which is further unification of all the backends we currently target: OpenGL, Vulkan and Metal (via MoltenVK) to reduce our maintenance commitments and technical debt. Two changes to this end in June were the implementation of shader storage buffer, load/store, local/shared and atomic shared operations using the new global load/store methods.
With the rumours, trailer leak and then trailer of a Persona 3 remake, some folk finally noticed that Persona 4 Golden looked a little strange on Ryujinx when the SMAA filter was applied.

Igor seems to have murdered quite a few people to get the velvet quite that shade of magenta. Or maybe a paper-cut, he seems awfully pale.
Morbid metaphors aside, this scene is a byproduct of some BGRA/RGBA swaps occurring when the filtering is applied to the original image. There was an attempt to correct this for non-storage image operations but it was failing in this case. The fix here involves simply binding the original BGRA as the storage image, rather than creating an RGBA copy, which almost all vendors support.

Let’s move onto some cool stuff related to macOS. The refactoring mentioned above isn’t just for codebase aesthetics, we’ve got plenty of that coming later, but also to make annoying stuff… less annoying.
Using all those new shader operations, transform feedback emulation has finally been upstreamed and is used on any devices without native support, not just Apple silicon. This implementation is substantially cleaner than that which was used to ship macOS1 and comes in at just under 400 lines. Some very notable games will now run and render on our master builds for Mac.
Pokémon Scarlet/Violet:

Pokémon Legends Arceus:

Xenoblade Chronicles: Definitive Edition (title screen only for now):

Pokkén Tournament DX:

Metroid Prime Remastered (geometry shaders are needed for some shadows):

Donkey Kong Country: Tropical Freeze:

Some GPU vendors do not support float64 shader operations, including both Apple and Intel. Use of these is relatively limited across the Switch library but there are a few instances where they’re used. For Intel, adding a mechanism to convert from float64 (double) operations to supported operations prevents a device loss in Tears of the Kingdom. While for Mac it’s most notable fix is to Rune Factory 4.
Before:

After:

A final much simplified upstreamed component of macOS1 is a SPV-Cross (library used to convert SPIR-V shaders to Metal MSL) workaround to avoid a stack overflow. Due to the very deep nesting and recursion that SPV-Cross seems to use, the default stack size is simply not large enough for some games, notably Splatoon 3 and Mortal Kombat 11. In macOS1 this was countered by using a custom thread pool rather than the threading resources that .NET provides. This wasn’t ideal and was up there as one of the messier workarounds we had to settle for to get something out the door.
While gdk initially opened the pull request anyway, a user noted that default stack size can be set as an environment variable within the Application plist. This meant that we didn’t need to tell users to manually increase the stack size, and it meant we could avoid any workarounds! What could have been a 200 line, vendor specific section of code was reduced to 7 lines of environment variable adjustment.
Splatoon 3:

Mortal Kombat 11:

dotnet-format: Apply new naming rule to all projects except Vp9 #5407
Moving onto some quick-fire changes:
While all of that certainly is a lot, the majority of June was monopolized by one thing… DOTNET… FORMAT.
To give a little background, Ryujinx is of course open-source and hence we get external contributions from lots of different people in a few areas of the project. While at surface level this sounds great, it isn’t that simple. Every time someone external offers a contribution our core development staff can either continue working on whatever they’re doing, or spend hours reviewing these external changes to make sure that they do what they say they do, don’t break anything and also conform to the standards we expect of our codebase. No one working on Ryujinx currently does so full-time and asking them to devote the spare time they do have into what boils down to marking homework, isn’t always the most appealing prospect.
This is 2023 though. Surely we can automate some of the more trivial stuff into a bot or something and let the reviewers actually focus on functionality, rather than having to leave hundreds of comments like “remove extra spacing” and “why did you add an extra line here”. The answer is yes, but there were a few things we needed to do first.
.NET has a nifty little built in tool just called Format which, as you would expect, formats code to conform to the standard C# codestyle. The first time this was run it created a rather monstrous difference of over 30,000 lines of code that needed updating, changing so many parts of the emulator that it would basically be impossible to review as a homogenous lump. The decision was made to format the codebase per-project and this 30K monster was split into around 50 different pull requests. Followers of our changelogs may have gotten rather bored of the repetitive line “Code cleanup. No expected changes in games” but this is the explanation.
The final goal of all of this is to add a bot workflow that automatically reviews code-style and ultimately makes the review process easier for everyone involved. While this isn’t particularly flashy for a progress report, we think it’s important to discuss as it’s something every open source project of a certain size must think about. How to make the time balance of external contribution versus core development tip in everyone’s favor.
Closing words
Well after a rather word-heavy end, we won’t continue to prattle for very long.
As is standard we’d like to thank everyone who supports us on Patreon, tests and contributes code via GitHub and even those of you who give up some of your own time troubleshooting with others on our Discord. We hope to have iterated it above but time really is our most valuable resource, all of you are giving us more of it in various ways and for that we’re forever grateful!
Until next time. Live on a prayer.
2023-07-11 17:04:38 +0000 UTC
View Post
May we offer you a progress report this fine month?
Tears were shed, monarchies were restored, and we’ve seen a fully-functional, multi-stage detachable dropship, complete with cruise missile systems and meat grill in a Zelda game. Truly, nature is healing. If you hadn’t already guessed, this month has been almost entirely monopolized by that pesky blonde princess, but fear not, we did find time between play-sessions to work on an avalanche of changes, fixes and improvements.
We’ll save a little something special for the end, but for now, let’s get to it.
We begin this month, not on a certain AAA release, but on the recently released Demon Slayer - Kimetsu no Yaiba. While the art-style of the show is very faithfully recreated in video-game format, fans quickly noticed that many models and textures appeared fuzzy, with an almost TV static-like effect.

As seen above, this haze is caused by an incredibly small floating-point error on a shader operation. Within the compiler-optimized output, there was a redundant multiplication operation occurring between `gl_FragCoord.w` and its reciprocal (1/gl_FragCoord.w). While on paper, these should cancel, computers are unfortunately not as simple as pure mathematics. With this multiplication step removed, the floating point error is swept away with it.

Before:

After:

How ‘bout we talk about Zelda now? But for the moment, it’s gonna have to be Breath of the Wild. As mentioned last month, the bulk of the performance optimizations for this title came in May with a whole slew of changes that actually did hit the ground running with a couple of other games as well.
- Rendered textures without any pool reference are now kept alive to avoid recreation. Version 1.6.0 of BoTW began to clear/write textures while never actually sampling them, causing a lot of headache for Ryujinx’s texture caching mechanisms.
- Vertex buffer updates and now batched in Vulkan. Avoids individual vertex buffer updates to reduce draw calls. Minor improvement and will vary heavily on the game and GPU driver.
- Granular buffer updates from constant buffer updates are now allowed. Constant buffer updates used to be uploaded as a full 4096-byte chunk; by tracking the offset since the last update, we can only upload the range that represents the newest buffer update. This dramatically reduces the uploaded bytes per frame by almost 50% in Korok Forest.
- Vulkan fence manager and MultiFenceHolder were simplified. There were a number of bottlenecks and slow data structures in use within the MultiFenceHolder that were dramatically cut down. Improved performance in any backend bottlenecked titles (this is very system and game specific).
- CPU region handle containers were removed. A rather stupid speed-up across the board here as we were accessing CpuRegionHandle objects via an intermediate instead of simply referencing them directly. A very small and simple change yet due to its huge usage per frame, we saw a non-trivial 5% performance increase in Super Mario Odyssey (a good benchmark for raw drawcount performance).
- Textures that are flushed often are now preemptively flushed to host-imported memory (when available). Breath of the Wild is a huge offender in wastefully reading back data from linear textures; foot placements on terrain, water level at an object's/link's position, even some information used to color underwater terrain and to populate grass. The game basically breaks if you don't do it properly. Performing a preemptive and direct flush of such textures to CPU accessible memory skips a number of waits that the GPU would otherwise perform while the data was being copied around. In an ideal scenario, when the texture layout is linear, it can even skip another step and copy directly to the GPU by importing memory directly. The Legend of Zelda: Skyward Sword HD is also a fiend for reading back texture data to check if the sun is obscured and also sees significant gains here.
After all that, let’s look at the outcome for a couple of titles:

As the test title for most of the performance related changes of the last 2 months, it isn’t unsurprising to see a very healthy 30% uplift on our test systems in Breath of the Wild. As mentioned prior, Skyward Sword HD also sees a very disproportional uplift of 25% from the preemptive flush change alone. Remember that texture it uses to readback data on sun occlusion? That thing is a full fat 1920x1080 RGBA8! Xenoblade Chronicles DE, as usual, somehow sneaks a small improvement from just about anything we do, but XC2 also manages a 28% uplift in the main town hub.
On the topic of Xenoblade, DE and 2 managed to become the catalyst to fixing some annoying, yet simple graphical bugs. On Nvidia drivers starting from 522.XX and pretty much all AMD drivers, XC:DE, XC2 and Bayonetta 3 exhibited major graphical artifacting. This was usually limited to UI elements in the Xenoblade titles but were expressed as full blown god-rays in Bayonetta which obscured most of the screen space.

Due to a very specific render scenario, there were some cases where the correct barriers were not being correctly set due to the order in which the checks take place. By adjusting when the barrier check happens, the barrier can be inserted correctly.

XC2 before:

XC2 after:

XC:DE before:

XC:DE after:

Alright enough with the formalities. Tears of the Kingdom… *drum roll*... Ladies and Gentlemen, we did it again. Our trophy cabinet of Day 1 playable titles receives possibly its largest accolade to date. The team is extremely proud of this one as it continues to re-assure us that we’re doing this whole emulation thing properly! That’s quite enough of the self-flattery though, the experience certainly was not perfect so let’s discuss what went wrong, and how we fixed it. We will try to keep this section as spoiler-free as possible, but as comes with the territory of using screenshots, proceed at your own risk.
The most immediate graphical issue could be spotted almost instantly. Rock and wall textures were littered with white square-shaped artifacts as shown below:

These splotches were being caused by a bug in the ASTC decoder which was setting an incorrect endpoint in LuminanceDelta mode. This affected all users who did not own a GPU with native ASTC support; basically everyone except for, ironically enough, Mac users.

Eagle-eyed readers may have already spotted the next issue in the two screenshots above. Those textboxes sure didn't look like that in Breath of the Wild and it appeared that in Tears of the Kingdom, they were setting their swizzle texture incorrectly. By explicitly failing an exact match condition, we can force the creation of a correctly swizzled D32 texture. This change also resolves a very old bug in Mario Kart 8 Deluxe where returning to the character select screen could sometimes break the character model cubemaps, causing them to appear a solid silvery color.

Moving down the list of most obvious bugs, the game at launch would experience seemingly random crashes citing an invalid memory region error after an hour or so of play-time. This one was actually caused by the shader cache matching a current shader use with the address of a different shader. The issue arose when that different shader had been partially unmapped, causing a crash. By only reading the mapped portion of the shader, this will instantly fail the compare condition and compile/lookup the correct shader instead.
Back to stuff you can actually see, the Vulkan backend was disabling explicit LOD when using depth-comparison with array textures. Interiors and a lot of shadow-based lighting was heavily affected. This seems to be a bit of legacy left-over code from when Vulkan used to cross-compile from GLSL as the extension is unsupported there. By removing this redundant blockage, the problems vanish!
Before:

After:

As far as vendor specific bugs go, Tears of the Kingdom came with plenty of baggage. For Nvidia users, the Vulkan backend was performing much worse than it should have done and in some scenarios up to 57% slower than OpenGL. The gap only widened when scaling to higher resolutions. The problem here was maddeningly ironic; a couple of months ago we implemented a system to migrate data around between device and host memory which helped almost every game across the board. Unfortunately, Breath of the Wild benefits from the exact opposite buffer locations as Tears of the Kingdom and the implementation was obviously tuned for the former, not the latter. By device mapping any buffers that are written more than they’re flushed, Nvidia GPUs no longer get kneecapped here.
AMD are not spared the spotlight either. The gloom in the depths seems to function by emitting LOD texture sampling instructions via compute. While this is somewhat valid behavior with Nvidia-specific Vulkan extensions but on AMD Windows, this caused texture sampling from compute to fail and cause “gloom damage” even when visibly, Link was not stood anywhere near.

It seems that other drivers were simply ignoring this invalid behavior and sampling LOD as 0. By checking the instruction is being used exclusively on fragment, the phantom gloom is a thing of the past.
The depths causing problems seems to be a theme because many users reported significant performance fluctuations after random amounts of time exploring. When down there, Tears of the Kingdom uses a global memory access with an address on constant buffer slot 6. This isn't standard and thus isn’t the size we expect, this caused us to read back a garbage size that ended up very large, which would synchronize a large amount of data per frame. Adjusting how we calculate the buffer size should bind it to a reasonable size and stop it crossing into other memory.
Onto something everyone could enjoy, Z-fighting. Z-fighting is a phenomenon in 3D scenes where if two ‘objects’ are very close together that they can appear to have an almost identical depth value in the z-axis. When this happens, the camera can effectively see a random assortment of geometry from both as a flickering effect as both “fight” to be “on top”.

Static representation.
In ToTK, this effect could be seen on distant geometry which has less precise depth values than objects close to the camera.
https://user-images.githubusercontent.com/6191957/239662266-fefdcd38-8fb6-49b4-b5b9-54f44c55ae2b.mp4
Adding support for `VK_EXT_depth_clip_control`, can significantly reduce the bulk of the larger geometry fighting. There is still work to look into the remaining fights, but it should be isolated to zoom-in shots now.
https://user-images.githubusercontent.com/6191957/239662272-59c1a897-2428-47e7-b0e6-d45623623286.mp4
Alas, all of the improvements to rendering and performance above mean nothing if the game refuses to run any faster than 20FPS though. Both Breath of the Wild and Tears of the Kingdom make use of a double-buffered VSync implementation that can dynamically switch between a 30FPS and 20FPS target depending on the performance of the Switch. If it starts to thermal throttle or drop frames, the game can simply swap to its 20FPS mode and maintain its speed. How does it know if it’s performing badly? By using the timestamp of the GPU at the current frame. It seems that Ryujinx was incorrectly reporting its timestamp because by simply forcing the first timestamp on game boot to be 0, thus making all future timestamps an offset from 0, ToTK finally seems to realize it isn’t running on a potato!
To finish off this extensive Zelda segment, some new shader formats were implemented in the form of p2rc, p2ri, p2rr and r2p.cc. We aren’t actually sure what they’re used for, but the logging console seemed to spit out an “unsupported format” warning occasionally so it does use them somewhere! Find out where and we’ll give you a prize*.
*we allow you to be smug for no longer than a period of 3 seconds.
MacOS development:
The upstreaming work for macOS continues at a rapid pace this month including some pretty massive and vital changes being merged.
As we mentioned last month, universal macOS packages are now part of our master build pipeline. This allows users on macOS to download a fully up to date, bleeding edge version of Ryujinx from our release pages. Be aware that not every Apple Silicon specific optimization we worked on for the `macos1` release has been merged yet, so many games may perform worse/render differently and this is the reason we are still linking to the original build on our website as it has a larger compatibility profile at the moment. This should change soon.
A few, different, varied and fun MoltenVK bugs were given work-arounds in May with the end result being that titles like Xenoblade Chronicles 3 are very close from rendering extremely respectably on macOS, with only a single change pending to make this happen.


And we hope you aren’t bored of Tears of the Kingdom but all roads do seem to lead there. For some unknown and honestly miraculous reason, the game not only ran on day 1, but more interestingly unlike its predecessor, didn’t instantly kill the hypervisor. This is good news for everyone because to this day, Breath of the Wild still needs to use the much slower JIT, whereas Tears of the Kingdom can natively execute all its code. We give our thanks to whichever game developer changed that!
This is not to say that Apple users did not escape the bug blast. Huge vertex explosions plagued the game when Link wore specific or no clothing at all. By truncating any vertex attribute format that exceeded the stride we stop MoltenVK from providing Metal with incorrect vertex values.
Before:

After:

The second issue was once again related to clothing. Sensing some prejudice here… Shining bright white spots would appear on certain outfits and world geometry which, while looking actually rather cool, was clearly a garbage value in a shader somewhere.

Sometimes games may add a very small offset to a value in order to make completely sure that it will never be used in an operation that could result in a division by zero. This makes sense, division by zero is certainly quite bad for computers to deal with. Computers are also very smart these days and compilers for shaders will usually try to optimize away anything it deems as incorrect, wasteful or inefficient. Randomly adding tiny values to stuff is prime territory for the compiler to ruin your day, as has happened here. Luckily values can be qualified as “precise” in SPIR-V and this allows them to be left well alone in the optimization stage.

While rendering is fairly sorted on the latest builds, plenty of more intensive games (ToTK among them) still need buffer mirrors implemented in order to perform well. We covered this in our initial blog post but as a TL:DR, they attempt to bridge the gap between how a desktop type GPU, such as that found in the Switch, and how a mobile type GPU, such as that found in M1/M2 chips, render graphics. Work is currently progressing on getting this merged without negatively impacting Windows and Linux users, something we didn’t really need to worry about for macos1.
While these changes are certainly flashy, gdkchan has been busy effectively reworking a massive portion of the shader backend across multiple pull requests. The end goal of this work is to implement emulation for transform feedback and geometry shaders in a much cleaner and maintainable method than that used in macos1.
May brought the final groundwork for this undertaking in three parts:
- Replacement of constant buffer access on shader with new `Load` instruction. Condenses the `ConstantBuffer` operand and `LoadConstant` instruction into a single `Load` instruction, reducing backend complexity and improving flexibility. This change also fixes the vertex explosions faced by AMD GPU users on Windows in Super Mario Galaxy (3D-all stars).
- Generate scaling helper functions on IR. This change moves all of the resolution scaling code out of the SPIR-V and GLSL backends and into a single homogenous helper function. This ultimately means less code, less work to implement more backends in future, and reduces the likelihood for differences to occur between any current or future backends.
- Replacement of ShaderBindings with new ResourceLayout structure for Vulkan. The ResourceLayout is used to create the PipelineLayout on Vulkan, rather than it having 2 hard coded layouts (one that was used for game shaders, and another that was used for helper shaders from the backend called "minimal layout"). Since we need to reserve additional storage buffers for transform feedback and geometry shader emulation, the PipelineLayout also needs to be different. This change allows this to be done in a simple way.
With those now in place, the first pull request for transform feedback emulation is open, with geo shaders to follow. We’re aware this has taken a fair while, but the team are far happier with the implementations that the reshape allows, compared to the rather complex solutions employed for our first macOS release.
SERVICES:
When emulating an operating system that is still in active development, it’s easy to get swept away and forget that maybe stuff has changed since it was originally reverse engineered. As such it was time to return an eye to the… time services. With a full RE of these from firmware 15.0.0 we were not only able to ensure our accuracy for at least another few versions, but also stumble upon some old mistakes. This change finally fixes the completely static timed PokeJobs in Sword/Shield and likely other games that make use of timed events. It was about time if you ask us.
Alongside that RE work, our resident audio-maestro marysaka, went back over to the audio renderer, fixing some audio bugs in Tears of the Kingdom and implemented support for full 5.1 surround sound when using the SDL2 backend with a compatible game and speaker system.

5.1 Surround vs Stereo arrangement
Most first-party Switch games actually do support surround sound, so this change is a welcome one for those with the space!
MISC:
To finish us up let’s rattle through a quick-fire round for some of those more niche, yet oddly helpful improvements:
Last but not least, ‘Stop emulation’, a button which has felt somewhere between useless and a Russian roulette for a fair old while now should finally work… in most cases. Four more causes of deadlock were isolated and resolved this month from GUI, ServerBase, CPU and mostly GPU code. The nightmare should, mostly, be over.
Closing words
Did we say we had something cool? Have a gander at this.
https://github.com/Ryujinx/Ryujinx/assets/44103205/0d2565be-c892-4f1d-a060-9b78c7e042e3
That’s all we’ve got for you this month! If you like what we do and want to give us a helping hand, you can check out our Patreon, have a wander through our GitHub, or join us on Discord. As per our usual pitch, open-source software is driven by folk around the world who find something annoying, and fix it. Are we annoying? Well, you know what to do.
Thank you all for reading and we’ll be back in a month! Au revoir.
2023-06-09 22:44:23 +0000 UTC
View Post
Hello folks, it’s your favorite time of the month once again. No need to say anything, we know it’s true!
Plenty happened in April which you’ll soon discover below, but before that let’s take a look through the current patreon goals.
We’d like to reiterate once again that any features listed below will eventually be worked on, regardless of the goal being met. It would simply become a priority as soon as the incentive amount was sustained. This, of course, isn’t true for the full-time development goals which, by nature, are dependent on consistent backing.
Patreon Goals:
$2000/month - Texture Packs / Replacement Capabilities - dipped below this amount during March but extremely close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
All aboard…
GPU:
April's journey begins with The Legend of Zelda: Breath of the Wild. It really couldn’t be anything else could it? Throughout April, people were suddenly very interested in the stability, fidelity and performance of the game and not without cause.
For Nvidia GPU owners, fidelity and graphical accuracy has never seriously been an issue for years now, but this was not the case for AMD and (a growing number of) Intel users. Grass shadows would have major artifacting around their shadows and this effect could even be seen on character models and other objects. Given they were in shade.

The issue here is how different drivers are tie-breaking when selecting texels when exactly half-way between two options. Nvidia, Apple and Mesa will break the tie correctly while AMD/Intel go the opposite direction. By applying the smallest positive bias possible, we can force these drivers to choose correctly.

Performance-wise the quirks here are numerous and will span into May. One of the main performance bottlenecks for Breath of the Wild was its incredibly long render passes with large numbers of draws in each. This meant that the backend could potentially end up building a single command over 4-5ms (a very large number when a frame can be as short as 16ms). This is worsened by BoTWs extremely aggressive GPU synchronization requirements, meaning that the game is forced to wait for the completion of each large command. Reducing the size of these command buffers would therefore reduce the impact of two net debits to performance.
By implementing a so-called “fast-flush” mode to the Vulkan backend, we now force command submission periodically if the game is syncing aggressively enough. We saw improvements of up to 11% in BoTW and some other GPU-limited situations such as when resolution scaling is used in Pokémon Scarlet/Violet.
Leaving Zelda alone for now, we’ve mentioned Fate/EXTELLA a fair amount in the past and this month is again no exception. It seems to have an odd knack for highlighting some rather niche gaps in the GPU emulator so we hope you aren’t bored with its continued cameos! It turns out we’ve been missing a single case of multisample <-> non-multisample depth conversion to complete the set, ultimately causing certain textures to simply not render. By resolving this final conversion case we hope (!) to finally put this game to bed.
Before:

After:

Now comes a new recurring segment of these blog posts: our coveted ‘GPU-vendor-specific bug of the month’ award. Snatching the prize out of last month's winner Nvidia it’s…….. AMD! Now the keen readers out there may be asking “why isn’t it a tie with Intel over that whole grass thing?”. It was tough let us tell you, the panel debated long and hard on this verdict but it was ultimately decided by a complete and catastrophic breakdown of Pokémon Legends Arceus that just edged AMD into the lead.

Don’t drink and write GPU drivers.
Starting in drivers 23.x.x, we had hoped that this would be quickly resolved in a couple of driver patches. Word on the grapevine told us other programs were exhibiting driver bugs with these versions and thus we waited. One, two versions passed us by and still no change. Fine, we’ll do it ourselves.
They broke transform feedback… AGAIN! We’ve already had to change the implementation twice but three times is, hopefully, the charm.

Red vs Blue, a tale as old as time. Some guest OpenGL games on Switch make use of a particular functionality in the GPU DMA engine that was causing some interesting color swaps. The function itself is more or less a simple shuffle, which is used to re-order things like pixel components in a texture. The Switch OpenGL driver uses this to perform BGRA (Blue, Green, Red Alpha) to RGBA (Red, Green, Blue, Alpha) data conversions. As expected, not implementing this results in this swap never occurring. In some cases it can seem like nothing is wrong, but if you’re familiar with how a game is meant to look it becomes more obvious.
Dragon Quest Builders before:

Dragon Quest Builders after:

You would be forgiven at first glance for thinking this is simply a time of day difference. It isn’t.
20XX before:

20XX after:

To put a bow on the GPU section, let’s first talk about mistakes and how they happen. Everyone is human and everyone is prone to making small mistakes with fairly enormous consequences. With that said, how about we discuss frame-pacing in Ryujinx.
Frames are meant to be rendered and then passed to the backend queue as ‘ready to go’. From here any number of presentation methods can be used to display them in motion and a lot of the details can be handled by your GPU driver and the backend itself. Ideally at any given framerate, all of the frames would be ready to present at an equidistant time interval to produce a smooth experience. We’ve known that this hasn’t been the case for a few years now and have been bombarded with spiky graphs throughout that period.

Users of VRR capable displays making use of G-SYNC/FreeSync were obviously less affected by this and we always assumed it must just be a limitation on the backend. Vulkan, for all its strengths, does not have any universally adopted way to query the display timing from your monitor without platform specific workarounds like a DirectX interop layer on Windows, which wouldn’t help us much on Linux/macOS.
While all of the above is true. It didn’t account for a single missing line in the GPU engine code. We originally designed the system to wait for up to 8ms on commands, as a failsafe, but with a separate interrupt event that would cause the frame to be released as soon as it was ready. Someone, who for their dignity shall not be named, forgot to signal this interrupt event and as such was effectively adding up to 8ms of error in every single wait event. This is very easy to see in the above graph as the frame-time deviation was never more than +/- 8ms but the crippling point was its fluctuating nature. What happens if the code written actually works how it was designed to work…

There are still a few moments where host:guest vsync deviates slightly but these are much rarer. Whenever Vulkan standardizes a way to query display timing, as mentioned above, this should improve even further.
MacOS upstreaming:
A few people asked us where this section was last month and it ultimately falls down to if anything was actually finished in a given month. Everything we detail in these progress reports are things available right now, and if a larger change is needed that takes say two months, then it would create a gap and it’s exactly what happened in March!
In April on the other hand, gdkchan finished a complete refactor of attribute handling on the shader generator which came in at just under 2000 lines and should resolve a significant amount of shader compilation failures under MoltenVK. Tessellation is almost non-functional in the macos1 build and in addition to simple upstream work, we’re also trying to clean up a lot of the more raw implementations of certain processes before they’re made available.
As a result of this work, tessellation is working correctly in games that make use of it such as The Legend of Heroes: Trails from Zero which uses tessellation shaders to render entirely.
Before:

After:

Other affected titles include The Witcher 3: Wild Hunt and Luigis Mansion 3 (specifically the sand textures) in later levels.
A smaller fix to dual source blending was also made which should resolve a crash in certain games such as Metroid Prime Remastered under MoltenVK.
As a result of both changes, lots more games should end up being playable at the next release! Unfortunately we don’t have any timeline on when that will be possible due to a number of changes made since November breaking a lot of the macOS specific workarounds like mirrors and geometry shader emulation. Given the time of year, the upcoming release schedule and a priority list as long as all our arms combined, it’s impossible to give an ETA. We can only apologize on this front and hope that when the inevitable `macos2` releases it will be a sizable upgrade.
CPU:
Staying on the topic of crashes in Breath of the Wild, the final “random” crash cause was resolved in April which was a great milestone for us on the stability front. The only prior information we had on this specific crash was that it happened sometimes near Lynels, maybe in the rain, or maybe on hills, or something. Not a great start on debugging.
Thankfully a discord user discovered that there was a specific shrine puzzle that always crashed on certain physics interactions.
https://user-images.githubusercontent.com/44103205/236526133-a2b35b3c-e17a-40ec-8244-e2816cefedd9.mp4
With this information it didn’t take long to track the bug down to the CPU recompiler and how it was handling the FZ/RM flags for floating point operations. While looking for this bug, an extra small optimization to TPIDR_EL0 and TPIDRRO_EL0 registers was made as games like BoTW and Scarlet/Violet access them thousands of times per second. This did appear in the CPU profile but is unlikely to show any significant performance improvement.
Some homebrew applications such as Borealis also required us to implement the remaining ARM64 HINT instructions. These are reserved instructions used on future CPUs and simply execute as nothing on older ARM processors like those found in the Tegra X1. These are usually used for fairly mundane tasks like pointer authentication and as such aren’t useful outside of homebrew.
HLE/MISC:
In the first part of this section we’d like to give a huge shout-out to contributor jhorv who is currently on a war-path of memory-usage reduction across Ryujinx. In April alone there were not one but two different changes made that together can reduce the size of the small/large object heaps by up to 20%, reducing total garbage collection time by nearly 10%. Check the handy table below for anyone who wants to see some large numbers.

You should see more of this work in the coming months and while it isn’t as flashy as a game fix or a huge performance boost, it’s appreciated all the same.
For those who make use of gyro motion controls on Sony or third-party Nintendo controllers, you may have noticed that when held stationary for a period, Ryujinx used to forcibly re-center the axes constantly. This was causing lots of problems in games like Splatoon where accurate aim is vital for success.
https://user-images.githubusercontent.com/124469126/216792608-921c6088-3238-411e-be63-721c3ebab857.mp4
Removing this reset functionality entirely seemed like the best solution here as on closer inspection, it was simply setting the motion filter to 1 periodically. The filter would then return to exactly where it was before this reset, and then simply reset again.
To finish out this report we’ll do a quick-fire round of the smaller quality of life changes made in April:
Closing words:
April? Completed it mate.
One third of 2023 is already over and Ryujinx has never been in a better state if you ask for our totally unbiased opinion. It’s all made possible by the incredible support that our community shows through donations on our Patreon, contributing code to our GitHub repo, or simply helping other users out on our Discord. All of it means that our development team can spend more time fixing games and making Ryujinx a better and more versatile program.
As always, if you’re proficient in C# (or really any C-based language), interested in emulation/modern 3D-graphics, want to improve any aspect of the program down to fixing typos, or simply need a large project to stat-pad your GitHub page for that upcoming job interview, we’re always on the lookout for folk who can bring something new to the table. While our core team can work some miracles, the lifeblood of open-source software has always been people finding something annoying, and fixing it.
We look forward to May, and whatever it may bring.
2023-05-07 16:53:06 +0000 UTC
View Post
We’re cruising our way into the second quarter of 2023 and finally got news on that new Zelda game everyone is so angsty about; did the presentation meet everyone’s expectations? All of us are incredibly excited to see what Nintendo has in store for this oddly familiar-looking adventure, and of course, to see what challenges it’ll bring Ryujinx. We hope you all had a great month, and that you appreciate this slightly shorter progress report than usual! But first, let’s take a look at our remaining patreon goals.
We’d like to reiterate once again that any features listed below will eventually be worked on, regardless of the goal being met. It would simply become a priority as soon as the incentive amount was sustained. This, of course, isn’t true for the full-time development goals which, by nature, are dependent on consistent backing.
Patreon Goals:
$2000/month - Texture Packs / Replacement Capabilities - dipped below this amount during March but extremely close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Moving on….
GPU:
What better way to start a GPU section than to present our coveted ‘GPU-vendor specific bug of the month’ award. Taking first prize this month (for only the second time ever!), itttt’ssssss….. *drum roll*.... NVIDIA! Storming to a clean victory with a Ryujinx bug that was specific to RTX 3000 and 4000 series GPUs.

Paris isn’t usually known for its ominous, floating 2D-shapes and Mario Kart 8 Deluxe wasn’t the only title affected. Since driver 522.25, games like Xenoblade Chronicles 2/3 and Hyper Light drifter had been exhibiting random artifacting that took a very long time to track down. We’ve mentioned in the past that it isn’t so much a problem if a driver bug is consistent, but when it’s restricted to certain hardware the complexity to solve increases tenfold.
The cause was eventually narrowed down to newer Nvidia GPUs being able to start clearing render targets before the final image rasterization task has been completed. This can allow a texture to clear while it’s being sampled, producing the artifacts witnessed above. The solution is luckily extremely simple, inserting a barrier before the clear event, thus aligning RTX 3000 and onward cards with their older siblings.

XC2 Before:

XC2 After:

XC3 Before:

XC3 After:

The resolution scaler giveth, the resolution scaler taketh-away. If anyone has been with us for a few months, you may remember a fix aimed at the splatoon games which stopped the scaler from multiplying point totals and causing players to be unable to ink-swim at low enough resolutions (check out the September report for more info). Unfortunately in order to fix these rather game-breaking bugs, a few titles such as WarioWare: Get It Together and Wreckfest now exhibited graphical bugs when rendering beyond native. Usually in the form of heavy flickering on character models or in overworlds.
! Flicker Warning !
Wreckfest
https://user-images.githubusercontent.com/2002038/220464805-2e98af8f-eac2-488c-97d0-aa07abc0be92.mp4
WarioWare: Get It Together
https://user-images.githubusercontent.com/2002038/220464851-a6ed0c9b-1582-4889-a4ff-532821941976.mp4
By scaling values when they’re being added to the ReportCounter, instead of scaling the total count after the fact, weird overflows and large counter values are avoided. This eliminates the seemingly random flickering some games may have been exhibiting when scaled since September and of course in the two mentioned above.


Sonic Frontiers was a bit of a problem-child at launch as while it did boot and technically worked, the experience was a little like tying your shoelaces via chopsticks. Or in one word, painful. The main cause of the extraordinarily long loading screens and the mediocre performance was the games tendency to create enormous cubemap arrays with over 7000 faces (175 cubemaps * 6 faces * 7 levels for those interested).
Iterating over both the handles and existing views when adding a new one added up very fast to potentially 50 million iterations to add the final views. Since we only needed to add individual views at a time, we can instead add that view to the existing overlaps, rather than recalculate them all. This becomes a new generic “fast path” for adding a single texture view to a group and could improve other titles that exhibit this behavior.
https://user-images.githubusercontent.com/44103205/231261772-3f429a99-9764-4d35-ab54-a0c34527124f.mp4
That used to take up to a minute!
Let’s talk performance, everyone loves a bit of that.
A focus in March was isolating cases where OpenGL was still vastly outperforming Vulkan. This usually indicates code-paths that the OpenGL driver of your GPU is optimizing automatically, whereas in Vulkan we’d need to do those manually in Ryujinx itself. We’ll start with some titles that really don’t look like they should be struggling, alas, they did.
Some games like LA-MULANA, a visually simple 2D-platformer, was sweating under Vulkan but running like a breeze under OpenGL. Previously, index inline buffer updates were being performed one index at a time, step by step, meaning that, for example, in the event of two 16-bit indices being uploaded, the actual work would be performed in multiple 8-byte chunks. An extremely inefficient process that the Nvidia OpenGL driver was working some magic around. Vulkan on the other hand, no such luck!
https://user-images.githubusercontent.com/5624669/227070451-5d085082-4bc4-4278-bb56-2c0520b86d36.mp4
Changing this upload mechanism to allow batched uploads (up to 256 indices at a time), titles such as this no longer struggle. While this mainly helps Vulkan rendering performance, other vendors whose OpenGL drivers may not be as competent could see improvement when using OpenGL also.
https://user-images.githubusercontent.com/5624669/227070461-31408ef5-ea88-4e84-9697-f300e2f670e9.mp4
There was a final elephant in Vulkan’s room which has taken a very long time to resolve. It was noted early into the public testing of the new backend that some games performed much worse and used a lot more GPU resources when compared to OpenGL. This wasn’t helped by the fact it seemed weirdly hardware specific. An Nvidia GPU paired with an Intel CPU wouldn’t exhibit these symptoms, but when you simply swap in an AMD CPU you did. Is the issue with AMD CPUs then? Well no, because if you swap the Nvidia GPU for an AMD GPU then the problem goes away again! The result being that there was clearly a weird situation with Nvidia GPUs being paired with AMD CPUs… Do they know? Do they repel like magnets?! No avenue was left unexplored.

Age of Calamity ran almost 70% worse with the AMD/Nvidia Combo
The problem in this situation stemmed from how Ryujinx handled GPU buffer data. All data was owned by “Host Mapped” memory which belongs to your system RAM, not your graphics card's VRAM. This allows us to quickly access, upload and pull data to and from this memory without needing to go through the GPU. Unfortunately, we learned that this is very dependent on a number of factors like CPU, GPU, GPU driver and even down to PCI-E bandwidth. As such, this method of storing all buffer data in shared memory is inconsistent to say the least.
Certain games bind very large ranges as storage buffers which, depending on the factors above, could cause huge bandwidth constraints, skyrocket your GPU usage, and subsequently cause your desktop manager to become laggy and unstable.

93% usage of an RTX 3070 on a static title screen!
The solution proposed is very much a balancing act. We can’t just store everything in VRAM as we’d lose all the aforementioned advantages such as quick access, but we clearly can’t store everything in shared memory either. A set of rules were therefore established to migrate buffers between different memory types in order to improve GPU performance and eliminate a bulk of cases where OpenGL still performed slightly better.
While the majority of these cases affect the fabled AMD CPU/Nvidia GPU combo mentioned, all vendors experienced the issue to some degree and all should see improvement. The numbers below were taken with a variety of AMD/Nvidia hardware combos so your numbers may not match exactly; the percentage improvement is the star of the show here.

This is not an extensive list and many more titles saw major to moderate gains across hardware lineups. Subnautica is one that’s omitted here but its title screen saw a minor 2000% performance increase. With these changes in place though, we aren’t expecting many more titles to perform wildly better/different in OpenGL, so if you previously tried a very slow game in Vulkan and had to switch backend, give it another go!
Last month there was mention of Metroid Prime Remastered and its notoriously stuttery doors. The largest cause of these frametime spikes is due to a very large (40mb) texture being created when each new zone loads. So surely the solution would be to try and update the current texture rather than recreate it? Spot on. If you guessed that at home then you too could one day be an emulator developer, or someone who writes about them…

To close out the GPU section, we fixed a small omission that was causing a device query to break in Vulkan, resulting in Ryujinx not knowing it could force some AMD GPUs (RDNA and later) to use a subgroup size of 32 rather than the default 64. This could be seen in some flickering corruption in titles such as Shin Megami Tensei V and Crisis Core.
https://user-images.githubusercontent.com/8129300/141647865-940988a0-fd77-4b9d-a8bc-664243c52155.mp4
Our bad! The issue above is still present on Radeon GPUs older than RDNA1 (RX 5000), but it should be resolved on anything beyond that supports variable subgroup sizes.
CPU/MISC
Shortly after Intel quietly dropped support for AVX-512, a new instruction set that operates on 512-bits rather than the 256-bits of AVX2, AMD announced that its newest Zen4 CPUs would offer support instead. As such there has been a fair bit of buzz around offering optimizations of the 512-bit variety to CPUs that support it currently and in the future. While we aren’t confident that Switch emulation will ever be able to take as much advantage of these instructions as something like RPCS3, some preliminary work was carried out by external contributor Wunkolo to accelerate the `mvn`, `orn` and `not` opcodes. While there are currently no tangible performance gains we can show for this, the implementations of the opcodes are technically faster on AVX-512 compatible chips and as further instructions are used these small optimizations may add up.
A funny side-effect of merging these changes was that we discovered that some CPUs being used with Ryujinx are so old that they didn’t support the hardware flag being used to check if certain instructions were supported or not! While we really do not recommend running a Switch emulator on 2008 server CPUs, this issue is now also fixed.
Onto some service shenanigans, `CreateServerInterface` in the Shop access services would pass some transfer memory and then never close its handle. If the service was called a second time, it would fail and cause a crash in any title that did this. Preventing this fixes crashes in SD Shin Kamen Rider Ranbu and, as usual, any other titles that could have exhibited this behavior.


As for this months miscellaneous round-up of changes:

News on our shift to Avalonia for the frontend has been slower recently as we are waiting on their team to finalize the 11.0 release. This is required for us to continue to package Ryujinx on FlatHub for our Linux and Steam Deck users so it is somewhat of a high priority!
Closing words
That’s all from us this month folks. We’re fast approaching the business end of the gaming year and we’re doing all we can think of to make this boat water-tight before the storm. As usual we’d like to give a huge thanks to everyone who supports the work we do financially on Patreon, with technical expertise on GitHub and just by helping fellow users or just being active in our Discord. You keep this ship on-course!
See you all next month ;)
2023-04-12 20:54:07 +0000 UTC
View Post
Bye-bye February, you won’t be missed. Does anyone actually like it? Short, cold and dark. Maybe if you live in an upside down part of the world you disagree, but you’d still be wrong!
February marked a couple of exciting events in the lives of Nintendo fans: a Direct, a Pokémon Presents, and a stealth drop of a certified classic; thus proving that if you need to delay a game, simply release a remaster of the old one to plug the gap. Luckily Metroid didn’t offer too much resistance to emulation but we’ll talk about that later. First on the agenda is glancing through our patreon goals, one of which we’re so close to!
Patreon Goals:
$2000/month - Texture Packs / Replacement Capabilities - Reached, work will begin on this feature if this amount is maintained for one more month.
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Sound good? Moving on…
GPU:
We’re kickstarting this section with some certified cursed™ gaming. For anyone who’s tried to play the Mario+Rabbids games on Ryujinx, you’ll know that until recently they were both… questionable experiences. The first title had significant performance issues and the newer game, Sparks of Hope, didn’t render much of anything; not unless you spoke gentle words and sacrificed a few goats the week prior. While the performance issues of the first entry got some love, the graphical issues of the second were more challenging to solve.
The first obstacle was to determine why the game sometimes rendered and sometimes didn’t. Sparks of Hope is using a buffer clear to remove texture data from the GPU, by clearing CPU side data on the GPU buffer clears, the game no longer reads garbage leftover memory when rendering and thus removes the random nature of this particular quirk.
Before:

After:

Still not quite right though. The game attempts to alias a R8Unorm texture as RGBA8Unorm which was incompatible. Shifting some of our copy dependency rules around is luckily enough to resolve this.
After 2:

Another title off the ‘cursed’ list! How about another?
The Legend of Zelda: Breath of the Wild is all hot at the moment, primarily of course due to its successor being oh so very close to release. This has triggered a lot of folks asking almost daily “Will it run fine at release?” and to this our answer is always one of two things. Either a crystal ball emoji (a personal favorite), or a lethargic ‘We’ll have to wait and see.’. But we’re trying our hardest to make the chances as high as possible.
Many users have reported on the more recent BoTW updates, the mysterious case of the legendary ‘Bike Nuke’. This phenomenon was triggered fairly consistently by activating the Master Cycle Zero rune, but was also sighted at random moments in gameplay.

The problem was looked at a couple of times over the years with the most constructive session leading to the following discord message: “yeah, this is toast”.
However, all it took was a couple of indie titles to come along and break in the exact same way, but this time with a twist. Consistently!
The hardest bugs to fix are the random ones. They aren’t deterministic and even if you do get one to happen, isolating the cause through the hundred or thousands of actions you performed prior is nigh impossible. This all changes if you can be sure something will break at a specific point and this is exactly what was happening in both “void tRrLM();” and “The Longest Five Minutes”. These games highlighted the need for us to handle cases where texture size between the cache and pool were mismatched as the older method was clearly insufficient. The solution? We sure hope you aren’t bored of copy dependencies!
void tRrLM(); before:

void tRrLM(); after:

The Longest Five Minutes before:

The Longest Five Minutes after:

And finally, Breath of the Wild. At last, able to re-enact the dream of being in an environmentally-friendly biker gang!

Players of Pokémon Scarlet and Violet with AMD graphics cards will be pleased to hear that two of their issues have been killed with a single change. Starting S/V with any resolution scale other than native would crash on these cards due to an unsupported blit operation but prior to this game’s release we weren’t quite as aware of the scale of games this impacted. It turns out that a number of games including Fire Emblem: Engage and TLoZ: Links Awakening have also been broken in the exact same way for seemingly a very long time.
A separate path for AMD cards has therefore been created which makes use of the `VK_EXT_shader_stencil_export` Vulkan extension, which notably Nvidia does not support at the time of writing. Luckily Nvidia and Intel GPUs don’t need to use this safe path so it’s a change that in practice should only affect AMD.

As a bonus this change seems to have also resolved the anomalously low performance AMD GPUs were seeing in Scarlet/Violet. One user even commented “upgrading” their experience by moving from an RX 6800XT to a GTX 1660, something that on paper, probably shouldn’t be happening. Tested with a Ryzen 5 5600X and an RX 580, performance jumped from 33 to 45 FPS with these changes. Right in-line with the Nvidia equivalent.
Unfortunately we have to move away from areas that AMD can currently follow. Some titles, but most prominently Mario Party Superstars (MPS), make use of some operations and extensions that only Nvidia have support for. The first of these fixes were mentioned way back on MPS’s release date, with AMD lacking `VK_EXT_fragement_shader_interlock` on Vulkan and also `ARB_fragment_shader_interlock` on OpenGL. To this day it means AMD GPUs cannot render certain effects in mini-games such as Spotlight Search, whereas their Nvidia counterparts can.
AMD:

NVIDIA:

Further MPS mini-games and certain screens of Luigi’s Mansion 3 make use of so-called ‘Programmable Blending’ which is implemented via microcode on the Switch’s Tegra X1. Graphics APIs on desktops however, such as OpenGL and Vulkan, do not expose such direct functionality, instead opting to provide extensions such as ‘VK_EXT_blend_operation_advanced’. You can check at home how many GPU vendors support this extension and if your GPU is listed among them. For Ryujinx it means that for now only Nvidia has the pleasure, but for future uses, and any Switch emulators on Android devices, Snapdragon Adreno GPU drivers also provide options.
Bowser's Big Blast before:

Bowser’s Big Blast after:

Puddle Paddle before:

Puddle Paddle after:

Pushy Penguins before:

Pushy Penguins after:

Luigi’s Mansion 3 before:

Luigi’s Mansion 3 after:

The kicker is that these advanced blend modes could be emulated with the use of fragment shader interlock… if AMD supported that. Users of these cards will be pleased to know that all hope isn’t completely lost though. AMD could implement support for one or both extensions sometime in the future, or an LLE approach to advanced blending can be implemented. If you’re on Linux, then go and pester those smart folks who develop the RADV driver!
The current implementation uses the Vulkan and OpenGL extensions to match whatever the Switch is doing with microcode operations but this could be done manually. We’ve avoided doing this for the time-being as the complexity and time cost compared to using API extensions is immense. It is, however, on the cards going forward for vendors such as AMD and Intel who may not support, or only support a small subset of blending extensions.
Onto the new release this month of Metroid Prime Remastered; a truly shocking reveal that had everyone born in the 90s produce a simultaneous scream. If we ignore the fact this release likely means Prime 4 is being delayed even further, it was cool to see such a high effort remaster in this day and age. As far as emulating it went, the game booted and was technically playable from start to finish on day 1 but with a few caveats which we’ll get into now.
First up to fix was a nasty crash when using Vulkan that was isolated to the SPIR-V shader generator (as OpenGL was unaffected) when the shader was using input or output indexing.
The next problem was graphical and affected both backends.

While the inclusion of Dark Samus would have been a nice touch, this was being caused by a limitation with how we handled partially mapped textures. Prior to this, partial mapping was supported but not when the start of the texture was unmapped. Any punts as to what Prime Remastered was doing? We’ll give you five guesses.

Unfortunately the fix here created another issue which we’re still in the process of solving. To deal with the unmapped start of textures, a ‘mega-texture’ of sorts needs to be created at certain moments. For anyone who’s played the game on Ryujinx recently, you may be able to infer when these happen due to the large hitch that can occur when going through doors and loading new areas. Rest assured that we’re aware of this and solutions are currently in the pipeline so stay tuned!
Some smaller changes to our GPU emulation this month included:
We end the GPU section on another remaster and another Zelda game. Skyward Sword HD took a little love this month with the resolution of a Vulkan-specific bug in one of the later-game dungeons. OpenGL allows primitive restart on all topology types while by default Vulkan would prefer it only be used on strip and fan. Luckily, by utilizing `VK_EXT_primitive_topology_list_restart` we can expand this supported topology list and match the OpenGL behavior.
Skyward Sword HD - Vulkan before:

Skyward Sword HD - Vulkan after:

Post-Processing support:
Implementing support for things like filters and anti-aliasing techniques is a little like asking a child to improve the Mona Lisa. Whatever tools you provide them, they’re going to take the biggest brush and create a mess. They’ll think it looks amazing, everyone else will shake their heads and cry. Unfortunately for us, it’s been a popular request for many years and it does have some genuine use cases. Below we’ll outline what’s currently available, what it does, and what the ideal scenario to use things/leave them alone are. This can be some quite pixel-peepy stuff so we recommend you try everything out and see what you like and don’t like!
Anti-alisaing
Aliasing is caused by everything on your screen being broken down into square pixels. Eventually even curved or circular edges need to be squares somewhere. At high enough resolutions you can barely see this; at lower resolutions the so-called “stair-casing” is very obvious. Anti-aliasing (AA) attempts to smooth these edges through a variety of techniques such as blurring and edge detection to make jagged surfaces appear smooth.

Ryujinx now offers two anti-aliasing techniques:
- Fast Approximate Anti-Aliasing (FXAA)
- Subpixel Morphological Anti-Aliasing (SMAA)
FXAA was one of the earliest forms of AA developed by engineers at Nvidia. Its goal was to be fast and functional, but not particularly amazing at edge detection. As such it tends to over-blur even non-edges and has really only been provided due to its simplicity.
SMAA is similar in concept to FXAA but uses much better edge detection in its shader. This means that it can more clearly define where in the image to apply the blur and ideally leave more of the screen in sharp focus. SMAA itself breaks down into 4 subsections: Low, Medium, High and Ultra. These sub-options define certain parameters in the SMAA shader such as edge thresholds and how many AA passes it makes.
No AA:

FXAA:

SMAA (Ultra):

On a modern GPU the cost of FXAA and SMAA, even at Ultra, are negligible. So realistically we do recommend SMAA Ultra if you’d like some level of AA on those particularly jagged games. On the flip side, we don’t recommend enabling this for pixel-art titles, or games whose art-style is designed around sharp edges. We will judge you for it!
Scaling Filters
Whenever a piece of content doesn’t exactly match the resolution of your monitor or TV, there needs to be some form of scaling in order to make that piece of content fill your screen. If no scaling was applied then you’d simply get black bars around the edges of the screen where no data was present. Your GPU or monitor usually does this automatically as is shown with a handy infographic in the Nvidia control panel if you were to display a 1080p image on a 4K screen.

As most Switch games aren’t even 1080p, let alone 4K, we need to scale them to the size of the program window, or even to the size of your monitor when playing in fullscreen.
There are many ways to scale an image. Some common ones you may have heard of include: Bilinear, Bicubic, Nearest Neighbour and Lanczos. All have their strengths and weaknesses which is why a lot of this comes down to personal preference. Ryujinx currently supports the following three scaling filters:
- Bilinear filtering (current default) is usually what the Switch itself will use to scale images to the output monitor and hence should be most accurate to real hardware output. However some view it as a little blurry, especially at lower resolutions.
- Nearest Neighbour is a very basic technique that simply replaces every output pixel with the “nearest” real input pixel. As such it creates a very blocky and aliased final image. This can however be a bonus when scaling pixel-art or retro titles such as Celeste and the GameBoy NSO emulator as no attempt will be made to smooth any edges.
- AMD FidelityFX™ Super Resolution 1.0 is a filter designed by AMD to take a lower resolution image and upscale it to a higher resolution. Note that only FSR 1.x is usable here as FSR 2.x makes use of temporal data such as motion vectors in a similar fashion to DLSS. When FSR is selected, a slider will also appear in settings which controls how much sharpening is applied. 100 = maximum, 0 = minimal sharpen.
To run through the same comparison as with AA the same scene will be used with no AA filters applied.
Bilinear:

Nearest:

FSR:

Scaling filters can be used to produce an image that better suits your personal preference or style of game you’re playing. However we’ve made a little table to highlight the intended use-case of each and what to try and avoid.

MacOS upstreaming
Fairly short one this month. The first stages of the more complex graphical fixes were upstreamed this month by moving gl_Layer to the vertex shader if geometry shaders are unsupported. This allows some UE4 games to begin rendering on self-compiled macOS builds.
Before:

After:

The method used is a little different to that in `macOS1` as there we have geometry shader emulation to worry about too. As previously mentioned though, ideally MoltenVK will natively support those before everything is ready, in which case no issues should arise.
An updater script is also now included in macOS releases, as attempting to replace your own program while running, as is currently done on Windows and Linux, can invalidate the code signing on Apple systems. This is mainly a stopgap until a better solution can be found and isn’t functional at the moment anyway as the macOS releases are not a part of our main build pipelines.
MISC/Services:
Tying into the last section, many Mac users have reported some fairly nasty screen tearing which hasn’t been reported on Windows or Linux. When searching for why this would only affect seemingly Apple devices, we discovered that in Avalonia windows we’d forgotten to call the device VSync at render time, resulting in tearing on systems where the driver couldn’t save you. Nvidia, AMD and Intel all force VSync by default if the program doesn't enforce it and so this went unnoticed for a fair while! This also slightly improves the micro-stutters on macOS as there is now some semblance of refresh rate sync.
Before:
https://user-images.githubusercontent.com/5624669/221981769-6472de9c-8fde-4587-9e7c-51f493751af3.mp4
After:
https://user-images.githubusercontent.com/5624669/221981814-6509e7f0-d5f4-4a91-a983-a5abb4ce5805.mp4
As far as service changes go, `LoadOpenContext` in the account services was implemented in February and allows some multi-game collections such as Prinny Presents NIS Classics Volume 1: Phantom Brave: The Hermuda Triangle Remastered / Soul Nomad & the World Eaters (so very long…) to head in-game. Can also resolve crashes in multi-game collections when attempting to head back to the game selection pages.


Some games, such as Kingdom Rush, were crashing when requesting an unknown NPadID type. By adding additional checks to determine whether received IDs are even valid, this resolves those particular crashes and allows Kingdom Rush to be played.

And last, but very much not least, here is an image from code contributor Lostromb after they gave our audio upsampler a SIMD optimization kick.

Below is a table which shows the time method used vs time spent on audio upsampling from the old (baseline) method and the new (SIMD) methods.

Closing words
Well that was a long one. For a shorter month. How does that even work?
We’d like to once again thank everyone who supports us every month on Patreon, contributes code to us on GitHub and those who help other users out with troubleshooting and bug reporting in our Discord! We couldn’t do it without you.
As mentioned right at the start, we’re sitting barely above the texture replacement patreon incentive and if that figure is maintained until the next progress report is released, work will begin!
SWKB?
Until next we meet…
2023-03-09 16:57:39 +0000 UTC
View Post
Gather back round the fire for another month. This time you can tell us all about your resolutions, both ongoing and already failed.
Compared to 2022 with Legends Arceus, January was far less hectic. Fire Emblem: Engage being playable on day 1 meant that there wasn’t a need for developers to leave whatever they were currently working on to handle damage control. We’re trying to build our day 1 streak back up and Engage will be a fine addition to that collection!
Before we wander further, take a look at our patreon goals below. We’d like to reiterate once again that any features listed below will eventually be worked on, regardless of the goal being met. It would simply become a priority as soon as the incentive amount was sustained. This, of course, isn’t true for the full-time development goals which, by nature, are dependent on financial backing.
Patreon Goals:
$2000/month - Texture Packs / Replacement Capabilities - getting close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Moving swiftly onwards…
GPU:
Resolution scaling is a core sway for the use of emulators when playing both retro and modern games. Escaping the clutches of 480p Gamecube games and… 480p Switch games (depression) is just as tantalizing. While our resolution scaler, added all the way back in 2020, is able to scale a significant chunk of the Switch’s library, there were a few games over the years that gave it trouble.
The release of Fire Emblem: Engage highlighted an old issue with the scaler which caused games to reset back to native on certain actions. Opening menus, scene transitions etc. The fix actually owes itself to Pokémon Brilliant Diamond and Shining Pearl released over a year prior, which had the same problem and is potentially a candidate for longest authoring to merge pull request on the repo.
Fire Emblem: Engage Before:

Fire Emblem: Engage After:

As this change was originally intended for BDSP, Deltarune and Crash Team Racing, MSAA textures were also un-blacklisted from the scaling algorithm. This allows any titles making use of these such as: Pokémon Mystery Dungeon: Rescue Team DX, Rune Factory 5 and Cruis'n Blast to finally scale correctly.
Pokémon Mystery Dungeon Before:

Pokémon Mystery Dungeon After:

Pokémon Mystery Dungeon: Rescue Team DX normally uses a very aggressive blur filter so the above screenshots were taken using a mod to remove this and allow the changes to be much more noticeable.
Rune Factory 5 Before:

Rune Factory 5 After:

Crusin’ Blast Before:

Crusin’ Blast After:

Koei Tecmo are a studio that simply love to develop some of the most jank and finicky games on the market, especially when it comes to emulating them. Both Hyrule Warriors: Age of Calamity and Fire Emblem: Three Houses would regularly suffer from large slowdowns when looking in certain directions or seemingly at random. Age of Calamity in particular would sometimes slow to a complete crawl when too much was happening at once.
Texture caches are extremely important for emulators, as texture creation costs are extraordinarily high. With the addition of a second, more niche, short duration texture cache, the troublesome cases of a textures reference being wiped from the main texture pool while still being in use, is heavily mitigated. This was ultimately the cause of lots of the major slowdowns in the two games mentioned prior, which should both see healthy performance boosts under scenarios where they used to struggle.
Hyrule Warriors: Age of Calamity Before:
https://streamable.com/mhvzbt
Hyrule Warriors: Age of Calamity After:
https://streamable.com/070zwq
Fire Emblem: Three Houses Before:

Fire Emblem: Three Houses After:

A small memory leak on AMD, Intel and Apple (still weird typing that) GPUs was resolved this month, which was caused by old Vulkan swapchains not being destroyed when a new one was created. This was experienced whenever the window was resized and could add up to a large chunk of VRAM over multiple changes. Nvidia was unaffected as it seems their Vulkan driver was doing some wizardry behind the scenes to recognise a redundant swapchain and destroy it automatically. This was our fault though and isn’t really within the scope of the driver to handle this. As such old swapchains are now manually destroyed when a new one is created.
On the topic of VRAM usage and texture caches, our current `AutoDeleteCache` has a hard limit of 2048 entries which will try to keep the most active textures cached while removing older entries. Unfortunately this doesn’t work too well when we have a small number of very large textures which don’t trigger any of the cache limit safeguards, but do take up a large amount of memory. For example take the following scene:

Not particularly complex, low number of textures, surely this can’t be an issue? Well the textures used here are few in number but huge in size. The exact scenario we don’t want. To resolve the large memory use here, we can force a deletion whenever a texture is unmapped and not in a GPU-modified sub-range region. This dramatically lowers the VRAM usage in high-stress scenes such as the above from Witch’s Garden.
VRAM usage graph Before:

VRAM usage graph After:

Visual novels are the unit tests of emulators and EVE ghost enemies is another example of this. Character portraits were failing to render due to the ‘Modified’ texture flag not being cleared.

Simply clearing this flag once the texture is modified from the CPU on the GPU thread resolves this bug.

Persona 4 Golden is a true masterpiece in the gaming world and a personal favorite in the series. However, while it sure is an old game at this point, it isn’t THIS old:

While the aesthetic here is somewhat interesting, it certainly wasn’t correct. The cause was isolated to the CSET and CSETP shader instructions which either had no or only partial implementations due to the rarity of their use. Fully implementing CSET and fixing the partial implementation of CSETP completely fixes Persona 4 Golden. Go make history.

As we mentioned a couple of months ago, with our transition to .NET 7 the path toward NativeAOT has been opened to us. For any unaware, NativeAOT allows your C# code to be compiled directly to self-contained native binaries, meaning you can run .NET applications on platforms without JIT permissions. Normally when you run any .NET program, what you are actually running is an “intermediate language” which is then compiled to your system's bytecode at launch via the .NET runtime. This can introduce additional latency and usually shows itself in slower startup times.
There are, however, limitations to NativeAOT. You cannot make use of the feature when some .NET features are utilized within your codebase; the main offender for Ryujinx being ‘Reflection’. Reflection is complicated, but in the simplest terms possible it allows your program to know stuff about its own code dynamically. Unfortunately this is a no-no for ahead-of-time compiling and there has been a significant push to remove uses of reflection from Ryujinx for a fair while. This month reflection was removed from the multithreaded GPU abstraction layer (GAL) which is another step toward the goal. Expect to see more changes referencing the removal of reflection and we hope you now know why it’s important!
Last month we highlighted some changes in the OpenGL backend to allow Pinball FX3 and Sphinx and the Cursed Mummy to render correctly. It took until January for those changes to cascade their way into a Vulkan equivalent. Both of these titles now render correctly on both backends.
Finally for our GPU section, last month we showed Ryujinx running on a Raspberry Pi and this month you too can try it! We’ve relaxed our Vulkan requirements which should allow a larger variety of devices and drivers to run Ryujinx if they don’t fully conform to the modern spec.

Don’t expect this change to allow your old celeron laptop to boot Switch games though. It’s mainly for open-source Vulkan drivers that may lack a feature or two. As stated last month, performance on the Pi is dreadful but we’re hopeful that newer Qualcomm devices might be interesting in the future.

Of course, these devices are ARM64 based. And Ryujinx is for x86 systems right?
CPU/Kernel:
Wrong! Make way for an ARM64 backend for our CPU JIT, ARMeilleure. While this doesn’t allow every ARM device a full hypervisor-style native execution of code, similar to that seen on our macos1 build, lot’s of instructions are mapped almost 1:1 and should have minimal overhead compared to a similar x86 CPU executing them. Some additional optimization to ARM bit manipulation and feature detection were also added to slightly improve performance for ARM64 processors.
This contributes to the upstream of the macOS changes as it is required for ARM64 processors to run 32-bit titles such as Mario Kart 8 Deluxe.

This change necessitated a couple of other changes to our JIT architecture. The PPTC, which caches CPU instructions so that they don’t need to be translated multiple times, now needs to check your CPU architecture as an x86 cache is obviously not compatible on ARM systems and vice versa.
Any global state was also removed from the PPTC which allows applications that make use of the Nintendo Switch’s JIT service: like the NSO N64 emulator, to attach a cache instance to each guest process. This allows sub-processes like the N64 games in the NSO emulator to gain the individual benefits of faster launches.

GoldenEye anyone?
MacOS upstreaming:
We’ve already covered some of the changes that originated from the macOS upstream roadmap, such as the ARM64 JIT and the relaxing of Vulkan requirements, but January still had plenty more to dedicate to its own section.
We’d like to preface this section by explaining that 99% of what is mentioned here is already included in the macos1 build. While we’re working on upstreaming the basic changes, our focus is not on fixing the known issues with the first mac release. This is why there’s been a bit of a delay for any further mac releases; realistically, nothing beyond minor fixes and adjustments have advanced what is currently available.
A variety of the MoltenVK workarounds were implemented which cover a lot of ground. MVK portability subsets, shader specialization and ASTC format checks are a few of the highlights but the pull request linked above lists the rest. Notably this does not include our geometry shader emulation or transform feedback emulation. We’re hopeful that the MoltenVK team will be figuring out their own geometry shader implementation sometime in the first quarter which would be preferable to using our own. These changes allow Mario Kart 8 Deluxe to render correctly on self-built releases.

Some smaller changes related to Vulkan/MoltenVK were also merged in January:
These final two major changes shift back toward CPU and Memory emulation.
The Switch uses page sizes of 4Kb, while macOS and some other platforms use 16Kb or higher. It was therefore required to implement support for platforms without support for 4Kb granularity. With this change in place, it is no longer required to boot self-compiled builds with Rosetta.
While the ARM JIT is cool and all, it wasn’t the centerpiece of why Apple Silicon support was so interesting in the first place. Our Apple Hypervisor is now also fully implemented, and with the above page size changes, is now fully on-par with macos1 as far as the CPU and Kernel side go. There is still a fair bit of work to bring the MoltenVK and GPU side up to snuff, but it’s getting there.
With a lot of these changes in place, we’re hoping that anyone who was interested in macOS contributions should now have a solid base to begin, with most games running at varying levels of performance and graphical fidelity on self-compiled builds.
GUI:
While we’re waiting for Avalonia 11.0 (required to distribute on FlatHub), January did not slow down as far as changes to our WIP GUI are concerned.
We saw a huge number of codebase refactors and improvements spanning from:



Alongside visual refactors and changes, there were a fair, lot, of, different, refactors to improve readability and modularity of the frontend.
A long-standing Windows bug where the “hide cursor on idle” setting didn’t actually work, was fixed this month alongside a couple of settings window rejigging. The resolution scaling drop-down menu was re-ordered to match GTK and the side nav-bar has been given a little bit of padding so that item selections don’t overlap.

Notifications have also been implemented which should allow some interesting use-cases in the future. For those unaware a notification looks like this:

There are still some questions to be answered on where they’d be best utilized to walk the line of informative, but not annoying. Shader compilation warnings are a potential future option here.
Avalonia will now also swallow keyboard TextInput events to avoid the annoying “bell” sound on macOS when using a keyboard as a controller. This change will take the place of a workaround in macos1 that currently breaks text input in a few places.
And finally for GUI, we have a CrowdIn page for translation help! If you’d like to help translate new strings or add your own language to Avalonia then we’d appreciate it. Lots of new translations were already added this month but there’s always someone out there who can expand that.
Services/Misc:
As mentioned earlier, sometimes changes take… a long time to reach end-users. Someone may have a great idea, write it all up and then get bogged down in other things, have their attention pulled away due to the release of a new Pokémon game, or drop off the face of the planet for a few years. Regardless, way back in late 2020, gdkchan - project founder, wanted to redesign how Ryujinx handles service implementations and start to decouple the OS HLE project from the kernel. Two years later and a multitude of changes in between, these goals finally materialized.
The advantages this redesign brings are numerous but simply boil down to being simpler (no more manual read/writes to message buffers), lower latency (less allocation and copying) and allowing multiple threads to process IPC which should eventually fix some bugs. Services need to be manually ported to this new system, so there is a lot of transition work to be done in the interim; most services at the moment are not using the new implementation so don’t see any of these benefits yet. We’ll keep you posted on what’s been migrated, and what that fixes as they happen.
More general changes this month included:
For anyone who uses the command line interface or our headless builds, lots of missing arguments were added in January such as the macroHLE setting, cursor hiding on idle and even which user profile’s information and saves you’d like to load. This is mainly useful for frontend users so if a terminal window scares you, none of this is important!
Closing Words:
What to say that we haven’t already. We’re already bounding into the new year with a day 1 playable exclusive and we’re hoping that this continues. There are some killer releases this year after all! As per usual, if you’d like to support our efforts in making this a reality, you can find us on Patreon for financial support and GitHub if you’re familiar with 3D-graphics, low-level software engineering and C#/.NET! Outside of direct contribution, testing games and reporting bugs goes a long way in letting the team know what works, what doesn’t, and what should be on the priority list.
Bye bye! For now…
2023-02-07 22:41:10 +0000 UTC
View Post
Happy New Year!
We hope you all had a festive end to a truly incredible year. So much ground has been covered each and every month of 2022, and December wasn’t about to break that trend. Performance improvements, fewer black-screening games and some upstreaming work for macOS littered the thirty-one day span, with more than just rampant consumerism for all to enjoy.
So strap yourselves in for the finishing touches of 2022, laid out by a witty, engaging and, if another word were to be used, handsome writer report. Before all that though: let’s take a gander down to our remaining patreon incentive goals.
Patreon Goals:
$2000/month - Texture Packs / Replacement Capabilities - getting close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
3….2….1…. GO!
GPU:
Pokémon once again has to steal the entrance to the GPU section this month with what we hope was the final issue with the game. Anyone who played at launch, or even late into November will probably have experienced completely random vertex explosions: moments where your character, an NPC, other Pokémon, or literally any other environmental asset could lose its grasp on reality and become either a minor glitch or a giant wall of beige-tinted spikes.
Minor issue, probably fine to just see a dentist:

Ground in the distance erupting? Seek specialist help:

As with most vertex explosions, this was being caused by simple data loss. Flushes on certain buffers would occasionally flush uninitialized data which could cascade into the results seen above. By tracking these troublesome buffers and flushing their source, instead of their possibly incomplete copy, most of the vertex explosions in Scarlet & Violet are massively reduced. There is still a very slim chance that NPCs may still exhibit minor explosions, but these took hours to appear in testing and were resolved with reloading an area.
Some smaller changes this month included: a reduction to texture operations in shaders which may give a slight boost to lower end iGPUs, SNorm buffer textures are now cleared for use with Vulkan (we previously enforced an OpenGL workaround even in Vulkan), and a couple more improvements to shader specialization and shader bindings array gave our usual benchmark game of Super Mario Odyssey a further 3-4% performance bump!
With a new Zelda game drawing near, it’s only natural that some of our longest standing bugs to its predecessor are now in the spotlight for fixing. Considering that The Legend of Zelda: Breath of the Wild has never had any sustained focus from our development team, it was already in relatively good shape with only a few notable issues remaining. One of these was the lack of any motion on wind and grass particle effects.
See video of what this effect should look like: https://user-images.githubusercontent.com/44103205/159062082-52ab920b-694f-4d7b-b2b8-06c9b94c65f6.mp4
And how Ryujinx used to render: https://user-images.githubusercontent.com/44103205/159062185-4f16ce5d-96a6-4c30-b646-6ec628cc1ecd.mp4
As witnessed, the grass seems to simply fade in and out of existence on Ryujinx while the effect on Switch is more of a gentle breeze, blowing the grass out of focus. By adding further fallback parameters for the LDG to constant buffer binding, another item gets crossed off the list.
Ryujinx after:
https://user-images.githubusercontent.com/6294155/205732462-cce718f3-18b6-44a4-85a0-85dbfb6fd30d.mp4
Game of the Year for 2022 was the latest FromSoft offering: Elden Ring, which promised to be larger, more diverse and far more stuttery than any piece of software that came before. But take a moment to cast your mind back just over eleven years ago… Skyrim is on the horizon, your phone still has a 3.5mm headphone jack and Game of Thrones is premiering on TV. More important than all of that was the release of FromSoft’s mainstream hit, Dark Souls.
Now fast forward eleven years again. There you are. Wanting to play this certified classic on your personal computer. Ryujinx is immediately booted up (of course it is) and Dark Souls is double-clicked… “Was it always this dark?” a million voices cry out in despair.

Egregiously-long and questionably humourous gags aside, no, vision-impairment hasn’t yet been added as a skill-check to any Soulsborne titles. However, if it was, fans would claim it was revolutionizing difficulty and rejecting the “hand-holdy” nature of properly lighting a 3D world.
On our side though, the simple implementation of PrimitiveID output on the geometry shader was enough to completely resolve this issue:

The Stanley Parable: Ultra Deluxe highlighted a slight bug in the FSWZADD shader helper function that was causing a LUT index to be out of bounds. Fixing this instruction allows the text to render correctly.
Before:

After:

Textures in games make use of a variety of compression formats in order to reduce their file size and impact on video memory. Due to the many different environments games are developed in, some of these formats aren’t supported on desktop GPUs and others aren’t supported on mobile GPUs. As the Switch is effectively a mobile device, it in turn can make use of some texture compression formats that your Nvidia or AMD desktop card simply doesn’t know what to do with. One type you may have already heard discussed in relation to both Astral Chain and Bayonetta 3 is the ASTC compression format, where we need to manually decompress the texture in software before it can be used. If you’re wondering, this is why those two games use a lot of VRAM compared to other titles, as the fully uncompressed texture is being loaded into VRAM before use!

ASTC is only one type of unsupported compression though; we have some other decoders to convert any unsupported texture into a format more easily readable by your GPU. Except two. ETC2 and EAC compression formats were still lacking a software decoder which meant any games using these formats would simply crash with an unsupported format error. Luckily for us, EAC doesn’t appear to be used by any title we’re aware of, so only an ETC2 software decoder was implemented.
Games such as Vegas Party and Paradigm Paradox are now playable on Nvidia and AMD GPUs (Intel ironically already natively support the format!).
Vegas Party:

Paradigm Paradox:

Implementing missing GPU quirks continues with both implementing another non-indexed draw method which solves some rendering in Ikaruga, and fixing the vertex buffer size of DrawArrays, fixing Sphinx and the Cursed Mummy’s rendering on OpenGL.
Ikaruga before:

Ikaruga after:

Sphinx and the Cursed Mummy before:

Sphinx and the Cursed Mummy after:

For any OpenGL enjoyers out there, the Xenoblade titles have been one of the few remaining holdouts for that agenda. Unfortunately, all good things must come to an end and in December Vulkan finally took the performance crown. With a 30% performance uplift in the Colony 4 hotspot for Xenoblade Chronicles 3, Vulkan now outperforms OpenGL by 5-8% on average while of course keeping its other advantages such as shader compile speed.

The new Crysis: Colony 4.
On the topic of Xenoblade, a recent Vulkan fix for XC2’s cutscene shadows by using a custom border color was causing a lot of headaches for Linux users. Shortly after merge, it was discovered that the open-source RADV drivers didn’t like this at all and were crashing in a wide variety of games including: Super Smash Bros: Ultimate, The Legend of Zelda: Breath of the Wild and Metroid Dread. Not small titles to say the least. External contributor DadSchoorse discovered that RADV requires the custom border features to be explicitly enabled before use which finally ended that miserable saga, especially for the influx of Steam Deck users!
Finally, the texture and sampler pools will now be forced to rebind when the pool itself changes, to mitigate a regression caused by the shader specialization optimization mentioned above. This change was causing minor issues in “The New Prince of Tennis: LET'S GO!! ~Daily Life~ from RisingBeat” (incredible name) which a pool rebind fixes.
Before:

After:

Pay attention to the rightmost character’s arms behind the text box.
macOS upstream:
A new section this month. As is laid out in this issue, there were a considerable number of changes made to Ryujinx in a private branch to provide support for macOS on both Intel and Apple Silicon Macs. Merging these all at once would be a nightmare for code reviews and also impossible to regression track, therefore the decision has been made to methodically upstream every change in piecemeal format. This allows proper code review and regression tracking, but also allows time for some implementations to be cleaned up. The journey has already begun!
A couple of mundane changes were initially needed for anything to be done:
The above changes allowed the program to both build and to boot, but only into the GUI. Launching any game would still be impossible as no valid Vulkan devices exist on macOS. It is therefore also necessary to create a Metal surface render window to which MoltenVK can eventually draw to. This actually allows extremely basic games like Undertale to run on a self-compiled mac build via Rosetta:

From this point some other GPU and MVK workarounds were merged:
HLE:
Services. Don’t we all just love ‘em?
‘IsIlluminanceAvailable’, and ‘GetCurrentIlluminanceEx’ were both stubbed in December which allows the Labo VR Kit application to get to the title screen.

Nothing can really be played yet due to the absolute mess of niche services and HID implementations this game requires, but notable nonetheless!
‘Select’ was fully implemented in the BSD sockets which finally allows Saints Row: The Third and Saints Row IV to access their LAN play modes for campaign co-op.

Some other more minor fixes, stubs and implementations in December include:
A fun final one is the stub of ‘CheckNetworkServiceAvailabilityAsync’ which allows everyone's most requested application to boot: Hulu! Unfortunately, some other issues remain for this title, so don’t try to use Ryujinx to tune into Forged in Fire: Beat the Judges just yet.

GUI:
Let’s talk GUI, let’s talk Avalonia. With the macOS release exclusively using the new framework we’re now fully in the endgame. The endgame consists primarily of “fixing the jank” as new contributor IsaacMarovitz so aptly put it. All of the basics of a GUI now work and are fully in place, but random stuff just sometimes looks a little odd.
Dialogs were made a little more intuitive across the board based on OS-specific design stylization. Our recent influx of macOS users perhaps necessitated the need for platform based UX language.
If you’re on Windows the style remains vastly the same, but with an accent over the confirmation option:

If you’re on macOS, you’ll instead see (when a new macOS update is pushed) a more natural inverted layout:

A few different cleanups to both input classing, the status bar, and the project file structure itself took place and a bug where the software keyboard was invisible on Linux was finally stamped out. Closer alignment with the principles of fluentUI and WinUI3 was achieved which improved readability and reduced instances of the ‘selection bean’ and the wide box-type selector clipping each other.
Before:

After:

Feature-wise: a Save Manager was added to make it simple to find user-profile specific saves without the need to switch but also has the power to recover lost accounts that still have saves tied to them! Very nifty feature all-in-all!

Oh, and you can’t update Ryujinx while a game is being played anymore. Far too many of you were somehow running a game and then decided right now was the best time to let the emulator download and replace files. So yeah. Stop that.
End of Year recap: 2022
Many people are saying that 2022 was, in fact, a year. One of the years of all time even.
Who wants some stats? Everyone loves some stats, they seem to be all the rage.

To get some terminology out of the way early, our definitions are very strict and on the harsher end of most emulators. To be given “playable” status, a game must have zero bugs. It must be identical to a hardware playthrough with no workarounds and be full-speed on ‘reasonable’ hardware. In-game means that the title enters the main gameplay loop but could have graphical, audio, input, service or performance limitations. The others are fairly self-explanatory from there.
We begin 2023 with a staggering 84.2% of the currently tested Switch library having no recorded bugs. A further 11.6% of titles are in-game but with at least a single issue; this can range from major graphical errors all the way down to a single texture corruption in the credits. As such we wager that most of these “ingame” titles are probably what other sources would consider ‘playable’ already! Only 3.5% of titles booted but, for one reason or another, couldn’t reach gameplay, and an absolutely miniscule 0.8% of titles (30 to be exact), are not responsive at all. The fact that in less than 5 years of development time we’re already down to just 30 titles that can’t boot is a monumental achievement. We have no doubt that more games will be released and continue to test us, but right here, right now, it’s looking mighty good. In the meantime, there are still thousands of untested Switch games (mostly multi-platform shovelware, but the point remains) that we need your help testing & cataloging.
2022 saw major feature releases up the eyeballs:
It’s been a wild ride and we’d like to thank you all for supporting our endeavors over this period. A question a lot of people are asking us is ‘What’s next?’. What does 2023 hold for Ryujinx? Well, truthfully, even we don’t know! But, we’ve got cool things to show…

Ryujinx on a Raspberry Pi, anyone? Admittedly the performance is fairly abysmal because… it’s a Raspberry Pi! This is simply a proof of concept that the work put into porting to macOS is transferable to other platforms and should, when suitable Windows/Linux ARM systems arrive, be an excellent platform to run your Switch games on.
That’s all from us in 2022 (says a report published in 2023).
As always, if you know some C#, .NET, 3D-graphics or low-level engineering, and need a good new years resolution, we’re always on the hunt for developers who can continue to bring those compatibility stats up. If that's all alchemy & wizardry to you then donating to our Patreon, or being active in testing and bug-reporting really does help out a bunch.
Happy new year!
2023-01-10 17:21:55 +0000 UTC
View Post
Wow. What a month. November is not typically this eventful, but sometimes a little spice is exactly what the Doctor ordered.
LDN being introduced to the number 3, macOS meeting a graphically demanding program, and Ryujinx facing its final adversary… the cardinal direction, North!
All-in-all we’ve had a busy month, with two major feature releases and that pesky company GameFreak deciding to push out those new indie games. What were they called? Red and Blue? Oh sorry, Scarlet & Violet. It’s getting hard to tell the difference between new and old games when they look like this, huh?!
Before that rant goes any further, please check out our remaining and final set of patreon goals. Once more, all the goals are planned to be actioned eventually, although if a sufficient monetary amount is sustained then focus would be shifted to deliver the respective feature in a reasonable timeframe.
Patreon Goals:
$2000/month - Texture Packs / Replacement Capabilities - getting closer!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Done reading? Alright, let’s whirl.
GPU:
How could this section start with anything other than Pokémon? Our streak of playable day 1 titles continues although not without hitches. Some issues were immediately apparent, both in the graphical and performance department. Firstly, resolution scaling on Vulkan was a tad broken as showcased below:

This was isolated to Vulkan and was caused by the SPIR-V scale helpers being unable to find Array textures. With this issue addressed, beaches and lake beds were no longer covered in perpetual grass.

The Mesa Radeon Vulkan driver, RADV, was also displaying some rather interesting albeit random behavior where occasionally, on boot the whole screen would be covered in a bright white filter. The Switch will always sample the initial dummy texture as 0 but Ryujinx was not forcing this. By clearing the texture to (0,0,0,0) when on creation, the variability is removed and so is the holy light of purgatory.
Before (sometimes):

After(every time):

While users with newer graphics cards and Linux users were having all of the fun, owners of Nvidia’s GTX 700 (and even some mobile 900 series) cards were getting no graphical output at all. As with most bugs of this type, we can usually trace them back to an unsupported feature or extension. This time it was the VK_EXT_shader_viewport_index_layer, and its OpenGL equivalent, that needed workarounds. By moving the gl_Layer from vertex to geometry if the GPU doesn’t support the extension, these older cards no longer have any issues rendering as nicely as their younger siblings.
The final Scarlet/Violet graphical bug fix that was squeezed into November resolved an issue with the Pokémon stats graph, where parts would be completely cut off and fail to render. It turns out that GameFreak, to no one's surprise, is using extremely dated rendering methods. Polygon topology is used to draw the hexagon but most Vulkan drivers, and a fair few OpenGL drivers, do not support the extensions required anymore. Nvidia does still support GL_POLYGON in compatibility mode but no such equivalent exists in Vulkan. Luckily for us, convex polygons can be identically rendered via a triangle fan (imagine a fan of triangles around a single point) which, as has been stated in previous reports, is a breeze for modern computer graphics renderers.
Before:

After:

To close out our Scarlet & Violet monologue, a random crash was resolved; but, we still have those pesky performance issues mentioned earlier to talk about. Thankfully, such issues affect more than just these games so we can take a breather from all this Pokémon talk.
First stop: Pokémon Sword & Shield. What? We managed a quick breath there. Didn’t you?
Jokes aside, they’re but a couple of games that have seen some rather startling performance improvements this month due to an avalanche of activity in the GPU emulation department. Given how varied and numerous the changes here are, it would be ill-advised to give them individual spotlight for the sake of all our time, but here’s a quick rundown. See the graph a bit further down for a TL;DR.
That’s a lot of bullet points containing Super Mario Odyssey and Pokémon. Don’t be disappointed though, these are just our developers’ go-to titles to test any GPU thread improvements at the moment due to their relative ease on the CPU-side. Tying all of those together; here is a quick graph of some popular titles’ performance at the start of November vs the start of December:

The big winners here are the two Pokémon entries with Scarlet/Violet receiving a staggering 81% performance uplift in under two weeks from its release. Even titles with hideous bottlenecks elsewhere, such as The Legend of Zelda: Breath of the Wild, still saw a healthy improvement of around 10%. It’s worth noting that both Xenoblade Chronicles and Sword/Shield were already hitting similar framerates on OpenGL. November has eliminated that gap and allowed both backends to reach new performance heights for any users who may still need to switch between OpenGL and Vulkan.
Moving away from all those figures and stats onto something a little more visual again, a change was mentioned a couple of months ago that allowed Fate Extella: The Umbral Star to render on OpenGL. We claimed a similar fix for Vulkan was on the way and November indeed provided.
Before:

After:

Due to Vulkan needing a little extra work in the form of implementing support for depth-stencil resolve and some changes to texture compatibility rules, Sonic Colours Ultimate also now renders on Intel GPUs using Vulkan. This game already worked for NVIDIA but it seems Intel’s driver is a little more picky.

Nights of Azure 2: Bride of the New Moon highlighted a flaw in the way Ryujinx handles instanced draws this month. Originally, we attempted to defer the draw until we could confirm the total instance count which seemed like the safest way to ensure an accurate render. Unfortunately there are rare circumstances, such as in Nights of Azure 2, where a compute dispatch is performed while the draw is still ‘pending’. Thus by the time the draw finally triggered, the state was completely wrong and was causing hard crashes for all GPU vendors. By ensuring all pending draws are forced to complete before any compute dispatches occur, our process holds true even in these niche scenarios.

Alongside some minor improvements to the Vulkan pipeline management systems, a long-standing transform feedback bug that was affecting AMD and Intel was finally stamped out. If any users with those GPUs tried to play Xenoblade Chronicles Definitive Edition, the issue would have been immediately apparent:

Maybe the grass in the field was a metaphor about GPU drivers all along… Either way it shouldn’t look like that. Making use of the vector outputs in cases like Xenoblade can get around the root of the problem, which is: missing data or data being written to the wrong offset in Intel’s case. Anyway, grass!

Do I see some Mystery Dungeon fans out there? Just me…? Well if you’re a Mystery Dungeon fan with an Intel GPU then this month you’re eating well. To render Pokémon Mystery Dungeon: Rescue Team DX in Vulkan, we were making use of OpenGL conventions such as gl_VertexID and gl_InstanceID. While Nvidia and even AMD have sufficient conformance for this to not be an issue, Intel wasn’t so lucky. Vulkan does have equivalents of sorts but they aren’t a direct 1:1 mapping; they add a little more information. By correcting for this divergence and literally subtracting some values, the visual gore is finally a relic of the past.
Before:

After:

Finishing up the improvements to our GPU emulation this month: not one but two crashes were resolved with the second deserving some extra air time, as it removes some extension requirements for MoltenVK and macOS. Regressions in A Hat in Time, Xenoblade Chronicles 3 and Super Smash Bros: Ultimate were quickly cleaned up and an interesting little bug in which Super Mario Odyssey could collect 10s of thousands of entries in the buffer cache was resolved!
Kernel/Services:
Onwards, toward the black hole that is the Switch kernel and Horizon OS.
Service HLE maestros of various usernames graced us with a couple of service implementations and cleanups in November, including a rather curious bug in the deserialization process of a sfdnsres (catchy!) service. The old implementation was unable to deserialize AddrInfoSerialized when the addresses were empty, causing a crash. With that scenario covered, a very cool bit of homebrew finally boots!

More services of equally memorable names like ‘IFriendService: 1 (Cancel)’ and ‘GetSaveDataSizeMax’ were both stubbed, the first of which allows SnowRunner to advance a little further into gameplay before hitting another friend service crash. One step at a time.
Two of the larger service implementations of audio and filesystem were updated to their firmware 15.0.0 variants with a variety of bug fixes attached.
On the audio side, this resolves an audio renderer crash in Paper Mario: Origami King and implements the new audio renderer features such as voice parameter support which was added in 15.0.0. Older effects like Delay also had miscellaneous bugs resolved with their initial implementations in 14.0.0.
LibHac, the library used for our Switch file system emulation, was also bumped to its newest release which added support for firmware 15.0.0 decryption keys and implemented some new save data services that were introduced in firmware 14.0.0. ‘General system stability improvements to enhance the user's experience’ are also listed if that floats your boat.
The spring autumn cleaning continued with fixes to IPsmSession, eventfd logic in the bsd services and even reaching as far as the software keyboard. Seemingly no one had ever noticed that when presented with exotic characters, it simply didn’t know what to do.

Hmmmmmmm, yes. This character is made out of… LEGO? Fear not, this turned out to be a very silly bug where the text displayed was not filtering out unicode control characters before displaying. A unicode control character is basically a character that instructs your computer to do something that, as a user, you don’t need to see, such as enter a word end or new-line. By adding a method to strip the raw unicode output into something we’d expect to see, text like this becomes much more legible.

UI/UX:
Reword the description of the 6GB expand DRAM hack to be less tantalizing #3870
November has seen Avalonia edge closer to the spotlight with a lot of clean-up work and Linux bug fixes. We invested so much faith in it that we exclusively shipped Avalonia for the macOS releases, and other operating systems are mere inches away from following suit.
The main roadblock thus far has been a couple of Linux-exclusive rendering bugs on the GUI such as dialog boxes and pop-up windows failing to draw. Even worse was that the render window was solid black when using Vulkan, not something we could realistically push out the door when we’re sure most users, even on Linux, would prefer to use Vulkan for that butter-smooth shader compilation.

Thankfully, the issue was tracked down to the way we initialized the X11 window and, with some back and forth and breaking OpenGL for a bit along the way, everything now works correctly. Although the dialogs were still broken, clearly this wasn’t the same issue.
The real cause was tracked down to the Avalonia style we’re using called “FluentAvalonia” which is, effectively, Fluent WinUI design and controls ported into Avalonia. We had to open a pull request to their repo directly to get this issue resolved, but the maintainers over there were gracious enough to get the change merged extremely promptly. All we then had to do was update to the new FluentAvalonia version.
A final boon to any Linux users; Ryujinx can also now boot in Wayland directly, if setting two-hundred environment variables wasn’t to your liking!
On to more general changes not targeted toward a specific OS: historically we needed a RenderTimer to keep the GUI and game in-sync before the switch back to an embedded window (covered a couple of progress reports ago!). That was still kicking around and was still forcing the GUI to refresh at 60hz no matter your monitor refresh rate. Removing the timer and some other legacy bits-and-bobs now allows the GUI to animate at full refresh rate.
We took this month to address some quirks with the general user-experience and streamline a few basics of organizing your libraries. The DLC manager has been completely reworked for Avalonia and now features the ability to enable or disable all files that have been added. As anyone who has all of the DLC for Super Smash Bros: Ultimate can tell you, this is a godsend.

Just the DLC window? Of course not, it would be fairly stupid to go through all that effort and not give the same treatment to the game update window...although, we can’t let you enable all the update versions at once; we’re told that it would cause some kind of tear in the fabric of the space-time continuum…

For those that didn’t know, Ryujinx can also be launched completely GUI-lessly from a terminal or command window. This is very useful for frontend launchers or those people with a very particular workflow! Until this month it was impossible to set the preferred graphics backend, OpenGL or Vulkan, when launching a game from the terminal, which was a major annoyance to many who wanted game-specific setups. This was the main focus of a bit of refactoring of the command-line launch process which also reduced some duplicated code to boot. There’s never been a better time to be a keyboard warrior.
Finally, like exasperated parents, we’ve had to rename the “Expand DRAM size to 6GB” option to not include the words “Expand”, “DRAM” or “6GB”, as faaaaaaarrrr too many of you were enabling it. We place it under ‘HACKS (may cause instability)’, and what do you think 90% of the support requests we got when Scarlet & Violet launched were? Right-o.
The entire team now has “disable the DRAM expansion” tattooed into the backs of their eyeballs.
MISC:
We’re looking at a fairly trivial miscellaneous section this time but with a single, very large, elephant.

.NET 7, the latest release of the .NET runtime, was let loose upon the world on November 8th and we, being the cutting edge software project that we are, jumped on it almost instantly. One of the many advantages to developing software in languages that are in active development is that we regularly see new features and performance gains that other people have written for us! In the case of major runtime updates this can be fairly significant. Just the update to the runtime gave us another 6% performance jump in Super Mario Odyssey (part of the jump in the graph above!) and when enabling a new .NET feature called Tiered PGO (TPGO) we saw a very healthy 13% gain over .NET 6.
TPGO, Tiered Profile Guided Optimization, was the feature that was most interesting to the development team going into .NET 7, as it effectively allows the runtime to optimize common code paths in real-time. This is a bit of an open goal as it required zero changes on Ryujinx’s end and simply provided more performance. What’s not to like?
Loads of other work in this section revolves around the new tricks we can take advantage of, or old bits we can remove, thanks to both .NET 7 and C#11. New LINQ methods, Random.Shared, ReadOnlySpan and string literals being the main quick additions.
A significant portion of this report has involved Linux, and we’re finishing the exact same way. Some fresh Fedora installs weren’t symlinking required dependencies so we now attempt to import those libraries as a fallback. Not content with stopping there, FFmpeg 5.1.x decided to break our video decoding and instead present a green screen.

Turns out the FFmpeg maintainers reverted some AVCodec changes they made in 5.0.x releases and we didn’t get the memo. Adjusting our decoder back to the older format fixes these bugs.

Closing words:
If that’s what she wrote, then that is all! This report alone really cannot do the month of November justice. Check out our separate blog posts for both the release of LDN3 and our world-first macOS port! It’s a phrase that’s been said a lot recently, but there truly never has been a better time to emulate the Switch.
Once again, it’s the recruitment section of the report! If you know some C#, .NET, 3D-graphics or low-level engineering, you too can help the remainder of 2022 be as smooth and bug-free as possible. If that's all alchemy & wizardry to you then donating to our patreon, or being active in testing and bug-reporting really does help out a bunch.
Until December, and the New Year!
2022-12-12 15:07:54 +0000 UTC
View Post
LDN 3.1.3 is out!
This brings the multiplayer build up to date with 1.1.651. In addition to that, it fixes some LDN related bugs present on the previous version.
Key changes:
- Fixed an issue where P2P connections would be rejected.
- Fixed a bug that caused ldn_mitm to crash.
- Fixed a crash when running ARCropolis mods on Super Smash Bros. Ultimate.
- Added post-processing (antialiasing, scaling filters) feature.
LDN 3.1.0 now available!
This brings the multiplayer build up to date with 1.1.616. With this new build, you should be able to play with CFW Switch consoles over the internet and other Ryujinx instances using ldn_mitm and XLink Kai: https://www.teamxlink.co.uk.
Other than this, the most notable changes for games with LDN are:
- Pokémon Scarlet and Violet now perform better, have fewer visual bugs (less vertex explosions, correct stats chart) and won't crash anymore on AMD GPUs when booting with non-native resolutions.
- Pokémon Sword and Shield now perform better and won't crash anymore on AMD GPUs when booting with non-native resolutions.
- Splatoon 3 won't crash anymore on AMD GPUs when running at non-native resolutions.
- Several fixes and improvements to the new Avalonia GUI.
- Other global stability and performance improvements.
- A communication issue on RyuLDN was fixed in LDN 3.0.3.
- A Super Mario Party crash was fixed in LDN 3.1.0.
Original post:
The third iteration of our LDN functionality is finally here! We've prepared a blog post to run you all through the key changes and improvements. Make sure to check it out before trying it out!
Key changes:
- ldn_mitm support for connectivity with CFW Nintendo Switch consoles.
- Pokémon Scarlet & Violet no longer have a white/yellow filter in gameplay.
- Pokémon Scarlet & Violet will have greatly improved performance.
- Splatoon 3 is now playable on AMD GPUs on Windows and Linux.
- Splatoon 3 scoring and other gameplay elements are no longer affected by resolution scaling.
- Animal Crossing: New Horizons no longer requires a save file to avoid crashing on the intro.
- Pokémon Sword & Shield have improved performance when using Vulkan.
- General performance and stability improvements for most other titles.
You can find the LDN builds in the attachments below. Download the correct version for your operating system: Windows or Linux. You'll also have the choice of our classic GTK UI, or the up-and-coming Avalonia UI. Avalonia builds are denoted by their 'ava-ryujinx' prefix. Please note that our Avalonia builds are experimental and we advise using the GTK ones if you seek troubleshooting support.
Note: LDN 3.0 is not compatible with Monster Hunter Rise Sunbreak. If you wish to play this game using LDN, please use the older LDN 2.5 build.
All our builds are completely free but if you'd like to support us, and you're here anyway, then please consider donating to us on patreon!
2022-11-21 01:20:18 +0000 UTC
View Post
Just like the nine months before it; October has slid gently into the rearview mirror and, by our estimates, shouldn’t be closer than it appears for at least another year.
NieR, Persona, Bayonetta, and a new contender for the coveted “why did this get a sequel?” award: Mario + Rabbids! While that last one is a bit of a horror story as far as emulating the damn thing goes, we’re thrilled with just how great most new titles ran this month with almost no or minimal fixes from the development team. Less time fixing jank means more time in the lab cooking up new features and improvements for your emulating pleasure.
Let’s take a pit stop for a moment and review our patreon goals and incentives. As your monthly reminder, these features are not locked behind a paywall; all features mentioned below will be implemented eventually regardless. However if a goal is reached, then priority is shifted to focus on implementing that feature straightaway.
Patreon Goals:
ARB Shaders - Goal reached in April 2021.
We’d like to provide an update on this goal before anything else this month. It’s closing in on over a year and half since this goal was reached and we believe that some transparency is deserved for those who donate to us.
To cut to the chase, work on ARB (Assembly) shaders has been put on hold indefinitely, until the core development team reassesses its value to the project. Back when these goals were being dreamt up in 2020, the landscape for Ryujinx was completely different; OpenGL was the sole graphical backend and shader stutter was the single largest issue that most users were facing. Over time, this changed. Shader caching, multithreaded shader compile and finally a full Vulkan backend have all but eliminated the need for assembly shaders as the path forward. Due to their being NVIDIA-specific and fundamentally very limited in their capabilities, the decision was therefore made to shift attention onto areas, like Vulkan, that would have a larger impact on everyone.
We would like to reiterate that this does not mean the implementation is being killed or otherwise removed from our longer-term roadmap, but the limited development resources we have are likely better spent elsewhere at the current moment. All we can do for now is apologize to our patreon backers and promise that we have some awesome stuff to show you all before the year is out; stay tuned!
$2000/month - Texture Packs / Replacement Capabilities - getting close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
3….2….1…. GO!
GPU:
To ease us into the GPU section let’s start with a classic from most comedy shows: a one-liner. The avid gaze of Xenoblade fans highlighted a discrepancy in Xenoblade Chronicles 2 between OpenGL and Vulkan in cutscene playback. We’re told that Rex doesn’t usually ask the barber for a back-and-sides fade but it rather suits him, does it not?

Either way, this was an inaccuracy and it turned out that, while we were creating a custom border color, there was no code line to actually pass it to the driver. As such all we can say is: our bad.

With the (then) upcoming release of a new Mario + Rabbids game there was a push to get the first Mario + Rabbids game actually booting in Vulkan on Nvidia. In a rare moment, AMD was not affected by this particular bug due to already being forced to use a different code path for blits. It just goes to show that in some cases, it helps to be so consistently buggy that developers write you a special path! For Nvidia, there was a DeviceLoss error being caused by some driver arguments being inverted and thus attempting to access data that was out of bounds. Flipping these parameters allows the game to finally boot on Nvidia Vulkan.

A recent regression from a refactor of the shader decoder had caused large graphical bugs to begin presenting themselves in both Sea of Solitude and Shadowrun Returns. Bindless elimination was failing to trigger due to the wrong instruction register being consistently used as the check flag. Fixing which register was being read resolves the regressions in both titles.
Sea of Solitude before:

After:

Shadowrun Returns before:

After:

After some changes covered last month concerning the conversion of quads (which Vulkan does not support natively) to triangles, Luigi’s Mansion 3 was having trouble keeping its minimap in one piece.

The map is meant to have six vertices with the final two being ignored, thus forming a quad. Unfortunately this was rounded up, instead of ignored, and ultimately formed two quads instead of one. By fixing this calculation, the minimap and some other miscellaneous issues found along the way, this could be quickly resolved.
Mario Kart 8 Deluxe was behaving naughtily when using Vulkan and creating an unbound number of graphics pipelines when blend constants were being used. The game seems to blend between colors at various stages of each track, and this was resulting in an inflated number of pipelines being generated. Limiting this behavior by using a reduced number of dynamic states reduces the cost of pipeline creation and can also reduce RAM usage by a small but noticeable amount.
Before:

After:

Let’s talk tessellation once again. If you’re getting déjà vu then don’t worry; we also covered it last month! Shortly after the release of Bayonetta 3 it was almost immediately noticed that, shock-horror, AMD GPUs were crashing the title. The fix involved taking a simple look at how OpenGL and its shader language GLSL was handling things. Tessellation control shaders are always indexed in GLSL in the same way using ‘gl_InvocationID’, which was not being done on the Vulkan SPIR-V control shader. What was more irritating is that this oversight wasn’t causing validation errors, and only AMD Windows drivers seemed to care. Not Nvidia, not Intel, not even AMD on Linux using either their proprietary or open-source RADV driver. Regardless, the outputs are now indexed with the same ‘gl_InvocationID’ method and we haven’t seen any more complaints from the driver.

Have a break, here’s a quick-fire round.
Rounding out the GPU section let’s speak in a language everyone understands. Performance!
NieR: Automata got a small improvement at release by passing SpanOrArray for Texture SetData to avoid a mass of copies that would bog frame rates, but otherwise still has some fairly large performance deficits to overcome in other areas.
The elephant in the room was actually presented by Mario + Rabbids Kingdom Battle; the first of the series, for those who don’t follow the franchise. By using a bitmap to track buffer modified flags instead of a MultiRegionHandle, games that bind HUGE buffers, in the region of 10MB and beyond, see enormous improvement as on paper the lookup becomes 64x faster. More games than expected actually exhibited this behavior so it ended up impacting a whole host of old and newer titles, including both Mario + Rabbids: Sparks of Hope and Bayonetta 3!

This table quickly shortlists a few titles that we instantly saw large gains in. However, as showcased by Zombie Army 4, there are probably a swathe of niche titles that are impacted that just haven’t been found yet.
CPU:
October’s CPU section is certainly more concise than last month's blistering streak, but some of the changes here are just as interesting.
Owners of either Intel’s Icelake (or beyond) and AMD’s new Zen 4 CPUs will be interested in how any of the newer instruction sets these architectures support fair if and when Ryujinx can take advantage of them. The largest and most well-known of these is of course AVX-512, but there are a few other interesting instructions that bleeding edge architectures can exploit, including ‘Galois New Field Instructions’; GFNI for short. While originally intended for cryptography, they can heavily accelerate general-purpose bit-shuffling operations, which are of great use in emulation. Initial support for these instructions have thus been implemented into the recompiler and on paper are generating much improved assembly.
We only have a couple of 32-bit implementations this month in the form of VCVTT and VCVTB. With these in place, Radiant Silvergun can finally head in-game and to no-one's surprise, it renders and plays great! Great, if you love retro titles that is.

Moving onto something a little more modern both in the instruction set and the games it affects, fast paths were added for… deep breath… A32: Vcvta_RM, Vrinta_RM and Vrinta_V and for A64: Fcvtas_Gp/S/V, Fcvtau_Gp/S/V and Frinta_S/V. Jargon aside, Super Smash Bros. Ultimate and Mario Strikers: Battle League both make extensive use of the 64-bit instructions included in these optimizations, with Mario Party Superstars possibly being impacted too. While there weren’t any obvious changes in our testing, the new fast paths could remove a bottleneck for lower-end CPUs or particularly tough emulation spots.
Mopping up some smaller changes before moving on; the rejit queue will no longer clear under certain edge conditions and IDisposable (the interface used to tell .NET that something can be disposed) was added to the Unicorn CPU test module.
Kernel/Services:
The battle against accurately emulating Horizon OS and its seemingly endless services and oddities continues this month, with a few notable additions.
The aptly-named ‘fatal’ service finally saw the light of day after nearly five years; a staggering amount of time considering that, internally, it’s service number one. While not as important for an emulator (we can already glean all the crash information needed via loggers/debuggers), it’s essential for future implementations, such as full error applets and guest error handling.

A memory corruption in the BCAT and FS read methods that was causing a crash in SWORD ART ONLINE: Alicization Lycoris was fixed, and the game now progresses beyond the title screen and into gameplay. No other bugs with this title were immediately obvious (other than being a disgustingly low resolution), but as usual we’re sure you’ll all let us know!

Some other more general changes include:
Filesystem services are the central pillar which allow titles to boot, save their data and gracefully interact with the Switch in general. ‘OpenDataStorageWithProgramIndex’ was a service we had up to this point been missing, and its partial implementation allows both Rollercoaster Tycoon 3 and MLB The Show 22 to both boot into in-game. The change doesn't currently support accessing any data outside the current program index, but there is no infrastructure there regardless; when we eventually find a game or homebrew that makes use of that functionality then the service can be fully explored.
Rollercoaster Tycoon 3:

We do not condone playing RCT3 and let alone the Switch port. Go out, buy Rollercoaster Tycoon 2 on PC and then get the OpenRCT2 patch. Thank me later.
MLB The Show 22:

This title still has issues. It runs extremely slowly on OpenGL and will crash on any GPUs with a local size of 1024 or less (any Nvidia GPU beyond 1000 series). If you have a 1000 or 900 series GPU then knock yourself out of the park with Vulkan.
Moving onto random bugs, a favorite of any software developer, we’ll start small and then work our way big. The old kernel implementation memory allocation method used to randomly try to find an empty region and allocate; if that failed, it would use a linear allocation. The issue was that the variable used to store the random address was being read as temporary storage within the allocation loop, and as such wasn’t the value zero when the random allocation failed. This could mean that the loop may actually be able to validly return an address in active use, causing a crash. In practice, the random allocation fails so infrequently that this isn’t a huge concern. Preemptively nipping this one in the bud reduces the crashes caused; more often in 32-bit titles such as ‘DoDonPachi Resurrection’ due to their smaller address space.

Let’s move away from the small-fry and onto a fully cooked tuna with side salad and complimentary open-bar, shall we? This month finally saw the death of some of the longest standing random graphics bugs, boot crashes and gameplay crashes possibly on record for Ryujinx.
It turns out the NvMap ID allocation service isn’t written with any level of normality by using an ever-incrementing counter for the ID. If you’re wondering “isn’t that super dumb because it could eventually overflow?”, you're not alone; what possessed Nintendo to intentionally create this potential point of failure is anyone's guess. It actually gets even worse because this allocation service increments by a value of 4 each time; effectively taking 4x less time for the counter to run out of valid IDs. However, in practice, this would require someone to leave their game running for months/years to constantly increment this ID to an overflow point. If anyone has a couple of years to kill and a spare Switch then may we suggest an experiment?
Either way, as an emulator we have to match hardware behavior even if we think it’s stupid… Luckily there are benefits! Let’s talk about some of those bugs I mentioned earlier:
- Animal Crossing: New Horizons no longer crashes randomly on boot without a save file!
- Various random graphical glitches in Animal Crossing: New Horizons were resolved. There truly has never been a better time to start a new island.

- The Legend of Zelda: Breath of the Wild will no longer randomly crash. Not much more to say about this one except for how annoying it was. Yours truly rode around on a horse for over an hour to test this one was fully gone!

- Random crashes when entering/exiting Pokémon Centers in Pokémon Sword/Shield are also tentatively fixed. This one is still very hard to test but we couldn’t replicate it after around 15 minutes of walking back and forth through the door, and we’ve received no further reports. Sanity only goes so far.

MISC/GUI:
As seems customary, we’ll finish off some of the changes happening in the outer orbitals of Ryujinx. Not everything is about that try-hard low-level nerd stuff like GPUs and CPUs!
We communicated last month that the Avalonia test builds finally had their auto-updaters fixed, but this was part of some more widespread efforts to turn a few of the pop-ups and windows into overlay dialogs instead of dedicated windows.

The same treatment was given to the controller applet dialog to reduce issues on Windows and various Linux distros when displaying transparency.

On the topic of the controller dialog: for many years, users have reported that it was appearing even when they had a seemingly valid control configuration. This was caused when the GUI signals to tell Horizon that a user had ‘disconnected’ controllers were passing incorrect data about the input state and, as such, the emulated Switch still believed there were other players connected. Passing the correct filtered data through our HLE input system should prevent this from happening and save us all a lot of stress finding phantom controllers.
Some smaller blitz changes:
One of Ryujinx’s earliest contributors, mageven, helped solve one of the more annoying issues our cheat system had this month, in the form of conditional inputs. Translated into English, that means cheats that require you to press a button combination. By correcting a simple logic error these types of cheats should work now!
Finishing with a quality-of-life change, support for volume hotkeys has also been added. Like the resolution scaling hotkeys before them they are, by default, not bound to any key press. ‘How do I use them then?’ you may be wondering. Our Avalonia UI has configurable hotkeys via a menu, but did you know all of our hotkeys have always been configurable anyway? Minus the GUI part of-course.
If you don’t mind getting your hands dirty and wish to map any of these “unbound” hotkeys without going through the hassle of downloading other builds, then you can simply:
- File -> Open Ryujinx Folder.
- Open the Config.json file in a text editor of your choice.
- Find the “hotkeys” section and add/edit to your heart's content!

Closing Words:
That’s all from us for October, but we have a sneaking suspicion that November is going to be one you should keep your eye on…
If any of you wonderful people reading this have an interest in helping develop on the cutting-edge of Switch emulation, then we’re always open for new contributors in our Discord or on our GitHub page! C# is the language and we’re told it’s somewhat like if C, Java and Microsoft all had premarital relations. If any of this sounds familiar, fun or something that could look cool on your GitHub page then we’d love to have you!
Until next time!
2022-11-12 01:18:50 +0000 UTC
View Post
September's report? You’re trying to tell me we just hit the 75% mark on the year? Madness.
This month marked not only the turn from summer to autumn and some major world events but, most importantly of course, the launch of Splatoon 3. With the holiday season fast approaching, that means game releases, game releases and…you guessed it, game releases. The characteristic eye-twitches that upcoming Pokémon games always bring to our development team are just taking root and Bayonetta hopefully won’t step on us at launch!
Before we dive into Ryujinx’s journey through September 2022, let’s take a moment to review our patreon goals and incentives. As a reminder, these features are not locked behind a paywall; all features mentioned below will be implemented eventually regardless. However if a goal is reached, then priority is shifted to focus on implementing that feature straightaway.
Patreon Goals:
ARB Shaders - Goal reached in April 2021.
Work is ongoing, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - getting close!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
3….2….1…. GO!
GPU:
It will come as no surprise that the largest GPU change this month concerns three items that, combined together, are relatively new to the Ryujinx spotlight: Vulkan, AMD GPUs and Splatoon 3. On release, many were pleased with how well the game rendered and played, but this was a glory that only NVIDIA GPUs could attain. Vulkan expects certain vertex attributes to be ordered in a specific way, and if games pass misaligned elements then this can cause a bit of a chaotic chain reaction to future vertices. By adding a method to change the stride of vertex buffers before they are bound we can avoid this issue and keep the Vulkan spec happy. Good news for the Team Red users.
Before:

After:

Not all issues need to be game-breaking in order to be notable, though, and Splatoon 3 brought us a rather interesting if not amusing bug this month. Some LDN users quickly noticed how the scoring system seemed to be awarding certain teams with some frankly outrageous scores.

It turned out that due to the unique way points are scored in Splatoon (how much map coverage you have), resolution scaling values above native were actually causing the game to believe there were more pixels covered in paint than there were. This quickly turned into a GPU arms-race on the LDN server to see who could get the all-time world-record before a “fix” dropped. Funnier still was scaling in the opposite direction, to resolutions lower than native, which made it impossible to actually swim in any ink, thus completely breaking a large portion of gameplay for many users. By scaling the SamplesPassed counter in accordance with the resolution, both of these quirks were resolved!
A couple of regressions were isolated in September, the first of which was resolving a bug in Fate/EXTELLA where all backgrounds had stopped rendering and simply presented a black void. Having narrowed the problem down to a broken blit between texture types, the issue has since been resolved in OpenGL with a separate fix for Vulkan to follow somewhat soon.
Before:

After:

Super Mario Party also suffered a blow recently, with rendering of the Spotlight Swim mini-game taking a hit. The red spotlight specifically seemed to be getting its surface illumination cut in half. This was a real head-scratcher as it didn’t affect the other two spotlights in the game at all. After some digging it was traced back to a previous optimization to shader specialization, and a quick fix to rebind textures if their format changed was added.
BEFORE:

AFTER:

Some titles seem to ‘prefer’ one host graphics API or another for rendering accuracy, performance or just legacy hardware compatibility. On the performance front there were a few games that ran a fair bit worse on Vulkan than on OpenGL. While we will continue to reiterate that Vulkan is not a silver bullet that fixes every problem under the sun and isn’t necessarily intrinsically faster than OpenGL, the dip in performance for these games was large enough to be considered abnormal. As with a lot of the most frequently reported issues, it all starts with Pokémon: specifically Sword and Shield which incurred a 20% performance hit by using Vulkan prior to recent changes. This was heavy for AMD and Intel users who may have been hovering 20% away from full speed, but without the luxury of a performant OpenGL driver.
So what’s causing this huge delta in performance and how do we fix it? It actually all starts at the Nvidia OpenGL driver because, as has been proved many times over emulation history, it’s pretty smart. The Nvidia OpenGL driver has a built-in mechanism to flush commands directly to the GPU when the queue becomes large which, while inconsistent, works really well in games where these flushes happen often. Vulkan, as usual, gets no such special treatment from the driver so the solution chosen here was to periodically flush the commands manually to reduce GPU<->CPU latency and make the time we spend waiting on the flush smaller and more consistent.

Above is a quick table of a couple of the games that benefit from the flush changes. Sword/Shield benefit the most but even Breath of the Wild is happy to breathe a little easier. The bottleneck in Sword/Shield was the time spent waiting on the GPU from the main GPU thread; Breath of the Wild, meanwhile, sees a reduction in time spent waiting on the GPU from other guest threads.
Remaining on the topic of Vulkan improvements: the R4G4B4A4 format had some components out of order and was causing all sorts of mischief with backgrounds and text boxes. Correcting this ordering manages the mischief in titles such as Ys VIII: Lacrimosa of Dana and Vroom in the night sky.
Before:

After:

Before:

After:

Let’s keep the Vulkan train going with some quickfire changes:
- The blend state is now zeroed if blend is disabled. This reduces pipeline recreation stuttering on AMD and Intel GPUs. The Nvidia driver was already very forgiving on pipeline misses in this scenario.
- Quads are now converted to triangles on Vulkan. As Vulkan has no native host quad support, our previous method of queuing one draw per quad was much less efficient than allowing Vulkan to render what it’s good at. Triangles!
- ViewportIndex is no longer output on SPIR-V if the host GPU does not support it. This allows older GPUs that may not conform to the latest Vulkan specification to play some titles that would previously crash on boot including Super Smash Bros. Ultimate.
Onto a more visible change, tessellation had a few notable problems even after the year-long testing phase of Vulkan. However, due to the recent release of The Legend of Heroes: Trails from Zero, the topic was brought back fresh into everyone’s minds and more specifically back into our Discord channels. As you can see, this probably wasn’t the immersive gameplay experience developer Nihon Falcom intended.

However it was indeed exactly what we suspected: tessellation struck again. Fortunately, by fixing a whole bunch of wrong assumptions and other SPIR-V related mis-steps, tessellation issues in a few games have been ironed out and they should be rendering accurately now.

Before:

After:

The Witcher 3: Wild Hunt before:

After:

As the Ys games seem to be popular in this report, why don’t we throw in another one? Ys VIII: Lacrimosa of DANA was a bit of a disappointment as users were presented with a wide assortment of rendering quirks. Sometimes it worked, sometimes it didn’t and sometimes it just rendered textboxes. Very annoying. Thankfully the game was so consistently broken, even in small ways, that reproducing the bugs and thereby finding the cause wasn’t as painful as other ‘random’ problems. By transforming shader LDC into constant buffer access in certain scenarios we can allow bindless elimination to activate in this case.
Before:

After:

September also brought the fixes to a crash in early intro cutscenes in Sniper Elite 3 by allowing the use of bindless textures with handles from unbound buffers. If that's a lot of words then allow me to simplify: game does weird thing = game crash, game still does weird thing = now game no crash. Somewhere in between those extremes we’re confident everyone is covered. The internal vsync signal (no not the screen tearing one you’re thinking of) was also changed in September to signal at precisely 16.667ms instead of just using Ryujinx’s swap interval. This fixes an issue in Tokyo Mirage Sessions #FE Encore where audio would slowly desync in cutscenes as the vsync timing slowly drifted away from the audio channel.
Capping out September's GPU section will lead us neatly into the CPU section as this final change works in tandem, with other things we’ll discuss later, to fix rendering in a couple of 32-bit titles. 1D and buffer textures use the exact same texture instructions on the shader so we need to get the actual texture directly from the GPU state and this was getting messy for Prinny: Can I Really Be the Hero and Prinny 2: Dawn of Operation Panties Dood (please never make me type this again). By resolving the scenario where 1D textures were assumed to be buffers, these games can start to render correctly.
Before:

After:

Not quite right still though is it? Let’s solve that.
CPU:
Not all graphical bugs are related to the GPU emulation, and this month saw huge progress for Ryujinx’s CPU emulation. As mentioned above we’re going to start with a change that, in combination with the latter GPU fix, resolved many rendering issues in 32-bit titles.
Due to an oversight in the original CPU tests for VLDn and VSTn, these instructions were not actually being accurately tested in all their modes. Fixing this revealed several failure points caused by an incorrect register value, in turn causing other values to be pulled from or sent to incorrect register locations. Addressing this incorrect register increment value fixes such a variety of 32-bit bugs it would require a whole list unto itself.
The two Prinny games, being the anchor here, were of course fixed by this change:
Before:

After:

But as with all of the best changes it affects a whole lot more. The following titles now render or have some major graphical bugs resolved:
No More Heroes:

No More Heroes 2 Desperate Struggle:

Olimar’s antenna and a range of other graphical effects now render correctly in Pikmin 3: Deluxe.
Before:

After:

This change also resolves abysmally poor audio quality in: Ni no Kuni, Double Dragon Neon and Sky Gamblers: Storm Raiders.
Would it be a progress report CPU section where we don’t list every new instruction ARMeilleure can now process? Executive decision: no, it wouldn’t.
If you haven’t guessed yet, the past few months have seen a 32-bit focus as it was by far the weakest area of our recompiler, due to the majority of Switch titles being natively 64-bit. However, as with all things Nintendo Switch, if you give developers the option to do weird stuff, they will do weird stuff. Quite a few Switch titles (usually ports of some kind) therefore opt for the 32-bit option, and can cause us headaches if the instructions they need are not accommodated in the recompiler.
Alright, so what’s new and what does it do?
- VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON were implemented and allow Dies irae -Amantes amentes- for Nintendo Switch, Baldur’s Gate, Icewind Dale and Star Wars: Republic Commando to all head in-game.
- ADD (zx imm12), NOP, MOV (rs), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions were implemented and allow the Vita2HOS homebrew to function again on its newest versions.
- Thumb (32-bit) memory (ordered), multiply, extension and bitfield instructions were implemented and allowed a few Vita applications to progress a bit further under Vita2HOS.
- T32 Vfp instructions were implemented and allowed some Vita homebrew to begin rendering under Vita2HOS.
RINT (vector) Arm32 NEON instructions were implemented which allow Ni No Kuni Wrath of the White Witch to head in-game if you provide a save file (Web applet is required otherwise). 
- T32 Asimd instructions were implemented which allow Vita homebrew such as a CHIP-8 emulator to boot and render. This one is actually insanely cool as the resulting scenario is essentially a PC emulating a Nintendo Switch, which is running a homebrew translation layer, which is running a PS Vita CHIP-8 emulator, emulating Breakout!

- PLD and SUB (imm16) on T32, plus UADD8, SADD8, USUB8 and SSUB8 on both A32 and T32 instructions were implemented and once more allow for more general functions of Vita2HOS to function, although there is just a chance some games may use them.
- A32/T32/A64 Hint instructions (CSDB, SEV, SEVL, WFE, WFI, YIELD) were implemented as Nops (do nothing’s) to avoid unintended behavior and crashes in games such as Meiji Katsugeki Haikara Ryuuseigumi - Seibai Shimaseu, Yonaoshi Kagyou (bit of a mouthful).
Wow. Lots of work being put in by a few different people to knock out so many new instructions in such a short period of time, not even taking away from progress in other areas as we’ve already covered the extensive changes GPU emulation received. We’re now in a much better spot in regard to 32-bit titles, homebrew and Switch -> PS Vita translation layers! That final one may seem niche, but projects like Vita2HOS really do capture the imagination.
After nearly a year in purgatory a cleanup of the rejit queue was merged, which saw the maintenance of that section of the codebase becoming easier and heralded the prodigal return of one of Ryujinx’s original CPU developers, who brought some excellent progress if you like your in-game videos to play at full speed. LDj3SNuD’s first port of call in September was implementing some managed methods of both the Saturating and ShlReg region of the SoftFallback class. You don’t need to know what any of that means but the effect is quite transformative in video playback.
Before:

After:

Astral Chain collaterally has its performance massively improved in certain areas of the game as pre-recorded videos coexist with normal gameplay rendering in places like the HQ lobby. We saw the AC intro improve from 23FPS -> 100FPS on a i7-10700K and the lobby take a jump from 30FPS -> 43FPS.

Not content there, some additional changes to the handling and isolation of Fpsr/Fpcr instructions further improved playback of full-motion video. The improvements are most apparent in titles like TONY HAWK'S™ PRO SKATER™ 1 + 2 whose intro is an extremely demanding FMV on lower end hardware.
GUI:
There are relatively few things to say about GUI development this month but one of them is major enough to deserve this section. If you’ve followed these progress reports for a while then you’ll be aware that we are trying to switch from GTK3 (via C# bindings) to a native C# UI framework by the name of Avalonia. It’s been a while since this journey began and at times it has felt like two steps forward, one step in a random direction. One of these was in the so-called ‘render window’; this is the section of the GUI that contains the OpenGL or Vulkan renderer and actually presents to you the game, app or homebrew being run.

The area highlighted in red has caused more than one headache and has already seen several revisions over this year. GTK currently handles this area as an embedded window which means that there is actually a second ‘child’ window simply being embedded directly into another separate window which houses the rest of the GUI. This allows full granular control over the rendering and means that the render window isn’t directly tied to the same update cycle as your GUI, a good thing for tasks like resizing and dragging the window around.
The first iteration of our Avalonia implementation did this also. But we soon noticed some strange oddities on Windows, including the ‘child’ render window having a separate focus to the GUI ‘parent’ window. For example, if you clicked on the game it would deselect the main window and break stuff like hotkeys and focus-specific actions. Not ideal. So, other options were explored over the course of 2021 and 2022, concluding with an implementation where the render window was instead a render layer being displayed as part of the main window. This seemed like the solution, as it resolved the focus issues and allowed the GUI and game to have full sync with each other in key areas like hotkeys and keyboard navigation. This, however, came at some significant costs.
- Because the rendering was now part of the parent window, that meant the entire GUI had to be hardware rendered. This made it impossible to switch GPU or graphics backend without restarting the entire application.
- Due to the above limitation, there was now no distinction between game and GUI. This wreaked havoc for overlays, recording software and other benchmarking tools that hook into graphics APIs. Most would provide a recording or performance statistics of the GUI itself rather than the game as there was no way to tell the difference.

- As the UI was now being rendered by the Avalonia layer we had effectively lost some of the control over the core rendering and presentation process. Frame pacing took a significant hit in a lot of titles and many users have been understandably concerned about the new Avalonia UI before this was resolved.

There has been much back & forth on how to best tackle this. Suggested solutions ranged from simply using a pop-out window (similar to Dolphin) all the way to potentially implementing something into the Avalonia project framework itself for our specific use case. None of these seemed practical nor solutions that our users would feel comfortable with for a supposed “upgrade”.
So. Are we back to square one? Well, yes! There was a second crack this month at returning to our roots with an embedded window and we’re pleased to say it’s been a resounding success. The interactions between the parent and child windows are not causing focus issues this time around and with the return of full render control comes better overlay support and a presentation experience on-par with GTK. Did I mention that it also removed over 3500 lines of code while only needing to add 800? Simplified and better.
General tidying of some bugs and UX improvements including some font changes, border additions and alignment fixes were merged this month, which should hopefully make things less of-centre or floaty.
Before:

After:

While technically an October change, we’re happy to note that the updater now functions correctly on the Avalonia builds, meaning that anyone who wishes to ‘beta test’ the new UI can do so with a fully self-updating build. Head over to our GitHub releases page and select the “test-ava” build for your operating system if you wish to give it a whirl.
KERNEL/SERVICES:
To wrap up a most productive September, let’s take a walk down the road that our kernel and services emulation took to reach us here.
A primary goal has been continued work on our network and BSD services as they affect a great number of games, even if we don’t directly connect to any real Nintendo servers. A null reference exception when launching Victor Vran Overkill Edition with guest internet enabled was fixed and a small oversight causing sockets to return incorrect result codes was resolved. Following on from this the methods the sockets use to poll was improved and SendMMsg/ReciveMMsg were both implemented in the bsd service for completeness.
Games that pack multiple titles into a single executable (think Super Mario 3D All-Stars or some other game collections) do some rather strange things when moving between their bundled applications. One such title is Prinny Presents NIS Classics Volume 3: La Pucelle: Ragnarok / Rhapsody: A Musical Adventure which requires the list of current users when it transitions into one of the actual games. Previously, the services required to do this were stubbed and so returned empty lists, causing a crash. By properly implementing the ListOpenContextStoredUsers service and stubbing LoadOpenContext, this title, and potentially others with a similar issue, head in-game.


An optimization to the placeholder manager tree lookup arrived this month with the primary aim of allowing games that perform a large number of memory mappings to shut down a lot faster. Previously games like Shin Megami Tensei V would take a considerable amount of time to close:

This is a 1 minute affair if you don’t want to sit through it!
After:

One more quick fire round until we finish:
Closing words:
And that’s all she wrote! We’re barrelling toward the end of 2022 and that means we draw ever closer to another Pokémon launch; nightmare fuel for us all. However, on a related topic, we are planning to release a new LDN build sooner than expected thanks to a number of the Splatoon 3 bugs listed in this very report being rather crucial for good LDN play; especially for those on AMD GPUs. We can’t provide an ETA yet but rest assured it will not be many months like the delay seen in the last release.
Finally, here is the usual sales pitch to anyone with a software development background, an interest in emulation/3D graphics or literally anyone who thinks they could contribute anything at all to the software package we currently provide. Core emulation, web development, GUI & UX improvements all the way down to simple code cleanups are all areas that make an emulator tick. We wouldn’t be where we are without generous people taking an interest in this field and dedicating some of their time to our funny corner of the internet. We’re always available on our GitHub and Discord!
Until next month!
2022-10-11 00:21:49 +0000 UTC
View Post
Another month rolls by the wayside and that means another progress report from your favorite Nintendo Switch emulator, nay, favorite piece of software ever developed. It’s hard work being this humble let us tell you.
To dispense with the pleasantries, what can you expect to read below? We’ve got the usual rolling improvements to our GPU, CPU, Kernel and Services emulation, a heck tonne of code-cleanup and finally a new LDN release.
Before all that let us talk to you about our sponsor. You guys!
Patreon Goals:
ARB Shaders - Goal reached in April 2021.
Work is ongoing, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
3….2….1…. GO!
GPU:
Switch games using the guest OpenGL driver are no strangers to appearing in progress reports, and this month’s is no exception. . Titles such as Digimon Cyber Sleuth, River City Girls Zero and Layton’s Mystery Journey would exhibit small texture corruptions usually in the form of lines or colored vertices where they shouldn’t be. This was ultimately caused due to the method Ryujinx used to flush texture data to the CPU being fairly inefficient and prone to bugs of its own. While fixing the underlying issue in the flush mechanism is important individually, the solution proposed to fix these OpenGL titles was to implement a new fast path for these texture data transfers which bypasses the flush altogether.
Before:

After:

Before:

After:

After the merge of Vulkan, owners of AMD and Intel GPUs soon realized that the Mii Editor applet was somewhat busted when the API was selected. While it worked fine in OpenGL, attempting to open the Mii Editor in Vulkan would either cause an outright crash or exhibit graphical bugs such as some visual elements not being shown at all:

This particular problem was tracked down to the assignment of an invalid bool constant (1 byte) which was then assigned directly to an integer (4 bytes). This was causing a SPIR-V parse failure on certain GPU vendors, so the fix here was to remove these invalid assignments.

On the topic of Vulkan: the delivery didn’t come without teething pain and a couple of minor regressions. One of these regressions manifested as flickering in Animal Crossing: New Horizons and Atelier Ryza on both Vulkan and OpenGL graphics APIs.


Both issues were traced down to an oversight during the final Vulkan rebase where, after the newly added sampler pool cache, texture bindings would use the wrong sampler pool. A single line fix was all that was needed here to pass the correct sampler pool as an argument.
Segueing off the word ‘textures’, there were a couple of more general housekeeping changes made this month with regard to defined formats and their associated format tables. ETC2 texture formats were added to Vulkan although, in an odd flip, it’s actually NVIDIA this time that doesn’t support their use. AMD and Intel owners should see Vegas Party, Radiation Island and any others that used said formats become playable.
The entire format table was also refactored to give meaningful names to the formats instead of using so-called ‘magic numbers’. A magic number in programming is a value that is used raw and has no meaningful context as to its use. Best practice would be to assign a constant to these magic numbers so that, in the future, contributors could understand what was meant. This isn’t always easy, especially in reverse engineering and emulation where you may not know exactly what something is until much later. Thankfully for us, our knowledge of the Switch has had years to mature; NVIDIA recently released a load of documentation too! Even for the many non-coders who read these reports, the difference in readability is night and day:
Before:

After:

Some color formats such as RGBX have no alpha (transparency) and, as such, when certain operations that expect alpha are applied, these should ideally not touch color formats that do not have the required component. This unfortunately is not possible to do easily but we can instead make them behave as if the alpha was always a static 1 (fully opaque). This resolves an issue in LA MULANA where the ground is rendered as a black rectangle instead of… well not a black rectangle.
Before:

After:

Xenoblade Chronicles 3 was an interesting release, to put it mildly. On the one hand, the game graphically seems to be holding up really well against hardware; on the other hand, stability has been hit or miss with not one but three distinct crashes, all centered around the menus.
The first of these to be investigated and resolved was unique to OpenGL and expressed itself in the form of a TDR (Timeout Detection and Recovery).
OpenGL does not allow a vertex buffer’s size to be specified, which can indicate that a function may call an address from outside the bounds of the buffer on occasion. To fix this, a second temporary buffer is created to accommodate these out of bounds access scenarios. Ironically enough this issue is well known; the most obvious secondary fix is actually the waterfall in Super Mario Odyssey.

Users have noted for going on 4 years now that triangular artifacts would be visible in the water smoke effects, also visible in Captain Toad: Treasure Tracker. This is caused by the exact same issue that results in the TDR in XC3, and showcases why you should always put off one thing today that may fix two things tomorrow!

Vulkan can still exhibit a DeviceLoss crash on the menus of XC3 (although we aren’t yet sure why, as the issue fixed here for OpenGL doesn’t impact NVIDIA Vulkan); there is also a final memory related crash independent of selected graphics API. Why they all chose to happen on the exact same menu, doing the exact same thing, and seemingly for completely different reasons, is beyond our understanding for the time being; we hope to have some more progress for you in the near future.
We aren’t done with XC3 yet though as a second fix was pushed through this month, this time specifically for AMD users. Not content to let NVIDIA be the sole offender for format support shortcomings this month, AMD does not support the RGB16 vertex format on Vulkan and, as such, a fallback format was needed. Using the RGBA16 vertex format if RGB16 is not supported resolved the instant crashes these GPUs faced upon trying to boot the game. A HLE macro for render target clears was also added, as the AMD driver protests vehemently against clearing individual slices.
If you hoped we were escaping Xenoblade territory, think again. A recent change brought about a regression with resolution scaling in XC: Definitive Edition, where things looked a bit wrong:

Some scale values were not being correctly updated across textures and images resulting in the above issue. The game now scales properly again.

A couple of more minor changes to close out this months GPU section include:
CPU
The CPU recompiler continues to improve, and this month witnessed the addition of a bunch of new instructions with convoluted-sounding names that are becoming a bit of a blur.
More 32-bit Thumb instructions:
- LDM/STM
- LDAEX/STLEX
- LDR/STR
LDRD/STRD
Even more 32-bit instructions:
The second batch were most interesting as they were now, seemingly out of nowhere, required by the latest Mario Kart 8 Deluxe update: 2.1.0. As a direct Wii U port Mario Kart 8 Deluxe is one of the few notable Switch games to run on a 32-bit instruction set, and so has tested us from the very start. With those new instructions in place the game returns to its usual splendor.


The SHA256 instructions specifically received hardware-acceleration treatment and a check was added to ensure an instruction supports vex encoding; without this there was a possibility of an invalid allocation.
KERNEL/SERVICES:
While Ryujinx blocks connections to Nintendo online services, that doesn’t mean we don’t have to battle with a slew of network oriented issues. Some applications react differently on boot if they “think” they’re connected to the internet: attempting to connect to servers, calling different services or just generally being annoying.
With this is mind, let’s glance at a couple of service implementations that were finalized this month:
Not one but two oversights in the network sockets implementation were resolved, with the end result of Minecraft being bootable while guest internet is disabled. If this setting is enabled the game will still crash; if Minecraft is a title that for some reason interests you on Switch, keep this in mind.

gdkchan also leveraged August to extensively optimize how the kernel looks up blocks of memory, migrating from a linked list to a red-black tree. If you haven’t taken a computer science course then these won’t mean much to you, but rest assured that the new method is faster by a fair margin. For those of you that have taken a CS course: a linked list has O(n) complexity whereas a red-black tree is O(log n), meaning that in the worst-case scenario the new method is dramatically more efficient at finding the desired memory block.
GUI/MISC:
Contributor CloneDeath this month took a mop to our codebase and decided it needed a spring cleaning in quite…a few…. different… areas, with other contributors joining the mix removing all sorts of unused strings, renaming functions and subtracting redundant code. Not the most glamorous of jobs, but with a project this big the basics are just as important.

We’re getting more translations and locale updates for the WIP Avalonia GUI than we can shake a stick at but we’re grateful for such wide community support in this regard! Polish and Japanese locales have been added with Japanese (yep again), Chinese, German and Turkish all receiving updates to include the latest strings. The list of translations itself is now sorted more intuitively, and each appears in their own native language.
Before:

After:

We have a longstanding falsehood that it’s time to finally own up to. It will come as an enormous shock to most people that the old “purge PPTC cache” button… did not actually purge the PPTC cache. We know, unacceptable. Only on the 18th of August 2022 (a date which will live in infamy) was someone finally brave enough to tackle such a flagrant abuse of trust. Instead of purging PPTC, the function actually queued a rebuild on the next run. This grave misprint has since been updated in the UI to reflect the real behavior. An apology video will follow shortly which will include tears and a commitment to grow from this experience.
Onto some quality-of-life changes your game directory list will now only refresh if settings relating to the list are altered. Think adding or deleting a directory, or updating your game to the latest version. Previously, even changing the graphics backend or increasing the resolution scale would cause a full reload: a death sentence for those with large libraries or making use of network storage.

So you have chosen… Death.
Cheating is a fundamental part of video gaming, so when some random cheats don’t work it can feel like the universe itself is trying to set you on the right path. Luckily we can just ignore the universe and resolve a small bug that was causing a few cheats not to work. Never forget your null terminators kids.
Closing us out we have a couple of cleanups from the July Vulkan merge. The voices of millions of Flatpak users all cried out at once and were swiftly silenced with a fix making sure some required packages were available on this build. Without it, even selecting Vulkan could cause a rather nasty crash.
For those unaware: Avalonia is actually a fully rendered framework and as such does make use of OpenGL or Vulkan to draw even itself. With the merge of Vulkan came the daunting task of not only making sure games rendered well, but also making sure it didn’t destroy all the work put into our GUI transition so far. Luckily the fixes needed were minor and sorted out in a flash.

Closing words:
If you’re one of those people who frantically Ctrl-Fs every progress report for the words: LDN, Mac or Vulkan 2: Electric Boogaloo (just me?) then we have good news on the LDN front and just a canned laugh track on the other ones. If you haven’t already seen, we released LDN version 2.5 in August which brings the LDN build up to date with master version 1.1.224. This means a lot more games are playable and more importantly every game is more playable for those poor souls stuck on AMD and Intel GPUs. As a result we’ve seen many more folks enjoying Splatoon 2 and Mario Kart 8 Deluxe on LDN (both shader-heavy games). For the download and a more comprehensive list of what’s changed since 2.4 check out the blog post here: https://www.patreon.com/posts/ldn-2-5-vulkan-70757628
As per, well, always, we’d like to extend a thanks to everyone who supports the project with their time, knowledge, money or enthusiasm. We really wouldn’t still be here without them. As an open-source project we thrive off of community involvement and external contributions; if you’ve got some knowledge in computer graphics, low-level systems or anything up to UI and web design, there is a place for you. Emulators can seem unapproachable beasts but there are truly endless ways to dive in.
Hope to see you all next month!
2022-09-08 22:55:45 +0000 UTC
View Post

LDN 2.5 has arrived!
Includes a Vulkan graphics backend which improves performance and compatibility drastically on AMD and Intel graphics cards running on Windows, plus all GPU vendors will enjoy drastically reduced shader compilation stuttering!
This version also includes all the enhancements and changes to the main Ryujinx build since November 2021, when LDN was last updated. There are too many to list here, but these are the most relevant fixes:
- Mario Kart 8 Deluxe v2.1.0 is now compatible with local wireless. Keep in mind this game requires a built shader cache to avoid disconnects.
- Monster Hunter Rise: Sunbreak is now compatible with local wireless. You'll be able to go on hunts online with other Ryujinx LDN users.
- Pokémon Legends: Arceus is now compatible with local wireless. Enjoy trading with other Ryujinx LDN users.
- Pokémon Brilliant Diamond/ Shining Pearl now boot up a lot faster.
- Anisotropic filtering set to custom values will no longer cause graphical bugs.
- Animal Crossing: New Horizons no longer requires "ignore missing services" on newer versions.
- Scanning an Amiibo no longer slows some games down.
- Super Smash Bros Ultimate no longer stutters on menus.
- Pokémon saves will no longer corrupt on Linux.
- Splatoon 2 v5.5.0 now runs on Linux.
- Firmware v14.0.0+ will no longer cause issues.
- Motion controls are now more reliable.
- Various performance and stability improvements.
Unfortunately, support has been dropped for Windows versions older than 2018. If, for whatever reason, you feel you must stay on an unsupported Operating System, then you will have to stay on LDN 2.4.
2022-08-19 20:44:16 +0000 UTC
View Post
What, a year huh? Captain, it’s only July.
So much happened in July that it felt infinitely longer than a measly 31 days. A new Xenoblade, Digimon (yep all 5 of you), patreon goals finally coming to fruition; we truly had a month to remember. So what do we have in store for this progress report? We’ve got a regular rundown of all the changes, a bit of a discussion on AMD (seems like a segment lately) and also, not to ignore the elephant, a big sign asking you to read the dedicated Vulkan blog post!
No point delaying further so before we start please take a look at our patreon goals.
Patreon Goals:
Vulkan GPU Backend - MERGED!
We made a dedicated blog post about this so check it out here!
ARB Shaders - Goal reached in April 2021.
Work is ongoing, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Cool? Let’s dive in.
GPU:
Resolution scaling fixes get us out the gate this month with riperiperi resolving a particularly annoying bug when scaling certain Unity games such as A Hat in Time and Cruis’n Blast. These titles would smear the screen with red when scaled beyond native resolution, making scaling these games a bit of a double edged sword. Luckily the solution here is to simply avoid scaling these rebel textures while continuing to scale everything else.
Before:

After:

Before:

After:

The SD Gundam Battle Alliance Demo booted at release but unfortunately crashed when in-game. The causewas tracked down to a bug in the fast path for DMA linear texture copies but by forcing the FindTexture method to only match linear textures with an exact copy height the game now boots and looks to function great.

An issue in how we manage sampler pools finally emerged this month with some poor performance in Super Zangyura. Considering how visually basic the game is, it was a surprise that it struggled to reach over 30FPS on modern hardware. The texture pool cache had a hard limit of 4 which means that if a game needs to use more than this one of the 4 is deleted and will need to be recreated when needed; causing some major performance dips on creation. By creating a sampler pool cache and better evaluating when pools can be safely disposed of, the performance in games that overload the limit of 4 is greatly increased. Most games only use a single texture pool so don’t expect this to affect much beyond.
Before:

After:

JUMP FORCE Deluxe is a game that has been so close to being graphically accurate for a fair while now with the only remaining major visual bug being large green lights appearing in the scene.

The issue here is a simple data loss error on the shader. As the decoder did not consider branch instructions in some scenarios the shader code generated would skip some vital information. By fixing this omission the gdkchan finally put a nail in this particular coffin.

On the topic of large light sources where they probably shouldn’t be: Tokyo Mirage Sessions had rather significant, shall we say, solar activity resulting in the game being virtually unplayable. This was an issue with the bloom the game uses and once again was an issue with a generated shader. By supporting conditional exits our retinas were saved!
Before:

After:

The new Monster Hunter Rise DLC required some difficult changes in order to render, which are gradually being rolled out. The first of these allow the game to reach its splash screen by extending bindless elimination to shaders that are combined with a constant handle.

The second of these fixes the flicker in the new Sunbreak DLC by extending the shader optimizer to propagate phi nodes. Unfortunately we don’t have an image comparison here, as to even render the game a few more WIP changes are needed. Hopefully in a future report there will be a full set of before and after comparisons for the entire DLC package!
Since shader cache 2.0, some old changes in the backlog were finally able to be completed, including the addition of alpha to coverage dithering. For anyone unaware: ‘dithering’ refers to an effect in computer graphics where objects or textures get this spotty look to them:

In retro video games this was used to add depth to otherwise muddy textures; it eventually became quite standardized as a way to make solid objects appear translucent when a camera shifts behind so as to keep the player in view.
Pokémon Legends Arceus makes use of dithering on environment models such as trees and rocks, but previously this effect was not emulated as it requires the new shader specialization introduced in the new shader format. gdkchan attempted a fix at this before the cache rewrite but without shader specialization the dithering emulation would need to be present on every single fragment shader; even if the shader would never use it. Those issues are all resolved in this new implementation.
Before:

After:

Some redundant allocations in the DMA handler were also removed this month which may result in some small optimization in FMV (Full-Motion Video) playback due to lower GC stutter during playback. Results here are hard to measure, but this hopefully removes one of the simpler bottlenecks in video playback.
To close out the GPU section we have the piddling addition of a Vulkan backend. Not too big, no serious changes here and only 40,000 lines of code. Rookie numbers, if you ask us. Anyway if you want to hear more of this topic check out the full blog post!
The flatpak needed a quick update to make sure it didn’t crash upon selecting Vulkan and we now avoid adding shader buffer descriptors for unused constant buffers; a change that affects OpenGL too but may give a minor performance bump for Vulkan.
CPU/SERVICES:
July also brought some new CPU and service fixes, implementations and stubs; the first of which are a couple of BSD (network sockets) fixes. The DontWait flag can now be used in the ‘Recieve’ methods and the case where the byte size options are exchanged over a network are now handled. The second of these changes is partially required by the newly released Super Mario Odyssey Online mod, and fixes a crash regression in Divinity Original Sin 2.

As we attempt to emulate the entire Switch, there are some services that most users may not even think of for a PC emulator. One of these is the aptly named ‘GetTemperature’ service which, you likely guessed it, reports back the Switch’s internal temperature in degrees celsius. This service was causing the homebrew launcher to crash during boot as it attempted to probe for this data. The stub sets a constant temperature of 42 degrees celsius (69 was taken apparently…) and this seems to satisfy homebrew’s environmental needs.

With the release of the Portal collection on Switch, two issues were found in both our service and CPU emulation. The first of which was an inaccuracy in our Vi services which were quickly resolved with RE from gdkchan and the second was the implementation of FCVT Half to Double conversion instructions. With these changes both Portal 1 and 2 boot, with Portal 1 being largely bug-free. Portal 2 still has some graphical issues with objects culling when they shouldn’t.


UI:
User interface development was hot in the street in July although not many regular users will have seen much going on. We’ve mentioned our shift to Avalonia multiple times by now and just recently we pointed out part 1 of the project being merged.
Well, part 1 can take its leave because July merged both part 2 and 3 thus bringing the new UI on-par with the current GTK implementation.
Part 2 implemented the complete settings windows:


Part 3 added the remaining context windows:

But while these were core changes, they very much opened the floodgates for fixes, additions and tweaks that any contributor can now bring. The project files were cleaned up once, twice and then some quality of life adjustments were made.
This included enabling ‘tiered-compilation’ for all projects. Avalonia is a fully ‘JITed’ framework, compared to GTK which is considered ‘Native’, and this means that the .NET runtime usually has to do a lot of work on the first run even after the program is compiled; tiered-compilation allows .NET to boot applications without applying all of it’s code optimizations, thus speeding up GUI boot-times and reducing app latency in general. As the program is booted more and more often, .NET can gradually start to perform these optimizations over time instead of dumping them all at once. This has been enabled for GTK too, but as a ‘native’ framework GTK doesn’t need to do nearly as much work on GUI launch as Avalonia does. We saw a reduction from 14 seconds down to 3 seconds the first time we ran the Avalonia application with this enabled.
‘Ahead-of-Time’ compilation also accomplishes a similar task as tiered-compliation to provide latency reductions by shifting some of the first-run costs into the compilation process. The downside to this approach are file size increases of up to 3x, a cost the development team felt was too much for what turned out to be a minor improvement over tiered-compilation.
Emmaus (lead UI developer) has also started to branch away from GTK now that feature parity has been achieved. The user profile editor now uses a single content dialog box instead of new individual windows per option. This means a much more fluid experience with less clutter and latency between opening and closing windows and also lays the groundwork for further such content dialogs for upcoming data managers (a save manager is already in the works).

A lot of other miscellaneous changes were made in addition to these including:
As you can likely see, it’s all kicking off and we already have an even longer list of bugs and problems to be resolved before Avalonia can become the default UI that everyone is greeted to when they boot. Once those are addressed, the process of changing the updater to deliver you all the new UI project instead of the old one will commence. For now you’ll have to remain patient while the work to make it a seamless experience continues in the background!
MISC/INFRA:
To close us out of the changes this month. our project’s infrastructure and QoL had a few major revisions including:
A particular change that a lot more of you will be interested in is the unofficially dubbed “Windows 11 FixTM”. Users noticed that upon updating to Windows 11 Ryujinx became nigh unplayable in some games due to large and constant stutters that dropped FPS for apparently no reason.

Due to the way Horizon OS operates there are a number of quirks that operating systems like Windows need to step around in order to be accurate; one of these is that we need to map memory in 4Kb sizes. Windows 11 appears to have a unique bug where this process can sometimes take hundreds of times longer than it did on Windows 10 and we still have no idea why this is.
By moving the memory unmapping handler to a native handler we can massively reduce the number of these problematic 4Kb mappings and thus salvage Windows 11’s woes.
“Hey Ryujinx Team, we heard that AMD finally fixed their OpenGL drivers?”
Well it finally happened. AMD has acknowledged the existence of OpenGL. As you can imagine it was a day of celebration. We ordered pizza, popped open the champagne, shared blissful memories of darker days when AMD GPUs only managed 13FPS in Super Mario Odyssey…

Wait, what? Not only is it still bad but AMD somehow managed to break rendering on the Odyssey too? Siggghhhhhhhhhhhh
To return to more serious discussion: it isn’t all as bad as what’s seen on SMO here, and in some games we do have to give AMD credit where it’s due for the dramatic OpenGL improvements in some cases.

Those of you who have read our Vulkan blog should be familiar with this graph but added here is the “NGL” dataset which you could read as “New GL” for AMD’s new 22.7.1 driver. Performance is bumped across the board everywhere we tested other than, weirdly enough, SMO, where both performance and graphics rendering regress. Mario Kart 8 Deluxe has a large performance bump but likewise suffers from brand new visual bugs.

Vulkan remains the best choice for AMD GPU platform, but at least OpenGL is a viable option in some titles. Metroid Dread is actually the outlier here with a staggering 269% increase over the old driver and even improving on Vulkan by 143% (although we’re fairly certain this is just due to an API bottleneck as even Nvidia sees much greater OpenGL performance in this title).
Either way we hope this clears up that question. We’re happy that AMD has finally done something to address their performance roadblocks but this should be regarded as just the first of many improvements. Claiming OpenGL is ‘fixed’ would be quite premature, and we’d all like to see AMD continue to focus on the remaining performance issues, rendering bugs and lack of extensions (this one applies to Vulkan too!).
Closing Words
That's about it! For those of you looking for Xenoblade Chronicles 3 news you’ll have to wait for next month, but the gist of it is that there are currently random crashes on opening the menu using both graphics backends. We have a fix in the works for OpenGL that may be merged sometime shortly, but Vulkan still needs to be investigated. Other than that it appears to run pretty well, you can check out the compatibility report here.
Emulators are built and maintained on thousands of hours of work encompassing everything from reverse-engineering, CPU and GPU emulation, service HLE and all the way up to UI and UX. Our door is always open to anyone who has an interest in applying themselves to a truly unique project at the cutting edge of both Switch emulation and C# in general. If this sounds even the slightest bit like your cup of tea, then check out our github or join or discord.
See you all next month!
2022-08-09 04:12:47 +0000 UTC
View Post
Half-way through 2022 already and time sure flies when we have some good games to play! Speaking of games, some real bangers were released this month and we’re happy to say that most of them work, either out of the box or with some small workarounds. Despite coming from Koei Tecmo (a name all emulator developers fear), Fire Emblem Warriors: Three Hopes, both the demo and full-game, ran flawlessly on Day 1 and only Mario Strikers stole our thunder!
We try not to advertise games being playable if a mod is required to bypass the intro, but the option is there for anyone interested, and the rest of the game is running great.
Here’s to more awesome releases in the second half, and without further ado, let’s jump into our patreon goals!
Patreon Goals:
Vulkan GPU Backend - still in progress.
A public test build is delivered and is available here!
ARB Shaders - Goal reached in April 2021.
Work is ongoing alongside Vulkan, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Let’s get started.
GPU:
Switch emulation is not something any group nor project has monopoly over, and the Skyline team are certainly putting their share of work to prove it! Co-lead developer Bylaws this month pushed a couple of fixes to Ryujinx; first of which resolved a possible race-condition, and the second fixed a long-standing bug in none other than The Elder Scrolls V: Skyrim. While possibly the most resilient game of all time to platform ports, it had never once got past the menus since it booted in 2020, and it turned out that a counter type used by Skyrim actually expects a semaphore (an alternative data structure used to help multi-threaded tasks), not just a constant zero to be released.


While still not in a perfect state, it’s always cool to see some of the more visually complex Switch games rendering and actually performing remarkably well!
Not content with just fixing one of the best selling games of all time, this change actually also completely resolves the abysmally slow speeds in Ys IX: Monstrum Nox and resolves a black screen graphical issue in Giana Sisters: Twisted Dreams. One very patient user noted of Ys IX ‘It took me 5 hours to get ingame’ , if anyone wanted a more concrete idea of just how slow we’re talking.
Giana Sisters: Twisted Dreams:


Continuing this month on his war against shovelware titles using endless GPU black magic, gdkchan put the finishing touches in place to fully fix both Perky Little Things and Genkai Tokki Moero Crystal H. Both games would simply present black screens due to a missed case in how multisampled and non-multisampled textures were handled. By allowing non-multisampled textures to inherit the same data as multisampled textures, Ryujinx will no longer read garbage data if this condition occurs.
Perky Little Things:

Genkai Tokki Moero Crystal H (before):

After:

Of course GTMCH (I am not typing the full name again) isn’t looking much better. Unless you’re a huge fan of the color… Sepia Skin? Luckily a small fix to *deep breath* instanced indexed inline draws, try saying that one quickly, actually allows the game to render something other than a single color.

A follow-up change related to indexed draws fixes the lackluster performance some games using them could have. Sometimes the emulator would draw multiple times for the same result, which added a huge amount of render-time before a frame was ready. By passing the index count to only a single instance, the excess draws were eliminated as shown below (far right column is the draw count).
Before:

After:

A video crash in the newly released LOOPERS was resolved by restricting the output rectangle to sit within a defined surface. Previously, if there was mismatch between the input and output surfaces, any data outside of the input range was technically ‘undefined’ and would randomly crash with an access violation.
Continuing on our theme of fixing visual novels, an adjustment was made to the draw texture fallback used on AMD and Intel GPUs so that certain games would not render their viewport upside down. As only NVIDIA supports the NV_draw_texture extension, Ryujinx needs to ignore the current ClipControl settings as they aren’t valid on non-NVIDIA GPUs.
Before:

After:

Alright for everyone who doesn’t care about visual novels, this month also had some changes and fixes for you too! The first of which affected one of the most hotly anticipated (and honestly disappointing) games of this year: the new Mario Strikers. Glossing over the as-yet unfixed crash due to the intro cinematic, the game mostly ran and rendered pretty well at launch outside of the animated 3D crowd. gdkchan jumped to the rescue and added support for some new forms of depth-stencil render targets (array and 3D texture), alongside fixing a bug that caused Ryujinx to ignore render target clears. With both changes in place the crowds now actually render and gameplay isn’t so lonely!
Before:

After:

A Hat in Time was another game that used to crash before the title screen, but weirdly enough only once the player had progressed a certain way through the story. A texture ID may not be valid when a shader compile occurs for a number of reasons, and so by checking this case before accessing the descriptor, we can avoid any unmapped memory crashes related to this.

Shader Cache 2.0 has been a largely net positive on playability of a lot of notorious games, but that doesn’t mean it didn’t come with some drawbacks. Due to the new shader specialization support, this property needed to be checked on every draw; this sounds costly and while overall it isn’t as bad as it first appears, there was a performance hit associated with it in multiple games, including Super Mario Odyssey and Xenoblade Chronicles: Definitive edition. riperiperi took it upon himself to have a crack at optimizing texture binding and shader specialization checks.

SMO and XCDE saw their performance return to pre-new cache levels, and while BoTW is performance limited by other factors, and hence was almost identical, there was a fairly large drop in FIFO, which is indicative of the emulated GPU being less loaded. Once the other bottlenecks the game experiences are shifted, this should see a nice payoff in the future. Feel free to check any games that felt slower after the new cache, hopefully they’re back to normal or at least close!
Nothing this good comes for free and it came at the cost of breaking resolution scaling for a couple of hours, before the new texture binding method was updated to take scaling into account, and certain titles like Super Zangyura required accounting for a complete pool change in the cache.
Before:

After:

CPU/KERNEL:
Our CPU section this month starts on some CS:101. For anyone not familiar with data types and more specifically how numbers are stored, there are a lot of ways to do it: integer, short, long, float etc. Previously Ryujinx used an unsigned (must be positive) short to store the operand uses count, which takes up to the number 65535. If you try and store a value higher than this, you get what’s called an “integer overflow”, where everything will go back to 0 again! Limiting yourself like this is mainly just best practice, as data types that store higher values usually cost more in terms of memory. Unfortunately, some games actually do require this extra data, and so the type was switched with an unsigned integer which caps out at a fairly ridiculous number of 2147483647, so there is little chance of ever needing higher!
Taiko Risshiden V DX now heads in-game and potentially others too (the Switch has so many gaammmmesss!).


Stopping emulation is currently a bit like playing Russian Roulette but with your task manager. The problem is that there isn’t a single cause of the issue, and as games have got more complicated and are doing different things, a lot of recent releases will deadlock on close. Two such problems were isolated and resolved in the CPU/Kernel space this month, one which was caused by an invalid access event while a memory mapping was taking place, and the second caused by a bit of a paradox! When the ‘TerminateProcess’ function is called it will try and kill all running threads. The issue here is that TerminateProcess itself is being triggered on a thread of its own. Has anyone spotted the issue yet? This bug prevented the thread that called TerminateProcess from being unscheduled, and deadlocked itself in an infinite cycle.
gdkchan closes us out of this section with a regression fix from the large memory aliasing change a couple of months ago that was causing memory crashes on windows. These could be triggered most often when attempting to use or switch between games after running another. Finally, the entire kernel memory allocator was rewritten to be a bit cleaner and more readable for our contributors and maintainers. There are no expected bug fixes or performance improvements here, but as always there may be some $5 JRPG that now boots. Remember kids, write clean code!
SERVICES:
Diablo II: Resurrected is a weirdly popular title that has been in limbo since the recent networking overhaul. After those changes, the game would crash on boot as the newer methods handled all read and write calls on the same thread, causing a deadlock if these were needed at the same time. By allowing the service to increase its thread count to 2, the game once again will consistently boot.

However this fix wasn’t all that was needed to prevent other problems. By nature of allowing a process to be multi-threaded, you need to then handle the cases where one thread is processing while the other is trying to respond. This exact issue was causing other games that made use of the socket services, like Pokémon Sword/Shield, to crash on boot. The solution here was to return to a single-threaded approach from these requests, but to add a flag to prevent the blocking issue that caused Diablo to deadlock. In the future, returning to a multi-threaded approach will be the more accurate way to handle this, but the changes required to make everything play nicely would be large and time-consuming. For the time being this solution meets every game’s needs!
‘TimeZoneRule’ in the system time services got some love this month, as its use around the codebase was highly un-optimal and required the use of copies everywhere it was used. By making this ‘blittable’ (giving it a common representation that requires no special handling between managed and unmanaged code) it can reduce JIT overhead in a few cases and give a potential boost to any areas where this may have been a performance bottleneck. This was followed up with a minor bug fix and a fix for how time zones were displayed on the UI.
MISC:
Everyone’s favorite section of quickfire changes!
VULKAN PROGRESS:
As stated in previous reports, the work on Vulkan itself is mostly complete, and if you are a proud owner of an NVIDIA GPU, then it’s a damn fine experience! However, as outlined from the offset, one of the major goals of implementing a Vulkan backend is to make sure it plays (somewhat) nicely with both AMD and Intel’s graphics cards and drivers. This is not a small task, and trying to fix certain bugs that are occasionally limited right down to a certain generation of graphics card, especially when all the developers can do is guess given they don’t own the cards themselves, progress in this front has been difficult to say the least. Ironically enough, it’s actually Intel here who should take a bow, because while there have been some bugs they tend to be consistent across architectures, a far cry from the frequency that AMD seem to be able to conjure them.
But…. before I spin myself into another AMD hate rage, let’s look at some stuff that has been tracked down and fixed for you AMD people!
Pokémon Sword/Shield lighting and shadow bugs.
Before:

After:

Mario Kart 8 Deluxe character shadows:
Before:

After:

The Legend of Zelda: Breath of the Wild just… in general.
Before:

After:

Fire Emblem Warriors: Three hopes:
Before:

After:

The final three issues were all Polaris-exclusive (anything RX 400/RX 500 and below), which was a real pain in the ass to track down. It turns out these cards just completely break 2D array textures with mipmaps when using the ‘ImageCreateCubeCompatibleBit’ flag. By forcing a copy and not using this feature, the issue can be resolved at the cost of slightly slower cubemap creation. The performance hit is not significant and should be relatively unnoticeable in most games tested. Thanks, AMD!
There are of course about twenty-thousand other bugs that Team Red™ have exclusive monopoly over but we are, hopefully, approaching the endgame.
CLOSING WORDS:
The first half of 2022 has gone and left us all too soon, but there sure are some killer games releasing in the second half! A new Xenoblade, Splatoon 3, Nier (somehow), PERSONA (!!!), a new Sonic game and yet another Pokémon. Should be an action packed few months, eh? Once again, thank you to all our contributors for keeping us going over the years! It’s thanks to you guys that we can hopefully see all of the above working on Day 1.
As always, it’s the HR recruitment time of the report! If you know some C#, .NET, 3D-graphics or low-level engineering, you too can help this year be as smooth and bug-free as possible. If that's all wizardry to you, then donating to our patreon or being active in testing and bug-reporting really does help out a bunch.
See you all next time!
2022-07-08 13:47:43 +0000 UTC
View Post
What brings together retro gaming, modding tools and .NET UI creation frameworks? If you answered “the Ryujinx May Progress Report” then give yourself a medal; if not then you owe us a box of cookies. Some nice ones!
Before we get down to business and into the meat of the changes that were implemented in May, give our patreon goals a look:
Patreon Goals:
Vulkan GPU Backend - still in progress.
A public test build is delivered and is available here!
ARB Shaders - Goal reached in April 2021.
Work is ongoing alongside Vulkan, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Everyone ready? Those cookies in the mail yet?
GPU:
The GPU section starts this month, alas, with regression fixes. Luckily, they were spotted in the rather popular titles of Pokémon Legends Arceus and Xenoblade Chronicles 2, and as we should all know by now, those people are vocal. The first regression was causing some considerable vertex explosions and other lighting bugs in Legends Arceus, courtesy of an oversight in texture sync where two or more threads encountered and tried to read the same memory region. Forcing both threads to wait on the data present resolves this, and with that we hope ‘texture sync’ is not mentioned in one of these reports for at least a few months!
Before

After

The second issue originated in the vertex buffer calculation fix mentioned last month, which was crucial to stop certain titles using massive amounts of memory on boot and just crashing. Unfortunately, while the solution worked great for those games such as for Super Mario 64 and Perky Little Things, some other games also make use of these buffers, and calculating them in the same way can and will cause graphical issues.
Before

If the draw is not indexed, then we cannot calculate the vertex buffer size in the same way, as the result is ultimately meaningless. By returning to the previous calculation for these specific unindexed draws, cutscenes in Xenoblade Chronicles 2 return to their prior form.
After

A bug in the shader recompilation stages (triggered with a driver change or cache version bump) was also squashed this month by forcing Ryujinx to prefetch the GPU capabilities before it started to translate your shader cache. The backend threader only expects one thread to actually submit commands at a single time, and when lots of them asked for the capabilities at once, it really was luck of the draw whether a crash would occur.
Alright let’s move away from stuff breaking and onto our tame GPU developer riperiperi. Some say that only snaking in Mario Kart DS can make him smile… Or that every Tuesday, at exactly 12pm, he sees visions of a blue shy-guy performing a satanic ritual. All we know is that he managed to get two new Mario Kart titles working this month and it isn’t a coincidence.
Let’s start at the beginning with Mario Kart 64. A game that, no offense to any boomers, genuinely sucks. It had to be said. Either way, the recent Nintendo 64 NSO service includes this game as part of its collection, and it didn’t take long to spot some problems with rendering in certain areas, like the restore point previews, the blur effect on the game-select screen and also the Mario Kart 64 jumbotron!

The official Nintendo Switch Vulkan drivers used in the N64 NSO rendering, and the Nouveau OpenGL drivers used in plenty of homebrew applications, make use of disabling viewport transformations which your system APIs do not support. As such, this needs to be implemented manually by transforming the vertex shader to match a provided viewport. If all of that was complete jargon then just know that this fixes a lot of games that used to have missing HUD or UI elements.

Before

After

Before

After

Moving forward in time we eventually reach Mario Kart DS, which can be emulated on the Switch using a homebrew emulator like MelonDS. This will crop up later, but for now we’ll have to ignore it and move straight to Mario Kart 7, for which the second of riperiperi’s fixes come into play.
Citra does indeed run on the Switch via homebrew, but there were some serious graphical issues when it was rendered in Ryujinx. The first of which was seen above with the viewport transforms, but the second provides alternative StencilOP enum values that the Nouveau OpenGL driver can make use of. It’s hard to spot but this does fix missing shadows in some 3DS games and potentially other homebrew emulators.
Before

After

CPU/KERNEL:
If you’ve followed along so far, you may have got the impression that there has been a fair bit of focus on homebrew and some rather unconventional things this month, and you’d be absolutely correct. The graphical stuff is just one side of a massive coin involving everything from official and unofficial emulation, retro games galore and… Super Smash Bros. Ultimate modding?
The road to these changes have been months in the making and as such we have one hell of a tale to tell.
The story actually starts in late 2020 with the release of Super Mario 3D All-Stars; an interesting release to say the least, as it contained 3 games that were all, at least partially, emulated! While both Galaxy and Sunshine were quickly bootable and in-game, Super Mario 64 remained a problem child for over a year due to its dependence on Firmware 10.0.0 and, specifically, the JIT (Just-In-Time) services it brought.

At the same time as Nintendo’s experiments with emulation there were already plenty of fan-favorite emulators being ported to the system, such as MelonDS, PPSSPP, mGBA and projects like Skyline and ARCropolis gaining steam in the modding community. What ties all of these together is actually how they utilize code memory syscalls and can actively generate or self-modify code on the fly.
So let’s work backwards from the game that started all of this. Super Mario 3D All-Stars. Specifically Super Mario 64.
- This game needs the JIT services implemented.
- For these to function they depend on the code memory syscalls to also be implemented and these services will also get a bunch of other stuff working.
Sounds great right? One issue.
- Without getting extremely technical, to accurately implement these calls Ryujinx needs to support what’s called ‘memory aliasing’, and it needs to support it on the fast memory mapping modes (previously only the slowest Software mode was functional).
gdkchan started the journey as such by rewriting a large section of the memory management system to support memory aliasing on the fast memory manager modes (3). Unfortunately, or fortunately, the new host APIs required for these changes were only implemented on Windows 10 and beyond. This marked the first death knell for Windows 7/8.
With this implemented, the code syscalls themselves could finally be implemented (2) and some real results were starting to take shape.
PPSSPP

Citra

MelonDS

Only the JIT services remained and here gdkchan chose a unique route. Normally we use a ‘HLE approach’ (High Level Emulation) for system services, where the service is reverse engineered and re-implemented directly in software. However, this particular service is quite different from anything else and is only initialized when a dedicated ‘PrepareForJIT’ function is called. In a really cool milestone, and to minimize the impact of running such a service when it isn’t actually needed, Ryujinx is capable of running the service directly off the firmware files in an ‘LLE’ (Low Level Emulation) fashion. Ignoring how awesome it is that our kernel and filesystem accuracy is capable of running real system services directly at playable speeds, this was the final puzzle piece in the 3D All-Stars issue. The N64 NSO emulator had also been released at this point so two birds with one stone and all that (1).



The modding plugin system Skyline and its most popular plugin ARCropolis also heavily abuse these services to elevate modding beyond simple replacements and into some truly wild creations. ARCroplois needed one further change to boot, which was the partial implementation of the GetProcessInfo atmosphere extension, but now functions at a core level.

List of mods used with links to their pages: https://pastebin.com/Rkj2eNE3
As is the nature of homebrew, we cannot guarantee future changes to any of the programs or plugins listed above will work forever, and as such we recommend that, at least for the time being, all crashes or unexpected behavior should always be cross-referenced with hardware before raising issues with either our own issue tracker or the tool itself. We’d also like to extend thanks to the teams behind the modding plugins and for any mod creator who have remained patient with our lack of Web Applet functionality, and hope that in the future such specific support won’t be required. So thank you!
Hope you aren’t burnt out because gdkchan isn’t finished with the CPU changes just yet! Following the implementation of memory aliasing he quickly resolved a Windows-exclusive memory leak, and then further refactored (basically re-organised) the CPU interface to completely decouple it from the core emulator.
Emulators are made of multiple parts, and ideally each of these parts should be completely removed from each other and accessed through ‘interfaces’ to allow a very modular design. Most of Ryujinx is designed like this as anyone who has looked at our source code repository would know:

The benefits here are obvious in that this allows people to take Ryujinx and slot in their own ARMv8 recompilers if they wanted, such as Unicorn or Dynarmic for debug or specialized purposes.
To conclude the month, gdk went on to rewrite the SVC handlers to use the new source generators that .NET 6 provides, instead of Reflection.Emit. This shifts a compute cost that used to be paid at run-time to compile time instead. It also has the further benefit of removing the last hurdle to make Ryujinx ready for .NET 7’s native Ahead-Of-Time compilation; a feature that should reduce startup times and improve program responsiveness/latency all-round.
UI:
It’s been a while since there’s been enough UI work to make a whole section for it, but these are the times we live in! May certainly has enough to warrant a dedicated section however with the merge of the first part of the hotly anticipated move to Avalonia.


As mentioned above this is just part 1, and further work needs to be undertaken before the UI becomes the default and GTK is banished forever. The current roadmap for Avalonia:
Part 1: UI GPU backend, Main window and App host (merged 15/05/22).
Part 2: Settings window and all its child windows and controls (currently open).
Part 3: Every other non-settings related window and controls.
Part 4: (if required) General cleanup and fixes. GTK begone.
Nevertheless, the Avalonia project was added to the build scripts for anyone who tests our pull requests, the build project itself was cleaned up and standardized with the rest of the program and a GTK specific DPI-aware workaround was removed to provide a slightly sharper image on systems with two monitors, as Avalonia handles this natively.
Shortly after, a system scaling bug was noticed where, if the user had set a scale factor above 150% in their OS, the framebuffer would only present 1/4 of the final image. By scaling this end framebuffer to match the OS scale factor too this issue was resolved.


INFRA/MISC:
As always the backend infrastructure that Ryujinx makes use of for everything from input, rendering and filesystem are constantly on development paths of their own. This month saw the following infrastructural changes:
Some HID services were also cleaned up and improved this month which resulted in RetroArch and potentially other homebrew being bootable. Ties in nicely doesn’t it!

Trendsetters as ever, this month also seems to have been the rallying cry to the death of Windows 7/8 in the emulation scene, and with good reason. Lack of driver updates, modern memory mapping APIs and lack of .NET 6/7 support are all problems that cannot be ignored forever, and so, on April 24th we made the announcement that support for those older systems running Windows 7, 8 and Windows 10 versions prior to 1803 would be dropped from June 1st 2022.
This started a bit of a debate online particularly on Reddit, but a warning was added to Ryujinx to warn those legacy OS users about the upcoming changes which were implemented on June 1st at 00:00 UTC. All Ryujinx builds now directly target Windows 10 and above in their build scripts, and we are no longer accepting issues from or offering support to users who are on those older OS’s even if by some miracle the software still boots.
CLOSING WORDS:
If February was the month of the CPU, April the month of the GPU then May surely is the month of ‘Cool-Stuff’. Emulators within emulators, homebrew titles running left, right and center and modding support being improved made it a wild ride. We always say this but anyone who’s joined us on this journey via patreon donations, code contribution or being active around the community truly cannot be thanked enough!
If reading this you’re an emulation fanatic like us and know some C# and .NET, our door is always open to see new code contributors who can take on anything from single-line typos to whole service implementations. If that sounds above your paygrade then simply giving us feedback, opening issues on GitHub or just reporting compatibility helps us out enormously too!
2022-06-09 21:19:53 +0000 UTC
View Post
April… a month deriving from the Latin word ‘aperire’ or ‘to open’. How does this relate to Switch emulation? You can probably think of a few metaphors, but honestly I just thought it sounded kinda interesting. This month we’re covering some major changes and also a pretty meaty section on the recent progress to the Vulkan backend, which was absent last month.
Before all that though, check out our patreon goals and progress toward them:
Patreon Goals:
Amiibo Emulation
Merged into the main build in March 2021.
While compatibility is close to being perfect, there are still some improvements to come for Amiibo which can be tracked on the associated GitHub issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles
Merged into the main build in April 2021.
Vulkan GPU Backend - still in progress.
A public test build is delivered and is available here! See the end of this month's report for some more details.
ARB Shaders - Goal reached in April 2021.
Work is ongoing alongside Vulkan, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Not yet met.
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met.
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Alright. Let’s go!
GPU:
We begin this month’s report on a sad, sorry tale. Rune Factory 5 received its western launch in April and from everyone at Ryujinx we’d like to extend our thoughts and prayers to anyone who is yet to recover from playing… A moment of silence.

Become one with the void.
Awful art direction and thinly-veiled satire aside, the game did have a few graphical bugs at launch which were sourced to an error in the GLSL shader generation stages. Multisample and buffer textures do not take an LOD parameter, and so removing this from the GLSL generator for these particular stages resolves some of the missing effects that users reported.

Pinball FX3 was a title that worked flawlessly if you chose to play on the base version of 1.0.0, but suddenly just presented a black screen if any sort of update was applied. The cause of this problem was narrowed down to some troublesome multisampled 2D textures. If the texture to be copied did not exactly match the texture in the cache (for instance if the format or size changed), then this match would simply fail and cause a beautiful void of darkness. Allowing the copy texture view to not exactly match the cache here fixed the bug, and all four of the Pinball FX3 fans out there can enjoy the updated version!

Primitive restart, a vital feature in vertex rendering (lines and stuff), was somewhat broken in Project Diva Mega Mix but rather strangely was completely fine when tested against Vulkan. It turns out that in OpenGL the feature was being applied to both indexed and non-indexed draws, whereas in Vulkan the specification explicitly ensures only indexed draws make use of this. Luckily the GPU itself keeps a register that controls this behavior, and by reading this register as false for non-indexed draws, the issues can be resolved in a similar fashion to Vulkan.

Before

After
A less visual improvement was made this month with gdkchan laying some groundwork for some interesting titles to boot in the future. Both Super Mario 64, in the 3D All-Stars collection, and Perky Little Things would make use of enormous vertex buffers when booting and subsequently crash with memory-exhaustion errors. This change went through a few different approaches to the problem as far back as December, but now the vertex buffer sizes are reduced by calculating them from the buffer type. This brings them well within a comfortable range that Ryujinx doesn’t complain anymore. Further changes are required for these titles to boot but they aren’t far off now!
From the future: Between writing that sentence and when you (yes you!) are reading this SM64 does now boot! The changes will be covered next month but for now here’s a candid shot!

‘The cake was a lie’ - Princess Toadstool
Kirby and the Forgotten Land was fully playable on Day…. -200 and something (we tested on a few old LDN builds!) but this was not without a general spattering of minor graphical issues. Text and some other textures were noticeably wrong. Some were overlapping incorrectly, some were being cut off, menus could flicker etc. And for all that, the fix in total was just a single line of code. Checking if the graphical scissor was enabled before clearing was the key here, as before, a clear would take place regardless of if the scissor was enabled or disabled, thus causing some undue side-effects. These namely manifested themselves into all kinds of text and texture clipping, incorrect transparency and general wrongness!

Before

After
If you have a younger family relative, are a massive fan of a mega-franchise from George Lucas, or are head-over-heels in love with small pieces of colorful plastic, then this month also had a game for you! Lego Star Wars: The Skywalker Saga brought another OpenGL exclusive bug into the spotlight in the form of fragment clamping. Usually all fragment values are clamped within the range of [0,1] and normally this is completely fine but as we all know it’s never that simple, is it? LSW:TSS makes use of a specific data type called an ‘SNorm’ which actually has a full range of [-1,1] and so when these were clamped to [0,1] problems would arise.

Luckily, having isolated such a fundamental problem, the fix was also written and approved on Day 1 with these ‘SNorm’ fragments being exempt from the normal clamping restrictions. Character and environmental reflections are now rendered correctly and no longer make everyone look like Force ghosts.

Since riperiperi’s implementation of ‘Texture Sync’ a couple of months ago, Xenoblade fans have finally had to touch grass and submit some meaningful bug reports for the first time. Some nasty and, more annoyingly, random flickering and lighting bugs could be traced back to this change and resulted in the largest black market trading of old Ryujinx builds I personally have ever witnessed. While this chapter of the games lifecycle was rather amusing, riperiperi finally descended into the depths of dice-roll hell to try and crack down on whatever the new problem was. It turns out there were a couple of flaws in the original implementation mainly revolving around some sync methods exiting early without resetting the action flags. These may have caused the action to never be registered, and thus break any texture that caused this for the remaining time it is on-screen, or in the runtime until the game was reset.

MY EYES!

Forced to redact the party in Paint on spoiler grounds…
By resolving these edge cases and giving some areas a general re-order the main regressions caused by the original change should overall be a thing of the past once again!

It’s fixed now. Source: trust me bro.
AFTER
Some smaller GPU changes also took place this month with the VMAD shader instruction being implemented by gdkchan, mainly used in homebrew applications that utilize Nouveau OpenGL as the guest API, merryhime took a crack at optimizing the Lop3Expressions and a general de-cluttering of the graphics abstraction layer (GAL) was undertaken.
Alright with that out of the way let’s talk about some game changers.
Shader Cache 2.0:
Shaders are an ever-present and immutable fact of modern system emulation. They exist, they make things look pretty, but as all of you are likely aware, they don't play nicely on a system they weren't designed for. While Ryujinx can translate these shaders into something your PC can understand, this process takes time, usually longer than the render-time of the frame, which causes stuttering. Project Salieri, our first implementation of a ‘Shader Cache’, managed to mitigate most of the problems a user would face when encountering the same shaders for a second time. That was in 2020 though. Games in 2022 are more commonly using shader types and specialization that the original cache just wasn’t designed to deal with and in these scenarios it is left solely up to your GPU driver to remember what it's already seen. This isn't ideal as driver caches are prone to invalidation and effectively start from scratch at every driver update. Shader cache 2.0.... take the stage!
This update, delivered by Ryujinx's creator gdkchan, aims to solve multiple issues with the original implementation, including but not limited to: smaller cache sizes, shaders using bindless textures now being cacheable, shader specialization and some other quality-of-life changes like being able to close the program while shaders are being loaded and faster shutdown times if shaders have been cached in a session.
Shader specialization fixing Yokai Watch 1 flickers:

Before

After
The following (non-comprehensive) list of notable games can now cache a bulk of their shaders:
- Shin Megami Tensei V
- Mario Party Superstars
- The Witcher 3: Wild Hunt
- Pokémon Brilliant Diamond and Shining Pearl
- Lego Star Wars: The Skywalker Saga
- Many more both past and future…
BDSP specifically is a unique case where the game seems to use over 2000 shaders just to boot. These could previously not be cached, and due to the rather sluggish nature of the OpenGL shader compiler, users have noticed since release that these games took an indecent amount of time to boot. With the new cache, from the 2nd boot onwards this pain will be heavily reduced; we noted boot times reducing from over 80 seconds to under 15 with a full shader and PPTC cache!

Wait for it… Winner: New Cache
There are other advantages to the new cache format too. One of the biggest being it lends itself well to flexibility between graphical API’s. This means that when Vulkan is integrated the cache formats can be almost identical in nature and this will be touched on more at the end of this report!
CPU:
While April was undoubtedly the month of the GPU, our CPU backend still received a couple of important improvements with a major fix in regard to Amiibo usage.
Merryhime implemented the T32 load/store single instruction set which is another step on the road to ‘Ni no Kuni: Wrath of the White Witch’, and perhaps some unknown others, booting. More changes are needed to get in-game but with each new error we’re ironically one step closer!
Amiibo emulation is in a reasonably solid state, but in a few titles users were experiencing massive slowdowns when the menu was opened, and these would persist until the game was hard reset.
This was a regression from a prior PR that optimized the tail merge passes and went under the radar for nearly a year. gdkchan stepped up to add additional checks in the tail merge methods which resolves these slowdowns. Some of the impacted titles include: Animal Crossing New Horizons, Kirby and the Forgotten Land and potentially other titles when used with Amiibos.

Pictured: Kirby demanding new plastic toys!
AUDIO:
Surprisingly popular sports title: ‘MLB The Show 22’ launched in April and unfortunately it isn’t currently in-game. However, similar to Ni no Kuni above, the steps toward this have already begun with the first being the implementation of multistream related Opus decoding functions in the audio service. Opus, according to their own webpage, is a “royalty-free, highly versatile audio codec… unmatched for interactive speech and music transmission over the internet” so there may be a considerable number of past and future titles that can make use of this addition.
Our audio renderer, Amadeus, also received two updates this month. The first was the laying of the boilerplate for the newest revision of the renderer, REV11, by Thog. This revision came with firmware 14.0.0 and changed the channel disposition for legacy audio effects such as: Delay, Reverb and Reverb 3D. Followup work is planned to fully re-implement these effects, but for now this change redirects them to the legacy system as a temporary solution.
The second change is directly linked to what was just mentioned with Thog starting the work to re-implement these functions, beginning with improvements and fixes to delay effect processing. No changes are expected in games with these audio adjustments, but the hope is audio-related problems in future titles can be avoided should they use modern revisions of the renderer.
SERVICES/MISC:
First-time contributor german77 resolved an issue in ‘Flip Wars’ where the game would insist that the controller was constantly disconnecting, while users were left baffled as other games didn’t exhibit this behavior, and as far as they could tell their controller was set up completely normally.

It turned out that certain games expect an AcquireNpadStyleSetUpdateEventHandle signal to be called during gameplay regardless of the status of the controller connection status. By returning this event, the random controller disconnects in this title were resolved and the game is now fully playable.

Input improvements continued with a second first-time contributor (but full-time complainer) Haronee fixing a long-standing bug in the native motion controls implementation. Users had noted that the axis that they held was oftentimes inverted, and if they held left they’d actually receive right or vice versa.

It turns out the problem really was as simple as it sounds. The Z axis was incorrectly attributed as positive when it should have been negative. This sign flip fixes issues in games where motion controls using the native setting (not cemuhook) would be backwards.

LibHac, the FileSystem service we utilize, also received a version bump this month to 0.16.1. This update fixes a regression where NSO titles such as NES and SNES online would crash on startup, adds support for reading XCI files that contain an initial data/key area and finally adds key sources for firmware 14.0.0 onwards. It is worth noting that this is required for FW 14.0.0 upwards to function, and be warned that if you share firmware between master and the older LDN build, LDN will start to get very angry and write lots of red text in the console!
To wrap up this main section we’ll enter the quickfire changes round:
VULKAN PROGRESS:
Ah, the section everyone seems so very interested in. This is my version of “watch to the end of the video!” in text format. Luckily this isn’t clickbait and we have a lot of progress to share with you on what is quickly turning out to be an AMD ‘Whack-a-mole’ experience. One bug gets squashed and three more games start to flicker! Before that though let’s talk about what’s going well.
NVIDIA Supremacy
A significant milestone was reached this month, with almost every title that the testing team possessed being functionally equivalent on Vulkan using SPIR-V and OpenGL using GLSL. So let’s dive into some titles that were once borked and are now not-borked.
Skyward Sword HD:


Ugly brown duckling no more!
Pokémon Brilliant Diamond and Shining Pearl:

These cash-grabs also finally boot! Rejoice AMD and Intel stans.
Luigi’s Mansion 3:
Virtually every major title we threw at it is now on equal footing to the OpenGL backend, but this comes with the caveat of saying ‘on NVIDIA GPUs’. More work is needed to fix some rather ridiculous bugs team red seem to be conjuring from the depths of hell each passing day.

The heat has made Mario rather prickly lately.

What does AMD have against Italian plumbers?
As usual if anyone works for/knows someone who works for AMD: can you take every opportunity to remind them that an Intel iGPU from 2017 is somehow causing less headaches than some flagship Radeon dGPU’s? Looking at you Polaris!
Shader Cache 2.0 continued:
Back to some good news, earlier in this report the rewrite of the shader caching system was discussed at great length, but details on how this would impact Vulkan were purposefully omitted as the changes here are extensive and part of the reason many more games are functional.
Shortly after the merge of the cache rewrite, gdkchan added the new system to Vulkan, which means that both multithreaded SPIR-V compilation and shader caching is fully functional across all vendors! Exciting stuff even for our testers.
Indirectly, the new cache also fixed some Vulkan-exclusive graphical issues, particularly when multisampling was used to render the image. The two most common issues were from Super Smash Bros: Ultimates title menu and Pokémon Legends Arceus’ town and battle style cards.

Before

After

Before

After

Before

After
That's a wrap! Some of the eagle eyed amongst you may have noticed how seamlessly we skirted around the mention of the newly released Nintendo Switch Sports, and to that I raise a glass. Unfortunately, it did break our rather considerable streak of day 1 playable first-party titles that dates back to sometime in 2020, but progress is being made! The game boots on our master builds but will promptly crash shortly after reaching the menus. gdkchan, juggling around five other projects, found the time at launch to get some fixes in place, which we hope can be finalized soon! Check out some horribly compressed footage below (spot the JoyCon drift challenge):

As usual, if you’re interested in emulation and know some C#, we’re always delighted to see new code contributors who can tackle anything from typos to rewriting entire systems! If that sounds above your paygrade then simply giving us feedback, opening issues on GitHub or just reporting compatibility helps us out enormously too!
See you soon!
2022-05-11 16:11:09 +0000 UTC
View Post
Here we are once more at the end of Q1 and progressing steadily toward the halfway mark of 2022! This month saw improvements to almost every aspect of Ryujinx, a new console is loose which got the emulation community all of a flutter and some killer new releases that, guess what, ran on day one!
Before all that though check out our patreon goals and progress toward them:
Patreon Goals:
Amiibo Emulation
Merged into the main build in March 2021.
While compatibility is close to being perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles
Merged into the main build in April 2021.
Vulkan GPU Backend - still in progress
A public test build is delivered and is available here!
ARB Shaders - Goal reached in April 2021.
Work is ongoing alongside Vulkan, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Almost there!
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx. All our contributors currently only work on the project in their spare time!
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Alright. Let’s go!
GPU:
Since the initial release of Pokémon Legends Arceus there was always a thorn in our side when it came to AMD and Intel GPU’s. While on the latest drivers the game ran great for them using our experimental Vulkan build the same could not be said if any user was unfortunate enough to use OpenGL instead. NPC’s and Pokémon alike would be invisible or have various graphical glitches which we’ve come to expect of those vendors. Rendering an image involves using, not one but, multiple shader stages which then sequentially form a pipeline in order to show you your elemental animals. If a shader stage needs a vector element from a previous stage that that stage did not provide then we start to run into problems as the contents is technically ‘undefined’. On Nvidia this wasn’t a problem as those inputs are always consistent across stages but for other vendors it broke things… a lot of things.

The fix from gdkchan makes changes to the shader translator to explicitly initialize vector elements that are used by the next stage, but might not be written on the current one. The effect is that while performance may be still be poor for AMD, the graphical output is correct.

There were a couple of edge cases to clean up here as while this was a fairly small change the shader translator is used in every single game, every single time an image is drawn. Initializing indexed inputs used on next shader stage and not initializing geometry shader passthrough attributes were therefore addressed in order to prevent regressions in a number of titles including Game Builder Garage (again…) and WarioWare: Get it Together!


When Brilliant Diamond and Shining Pearl were turned loose upon the world in all their gen 4 glory, users couldn’t help but notice the game took… a while… to… boot………………….
This problem is actually two-fold (and if all goes to plan a future progress report will explain the second half of the issue) but this report will focus on the first of these fixes.
Some games have a buffer usage pattern where it uses small adjacent buffers with increasing addresses. Right now this causes a new buffer to be created each time, merging the existing overlapping buffers. It’s effectively resizing the same buffer over and over, page by page which is very inefficient. The fix therefore simply allows the buffers to expand at a rate of 1.5x which, while possibly using a bit more memory, massively cuts down on the buffer creation bottleneck.
Other games benefit from this including Super Mario Galaxy which notoriously had an extremely laggy title screen for the first minute or so before apparently randomly jumping up to 60fps.
Before:

After:

Galaxy also got some more love in the form of the S8D24 texture format being implemented. This is used for starbit interactions throughout the game and so was a brick-wall against actually being playable from start to finish. Galaxy isn’t out of the woods yet but this was a major step in the right direction.
Before:

After:

The modern internet age is very firmly upon us and our users wish to record, stream and otherwise broadcast their gameplay around the world. There was a small snag however; most capture programs such as OBS or even simple overlay programs like RTSS make a lot of assumptions on how a program will present frames which do not always turn out to be correct. Especially when the presentation method isn’t what most native games would consider ‘normal’.


By explicitly providing capture software and overlays the information about the framebuffer and viewports riperiperi managed to mitigate most of these issues!


Ryujinx was the first Switch emulator to implement NVDEC video decoding back in 2018 but the implementation still isn’t perfect and the root of one of these limitations is actually in extremely old technology. If you were born after the new millennium then there is a chance you never had the delightful experience of dealing with interlaced video! Video resolutions are usually stated in such formats as ‘720p’, ‘1080p’ etc. and this ‘p’ stands for progressive scan. What some of the younger audience may not know or remember is that formats like ‘480i’ and ‘1080i’ also used to exist and the ‘i’ here stands for interlaced. Progressive scan draws each line of the video one after another to construct a whole frame while interlaced video only constructs every other line of the image in a single frame; this was a useful compromise to send less data over the air before digital signals but can result in visual artifacts depending on your deinterlacing method.

While I’m sure this is very interesting, a lot of you will be wondering what this has to do with the Switch, a console made in 2017 which is well beyond the times when progressive scan took the spotlight. As it turns out some games like to go for that authentic feel in retro-styled videos that actually do make use of interlaced video! NVDEC does support interlaced footage but our implementation, on top of FFmpeg, did not and was thus corrected this month. FFmpeg now provides us with the full progressive scan frame from which we can extract the even and odd fields to then reconstruct the original interlaced image, from here there are a couple of options to then deinterlace it back down to progressive scan so that it displays correctly on modern displays. BOB and Weave both produce artifacting in low and high motion scenes respectively (see the example below):
Weave:

BOB:

Therefore ‘Motion adaptive’ deinterlacing is used to try and mitigate the flaws of both methods and provide a more stable and artifact free final image. This allows games such as Layton's Mystery Journey to show its cutscenes in their full form!

To wrap up March’s GPU section gdkchan once more struck a home run by fixing an issue that plagued almost every guest OpenGL game on the switch. By ‘guest OpenGL’ here we refer to the rendering API the switch itself uses to display the game, not the API that Ryujinx is using to emulate this. The switch is capable of using its own NVN proprietary API (for most first party titles), Vulkan (notably used in 3D All-Stars and games like Hades) but also OpenGL. However a lot of OpenGL games had glaring, but very similar visual issues:
Zombies Ate My Neighbors and Ghoul Patrol

Putt-Putt Travels Through Time

Cartoon Network: Battle Crashers

Digimon Story Cyber Sleuth: Complete Edition

Snack World

As is apparent the problem in all of these titles looks to be the same and that holds true for the fix! It turns out the Tegras OpenGL driver has a bit of a quirk when copying data from block linear textures. This behavior is now fully emulated in order to fix the texture corruption issues these games exhibited.





CPU:
Improvements to the CPU recompiler continued in full-force this month with merryhime retaining the crown.
Aside from raw instruction implementations and fixes the recompiler is now forced to exit on trapping instructions to ensure no code after these are executed and a small fix to some kernel threading functions were applied to more closely match hardware. To finish up the order in which some memory barriers had been arranged were changed to align with conventional theory, no games should be affected here but these changes should make Ryujinx more resilient as an emulator and a better resource for many enthusiasts.
Audio, Kernel and Services:
Thog started strong in March, fixing a bug in the audio renderer which was causing a couple of games to crash on-boot such as ‘Mononoke Slashdown’ (which now heads in-game) and Paper Mario: Origami King which used to experience random crashes while in-game. Both of these issues should now be a thing of the past!

Animal Crossing: New Horizons continues its warpath to add endless new service dependencies with every update and 2.05 was no different. Ac_K once more descended into the trenches and implemented the ‘OLSC: GetSaveDataBackupSetting’ service which was introduced in firmware version 10.0.0. This allows Animal Crossing to boot if you have a valid save file and user profile attached to it… until the next update!

A second service implementation this month was only made necessary just recently when a lot of the Ryujinx’s networking infrastructure was recently rewritten. With the “Guest Internet Access” setting enabled Splatoon 2 would fail to launch and it turns out for all this hassle it’s basically just trying to check what time it is! ‘IEnsureNetworkClockAvailabilityService’ was thus partially implemented (we don’t actually contact Nintendo for the time!) and Splatoon 2 is non-the-wiser! The game can now be booted once more when this setting is enabled.
A small fix to the closing process of the emulator was made to mitigate NullReferenceExceptions when closing Ryujinx or stopping emulation and a limit to the number of events the ‘GetDisplayVSyncEvent’ service could signal was added; the latter of which allows .hack//G.U. Last Recode to achieve playable framerates with logging enabled as previously the VsyncEvent signal would be triggering multiple times and spamming the logger.

On the topic of massive slowdowns anyone who’s played Fire Emblem: Three Houses on Ryujinx will know that while most of the game runs extremely well there is a single outlier. And that outlier isn’t crazily animated battle sequences or scenes with 20 NPC’s on screen, it’s the calendar…

Yes that’s right, this one section of the game can bring Ryujinx to its figurative knees with a staggeringly swift 15 FPS on our systems. The culprit you may ask? Once again it’s those pesky network services that seem to be infecting every game on the planet these days. At a simple level the slowdown is caused when Ryujinx is constantly asking your PC what the local IP and other network info is, the game asks for this info and Ryujinx simply tries to give it the info it needs… every frame. The solution that first-time contributor JumpmanSr proposed here is to cache local network information provided by the host system and only update this when the network configuration actually changes. This way Ryujinx can give the game all the correct information without having to go through the expensive cycle of asking the host system every single time. The FPS improvement here is dramatic and finally resolves one of the last outstanding issues with this game!
MISC:
As one reddit user (who will remain anonymous) so graciously pointed out after last months progress report: “ctr+f Steam Deck: 0, lame” this month we will give a mention to the Steam Deck (already 2 results). While not the most powerful portable machine you can buy it has truly captured a lot of people’s hearts and minds including a couple of the development team who may receive their own sometime in Q2/Q3.
With this in mind Ryujinx became only the second .NET 6 application in the world to go on a diet and flatpak itself! Thog here putting in a lot of work behind the scenes to prepare Ryujinx’s infrastructure for this format, which for .NET applications is actually far from trivial. Either way Linux users can now find us on FlatHub and worry not about falling behind as our build-bot will auto-update the FlatPak too!

Onto some more GUI related adjustments, a feature that has been so highly requested that it’s had not one but three separate pull requests filed is the ability to hide the console logging window! Anyone who doesn’t wish to be plagued with the anxiety of seeing all the magic happening behind the curtain can now simply uncheck the “Show Log Console” box in options and never look back! This option is only available on Windows as Linux can already hide the console via CL arguments.
Another first-time contributor darko1979 implemented a new option to rotate the emulated analog sticks by 90 degrees which should make games that utilize sideways Joycons such as Super Mario Party much more accessible. Use this setting in conjunction with axis inversion to achieve the desired stick rotation!
To wrap up we’ll head into a quick-fire round of small additions or fixes:
Closing Words:
That's all we have for you this month so we'd like to thank everyone for their continued support as we pass the Switch's 5th birthday! In just half a decade we’ve already come so far and it’s majorly thanks to everyone who has backed this project over that time; this level of growth wouldn’t have been possible without you! As usual if you’re interested in emulation and know some C# we’re always delighted to see new code contributors and if not simply giving us feedback, opening issues on github or just reporting compatibility goes a long way!
Until next month!
2022-04-05 19:18:01 +0000 UTC
View Post
February has gone and left us all too soon once again. Who thought that 28 days was enough to make it a worthwhile month?
But rest assured what the month lacked in days our development team more than paid back in the avalanche of improvements, fixes, additions and ongoing project work!
Patreon Goals:
Amiibo Emulation - merged into the main build in March 2021.
While compatibility is close to being perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles - merged into the main build in April 2021.
Vulkan GPU Backend - still in progress, a public test build is delivered and is available here.
ARB Shaders - Goal reached in April 2021. Work is ongoing alongside Vulkan, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Almost there!
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Without further rambling from me, to quote a clichéd line, let’s just jump into it.
VULKAN PROGRESS:
So we’re back for another round and this month is a good one, if I may say so myself. Last month the SPIR-V backend had a bunch of new shader instructions added which started to get it into great shape for more general testing. However, while the shading language is a superb replacement for the painfully slow GLSL, it can get faster, and one way of doing that is multithreaded (parallel) shader compilation. This is a touch more complicated than on OpenGL, as with all things Vulkan, but riperiperi has taken the challenge upon himself and is already delivering some truly impressive results.
https://www.youtube.com/watch?v=C3R-SlijVSc
For titles that use many shaders simultaneously, parallel compile will have the largest impact especially compared to OpenGL using GLSL. This feature isn’t quite ready to be added directly to the main Vulkan branch due to some other bugs (thanks AMD) and it needing a general code clean-up, but we hope that user testing can start relatively soon!
Alright here’s another one:
https://www.youtube.com/watch?v=XaTAD8zwZVQ
GPU:
Improvements to the emulation of the Switch’s GPU are always the flashiest and most visual of changes and this bout of updates is no different.
Game builder garage had a recent regression and was showing all kinds of graphical issues that ranged from minor texture bugs:

To some quite major scene-changing problems:

Luckily our very own gdkchan was on the ball with a mere 2 lines of code to resolve these issues and make this a much more faithful experience to any budding game developer… any budding game developer playing via emulation anyway.


Moving onto some big hitters; Pokémon is a constant thorn in both my wallet and, until recently, my eyes. While the game was fully playable at launch it did not take long for users to start to notice some peculiar rendering in one of the games many caves:

This was an interesting problem as it was exclusive to OpenGL and so this eventually led to a whole rabbit hole of fixes that the Vulkan backend once boasted exclusivity over. gdkchan soon came to OpenGL’s rescue and fixed a whole host of issues; some of which have been long-standing indeed!
Fixes cave rendering in Pokémon Legends: Arceus.

Fixes outlines showing through geometry in Pokémon Sword/Shield.
Before:

After:

Fixes black water in Paper Mario: The Origami King.
Before:

After:

Fixes blue emblems on ships in Monster Hunter Rise.
Before:

After:

Fixes over-bright jellyfish in NEO: The World Ends with You.
Before:

After:

Unreal Engine games also got some much-need love this month with riperiperi implementing a new fast path for 2d engine copies (blit) which drastically reduces stuttering related to texture-streaming in some UE games such as A Hat in Time and Yoshi’s Crafted World while fixing the water in Fatal Frame: Maiden of Black water to boot!
Before (especially when entering the telescope):

After (all telescope stutter gone):

Fixes the water in Fatal Frame: Maiden of Black Water.
Before:

After:

Since Ack77 implemented the mii editor applet all the way back in June, Miitopia has been the game that lots of people wanted to take advantage of their own custom Mii in. Unfortunately while the majority of the game is functionally sound, there was a consistent crash that has been impeding progress since the game launched.

Picture of a black screen to visualize a crash.
The fix by gdkchan has been in review stage purgatory since September but it is finally possible to get through this door! Who knew 3DS ports were so annoying.

Changing gear slightly, as many of you are aware, shaders are small programs that run on the GPU and are used to make ‘effects’ happen on-screen. Puffs of smoke, flashes, you get the picture. There are a couple of cases where Ryujinx can fail to compile a shader and one such case was addressed this month by gdkchan where either the Texture is sampled with a depth compare later in the pipeline or the Texture pool type doesn’t match the sampled type. While we aren’t aware of any games that were affected so far by this particular shader mishap, now we have peace of mind that we will never know of any!
Our lead developers aren’t just targeting those overpriced AAA 3D games that bring both your switch and PC alike to max fan speed; if you were an avid player of ‘River City Girls Zero’ then you were in luck this month as a fix for a crash after a cutscene was opened… and then swiftly closed. Luckily this was due to gdk doing some hardware tests and discovering that, while the fix above worked, it wasn’t how the hardware actually behaved. The cause of the crash was due to the game not writing to the X/Y region registers and so the first fix simply zeroed them out. It turned out the switch doesn’t do this but instead employs the rather disappointing workaround of simply ignoring those region registers! A second more accurate fix from gdkchan was therefore put forth and accepted.

Enjoy being a River, a city, a girl or a zero.
AUDIO:
Since the merge of “Amadeus” Thog’s complete audio service rewrite a very long time ago in the fever dream that was the summer of 2020, Ryujinx has been relatively free of major audio bugs. But with 2022 comes new games, new firmware services and new problems to torment the developers.
The first problem child was with the hotly anticipated Nintendo Switch Sports Online Playtest (what a mouthful). Some of you may be wondering “what’s the point in emulating an ONLINE playtest on an emulator?”, to which we reply… Making sure it works day 1! Can’t lose our record that easily. Thog served an ace by swiftly allowing the game to output audio and then followed it up with a typo fix.
Skyward Sword HD on occasion can sound rather terrible especially on audio with high frequency effects. An adjustment to audren’s upsampler from nearest to cubic interpolation was swapped in by riperiperi and should improve the situation and act as a temporary stopgap until the sampling algorithm that the switch actually uses is reverse-engineered.
CPU:
Get a snack, turn up the mood-lighting and strap in for this because CPU improvements were hot in the street this February. Both gdkchan and this month’s MVP merryhime, the brain behind “dynarmic” (a popular dynamic recompiler for ARM written in C++), graced us with a whole host load of additions, fixes and optimizations which I will do my best to simplify.
Let’s first look at some of the new instructions ARMeilleure (Ryujinx’s dynamic recompiler) now supports:
Even more are still in the review stage and while this probably doesn’t sound too exciting the more instructions the CPU recompiler can understand the less likely your favorite upcoming game will crash on you when you boot it up day 1!
Something pretty cool to note of the thumb support specifically is that they are crucial in helping run and develop the new PS Vita -> Switch compatibility layer, aptly named vita2hos by xerpi (check it out if you haven’t already heard about it!).
As with any sweeping changes, there's always a chance you break something else. Luckily merry squashed a bug that was preventing games booting, while also fixing a potential issue in the Thumb instructions AND implementing single-stepping of instructions, which will help both the core development team and also any switch homebrew developers enormously.
To seal off her month in style, a final couple of BLX and BXWritePC instructions were fixed, and you can expect to hear more exploits in the next report!
Not to be completely outshone, gdkchan took up the CPU torch and added a limit on the number of uses a constant may have which was preventing “Deathsmiles 2” from getting in-game.

The game now boots and seems to work pretty well!
A second interesting addition by gdk was the implementation of CPU JIT invalidation (along with a quick PPTC version bump) which extends the existing region that can be invalided by the JitCache to actually remove functions that overlap a given range. This change is of particular interest because it begins to lay the groundwork for applications that dynamically load code in NRO’s or for code that is dynamically self-modified to function.
That last part is required for modding frameworks like Skyline and Acropolis as they take advantage of this self-modifying code to hook the game at run-time.
MISC:
While we’re on the topic of Super Smash Bros: Ultimate many of you will be aware that the game can stutter in a few places. There are three causes for this: the first and most common on first boots is shader compilation stutter, the second is NRO stutter at the beginning of each match and the last component was menu and character select screen stutter. This final problem was tackled by riperiperi this month with the addition of a dedicated thread (ServerBase) for FileSystem services.
Before:

After:

The root of the problem was that some file system services were blocking other services that did not have a dedicated thread and thus giving FileSystem its own prevents it from bottlenecking other services and causing stutters while they wait for the filesystem service to clear. This change also improves other games that suffered from filesystem related stuttering such as some Xenoblade: DE cutscenes, Fire Emblem: Three houses cutscenes and also for users who store their games on network drives or other external solutions.
Controller drift is a phenomenon that has come into the spotlight in recent years and one of the few ways we have of tackling this globally is via an increase to the emulated deadzone. However some users quickly noticed that the math we used to calculate how to apply the deadzone was quite flawed. See below a visualization of our old implementation:

Credit to Vegita2 for their amazing deadzone visualization tool!
This means that if you set a deadzone of say 30% it would be applied in both X and Y equally meaning that fine motions at the outer edges would also be considered “dead” due to X and Y being treated as separate components of the stick vector. At extreme deadzone values of above 50% this could result in the analog stick feeling almost like an 8-axis d-pad with only cardinal directions functioning properly.
Quick to jump on this problem, skrekhere implemented a new deadzone algorithm which now smooths the deadzone at the outer edges and makes fine control possible again even at massive deadzone values like 80%. Check out the new visualization below:

Backend infrastructure is something most people who have never worked on large projects never even think about but the work never stops there either. For some smaller tweaks this month turbedi was knocking it out of the park with: unused EnumExtensions being removed, optimizations to static data in C# compilation, swapping BitUtils with .NET BitOperation methods and finally collapsing AsSpan() operations to use a more modern approach with less, faster code paths. Small changes add up and we always encourage any novice-expert C# developers to look around the codebase because optimization is always possible!
Thog continued her reign of terror over infrastructure changes with a data type adjustment to PID and, thanks to merryhime fixing some GTK bugs in our dependencies, an update to the GtkSharp (Ryujinx’s GUI framework) which should speed up Windows build times and fix a bug where the menu icons would flicker when hovered over.

Hmmm… Crunchy…
Game icons in the GUI used to simply use their base icon for the preview but some games actually include an “updated” icon in their update files that would display on the Switch if you were to update a game. Ack_77 decided they couldn’t stand this inconsistency any longer and now Ryujinx will pull the game icon from the update file if one exists.
Before:

After:

Our filesystem wizard Thealexbarney (or as you may know him by his discord name ‘Moosehunter’) merged three changes this month. The first being a fix for a file system “permission denied” error that was plaguing users seemingly at random. This change runs some extra data fixes on any saves without a valid owner ID and should resolve most of these issues. Next was the removal of a lot of log clutter in the “ServiceNv map” creation spam that would take place in every game. This provides no information to the general user and no additional information when troubleshooting generic problems so it has now been shifted to the debug logs category where we wish it a very happy retirement.

The numbers Mason…. What do they mean!
Last but not least LibHac was given a bump to 0.16.0 which added support for reading NCA’s with compressed sections, increased resilience against invalid extra save data (may help recover from external “messing” from other programs/OS etc.) and finally fixed a FileSystem access control check. The first of these changes is perhaps the most exciting as it allows both Iridium and Gunvolt Chronicles to get in-game and it seems both are fully playable!


To no one's surprise our linux users are also very active in the development process and regularly submit PR’s with various fixes for 3-letter acronyms and words with lots of underscores in them. Continuing this trend, edisionnano pushed a fix for Ryujinx’s Backend Multithreading on MESA drivers where we were providing a string raw and MESA was expecting it to be lower-case. This simple problem was wreaking havoc when setting environment variables and so passing a lowercase argument is now enforced.
They also resolved a minor build bug where the platform specific binaries for the SoundIO audio would be shipped wholesale to all OS’s it was built on. This meant Windows and Linux would build binaries for each other that couldn’t be used and was ultimately just bloat.
A new contributor to the project, mlgatto, added a new trace-level log which gdkchan immediately made use of by moving all kernel syscall logs into and continuing in the vein of simplicity wss445566 fixed a long-standing typo for us further proving that just about anyone can help us out even if it seems small!
Ack_77 is going to finish off this report in style by stubbing some new service calls: mnpp:app, which seems to be a telemetry module for chinese consoles which was causing NES/SNES NSO to crash, and some HID services required for the aforementioned Switch Sports to launch.

To be clear, I still hate the miis.
CLOSING WORDS:
We know it’s tough times for everyone right now so we’d like to express our immense gratitude to everyone who has been contributing to the development of Ryujinx, whether it be through patreon, testing, or just being part of the community. Never could we have imagined the amount of support we’d get for this project and for that you all have our gratitude. Until next time, stay safe everyone!
2022-03-06 17:48:05 +0000 UTC
View Post
New year, new month, and new progress! We hope everyone has been enjoying the new year as much as we have. 2022 has been quite the adventure already with a brand new main series Pokémon title Pokémon Legends Arceus. Alongside that big release we’ve been working hard with the amount of GPU updates and bug fixes we’ve been able to do this month.
Patreon Goals:
Amiibo Emulation - merged into the main build in March 2021.
While compatibility is close to being perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles - merged into the main build in April 2021.
Vulkan GPU Backend - still in progress, a public test build is delivered and is available here.
ARB Shaders - Goal reached in April 2021. Work is ongoing alongside Vulkan, please wait a little while longer until we are able to deliver this update into a state we are happy with.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - flickering around this level!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is sustained: ~3-4 weeks
$2500/month - One full-time developer - Almost there!
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
So without further ado let’s get into this month's progress!
Vulkan Progress:
You people are suckers for this section aren’t you…
Well without further adieu let’s get into some of the juicier changes, fixes and additions to the Vulkan backend.
Final Fantasy VII had a rather… unique visual glitch on AMD cards:

If you haven’t played FF7 in a while it isn’t meant to look like that. Luckily this is included in our list of fixes, still a minor issue but much better:

Professor Layton shared a similar issue:

Became:

Direct SPIR-V (Vulkan’s shading language) compilation is disabled by default on the main pull request download to make testing easier but that doesn’t mean it’s been forgotten about in this report! Gdkchan has been hard at work fixing a mountain of bugs with the shader backend and it’s starting to pay some handsome dividends.
SPIR-V shaders are much faster to compile than the GLSL shaders that are currently used on OpenGL. Check out the comparison below (both without any cache):
OpenGL GLSL:
https://cdn.discordapp.com/attachments/694459861190311957/942937091216666744/ogl_Trim.mp4
Vulkan SPIR-V:
https://cdn.discordapp.com/attachments/694459861190311957/942937094886682654/Vulkan_Trim.mp4
*play both of them at the same time if your internet connection allows it!
If you’re looking at this and going “what’s all the fuss is about” then it may be worthwhile mentioning that the OpenGL video is using fully multithreaded compile while Vulkan is using single-threaded compile! That’s the power of SPIR-V and we may have more to share about parallel compile in the next report.
Breath of the Wild is our first case study. A game lots of more adventurous users were using as a check to see if they had managed to enable SPIR-V due to some well… Easy to spot problems!
Before (spooky):

After:

Next let’s check out a few games that straight up didn’t boot using SPIR-V before some recent changes:
Shin Megami Tensei V:

Monster Hunter Rise:

Paper Mario:

Well are you all sufficiently Vulkan’ed out for a bit? We hope you’re not too tired to continue because life outside of Vulkan was also getting some major attention!
GPU
Add support for render scale to vertex stage.
Games can occasionally read off textureSize on the vertex stage to inform the fragment shader what size a texture is without querying in there. Before Scales were not present in the vertex shader to correct the sizes, so games were providing the raw upscaled texture size to the fragment shader, this was incorrect behaviour. There’s two downsides to note from this, one downside is that the fragment and vertex support buffer description must be identical, so the full size scales array must be defined when used. Another is that the fragment texture count must be updated when vertex shader textures are used.
Fixes render scale causing a weird offset bloom in Super Mario Party and Clubhouse Games. (Clubhouse Games still has a pixelated look in a number of its games due to something else it does in the shader). This also fixes a regression where some line artefacts would appear if you upscaled games such as Hyrule warriors Age of Calamity.
Super Mario Party
Before:

After:

Hyrule warriors Age of Calamity
Before:

After:

Implemented by riperiperi in #2763
Texture Sync, incompatible overlap handling, data flush improvements.
This was quite the big update so grab your popcorn. This big update aims to solve a bunch of issues caused by texture modification and data flushing, by primarily handling data flush on a per-handle instead of per-view basis, and synchronising flushes with syncpoint increments and introduces a new backend method, so it's not currently compatible with Vulkan.
Part 1: Syncing Texture Flush
This change has been in the pipeline (wink wink) for quite some time. When texture flush via memory tracking was first added, our CPU and various other components were considerably slower. The GPU was so tight with the CPU that when the game tried to access the data it was almost certain that the draw would have completed and the data would be there, just by chance. This did not last, overtime people started encountering various issues where textures would flush before their data was fully ready, causing white water in BOTW (milk water):

Rainbow lighting in Splatoon:

This was compounded a lot by the addition of Background Multithreading and Vulkan, so much that people thought there was a regression. There wasn't, it’s just that as that one meme said… “Being faster than light means you can only live in milk and rainbows”.
To fix this we need to look at what is even happening in all this flushing and textures and other words I had to look up before writing this. When a game flushes texture data, there is usually some sort of notification to let the system know whether or not the data it wants is even there. If this system isn’t used then it’s all a roll of the dice and could end up in a race between the CPU and GPU. While this sounds cool on paper it’s a nightmare in practise and ultimately undefined behaviour, so some changes were made to the backend to ensure that: given a race, the GPU always wins.
Part 2: Flush ordering for incompatible data
As of right now, Ryujinx follows a core assumption with the texture cache, where a use of a texture should be valid in its own layout + format (imagine this as shape + size).
This core assumption keeps important texture data alive while saving time by not flushing or loading “garbage” data. Sounds great right? Well here’s the kicker; this rule wasn't fully established before and as always there are exceptions to every rule…
The issue lies in that incompatible textures only have the potential to be deleted when a new overlay appears, and any checks only happen when the texture is created. If both textures existed in the cache at the same point, they could flush separately... in any order.
This is where the new rules come into play. To put it simply, if a texture is written to and some other textures try to use its memory, their data is considered invalid. This means that only one can live at a time, and therefore data flushes will always use the latest available information. Problem solved! Note that these rules were already in place before, they were just enforced on creation, rather than on each use. This is a large sweeping change that affects every game.
Part 3: Flushing host incompatible formats
Switching gears slightly there is occasionally a texture format or relationship used which isn't fully supported. Two such examples are:
- ASTC compressed textures, which are not supported on desktop GPUs (other than ironically enough Intel iGPUs).
- BCn compressed 3D textures, which are not supported by OpenGL (but can be supported by Vulkan)
Ryujinx supports these formats by converting them to a supported, uncompressed format on the CPU. But, this means that data cannot be accessed directly on the GPU, which is quite important to… you know render stuff. As anyone could guess this isn’t ideal and was causing a whole host (my pun game is on fire!) of issues.
Life is Strange: True Colours, and potentially other UE4 games use ASTC textures for characters and environments:

BOTW draws into a compressed 3D BCn texture to use for the blue dissolve teleportation animation:

Before, Link would just disappear immediately as Ryujinx could not move the data for this texture. The change to fix this was not the fastest, but it is fully compatible and allows us to cover cases which were completely broken before and somewhere in the future allow support for platforms that don't support BCn like mobile hardware.
Life is Strange:

BoTW:

So after that rather lengthy section let’s look at some pretty pictures together.
You’ll need to head to Lon Lon Ranch for your milk now:

Splatoon 2 won’t play itself and splatter the whole map with rainbows (unsure if this is a W or an L):

We all learned a lesson here. I myself now know that if you constantly shout “CEMU MILK WATER GX2DRAW DONE” at riperiperi he would likely solve the climate crisis if it got you to shut up! And in a way, he did. Hyrule’s water runs clean again.
Implemented by riperiperi in #2971
Fix sampled multisample texture size
The width/height of the render target and copy textures is already pre-multiplied by the driver for multisample textures. For shader sampled textures that are on the pool, they are not pre-multiplied. This changes how they multiply their size by the multisample size, in order to allow them to match existing textures on the cache, in addition to allowing the texture to have the correct size (as the TextureCreateInfo that is passed to the backend has the width and height divided by the amount of samples).
Fixes rendering on Okami HD.
Okami HD
Before:

After:

Implemented by gdkchan in #2984
Implement IMUL, PCNT and CONT shader instructions, fix FFMA32I and HFMA32I
Ryujinx is capable of running a variety of homebrew applications, though some may not run as well as others. MelonDS, a Nintendo DS emulator introduced us to IMUL, PCNT and CONT shader instructions which we weren’t aware of before, the last two are similar to existing PBK/BRK and SSY/SYNC pairs. While working on implementing these the FMUL32I instruction implementation got fixed up along the way with modifying it so the third operand should use the destination register, not "SrcC'' as it does not exist for this instruction. An issue similar to the above one for HFMA32I as well, but this one was also missing from the instruction table so this was remedied.
Implemented by gdkchan in #2972
Fix adjacent 3d texture slices being detected as Incompatible Overlaps
The big changes Texture sync brought was quite big but some issue came up and caused the Xenoblade games to have odd colour grading. Essentially what was happening was the rendered 3D texture data was lost for most slices.
Implemented by riperiperi in #2993
Fix render target clear when sizes mismatch
On OpenGL and Vulkan when the bound render targets have different sizes, then it only renders on the intersection of all their sizes. On the GPU, this clipping is controlled by the ScreenScissorState pair of registers. This register was being mostly ignored before, but for clears, that may cause issues if there are render targets of different sizes bound, and the game is trying to clear one of them, with a screen scissor size that matches the target being cleared. OpenGL would clip it to the smallest size and not clear the entire region. This issue was fixed by forcing all other render targets to be unbound, to avoid the host clipping, and then using a custom scissor region, calculated from the screen scissor and user scissor (0).
Fixes Pathway not having the screen entirely cleared.
Before:

After:

Implemented by gdkchan in #2994
Add capability for BGRA formats
This adds a new capability called, SupportsBgraFormat. On OpenGL, it is always false as the API has no support for BGRA texture formats. However, it will be set to true on Vulkan, which allows us to use those formats there, without needing to swap the components ourselves on the fragment shader output. The main goal here is reducing the difference between the Vulkan branch and the current branch which makes reviewing much easier.
Implemented by gdkchan in #3011
Stop using glTransformFeedbackVaryings and use explicit layout on the shader
On the Nintendo Switch there are two ways to specify what should be written to the transform feedback buffers when the feature is enabled on OpenGL. The first and the one that we use currently is passing the name of the shader outputs to be written using the glTransformFeedbackVaryings function. The newer method is specifying it directly on the shader using layout qualifiers. This change implements the latter. The reason for that is that Vulkan only supports the latter, there is no "TransformFeedbackVaryings" function on Vulkan to specify that information outside of the shader. In fact, this code for this change was mostly pulled from the Vulkan branch. So, the main advantage here is reducing differences with Vulkan and Master, which will make review easier, and will allow us to use the same method on both APIs. One limitation of this new approach is that it's not possible to, for example, write the same output into multiple buffers (although, it may be possible to create multiple outputs and copy the value). But since games also have to specify the transform feedback layout, they should also be bound by the same limitations.
Implemented by gdkchan in #3012
Fix deadlock for GPU counter report when 0 draws are done
A few games on Nintendo Switch use what’s called conditional rendering, it’s where a game renders a different user interface (UI) markup if a condition is true or false. Sometimes a rare bug on Ryujinx would occur where reporting a counter for a region containing 0 draws could deadlock the GPU. If this write overlaps with a tracking action, then the GPU could end up waiting on something that it's meant to do in the future, so it would just get stuck. Before, this reported immediately and wrote the result to guest memory (tracked) from the backend thread. The backend thread cannot be allowed to trigger read actions that wait on the GPU when backend threading is enabled, as it can end up waiting on itself, and never advancing. In the case of backend multithreading's SyncMap, it would try to wait for a backend sync object that does not yet fully exist, as the sync object would exist according to the GPU and tracking, but it has not yet been created by the backend. The fix is to queue the 0 draw event just like any other, its _bufferMap value is just forced to 0, and it will be flushed with other events on the counter queue. This fixes the issues games with conditional rendering such as Super Mario Odyssey, Mario Kart 8, Splatoon 2
Implemented by riperiperi in #3019
Add support for BC1/2/3 decompression (for 3D textures)
The ginormous texture sync update added support for flushing incompatible overlaps that uses unsupported compression formats. However, only the BC4 and BC5 compression formats were supported. This extends it to support the BC1, BC2 and BC3 formats. This fixes broken textures on games using those formats with 3D texture, on OpenGL. Vulkan does not have the issue as it supports 3D compressed formats. Other changes include, added new "Supports3DTextureCompression" capability, always false on OpenGL but should be set to true on Vulkan, Changed Capabilities property on GpuContext to return the struct ref to avoid copies and also changed the Capabilities struct properties to readonly fields, also to avoid copies.
Removed the Bc1Rgb formats. They were unused (in fact they are pretty useless since there's no difference between the RGB and RGBA variants, other than the alpha component being ignored (can be done by setting alpha on the swizzle to one)) and finally, optimized existing BC4 and BC5 decompressors as well. BC4 is about 2.5x faster here, while BC5 is about 2.1x faster (tested with a randomly generated 256x256x2 3D texture).
Fixes text in Tales of Vesperia.
Before:

After:

Fixes explosions in Xenoblade Chronicles 2.
Before:

After:

Implemented by gdkchan in #2987
Fix res scale parameters not being updated in vertex shader
Before on Ryujinx, render scale arrays would not be updated when technically the scales on the flat array were the same, but the start index for the vertex scales was different. This fixes the issue by updating the scales in the support buffer when the vertex stage has bindings and fragment stage binding count has been updated since the last render scale update.
Implemented riperiperi in #3046
Add timestamp to 16-byte/4-word semaphore releases.
The Legend of Zelda Breath of the Wild had a bug where the game would act as if it running at 20fps in Ryujinx was full speed and going above that would make it go above its native frame rate, this was incorrect behaviour and it was a long standing bug that just stumped developers as the issue was strange, it turned out what was happening was the game was reading a ulong 8 bytes after a semaphore release, this is the timestamp it was trying to do performance calculation with, so its been made so it writes only when necessary.
This fixes BOTW being capped at 20fps all the time. (now it only does this when the game runs too slowly)
Implemented by riperiperi in #3049
CPU/HLE/Kernel
ffmpeg: Add extra checks and error messages
Some games use H264 video encoding which is displayed to the user via an ffmpeg context. If the system did not have the correct packages installed Ryujinx would crash in a null value error which of course wasn’t the best!
This adds some error checks and logging to inform users if they do not have the required packages installed and most importantly prevents a ‘random’ crash.
Implemented by Ac_K in #2951
CPU - Implement FCVTMS (Vector)
We’re still finding games both new and old that put ARMeilleure (Ryujinx’s CPU dynamic recompiler) through its paces.
This change implemented the FCVTMS vector CPU instruction which allows games such as XCOM 2 to now boot. The struggle against endless CPU instructions continues…
Implemented by Saldabain in #2973
Update to LibHac 0.15.0
LibHac updates are usually followed with a large list of bug fixes and new games that will now boot due to the improved accuracy of the filesystem!
However this time the new version was the equivalent of a spring clean with some reorganisation of the code and some minor changes. These changes are aimed at making future updates in the filesystem code much more seamless for our developers and ultimately our users too.
Implemented by Thealexbarney in #2986
sfdnsres: Implement NSD resolution
Fixes a missing implementation of NSD usage when being requested by a couple networking-related services ‘GetAddrInfoRequest’ and ‘GetHostByNameRequest’.
This is but one of many networking fixes in this report!
Implemented by Thog in #2962
Return error on DNS resolution when guest internet access is disabled
When gdkchan implemented a lot of network fixes last month in #2936 (yes the one that lets you all watch YouTube!) this wasn’t without a blood sacrifice. As it turns out some games, most notably Crash Bandicoot 4, try to connect to servers very early in their boot process. Prior to this fix the game would crash immediately if the guest network option was disabled as it would fail to lookup a DNS and error out.
This change returns to the old behaviour if the setting is disabled which allows Crash to boot successfully again.
Implemented by gdkchan in #2983
sfdnsres: Block communication attempt with NPLN servers
It’s been all over the internet lately that Nintendo are replacing their ageing ‘NEX’ server system with new ‘NPLN’ servers! Maybe smash can get rollback next time…
Some games such as Monster Hunter Rise were among the first to make partial use of this new system and more games will of course soon follow. This change simply adds the new servers to the internal DNS blocked list.
Implemented by Thog in #2990
account: Rework LoadIdTokenCache to auto generate a random JWT token
Many Switch servers use JWTs (json web tokens) for authentication. JWTs are a simple and standardized way to pass information between servers without storing it in a database.
This improves Ryujinx’s accuracy when using this call and brings it closer to the hardware implementation.
Implemented by Thog in #2991
bsd: Revamp API and make socket abstract
As some of you know and some of you don’t, Ryujinx is a project that is over 4 years old now and as such some of the codebase hasn’t seen the light of day for quite some time. Think about where you were 4 years ago!
As networking fixes were all the rage in early January it was time to venture once more unto the breach and back into the API and socket functions. The list of changes, updates and modernisations here is quite extensive but some highlights include:
- The socket implementation was separated from the IClient class (allowing for possible native implementation of the sockets in the future if needed)
- The IPC code of IClient was revamped to use more modern memory API’s
And my personal favourite:
- “...Probably more that I missed”
Implemented by Thog in #2960
ssl: Implement SSL connectivity
SSL, or for our readers who don’t have a background in networking, ‘Secure Sockets Layer’ is a protocol for establishing encrypted links between networked computers (this is the same protocol that gives you the little padlock on https sites!).
Some applications require SSL authenticated connections to boot/display things to the Switch and by extension Ryujinx. These include some games and most notably applications such as Twitch can now function correctly.
Implemented by Thog & InvoxiPlayGames in #2961
Fix return type mismatch on 32-bit titles
After a larger addition a few months ago that optimized tail merges in the CPU recompiler a minor issue could occur where the return type may not match the actual return type of the function due to the address being 32-bit, rather than 64. This would then cause an assert on the copy and cause mayhem!
This change resolves the assert and the problems causing it.
Implemented by gdkchan in #3000
kernel: Fix deadlock when pinning in interrupt handler
Even the best of us make small mistakes which start to seem quite major. A simple misplaced critical section leave was causing deadlocks on certain games such as DoDonPachi Resurrection and possibly other games too.
Luckily all that was needed was a basic rejig of only 2 lines of code and this was swiftly corrected!
Implemented by Thog in #2999
GUI/MISC
Add Cheat Manager
Cheating is such an integral part of video games that one of our developers felt that it should be integral to Ryujinx too. Cheats already technically worked fine before this change but they were always blanket applied and the users could not toggle them at runtime or select which cheats they wanted active from a large list, something the switch can do via cheat managers.
This change implements a cheat manager of our own that will parse your cheat files and allow these to be enabled selectively at runtime. Try it out in-game with Actions -> Manage Cheats (just make sure you have a valid cheat file placed correctly first!).
Implemented by emmaus in #2964
Implement analog stick range modifier
No controller is perfect, regardless of what PlayStation owners will try and convince you, and so over time their analog sticks are subject to wear and tear just like everything else. Deadzone adjustment can help to mitigate drifting of the sticks but just like humans in old age sometimes these old controllers just can't quite reach the same maximums as they used to be able to.
Range modification allows the controllers “maximum” input to be reached earlier in the axis to help old or strangely designed controllers to input full directions. Games like Super Smash Bros. Ultimate require full input to be reached in order to consistently dash and so this change also helps even brand new controllers perform such techniques more easily.
Implemented by MutantAura in #2783
Closing words:
So far 2022 has already been quite the eventful year for us! The final section of this report may have bored you silly with network jargon but it did allow a Direct x Ryujinx crossover!

It’s been a while since we talked about our UI rewrite in Avalonia but we’d like to assure everyone progress is still going strong and there’s even been time to make some of it quite fancy!

We’d like to thank everyone for their continuing support and we hope to be able to bring you more (on-time) gossip next month!
2022-02-15 20:47:54 +0000 UTC
View Post
This guide assumes you’ve followed our quickstart guide here: https://github.com/Ryujinx/Ryujinx/wiki/Ryujinx-Setup-&-Configuration-Guide
We want to clarify that Ryujinx works best on default settings. We only require users to configure controllers, game directories, and set resolution scale to their preference.
NVIDIA GPU on Windows:
Download the latest version of Ryujinx here: https://ryujinx.org/download/
Make sure your graphics drivers are not older than 472.12. If they are, and you can’t get updates for it, your GPU might be too old, but perhaps you can still run Ryujinx.
In case of black screen:
Make sure no overlays are active (like MSI Afterburner, RTSS overlay or Twitch Studio). Then, Right-click on your desktop, select Nvidia Control Panel, click Manage 3D Settings, and in the center-right corner, click "Restore Defaults”.
AMD GPU on Windows:
Note that AMD OpenGL with Mesa drivers on Linux will yield better results than our current Vulkan implementation.
First, download the Vulkan build from the bottom link in this comment:
https://github.com/Ryujinx/Ryujinx/pull/2518#issuecomment-890255424
Extract the downloaded file to a folder of your liking and run it.
Remember: this build does not auto-update. When it does get updated, the links in the above comment will get updated. To update manually, simply download the new build and extract it as you did the first time. You will know if it’s updated by checking the comment’s edit history. Notice the build number will differ as well.

Next, search for your GPU drivers here https://www.amd.com/en/support and download the optional 22.1.2 drivers.

Then, go to Radeon control panel > Settings > Graphics > Global Graphics and disable Radeon Image Sharpening.

Intel GPU on Windows:
Note that Intel OpenGL with Mesa drivers on Linux will yield better results than our current Vulkan implementation.
Keep in mind Intel GPUs are not very strong, and performance may not be the best for them.
First, download the Vulkan build from the bottom link in this comment:
https://github.com/Ryujinx/Ryujinx/pull/2518#issuecomment-890255424
Extract the downloaded file to a folder of your liking and run it.
Remember: this build does not auto-update. When it does get updated, the links in the above comment will get updated. To update manually, simply download the updated build and extract it as you did the first time. You will know if it’s updated by checking the comment’s edit history. Notice the build number will differ as well.

Next, make sure your graphics drivers are up to date. Here are the latest: https://www.intel.com/content/www/us/en/download/19344/intel-graphics-windows-dch-drivers.html
On Ryujinx, go to Options > Settings, then under Input settings, disable Docked Mode. This is in order to help with the GPU bottleneck.
Linux:
Download the latest version of Ryujinx here: https://ryujinx.org/download/
Recently, a bug was fixed that was causing saves on Pokémon games created on Linux to be corrupted. Any saves created between 1.0.7000 and 1.1.6 (including LDN 2.4) will likely be corrupted too, and will need to be deleted. If the save wasn’t created on Linux, then it’s fine.
We have a Linux channel over at our Discord in case something goes wrong: https://discord.gg/ryujinx
FAQ:
Why is my performance still bad?
Make sure to enable the following:
Settings > System > Enable VSync, Enable PPTC, set Audio Backend to SDL2, set Memory Manager Mode to Host unchecked.
Settings > Graphics > Enable Shader Cache, set Graphics Backend Multithreading to Auto.
Why did my game crash?
Come to our Discord’s support channel for assistance: https://discord.gg/ryujinx
Can I play on LDN?
At the time of writing, LDN does not work with Pokémon Legends Arceus. When the next LDN build (2.5) comes out, it will most likely be compatible with the game.
How do I unlock Shaymin and Darkrai?
You need to have played Pokémon Sword or Shield and Pokémon Brilliant Diamond or Shining Pearl at least once in the emulator to unlock Shaymin and Darkrai, respectively. Unfortunately, it’s currently not possible to insert the save files yourself if you don’t own these games.
If you like our work, please consider supporting us on Patreon.
2022-01-27 23:38:07 +0000 UTC
View Post
Happy new year everyone! Ryujinx wrapped up the final month of 2021 with a blizzard of bug fixes, GPU improvements, HLE updates, code cleanup, N64 emulation(!) and finally, general system stability improvements to enhance the user's experience.
Patreon Goals:
Amiibo Emulation - merged into the main build in March 2021. While compatibility is now almost perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles - merged into the main build in April 2021.
Vulkan GPU Backend - still in progress, a public test build is delivered. A lot is being worked on
ARB Shaders - Goal reached in April 2021. As seen from last month's progress report, work on ARB shaders has been going smoothly.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - Almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is reached: ~3-4 weeks
$2500/month - One full-time developer - Not yet met
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Vulkan Progress:
December has seen a lot of progress on our Vulkan implementation.
First, as you may have noticed, we have been telling our users not to expect immediate improvements on Vulkan compared to OpenGL under NVIDIA graphics cards, with the exception of a few graphical fixes thanks to features that Vulkan supports but OpenGL does not. But thanks to recent performance improvements on the backend, a few titles are starting to outperform Nvidia OpenGL. For example, The Legend of Zelda: Breath of the Wild is up to 25% faster in some areas.
OpenGL:

Vulkan:

There are a few other games that have seen improvements on NVIDIA with Vulkan aswell.
NVIDIA was not the only vendor that had improvements; Intel also had great performance uplifts in the past month, thanks to optimization that was made on the backend, and also several improvements to their drivers on Windows, which happened to fix bugs that affected Ryujinx too.
One example of this is Mario Kart 8 Deluxe, which still runs at about 30 FPS on OpenGL (varies a bit depending on the area and the view).
OpenGL:

Vulkan:

Below you can see a graph with tests on a few more games.

The games are Mario Party Superstars, Super Mario Odyssey, Luigi's Mansion 3 and Animal Crossing, respectively. The tests were performed on a laptop with an Intel i5 8300H CPU and integrated Intel UHD Graphics 630 GPU, and 16 GB of DDR4 RAM.

Above we can see a similar graph, this time tested with a AMD GPU. The tests were performed on a PC with an RX 570, Intel i5 7500 and 16 GB of DDR4 RAM.
Please note that the values on the graphs are percentages in relation to the target frame rate, not frame rate values. So 110 (%) there means that it can run at slightly above the intended frame rate. For a game like Animal Crossing that targets 30 fps, that means it can run at 33 fps. Please also note that performance on those games is dependant on several factors, including the area of the game and current amount of on-screen elements. Animal Crossing performance for example depends on the number of houses and other objects on the island. Our main goal here is to show how it compares to the current OpenGL backend on the same hardware and same spot.
Thanks to Harone for performing these tests.
We also had fixes for graphical glitches on the Vulkan backend in December. First, alpha test was implemented, which fixed the lack of transparency on a few games, including Mega Man 11 and New Pokémon Snap. The issue causing models on New Super Mario Bros U Deluxe to have black borders was fixed. Some texture corruption issues have been fixed, and Splatoon 2 can now load and run consistently on NVIDIA. A bug causing Shin Megami Tensei III Nocturne to render nothing other than the playable character was fixed. We are also working to fix issues that affect both Vulkan and OpenGL (like some texture related issues), but Vulkan is the most affected.
There are still some performance issues, like low frame rates on Super Mario Odyssey on AMD and overall low performance on Linux with NVIDIA and AMD, that we plan to investigate.
Please look forward to more improvements on next progress reports!
ARB FAQ.
Q: What is ARB?
A: ARB is a low-level shading language, created by the OpenGL Architecture Review Board, which can be characterized as an assembly language. ARB shaders are significantly faster on NVIDIA drivers than the OpenGL shader language (GLSL) that we currently use.
Q: What can I expect from ARB?
A: Users should expect a significant reduction in shader stutter and overall loading times. This will improve almost every game in terms of shader stutter.
Q: Can I use ARB?
A: If you have an Nvidia GPU then yes! Sadly, Nvidia is the only GPU vendor to make use of this shader language, as AMD and Intel drivers lack support for most of the required functions to make it work. While this is very unfortunate for our intel and AMD GPU users, work is ongoing to improve our Vulkan API backend and improve the overall user experience.
Q: Are there any downsides with ARB?
A: It depends on the game, some games may have some heavy performance loss with ARB enabled. Most users shouldn’t notice this in most games but there are a fair few that do have significant performance issues with ARB such as Animal Crossing New Horizons. We are working very hard to make the performance hit as little as possible.
Q: When will ARB release?
A: Soon.
Now let's get into this month's progress!
GPU
Fix FLO.SH shader instruction with an input of 0
An oversight on the implementation of this shader instruction made it produce the wrong results for a specific input value (zero). The implementation was corrected to fix this error. This error was found using our shader fuzzer that we mentioned on previous progress reports.
While we haven't found any game with visual improvements caused by this change, it's pretty likely that something was affected by it, since this instruction is often used for shader thread operations.
Implemented by gdkchan in #2876
Implement remaining shader double-precision instructions
Most games use 32-bit single precision floating point instructions for performance reasons, as 64-bit double precision instructions are slower and best avoided. There are, however, a few games that make use of those instructions, and one of them is World War Z.
Thanks to the shader fuzzer, we now have a way to test the shader translator without needing to actually launch a game and wait until the part where it uses a given shader with missing/broken instructions. Now we can just generate a shader with those instructions and use it for testing. This approach allowed us to easily implement all the missing double precision instructions, allowing the game to render correctly.
The implementation includes DMNMX (Double Min/Max), DSET (Double Set), and DSETP (Double Set Predicate), and for double-precision operations on the MUFU (Multi-function) instruction: RCP64H (Reciprocal 64-bit high half) and RSQ64H (Reverse square root 64-bit high half). Finally, this fixes the immediate operands on all double-precision instructions. Before it was being interpreted as the higher 20-bits of a float value converted to double, when it should be the higher 20-bits of a double value.


This allows World War Z to progress further, but it still can’t progress past the menu due to a few other errors.
Implemented by gdkchan in #2845
Move texture anisotropy check to SetInfo
Some games on Ryujinx have texture/sampler counts when anisotropic filtering is not Auto (notably Unreal Engine 4 titles). Rather than calculating this for every sampler, this change calculates if a texture can force anisotropy when its info is set, and exposes the value via a public boolean. This should improve performance on games with heavy texture/sampler counts.
Implemented by riperiperi in #2843
Fix SUATOM and other texture shader instructions with RZ dest
This is another shader issue found with our fuzzer. The shader translator would produce invalid code if the shader contained a SUATOM (Surface Atomic) instruction with a RZ destination register. This has been fixed and now a valid shader is produced for this instruction encoding too.
Implemented by gdkchan in #2885
Add support for releasing a semaphore to DmaClass
If you've tried Undertale on this emulator before, you may have noticed that on some specific sections of the game, it would slow down a lot, to the point of being unplayable. This was not the only affected game: several other OpenGL games had a weirdly similar pattern, where they would pause for about 10 seconds and then continue.
The fact that it would pause and then continue after this specific amount of time was indication that the OpenGL driver was waiting for something, but whatever it was waiting for did not happen, and it would just give up after 10 seconds. It turns out that the driver was waiting for a semaphore release operation that did not happen. With the operation properly implemented, the freezes no longer happen and Undertale runs at the correct speed.

This also fixed some graphical issues, like for example, the thumbnails missing from save games on some visual novels, and a softlock on a specific level of the game Record of Lodoss War: Deedlit in Wonder Labyrinth.
Implemented by riperiperi in #2926
Fix for texture pool not being updated when it should + buffer texture fixes
This one is a batch of fixes for buffer texture related issues, but the most notable one was the black vertex explosions in some Unreal Engine 4 games. The cause was an incorrect buffer texture being bound, due to it missing some changes to the texture pool (the region of memory where GPU texture information is kept).
One of the affected games was Dragon Quest XI S, see the screenshots below for a comparison.
Before:

After:

Also fixes black textures in Balan Wonderworld Demo...

...and flickering black textures in SnowRunner.
Before:

After:

Implemented by gdkchan in #2911
Fix I2M texture copies when line length is not a multiple of 4
The Switch GPU has an engine called Inline-To-Memory (I2M) that is used to push data to GPU memory. The data is submitted on the command buffer, at a granularity of 4 bytes. That's because the command buffer data is divided into 4 bytes values.
This actually imposes a limit on the data that is submitted using this method. Since the data is divided into 4 bytes values, all the data submitted must be padded to align to 4 bytes. For textures, it means that each line of the texture must have its size in bytes aligned to 4. If we take a RGBA8 texture, which is a very common format, we can see that it is already naturally aligned to 4, since on this format, each component takes 8-bits (1 byte), and it has 4 components (red, green, blue and alpha). But if we take a format like R8 (still 1 byte per component, but only one component), then the format no longer naturally aligns. Depending on the width of the texture, we may have a line size that is not a multiple of 4.
The issue here is that the emulator was simply ignoring the padding, and assuming that the data was supposed to go into the next line. This would create some sort of staircase effect where all the lines of the textures were misaligned. The fix was simply taking this padding into account and skipping it.
To see what it looks like in practice, we can take a look at Cat Girl Without Salad, one of the affected games.
Before:

After:

Pay attention to the subtitle text.
You might be wondering why only the text was affected on this game. The reason is simply, the font uses a texture with R8 format, since it only requires one color channel (the text only has a single solid color afterall). This format, as explained above, does not use a multiple of 4 amount of bytes per pixel, and dependng on the texture width, it could trigger the bug.
Implemented by gdkchan in #2938
Fix DMA copy fast path line size when xCount < stride
This fixes an issue related to texture copies. In some specific cases, the copy could be out of bounds, causing a crash. The specific case triggering this was a linear texture, where the copy region width was less than the stride (amount of bytes per line) of the texture.
It was causing random crashes on the YouTube app for the Switch, and might also affect a few other OpenGL games.
Implemented by gdkchan in #2942
Flip scissor box when the YNegate bit is set
GPUs support a feature called "scissor" that does what the name would suggest: it cuts one region of the output image, or more precisely, it restricts the rendering to the region specified by the scissor rectangle. Anything outside that region is simply not rendered.
On OpenGL games and apps, it was causing issues because the coordinates of the scissor rectangle were inverted, so the region being cut was completely incorrect. This is because there is a register controlling if the origin point is at the top or the bottom of the image, and since that register was being ignored, it was using the wrong origin in some cases.
This fixes menus being cut off in the YouTube app.
Before:

After:

Also fixes the in-game UI in Bloons TD 5.
Before:

After:

Implemented by gdkchan in #2941
Fix A1B5G5R5 texture format and Add support for the R4G4 texture format
Nintendo released a Nintendo 64 emulator rather recently on the Switch for NSO users. The emulated games were not working as they require the JIT service (which is not implemented), but in December we started working on the changes required to get it up and running, making Ryujinx the first-ever Nintendo Switch emulator to be able to boot and run this official Nintendo 64 emulator. While not complete yet, there is a PR open if you want to give it a try (here). What's more, running it also revealed a few graphical issues. This is one of them.
First, the A1B5G5R5 format was incorrect on the OpenGL backend, which caused the textures to have the wrong colors. A pretty easy fix, we just had to change the OpenGL format and invert the texture swizzle.
Before:

After:

It also had another issue caused by a missing texture format.
Before:

As you can see, the buttons are not being rendered on the HUD. They use the R4G4 texture format, which was not implemented before. It is very similar to the much older L4A4 texture format (4-bits of luminance and 4-bits of alpha). However, while this format was once supported by OpenGL, it has since been deprecated so we can't use it anymore. So instead, we need to do conversion on the CPU to a compatible format. Vulkan does support the format, so no conversion will be required once the change makes its way to the Vulkan branch too.
After:

With those fixes, the game is now rendered properly.
One can see why those textures are using this format. They only have a single color, in addition to transparency, so the format with one color channel and one for transparency is just ideal. And it being 4-bits is probably a choice made due to the memory limitations of the Nintendo 64.
Please note that this emulator has its own emulation issues that happen on the Switch as well, and therefore those issues will also happen on Ryujinx. So if it doesn't look like the game running on a real Nintendo 64, it might be a NSO emulator issue rather than Ryujinx.
Implemented by gdkchan in #2955 and #2956
Force crop when presentation cached texture size mismatches
Before, the presentation texture size was used to find a matching texture on the cache, but after that, it was not used anymore, instead, it used the cached texture size. The problem is that due to size alignment, the cached texture might be actually larger than the presentation size, leading to gaps when the texture is presented. The fix is relatively simple, we simply crop the texture based on the presentation size before showing it on the screen.
This solves alignment issues the Nintendo Switch Online Nintendo 64 emulator, Super Mario Sunshine. Hades and maybe a few other Vulkan games had.
Before:



After:



Implemented by gdkchan in #2957
HLE/Kernel/CPU
kernel: Improve GetInfo readability and update to 13.0.0
This one is mostly refactoring. It does not have any effect on games, but makes the code easier to read and makes it up to date with the changes on the latest version of the official kernel.
Implemented by Thog in #2900
Implement UHADD8 instruction
This implements a missing 32-bit CPU instruction required by a few games. This specific instruction is used by No More Heroes and No More Heroes 2. According to our testers, the game is not yet playable, but can now boot further with this implementation.
Implemented by piyachetk in #2908
Implement CSDB instruction
This is a 32-bit instruction required by the recently released Monster Rancher games. On the Switch CPU, it does nothing since the CPU is quite old and this instruction was not yet supported there, so the implementation was very simple.
Both games are now playable.

Implemented by gdkchan in #2927
Update to LibHac v0.14.3
LibHac is a .NET library that reimplements some parts of the Nintendo Switch operating system, also known as Horizon OS. Ryujinx uses Libhac for its file system. This updates LibHac dependency to version 0.14.3 which brings many improvements to Ryujinx’s file system. It makes the emulator all the more accurate while also allowing some games to boot that didn’t before.
Most notably, this update adds support for NCAs with sparse partitions and fixes an issue related to games that do not contain an NCA data partition (this one was actually a Ryujinx issue, not a LibHac issue). Both of those allowed some games to work for the first time.
As an example, we have Ruined King: A League of Legends Story working.

(The red glitch is caused by resolution scaling).
Another game using the sparse storage is Lost in Random, which also now works thanks to this update.

As an example of game without a data partition, we have Fire Emblem Shadow Dragon and the Blade of Light, which is an emulated NES game with the ROM embedded on the executable, which is why it has no data partition.

It was technically possible to run the title before, by first unpacking it and loading as an unpacked game (which was the standard way of running games in the early days, before Ryujinx had support for XCI and NSP). Now it works properly without unpacking, like the other games.
Implemented by Thealexbarney in #2925
Remove PortRemoteClosed warning
The emulator automatically logs some result codes returned by the kernel as a warning, because some of them might indicate an error. Some of those results codes are returned under normal operation however, instead of being actual errors. We already filter some of those results to not print them as a warning, but the PortRemoteClosed was missing from that list. We have now added it too, which removes the warning. We had a few users ask what was wrong because they were seeing the warning multiple times, so removing it also solves this issue.
Implemented by gdkchan in #2928
Fix bug causing an audio buffer to be enqueued more than once
This was a small oversight that would cause an audio buffer to be enqueued more than once. It was caused by a variable not being incremented, which would cause the same buffer to be picked multiple times. In addition to making the same audio data play more than once, it was also messing up buffer release, which would cause the backend to become starved as not enough audio data was coming in, causing terrible audio crackling in some specific cases.
This was affecting the YouTube app, although some games are probably affected as well. With it fixed, the audio now plays perfectly on this app.
Note that this bug affected all audio backends.
Implemented by gdkchan in #2940
Fix GetAddrInfoWithOptions and some sockets issues
This is one of the functions related to DNS resolution. This specific function can be used to get an IP address from a host name, which is later used to connect with the servers.
While the function was already implemented, one of the variants of the function was not correct, as it was writing the result values at the wrong location. The end result was that applications calling the function would return an "empty" result, as if the resolution produced no IP addresses.
The fix was just re-arranging the fields to have them written at the correct location.
This allowed the Switch YouTube app to work for the first time on a emulator. As we have shown before, it had a number of graphical issues, but they all have been fixed and the app works just like it would on a real console.
And just in case you're wondering, 360° videos and live streams are also working.

The YouTube app is one of the easiest among the ones that requires Internet access to get working on an emulator, because it does not use the SSL service (instead it uses some SSL library internally), and also because it does not require the Web Applet. Other apps that we have tried make use of both the SSL service and Web Applets (both are not currently implemented right now).
Most games that have some online capability require a Nintendo account to function. Since we don't have one on the emulator, they don't get very far even with those services implemented.
All that said, it's still nice to see this app working, and a great milestone for Ryujinx to have network functionality working in this capacity. We also have a SSL implementation in the works and will share more about it on the next progress report.
Note: A new "Enable guest Internet access" option was added in the settings (system tab). The YouTube app only works with this option enabled. When enabled, it indicates to the game/app that there is a network connection available. Some applications will assume that there is no network connection if it is disabled.
Implemented by gdkchan in #2936
Use minimum stream sample count on SDL2 audio backend
On titles where the buffer size was changed constantly, a lot of audio crackling could be heard when using the SDL2 audio backend. That was caused by the output audio stream being re-created every time the backend received a new buffer size that was not divisible by the old one.
Now, the stream is only re-created if the buffer length is lower than the old one. This avoids the massive slow down and audio crackling that was caused by the old approach. Affected games include Final Fantasy VII, Animal Crossing, Pokémon Sword/Shield (on videos only), the GTA Trilogy, the YouTube app when watching live streams, and likely more. With the change, the audio now plays perfectly on those titles.
Implemented by gdkchan in #2948
hid: A little cleanup
More refactoring, improves the code, but has no visible effect on games.
Implemented by Ack77 in #2950
kernel: Implement thread pinning support
This adds support for 8.x thread pinning changes and implements the SynchronizePreemptionState syscall.
Based on kernel 13.x reverse, this likely fixes a few softlocks games could have.
Implemented by Thog in #2840
am: Stub SetMediaPlaybackStateForApplication
This specific function is used by the YouTube app to make the operating system aware that there is a video playing. We have stubbed it to avoid unimplemented service related crashes with the "Ignore missing services" hack disabled.
Implemented by Ack77 in #2952
friend: Stub IsFriendListCacheAvailable and EnsureFriendListAvailable
More stubs, those functions are used by the Super Bomberman R Online game. The game now no longer crashes with ignore missing services disabled, but is still not playable, most likely due to online functionality related issues.
Implemented by Ack77 in #2949
GUI/MISC
misc: Migrate usage of RuntimeInformation to OperatingSystem
This changes how runtime information was displayed in our code base. As an example, for Windows before it showed as “if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))” but now it shows as "if (OperatingSystem.IsWindows())".
This results in much cleaner and simple code as there’s much less clutter being added.
Implemented by Thog in #2901
misc: Fix alsoft.ini being present on Linux releases
An oversight made alsoft.ini be present in the Linux releases when it shouldn’t have because it’s not supported on Linux.
Implemented by Thog in #2902
Remove usage of Mono.Posix.NETStandard across all projects, Remove unused empty Ryujinx.Audio.Backends project, Remove debug configuration and schema
Thog went hard at work with removing a lot of legacy files from our source code as they weren’t being updated and were not required anymore. Some cleanup was also done to make the code base much cleaner.
Implemented by Thog in #2920, #2919 and #2906
Using more intense lossless compression
This makes our assets smaller by using more intense lossless compression via the tool Optipng.
Implemented by Mou-ikkai in #2811
UI - Add Volume Controls + Mute Toggle (F2)
A long-time-requested feature was to be able to control the volume of a game through the emulator instead of using the user's OS’s volume settings.
With this update, users can now change the emulator's volume without needing to mess with any OS settings. It should be noted the default level is always 100% and will always reset back to the default level once you close the emulator
Implemented by saldabain in #2871
Closing words
We’d like to thank everyone who contributed in any way to this project whether it be through code contributions, testing, or by being a patron. We can’t even begin to show how encouraging it is to see everyone be excited by our hard work and everyone’s dedication to our project. This past year has been a rough ride for everyone all around, and from all of us here at Ryujinx, we once again wish you a Happy New Year and an amazing 2022!
2022-01-10 02:41:20 +0000 UTC
View Post
November was a brilliant month for Ryujinx and Nintendo game fans alike, we saw some shining GPU improvements and HLE updates and an update to our .NET version. Nintendo game fans saw the long-awaited release of Shin Megami Tensei V and also remakes of the classic DS-era Pokemon games re-released as Brilliant Diamond and Shining Pearl.
Patreon Goals
Amiibo Emulation- merged into the main build in March 2021. Some new amiibo were added this month! Check below for more details. While compatibility is now almost perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles - merged into the main build in April 2021.
Vulkan GPU Backend - still in progress, a public test build is delivered. A lot is being worked on
ARB Shaders - Goal reached in April 2021. As seen from August's progress report, preliminary work on ARB shaders has begun.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - Almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is reached: ~3-4 weeks
$2500/month - One full-time developer - Not yet met
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
Amiibo updates
Added Animal Crossing Series 5 Amiibo:


- Tom Nook
- Timmy & Tommy
- Isabelle
- Orville
- Wilbur
- Blathers
- Celeste
- Mabel
- Sable
- Label
- K.K. Slider
- C.J.
- Flick
- Daisy Mae
- Kicks
- Saharah
- Harvey
- Gulliver
- Wisp
- Lottie (Island)
- Niko
- Wardell
- Tom Nook (Coat)
- Isabelle (Sweater)
- Sherb
- Megan
- Dom
- Audie
- Cyd
- Judy
- Raymond
- Reneigh
- Sasha
- Ione
- Tiansheng
- Shino
- Marlo
- Petri
- Cephalobot
- Quinn
- Chabwick
- Zoe
- Ace
- Rio
- Frett
- Azalea
- Roswell
- Faith
GPU
Clamp number of mipmap levels to avoid API errors due to invalid textures
Some mods for games use an invalid mipmap level count and that causes the textures to be solid black as the texture initialization will fail. The approach to fix this is fairly simple, Ryujinx now clamps the amount of levels to avoid OpenGL errors on textures with an invalid number of mipmap levels. While this technically means the mod is incorrect, doing so more closely matches the hardware behaviour, as the GPU does not perform validation of those parameter, and would not fail to sample from such texture.
Implemented by gdkchan in #2808
Implement DrawTexture functionality
A few games on the Nintendo Switch that use OpenGL or NVN (the Switch's proprietary graphics API) uses the DrawTexture command. It can be used to draw a texture on the current render target without needing to use shaders, or having any geometry.
This fixes Steel Assault not rendering:


Also fixes Final Fantasy VII:


Implemented by gdkchan in #2747
Fix InvocationInfo on geometry shader and bindless default integer const
The generated geometry shaders were trying to access the gl_PatchPrimitivesIn built-in variable, which is invalid according to the specification. This is because the invocation info system register is accessed on those shaders, however this built-in can only be accessed on tessellation evaluation shaders. This also contains a fix for another issue that was found while testing this, bindless images with unknown handle source uses a 0 constant right now. However, it was using a float constant for images with integer formats, which is incorrect. Change it to use the proper integer type, this fixes some shader compilation errors. The first issue apparently doesn't affect NVIDIA, as apparently their compiler doesn't complain on OpenGL and it "just works". On AMD and Intel, this fixes a regression that broke geometry shaders; they caused some UE4 games to only show a black screen there because those games wrote a 3D LUT texture from a draw that uses geometry shaders (to specify the slice).
Implemented by gdkchan in #2822
Fix bindless/global memory elimination with inverted predicates
This fixes bindless texture elimination with inverted predicates (like @!P0). Before, it was working with the assumption that the branch would always enter the predicated block if the condition is true, but this is not exactly the case. What the branch does is skip the predicated instruction if the condition is false. So for non-inverted predicates, it will skip it if the condition is false, and for inverted ones, it will skip it if the condition is true (as if it is false, then the instruction is supposed to execute). In both cases, the branch not taken is where it enters the block, so in both cases, the Next block is the one that should match the Phi incoming block.
This fixes lighting issues on Disaster Report 4, and maybe other games. Should also allow global memory SSBO replacement to work in more cases.
Before:

After:

Implemented by gdkchan in #2826
Support shader gl_Color, gl_SecondaryColor and gl_TexCoord built-ins
A few games on Nintendo Switch that use the OpenGL API use shader gl_Color, gl_SecondaryColor and gl_TexCoord built-ins. Nowadays it's pretty much useless since you can just write a fragment shader and do whatever you want from there. Those built-ins are now implemented; it was mostly used for fixed-function functionality before fragment shaders were a thing. One could set gl_Color and gl_SecondaryColor from a vertex shader to set the parameters used for the fixed-function lighting. This is sort of similar to when we added Legacy Attributes gl_Color and gl_TexCoord but instead we don’t make use of any of the compatibility profiles. The key difference here is It passes the values on regular user attributes that are otherwise not used on the next stage. That means it should work on OpenGL without needing a compatibility profile, and it should also work on Vulkan.
This fixes black screen on a few OpenGL games, such as:
rRootage Reloaded:

This one only uses gl_Color as the game doesn't use textures at all.
Dragon Quest III (and possibly the older ones too?):



This one uses both gl_Color and gl_TexCoord.
Implemented by gdkchan in #2817
Limit Custom Anisotropic Filtering to mipmapped textures with many levels
Anisotropic filtering is where it improves texture quality at oblique angles, you can see this very clearly in games like Fire Emblem Three Houses or the Pokémon sword and shield games.
While for many games this worked and improved the games low quality textures greatly, sometimes it would cause some severe rendering issues. This was because anisotropic filtering is enabled on something that it shouldn't be, such as a post process filter or some data texture. Instead of relying on a system like this, now anisotropic filtering on Ryujinx maintains two host samplers when custom AF is enabled, and only uses the forced AF one when the texture is 2d and fully mipmapped (goes down to 1x1). This is because game textures are the ideal target for this filtering, and they are typically fully mipmapped, unlike things like screen render targets which usually have 1 or just a few levels. This also only enables AF on mipmapped samplers where the filtering is bilinear or trilinear.
Astral Chain
Before (16x Handheld):

1x Handheld:

16x Handheld:

Along with this great improvement, it’s now possible to change Anisotropic Filtering at runtime, and you can immediately see the changes. All samplers are flushed from the cache if the setting changes, causing them to be recreated with the new custom AF value. This now brings it in line with our resolution scale. Test it to your heart's content!
Implemented by riperiperi in #2832
Fix shader integer from/to double conversion
On the Nintendo Switch, some games use the I2F or F2I shader instructions to convert between double and integer types. If the game tried to do this on Ryujinx, it would report back this error Invalid reinterpret_cast from "F32" to "F64". This was happening because those float conversion instructions have a 32-bit float type, while it was trying to store them as a double, which made it try to reinterpret the value. It was fixed by adding new instructions for explicit conversion from and to double precision.
Implemented by gdkchan in #2831
Better depth range detection
This is a follow-up on a previous update to our depth range detection. It now reads the depth range from the register when it determines it can't guess it from the depth near/far + translate values. On top of that, it adds another case where it uses the register, which is when both near and far values are 0, as in this case too the assumption that TranslateZ = (Near + Far) / 2 for -1 to 1 and TranslateZ = Near for 0 to 1 no longer holds, as 0 divided by 2 is still 0, so there's no way to tell the depth range in this case using this method. This improves rendering of Bastion on OpenGL.

Implemented by gdkchan in #2754
HLE/GUI/Kernel
infra: Migrate to .NET 6
The .NET6 update was quite a nice update as it brought a lot of improvements internally and users should notice a small performance increase with this update.
Implemented by Thog in #2829
kernel: Add support for CFI
Nintendo Switch firmware 11.0.0 introduced basic support for the CFI value being passed in X18. Ryujinx does not implement any random generator in the kernel at the moment as it is unnecessary. As such the KSystemControl.GenerateRandom function is stubbed.
Implemented by Thog in #2839
kernel: Fix sleep timing accuracy
This corrects some mistakes in our previous implementation. The inaccuracies were caught because of Thog’s reversing of kernel 13.x and comparing it with Ryujinx’s, WaitAndCheckScheduledObjects timing accuracy was also improved.
- Greatly improves loading speeds in BOTW 1.0.0. Before they were very long and inconsistent, now they are short (3-5x faster) and consistent.
- Hatsune Miku: Project DIVA Megamix seems to no longer crash for the first time on boot
- Fixes several issues with Hyrule Warriors Definitive Edition where timings would exhibit odd behaviour.
- Users should also note a small performance improvement in quite a few games.
Implemented by Thog in #2828
account/ns: Implement 13.0.0+ service calls
The new Animal Crossing New Horizons 2.0 update was something fans of the series have been craving for. Alongside the enormous content that was added, the new update introduced some new services that were implemented on the switch in firmware version 13.0.0, acc:u0 InitializeApplicationInfoV2 and aoc:u NotifyMountAddOnContent/NotifyUnmountAddOnContent/CheckAddOnContentMountStatus are both needed for the update so they are fully implemented. While work was being done on these services another one was spotted that was not necessary for Ryujinx so IPurchaseEventManager PopPurchasedProductInfo is stubbed as it is used for E-shop purchases and we don’t support purchases from the Nintendo E-shop. It is needed by Dying Light. Which can now boot further.


Implemented by Ack77 in #2820
Nickname! - Init Amiibos with Profile's name!
Not too long ago we hit our patreon goal of adding amiibo support and a lot of work went into it. There are some things left unfinished within the implementation, one of which is amiibo nicknames. Before they were automatically set as no name. With this new change it now defaults to the name of the users Nintendo Switch profile that they have set within Ryujinx
Implemented by Mou-Ikkai in #2804
When waiting on CPU, do not return a timeout error from EventWait
When the nvservices EventWait function was updated to not error out when the function returned Success some new timeouts were introduced by this change where they did not exist before. This likely happened because before this change, it could return without waiting in some cases and after the update that fixed this inaccurate behaviour things such as shader compilation stutter or PPTC stutter started causing timeouts which made some games assert. CPU waits are supposed to prevent this, and it was assumed that they could never timeout since it passes an infinite timespan but some certain aspects of the code forced the timeout to 1s when it was infinite, so it could timeout. So to fix this, the nvservice is now changed to return "false" even when there is a timeout, to avoid guest asserts. This fixes some crashes users noticed in Tokyo Mirage Sessions #FE Encore.
Implemented by gdkchan in #2780
hle: Make Ryujinx.HLE project entirely safe
This follows up on a previous update making the entirety of Ryujinx.HLE project safe.
Implemented by Thog in #2789
Ensure sync points are released and event handles closed on channel close
Some games on the Nintendo switch create a lot of channels and when the game attempted to close these channels, the sync point was not being released resulting in this error (Cannot allocate a new syncpoint!)
Although it did not cause any sort of crashing (at least to our knowledge) it was best to fix the error rather than it being left in the emulator and forgotten for a while. Another issue that was solved by this change was event handles not being closed when returned by QueryEvent. This caused Legends of Mana and other games to crash with the "Out of handles!" exception. This should be fixed as well now, as events are closed when the channel is closed, or when the event is unregistered.
Implemented by gdkchan in #2812
Fix direct mouse access checkbox label
The "help hint" of Settings > Input > Direct mouse access had the wrong text, which was referring to the keyboard instead of the mouse. This corrects the typo.
Implemented by adryzz in #2827
Don't blow up everything if a DLC file is moved or renamed.
When our DLC support was first implemented, although it worked very well, it had a few big issues. Often a very common issue many users ran into is if they moved their DLC file from the path it was set in the DLC.json file that Ryujinx made once you mapped your DLC, Ryujinx wouldn’t know what to do as there was not a fallback put into place if the DLC path was modified in any way. This change makes it so, if the DLC path was modified, it just ignores the file instead of crashing the emulator or bugging out the DLC Manager window.
Implemented by Iostromb in #2867
The ARBitious work continues!









As you can see, progression on ARB is going well! Our project lead gdkchan and developer Thog are both working very hard on this making sure it works well for everyone that can use it for when it's ready to become public. There have been some massive improvements across the board from where we left off in August. The most notable one is that games are rendering correctly now. Our new shader tester greatly helped on the development of this as it helped us ensure ARB shaders were working correctly. In addition to helping us a great deal on the development of ARB it even helped us find bugs that affected GLSL shaders as well. There is still much work left to do on this but don’t worry, we will try to update everyone as much as we can on the progress of this.
ARB FAQ
Q: What is ARB?
A: ARB is a low-level shading language, created by the OpenGL Architecture Review Board, which can be characterized as an assembly language. ARB shaders are significantly faster to compile on NVIDIA drivers than the OpenGL shader language (GLSL) that we currently use.
Q: What can I expect from ARB?
A: Users should expect a significant reduction in shader stutter and overall loading times. This will improve almost every game in terms of shader stutter.
Q: Can I use ARB?
If you have an Nvidia GPU then yes! Sadly Nvidia is the only GPU vendor to make use of this shader language as AMD and Intel drivers lack support for most of the required functions to make it work for them. While this is very unfortunate for our intel and AMD GPU users, work is ongoing to improve our Vulkan API backend and improve overall user experience.
Q: When will ARB release?
A: Soon.
New code contributors November 2021
adryzz
Closing words
We are all incredibly thankful for everyone’s support towards this project so far whether it was through Patreon, reporting bugs, or code contributions. Because of all of you, we’re now able to boot so many games on their release day and have them be playable. We are truly in awe of how far this project has come, so once again thank you!
We have an active Patreon campaign with specific goals and restructured subscriber benefits/tiers, so please consider becoming a patron to help push Ryujinx forward!
2021-12-08 00:01:13 +0000 UTC
View Post
The spooky month of October brought some amazing releases like Metroid Dread, Mario Party Superstars, and Fatal Frame: Maiden of Black Water. All of which worked day one, thanks to the absolute avalanche of graphical bug fixes for all these new wonderful games and some incredible kernel improvements across the board!
Patreon Goals
Amiibo Emulation - merged into the main build in March 2021.
Some new amiibo were added this month! Check below for more details. While compatibility is now almost perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles - merged into the main build in April 2021.
Vulkan GPU Backend - still in progress, a public test build is delivered. A lot is being worked on
ARB Shaders - Goal reached in April 2021. As seen from August's progress report, preliminary work on ARB shaders has begun.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - Almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is reached: ~3-4 weeks
$2500/month - One full-time developer - Not yet met
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
So now we’re done with that let’s get started with this months progress
GPU
Rewrite shader decoding stage
This changes the way how shaders are decoded on the emulator. The new method is not only more efficient, but it is also less error-prone. The shader decoding process consists of reading a value from memory (known as an opcode) and then finding which operation must be done from the information encoded on this value. Doing that requires knowing which values correspond to which instructions, and initially, we gathered this information from Nouveau (open-source Linux driver) disassembler or from NVIDIA's disassembler called nvdisasm, as available as part of the CUDA development kit.
The main problem with the NVIDIA disassembler is that it is not open-source, so to view the instruction from its encoded value, we need to have a valid shader and pass it to the tool to view the disassembly output. This is pretty time-consuming and not very efficient to do manually, so we created a script that automatically creates shaders with several different values, passes them to the tool, and sees which instruction comes out on the disassembly. From this information, it can auto-generate tables and structures that can be used to decode shaders on the emulator.
So you might be wondering which benefits this change brought. First, now we can decode all the shader instructions that the Switch GPU supports. That means when those instructions are implemented on the emulator, we will have less work to do as the decoding part is already done. Second, a few oversights of the old decoder have been corrected. One of them was the wrong bit being read for one of the bindless textures with offset instructions, which was causing issues on some Unreal Engine 4 games like JUMP FORCE Deluxe Edition. See below for a comparison.
Before:


After:


Notice how the character skin and hair has the correct tone now. It also fixed other issues not visible on the screenshot, like "shaky" pixels.
Implemented by gdkchan in #2698.
Smaller initial size for BufferModifiedRangeList & directly inherit backing array
This fixed a potential regression with the old range list changes, where the cost for creating new ones would be rather large due to creating a 1024 size array. It also reduces the cost for range list inheritance by using the first existing range list as a base, rather than creating a new one then adding both lists to it. The growth size for the RangeList is now identical to its initial size. The Unmapped and SyncMethod methods have also been changed to ensure that they behave properly if the range list is set to null. This improves performance in a few games.
Implemented by riperiperi in #2663.
Relax sampler pool requirement
Before, the emulator printed an error and exited early if an attempt was made to use textures without having a sampler pool that was currently bound. That was because accessing a texture without a sampler is usually not valid. But there is one case where the sampler is not needed, which is when textures are accessed with texel fetch, as those are not filtered in any way. This was usually not a problem, because it's not common for a game to only ever use texel fetch, but it turns out the Cotton/Guardian Saturn tribute games compilation does this. This change allows the texture to be bound without a sampler, which improves rendering on this title.
Before:

After:

Better, but still has issues. The remaining issue is due to a missing shader instruction, we'll talk more about this one later.
Implemented by gdkchan in #2703.
Don't force scaling on 2D copy sources
GameMaker Studio games build texture atlases out of sprites during initialization, using the 2D copy method. These copies are done from textures loaded into memory, not rendered, so they are not scaled to begin with. The source texture are now in these copies and are set to force scaling, but really it only needs to scale if the texture already exists and was scaled by rendering or something else. This is now set to false, so it doesn't change if the texture is scaled or not. This will also avoid the destination being scaled if the source wasn't. The copy can handle mismatching scales just fine. This prevents scaling artifacts in Game Maker Studio games and likely others.
Before:

After:

Implemented by riperiperi in #2701.
Enqueue frame before signaling the frame is ready.
Link's Awakening and Xenoblade DE had their fences reached already when posting framebuffers, so the signal that a frame was ready would go out before the frame was enqueued, and the render loop would fail to dequeue anything and "skip" a frame. This resulted in their performance lowering dramatically after some loading transitions, as a frame signal would be consumed and presentation would be one frame behind. Xenoblade would seem to cap at 60% FIFO, and Link's Awakening would run at 30fps or worse. Reordering this seems to fix both.
Implemented by riperiperi in #2722.
Force index buffer update for games using Vulkan
Some games that use the Vulkan API on Nintendo Switch previously had an issue on Ryujinx for when the Vulkan draw methods were used. On games that do multiple consecutive draws with different ranges of the index buffer, the emulator was not updating the index buffer range used, which would cause the draw to not draw anything, as the draw would try to access a range of the index buffer that does not exist.
This was fixed by forcing the index buffer range to update on the draw methods used by the Vulkan API on the Switch. On its own, this change has no known visible effects, but when combined with the change below, it allows the game Hades to render correctly.
Implemented by gdkchan in #2726.
Extend bindless elimination to work with masked and shifted handles
Bindless textures were already discussed quite a bit on previous progress reports, so we won't be going into too much detail about what it is this time, and focus more on what changed.
First, Hades uses shaders that perform a bindless access, with a handle that comes from a constant buffer. Nothing out of the ordinary here and this case would be handled by the existing bindless elimination. The difference is that this time, the shader has more operations to ensure that the texture handle value is valid and in range. Also supporting this case was not difficult, we just had to extend our bindless elimination to also be able to recognize those extra operations.
This allows Hades to render, instead of being just a black screen.

Another game with bindless textures related issues was The Witcher 3. While not the same case as Hades, it was also pretty easy to handle the case that this game uses, which combines the texture and sampler handles differently. The change also allowed this game to render for the first time before it was just a black screen.


Implemented by gdkchan #2727.
Implement SHF (funnel shift) shader instruction
This implements the SHF (funnel shift) shader instruction, required by Cotton Saturn Tribute games compilation. This instruction shifts a 64-bit value composed of 2 registers and returns the upper (for the left shift) of the lower (for the right shift) half of the 64-bit result.
As we mentioned earlier, those games were not rendering correctly, even after the sampler pool fix. With this change, they now render as they should.



One interesting note about this game compilation is that it uses a Sega Saturn emulator, so we're effectively doing double emulation here.
Implemented by gdkchan in #2702.
Initial tessellation shader support
Luigi’s Mansion 3 has a sand room that wouldn’t render correctly on Ryujinx due to the emulator missing tessellation shader support. This adds support for tessellation shaders (the control and evaluation stages, also known as hull and domain), which is the only shader type that was not yet supported. Most of the work here was just adding declarations that are specific to those stages, and also improving the implementation of a few other instructions.
Luigi’s Mansion 3’s sand room now renders correctly.
Before:

After:

Implemented by gdkchan in #2534.
Workaround for NVIDIA driver 496.13 shader bug
NVIDIA's recent driver updates had caused some major graphical issues in many games. This happened because there's an issue with assigning variables with the "precise" qualifier to negated expressions on the new driver. So, doing -x does not work on the new driver, while 0.0 - x does (both are supposed to be equivalent). This will be removed once the issue is resolved on NVIDIA’s side.
This fixes a variety of issues in several games.
Before:


After:


It is worth noting that those issues only started happening on this driver version, so it is not an emulator issue or regression.
Fixed by riperiperi in #2750.
Fix shader 8-bit and 16-bit STS/STG
The emulator uses an unsigned integer buffer for the global memory that is accessed on the shaders. That means that the buffer can only be accessed 32-bits at a time. This is a problem when we need to access shorter values, like 8-bit or 16-bit values. To perform a 16-bit store, for example, we have to do a partial update of the 32-bit value and change either the lower or the higher 16-bit half. So basically, we do 3 operations: load the 32-bit value, partially modify this value inserting the new value, and then store the 32-bit value back. The problem is that on the GPU, invocations happen in parallel, so multiple invocations might be trying to modify this value at the same time, which is a problem.
To make this work, the store is performed using an atomic compare and swap operation. Atomic here means "indivisible", which means that it can be considered a single operation that does not have any intermediate result visible by other invocations. First, it loads the current value, inserts the new value into it, and then performs the compare and swap. If the value in memory is equal to the "current value" we loaded earlier, then no modification was made since we loaded the value, and we can safely just store the modified value. Otherwise, we need to start over as the memory has been modified.
This fixes the broken interior lighting in The Witcher 3 making it render much better.
Before:

After:

Notice the weird squares on the character's hair, and the woman on the bottom left is too dark.
Fixed by gdkchan in #2741.
Preserve image types for shader bindless surface instructions (.D variants)
This fixes a small oversight, where shaders could use the wrong format for bindless image accesses. There are 2 types of image access on the Switch GPU, sized or formatted. With the sized access, it simply loads a given amount of data from the image, like 32-bit or 64-bit, without caring about the format. With the formatted access, on the other hand, it loads each component to a separate register, as performs the required conversions depending on the format.
The bug affected the sized access. Since the format shouldn't matter here, the correct thing to do is assign to the image a format matching the access size. For example, for 64-bit access, it would assign a rg32ui (32-bit of red, and 32-bit of green) to the image, which is a total of 64. The oversight was that it was replacing this format with the actual image format during the bindless elimination process, which is incorrect in this case.
This was found while debugging other issues on Clubhouse Games 51, we are not sure how the bug impacted this title however, but it is worth fixing nonetheless.
Fixed by gdkchan in #2779.
Add support for fragment shader interlock
As mentioned before, GPUs work with several "invocations". On a fragment shader, for example, each one of those invocations runs in parallel and is responsible for computing the color of each pixel on the output image that is eventually presented on the screen. The high parallelism is very good for performance, as you have several operations happening at once, but it also means that there are no guarantees about the order of operations or when they will be complete.
An easier way to see the problem is with tasks. For example, let's say there is a library with a large pile of books. Those books are sorted in alphabetical order, and a group of people is asked to put them on shelves. Without further instructions, they would just place them at random, not knowing that they should be sorted in a particular way on those shelves. If you repeated the task 10 times, most likely they would be in a completely different order each one of those times. Now, if you instructed those people to place the books on the shelves in alphabetical order, they would do so, and even if the task was repeated 10 times, the result would be the same, as they would now coordinate their efforts to ensure the books are properly sorted.
The same problem can happen on the GPU. The invocations are happening in parallel, there are no guarantees about which one will finish first, or the order they will happen at. Usually, this is fine, as the order doesn't matter most of the time. But depending on the operation that is being done on the fragment shader, the order might matter. So how can you ensure that the invocations happen in correct and consistent order? The answer is fragment shader interlock. This is like telling the GPU that you want the invocations inside a given region to be ordered, much like telling the people that the books should be sorted alphabetically on the example above. It ensures that all invocations for overlapping pixels (at the same screen position) are properly ordered.
The lack of fragment shader interlock usually causes tile flickering. If you recall the previous example, the reason should be clear at this point. No coordination means the order is completely random, and the final results change each time, which causes flickering on the image.
On the NVIDIA shaders, the interlock begin and end operations are implemented using function calls to some NVIDIA-specific functions on the shader. We had to implement pattern recognition to find those functions and replace calls to them with regular calls to the interlock extension begin and end functions, as implementing it otherwise is not impossible, since those functions use hardware-specific registers that are not exposed by high-level languages such as GLSL (OpenGL Shading Language).
This fixes flickering lights on the "It's the Pits" mini-game on Super Mario Party. Other parts of the game with a similar glitch could also be affected
Before:

After:

One thing that should be noted is that the vendor support for the fragment shader interlock extension is hit or miss, with AMD being completely absent. On OpenGL, AMD does support the Intel fragment shader ordering, which does the same thing as the interlock extension, so we use it if available. Most cards do not support it though, and on Vulkan, AMD has no support for it at all. We plan to look at different methods to implement this on the drivers that don't support the extension, but doing so in a performant manner without hardware and driver support is very difficult.
Implemented by gdkchan in #2768.
CPU
Add Operand.Label support to Assembler
This improves the JIT generated code when PPTC is enabled. Before, all jumps would use a 32-bit offset when it was enabled, to make getting the relocation offsets easier, as knowing whenever the jump offset can be encoded in 8-bits requires generating the code first to be able to know the offset. The PR changes the way how this is handled, and enables using 8-bit jumps with PPTC enabled too (previously it was only used with PPTC disabled), which makes the code a little bit more compact, which again means slightly less memory usage and disk usage by PPTC caches.
Implemented by FICTURE7 in #2680.
Optimize LSRA
This optimizes the register allocator. LSRA stands for "Linear Scan Register Allocator", which is a type of register allocator commonly used in JITs because it is fast while still producing decent results. Register allocation is the process of allocating an unlimited number of variables to a fixed set of registers on a given CPU architecture. On x86, you have about 16 registers (a bit less actually, some have a fixed purpose and you can't use it as a general-purpose register), while Arm64 has about 32 (again, a bit less since you have registers like the stack pointer included which can't be used for other purposes). This process is necessary to "map" the 32 Arm registers to the 16 registers on x86.
The change makes the register allocation process faster by optimizing the allocator code, and the benefits here are faster PPTC rebuilds (as it has to recompile all the functions), as fewer stutters caused by JIT compilation (which would be present if the user has no PPTC cache or PPTC is disabled, and games that loads NRO code dynamically at runtime such as Super Smash Bros Ultimate).
Implemented by FICTURE7 in #2563.
Add an early TailMerge pass
This merges the epilogues and returns on the code generated by the CPU JIT. At every point that the function returns, it needs to generate something called "epilogue" that restores the CPU registers to the state it was before the function was called, as mandated by the ABI (Application Binary Interface). This is necessary to meet the expectations of the caller when the code returns.
The change makes the code jump to a single location with the epilogue and return, instead of generating that code on every single return point. The benefit of this is that the JIT-generated code size is smaller, so slightly lower memory usage, and slightly lower disk usage by the PPTC cache.
Implemented by FICTURE7 in #2721.
HLE
Amiibo API updates
The new Metroid Dread Amiibo (Samus and E.M.M.I) have been added into the Amiibo API, use it to your heart's content!

Fix DisplayInfo struct
This fixes a regression that would cause Dragon Ball Xenoverse 2 to no longer boot, as it would pass an invalid size of 0 to surface flinger initialization, which would later cause other failures. The error was caused by the DisplayInfo structure size being incorrect, a regression caused by the recent change to support multiple resolutions on this service, mentioned in the previous progress report.
Fixed by gdkchan in #2708.
Added support for Pixel Format X8B8G8R8
Metroid Dread’s title screen introduced a new pixel format and Ryujinx did not support this as there isn’t another game that we know of that uses this. This makes it so the new format is now supported and makes it render correctly.
Before:

After:

Implemented by C1fer in #2716.
Inline software keyboard without input pop up dialog
This adds a new inline software keyboard so that the old pop-up text window is no longer needed. Before, if you were prompted to enter characters through your keyboard a small text window would pop up. This was an annoyance if you played in full-screen mode as the game audio sped up and froze. It confused a lot of users as the pop-up only showed up in windowed mode. This new inline software keyboard makes it so you no longer need to be in windowed mode to see your typed characters.

Note that the new keyboard is only used for games using the "inline" keyboard type. For regular software keyboard launches, it still uses the pop-up window.
Implemented by Caian in #2180.
SPL: Implement IGeneralInterface GetConfig
This implements the GetConfig call of the SPL service. This is currently needed for some homebrews, which now no longer need ignore missing services to boot.
Implemented by AcK77 in #2705.
NVDEC: Adding VP8 codec support
This codec was not implemented before as very few games use it. It is a very old codec, so there is little reason to use it when more modern and efficient codecs are supported, but it turns out there are a few Switch titles out there making use of it. After implementing it, Diablo II’s intro now plays correctly, and the cutscenes on TY The Tasmanian Tiger are now properly rendered too.

Implemented by AcK77 in #2707.
HLE: Improve safety
This reduces the use of "unsafe" code, which makes the code a bit more secure and less prone to errors caused by memory corruption, due to code not doing bounds check properly or not validating input values, etc. It also fixes a bug with the way the code was reading ASCII strings from memory, as it would not stop at the null terminator if the buffer had any non-zero value after the null terminator, causing it to load strings with garbage data after the end.
Fixed by Thog in #2778.
kernel: Fix inverted condition on permission check of SetMemoryPermission syscall, Clear pages allocated with SetHeapSize, Add resource limit related syscalls, Implement SetMemoryPermission syscall, Add missing address space check in SetMemoryAttribute syscall
We saw several improvements to the HLE Kernel implementation in Ryujinx. Thog made many changes to bring Ryujinx's kernel implementation further in line with what the original OS does. This fixed some small issues in the kernel that was lurking about but haven’t been hit by any games that we're aware of. Some of the notable improvements are that SetHeapSize now clears the memory allocated for the heap, to avoid leaking information from other processes. Some syscalls used by services have been added, but games never use them. So they don't have any user-visible impact right now but make our kernel implementation more complete, and the emulator one step closer to being able to run the services from the Switch firmware (as opposed to providing an HLE implementation on the emulator).
Fixed by Thog in #2771, #2772, #2773, #2776, and #2777.
Fixup channel submit IOCTL sync point parameters
Fixes a bug where the emulator was reading the function parameters from the wrong buffer location. The bug only manifests if more than one fence is submitted to this function, which commercial games never do, so in general, it should have no user-visible effect.
Fixed by bylaws in #2774.
Add support for the Brazilian Portuguese language code
With the release of Mario Party Superstars, it became the first Nintendo game to utilize the new Brazilian Portuguese language option which was introduced back in firmware 10.1.0. With this now implemented you can now choose Brazilian Portuguese in the system languages drop-down menu in the Ryujinx GUI.

Note that if you select the Brazilian Portuguese language and move to an older version of the emulator, the configuration file will reset as the language did not exist on the previous versions and it will fail to load.
Implemented by gdkchan in #2792.
New code contributors October 2021
C1fer
Closing words
We are all incredibly thankful for everyone’s support towards this project so far whether it was through Patreon, reporting bugs, or code contributions. Because of all of you, we’re now able to boot so many games on their release day and have them be playable. We are truly in awe of how far this project has come, so once again thank you!
We have an active Patreon campaign with specific goals and restructured subscriber benefits/tiers, so please consider becoming a patron to help push Ryujinx forward!
2021-11-09 00:19:48 +0000 UTC
View Post
The month of September brought dozens of bolstering improvements including significant performance improvements, bug fixes, HLE improvements, and GPU improvements. There have also been significant improvements to something we teased a few months ago!
Patreon Goals
Amiibo Emulation - merged into the main build in March 2021.
While compatibility is now almost perfect, there are still some improvements to come for Amiibo which can be tracked on the associated Github issue here: https://github.com/Ryujinx/Ryujinx/issues/2122
Custom User Profiles - merged into the main build in April 2021.
Vulkan GPU Backend - still in progress, a public test build is delivered. A lot is being worked on.
ARB Shaders - Goal reached in April 2021. As seen from the last progress report, preliminary work on ARB shaders has begun.
ARB shaders will further reduce stuttering on the first run by improving the shader compilation speed on NVIDIA GPUs using the OpenGL API.
$2000/month - Texture Packs / Replacement Capabilities - Almost there!
This will facilitate the replacement of in-game graphics textures which enables custom texture enhancements, alternate controller button graphics, and more.
ETA once the goal is reached: ~3-4 weeks
$2500/month - One full-time developer - Not yet met
This amount of monthly donations will allow the project's founder, gdkchan, to work full-time on developing Ryujinx.
$5000/month - Additional full-time developer - Not yet met
This amount of monthly donations will allow an additional Ryujinx team developer to work full-time on the project.
So now we’re done with that let’s get started with this month's progress.
Vulkan progress
First, an update was released by AMD that made games no longer boot on Vulkan, as the device creation would just fail with an out of memory error on Windows. We believe that this is a driver bug, but we have now added a workaround on the emulator to allow it to work again with the newer drivers. The issue was caused by the "index type Uint8" extension. This extension is not supported by AMD hardware, and we do not use or request this extension on Vulkan. However, simply including the struct for this extension when creating the device causes it to fail on the newer driver, even if we do not enable it. As a workaround we have simply removed it on AMD, as the extension is not supported anyway, so it has no use.
The branch has also been rebased, so it is now more up-to-date and contains the latest improvements. We have received reports of a few games that seems to have regressed on AMD since then, and we're looking into it.
Some of the Vulkan changes are now in the main build too, which makes merging it in the future easier, and some also benefit OpenGL (such as the shader subgroup change that will be discussed later).
The plans for october is working on a shader tester. This will allow easily catching bugs on the SPIR-V implementation, that is required for Vulkan, but that is not the only benefit. It will also allow findings bugs on our shader translator/decompiler and improve the emulation in all backends. It will also help testing ARB shaders in the future, as that too is a new backend with bugs to iron out.
GPU
Fix TXQ for 3D textures
UE4 games assume the texture is 3D if the component mask contains Z. This fixes a bug in UE4 games where parts of the map had garbage pointers to lighting voxels, as the lookup 3D texture was not being initialized. The texture is supposed to be initialized by a compute shader, and the shader was failing to compile before due to this error. The most notable game to see this fix is Tony Hawk’s Pro Skater 1+2.
Before:

After:

Fixed by riperiperi in #2613.
Lift textures in the AutoDeleteCache for all modifications
Before, this would only apply to render targets and texture blit. Now it applies to image stores, the fast DMA copy path, and any other type of modification. Image store textures always have at least one reference in the texture pool, so the function of the cache keeping textures alive is not useful, but a very important function has been its use to flush textures in order of modification when they are dereferenced so that their data is not lost. This fixes lighting breaking when switching levels in UE4 games and "rainbow" textures in a few games.
Tony Hawk Pro Skater 1+2
Before:

After:

Little Nightmares II
Little Nightmares II’s broken "rainbow" textures seemed to have been fixed by this as well.

Fixed by riperiperi in #2615.
Account for negative strides on DMA copy
Some games on the Switch that uses the OpenGL API are using negative stride values. This would cause the copy to advance backwards. This is used to flip the image vertically on the copy. This new change ensures it is positive and If the stride is negative, the base offset is adjusted to the real start offset of the copy. With all of these changes, Idol Days no longer crashes if the user tries to load/save the game.
Fixed by gdkchan in #2623.
Set texture/image bindings in place rather than allocating and passing an array
Ryujinx was allocating multiple arrays per draw or compute invocation. The cost for this was small but still significant. This has been updated and now the functions used to update the texture and image bindings instead rent the bindings array for modification. This is done to set the data directly, rather than allocate or copy it into the bindings manager. They now use arrays that are pre-allocated with a default size but can be increased in size to fit shaders that bind way more textures, such as bindless accesses. One notable improvement to this change is in Super Mario Odyssey, the FIFO% has been brought down which could also mean some systems got improved performance.
Fixed by riperiperi in #2647.
Implement and use an Interval Tree for the MultiRangeList
This implements an augmented interval tree based on the existing tree dictionary and uses it for the texture lookup on the cache. This greatly speeds up texture overlap checks, as they can't use the non-overlapping fast path that buffers and tracking handles can use. Like the tree dictionary, it is based on a red-black tree and is self-balancing.
One game that was improved by this change was Mario Golf Super Rush. If you have tried to play it before on this emulator, you might have noticed that the game would take a long time to load the courses. With this change, the load times are much lower, thanks to the fast texture lookup that makes creating new textures faster. The games that benefits the most from this are the ones with a high amount of textures on the cache, as before creating new textures, it first needs to check if it already exists on the cache to avoid creating duplicate textures.
Implemented by riperiperi in #2641.
Use shader subgroup extensions if shader ballot is not supported
Despite a lot of work put into making Intel GPUs work as best as they can on Ryujinx on the OpenGL backend, it’s extremely hard to make it run perfectly especially since Intel proprietary drivers aren’t fun to deal with as they don’t support a lot of things including a lot of extensions. ARB_shader_ballot extension is not supported on Intel’s proprietary drivers but the newer subgroup extensions are supported. The two extensions are equivalent, so simply replacing the shader ballot calls with equivalent subgroup calls allows more games to render correctly, most notably Astral Chain.
Before:

After:

This also reduces the differences between the master and Vulkan branches, since the new subgroup extensions are used on SPIR-V.
Fixed by gdkchan in #2627.
Share scales array for graphics and compute
Our resolution scaler works incredibly well with many games especially since these past updates but some games still don’t scale correctly or have issues with scaling. Ni no Kuni 2 is one of these games that had graphical issues if you used resolution scaling. The issue happened because the backend is using a single array to store both fragment and compute scales, while the GPU emulation is using 2. The fix was simply sharing the same array for both compute and graphics. This fixes an issue where scales might not be properly updated on games that use compute.
Before:

After:

Fixed by gdkchan in #2653.
Fast path for Inline2Memory buffer write that skips write tracking force copy when auto-deleting a texture with dependencies
Many games write SSBOs from compute, notably the Xenoblade games which flushes buffer data on the GPU thread when trying to write compute data. The old method for this was already pretty fast, the better way of handling this is adding a method to PhysicalMemory that attempts to write all cached resources directly, so that memory tracking can be avoided. The idea is to both avoid flushing buffer data and to avoid raising the sequence number when data is written, as it causes buffer and texture handles to be re-checked and can make performance worse. Xenoblade Chronicles 2 and Xenoblade Definitive edition both net a significant performance increase from this.
Before:

After:

Implemented by riperiperi in #2624.
Only make render target 2D textures layered if needed
In some cases, games can have a bogus value written as the render target texture depth. This can cause very large 2D array textures to be created, this is not only bad for performance as it makes the system use more resources but it can cause out-of-memory (OOM) errors and potentially a few other errors. Normally the non-base layers of the texture are not accessed at all, as it will only render to a single layer. It only matters when the shader writes to the gl_Layer with a non-zero value. Doing so will modify the target layer. So to fix this issue the code has been changed to only ever use 2D arrays when one of the vertex, tessellation, or geometry shaders writes to gl_Layer. This solves an issue where The Legend of Heroes: Zero no Kiseki was crashing on boot due to a 1080p array texture with 257 layers being created which would take several GBs of memory and cause all sorts of issues.

Fixed by gdkchan in #2646.
Replace CacheResourceWrite with more general "precise" write
The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their way to do this, and it can only signal to resources using the same PhysicalMemory instance. This new method adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead, punch a hole in the modified range list to indicate that the data on GPU has been replaced. This fixes some rendering issues in Mario + Rabbids Kingdom Battle and Rune Factory 4 that were introduced with the aforementioned fast Inline2Memory buffer write change.
Implemented by riperiperi in #2684.
CPU
Use normal memory store path for DC ZVA
This is used as an optimized way to clear the memory in homebrew applications. Changing the method used to zero the memory to use the new method introduced with the "POWER" update that allows fast memory accesses can speed this up significantly aswell.
Implemented by riperiperi in #2693.
Optimize fast register allocator
This optimizes the JIT's faster register allocator, used the first time a game is played. This reduces the boot time when the game is launched with PPTC disabled, or on the first run (as there is no PPTC cache built at this point).
Implemented by FICUTRE7 in #2637.
HLE
Report 1080p resolution when in docked mode
The GetDefaultDisplayResolution service function was returning a 720p resolution even when docked. While this is technically correct, most of the benefit of enabling docked mode on the emulator is getting a higher resolution, so increasing the resolution, in this case, is more desirable.
Allows Tsukihime -A piece of blue glass moon- to render at a higher resolution when docked.
Before:

After:

You might need to load the images at full screen to see the difference.
Implemented by gdkchan in #2618.
Implement GetVaRegions on nvservices
This implements the GetVaRegions ioctl, which is used to get the ranges of the address space that the application can use. It returns two ranges, one for small pages and one for big pages. The Vulkan driver uses this to calculate the usable address space size. This fixes a crash on Quake due to VK_ERROR_OUT_OF_DEVICE_MEMORY being returned by the guest driver, caused by the fact that it assumed that the usable address space size was 0, which would fail the check for any buffer size that is greater than 0.
The game can progress further now but crashes due to Sockets issues.

Implemented by gdkchan in #2621.
HOS: Cleanup the project
This cleans up the HOS (Horizon OS) project as it has seen a tremendous amount of change. Leftovers that are not needed have been removed from the code and moves some things at the wrong places to the correct ones.
Fixed by AcK77 in #2634.
Amadeus: Update to REV10
The 13.0.0 update for the Nintendo Switch introduced Bluetooth audio but also introduced a lot of hidden changes within the OS. At the moment no games use this, but eventually, they will.
Implemented by Thog in #2654.
VI: Unify resolutions values and accurate implementation of them
This continues the work started with the change to report a 1080p resolution in docked. It makes the values and checks related to displays closer to the original hardware. Changes include AM's service GetDefaultDisplayResolution/GetDefaultDisplayResolutionChangeEvent functions getting more information on what the services do, VI:U/VI:M/VI:S GetDisplayService are now much more accurate and finally IApplicationDisplay GetRelayService, GetSystemDisplayService, GetManagerDisplayService, GetIndirectDisplayTransactionService, ListDisplays, OpenDisplay, OpenDefaultDisplay, CloseDisplay, GetDisplayResolution are now properly implemented.
Implemented by AcK77 in #2640.
IRS: Stub some service calls
This stubs some IR service calls as at the moment we do not support the IR sensor in the right Joy-con. This allows games such as Night Vision and Spy Alarm to boot and makes Doukoku Soshite playable.



It is worth noting that those games were already playable before by enabling the "Ignore missing services" hack on the settings, but this change makes the hack no longer needed, so now those games can be played out of the box.
Implemented by AcK77 in #2665.
NVDEC (H264): Use separate contexts per channel and decode frames in DTS order
When H264 support was implemented on NVDEC, it was noticed that the frames were not in the correct order. At the time, we tried to fix it but couldn’t find the root cause of the issue, so to avoid further delaying it we used a workaround where it would ignore the VIC input surface address, and instead use the address of the last NVDEC frame decoded. This approach had issues, the most noticeable one being that it can lead to the presentation of duplicate frames because if there is more than one consecutive VIC copy operation, it will copy the same frame more than once.
The result is that the H264 videos are usually not smooth, and the frame pacing is irregular. On top of the existing problems, it also has another issue when multiple videos are decoded at once. There is no guarantee that the NVDEC decode and VIC copy for a given channel will happen in order. These issues only encouraged us to dig into this problem once more. Fortunately, this time the endeavor was a bit more successful.
The problem is that FFMPEG will deliver the frames in Presentation Time Stamp (PTS) order, while NVDEC is supposed to output them in Decoder Time Stamp (DTS) order. That is, not all frames on an H264 video are decoded in the same order they are supposed to be displayed on the screen, but FFMPEG always returns them in display order, which does not match the order that NVDEC is supposed to, or that the game expects. Using a more efficient and non-hacky solution fixes several issues that the original implementation had.
H264 video playback should be smoother now, without duplicate frames, some minor issues like a few games flashing a green frame when the video starts have also been fixed, the missing field_pic_order_in_frame_present_flag has also been added to the stream PPS which fixes decoding errors on Layton's Mystery Journey, but the video is still not rendered properly due to VIC Issues.
A more notable improvement came from another change to use a separate FFMPEG context per channel. Before, all channels shared the same context. This becomes a problem when there is more than one video being decoded at once, as the context stores previously decoded frames, which are used to predict content on future frames, a technique employed by video codecs to reduce file size by not encoding the same information more than once. One of the issues of sharing the same context for different videos, is that it would cause frame data from the wrong video to be used, among other issues. To sum it up, it causes severe image corruption, which you can see below.

You might be wondering why the Hatsune Miku game is decoding multiple videos on this clip in the first place. What happens here is that most of this scene is not actually rendered by the GPU, and instead uses pre-rendered videos. The only thing there that is actually 3D is the Hatsune Miku model. You can think of it like a sandwhich, there is a video for background elements, another for foreground elements (light effects and others), and Miku is right on the middle. So what we have here is 2 videos being decoded at the same time.
With this change that uses separate contexts per channel, the issue is now gone and the clip can finally render properly.

Hatsune Miku: Project DIVA MEGA 39's is not the only game to benefit from this. No More Heroes 3 also had a similar issue, and has also been fixed with this change. It may also have improved other games, such as Just Dance that had similar issues, but we did not test this one.
Fixed by gdkchan in #2671.
CLKRST: Stub/Implement IClkrstManager and IClkrstSession calls
This stubs and implements some clkrst calls. Some are stubbed because they are used to overclock the Switch hardware and it's pointless in our case as we are emulating the system.
Implemented by AcK77 in #2692.
GUI/Misc
GUI: Replace FileChooserDialog with FileChooserNative
The UI framework we currently use (GTK) has its own file chooser dialog, but many have seen that it’s not very fun to work with when it comes to handling multiple files. It also does not match the native OS look that most are used to. This makes it so it uses the OS native file chooser dialog instead of using GTK’s.

Implemented by AcK77 in #2633.
Adjustments to framerate metric and addition of frame time
Our old frame time indicator used a weighted average which uses a decay rate of 0.5 weighting frame to frame. This causes the perceived FPS and Ryujinx's reported FPS to always feel slightly off from one another and because of this, the bottom value can feel off. The FPS monitor has been changed and now shows instantaneous FPS rather than any form of weighted average and finally, all performance metrics now update every 750ms rather than 1000ms. Frame time is usually much more valuable in determining how "smoothly" a game is running so now a frame time metric is added to make results easier to analyze.
Implemented by MutantAura in #2638.
Remove file error popup
If you've ever stored games on an external hard drive for Ryujinx and you unplug it from the PC an error would pop up every time you opened up Ryujinx as it could not locate the games in the directory. This change removes the pop-up as it became more redundant than helpful. Note that the error will still be saved in the log.
Implemented by bobhope9848 in #2547.
Update game metadata when stopping emulation
The metadata for games would not be updated if you had stopped emulation. This small change fixes this issue and metadata now properly updates if you use the stop emulation button.
Fixed by Nistenf in #2610.
Fix GTK3 mapping for single quote key
If you had tried to use the quote button as a mapped key for Ryujinx, this wouldn’t work. This was because the single quote key (') was incorrectly mapped to the GTK key quotedbl.
Fixed by Nistenf in #2612.
Implement a "Pause Emulation" option & hotkey
A long-time-requested feature was the ability to be able to pause emulation. This can be useful if the game doesn’t allow you to pause at a certain moment. It can be toggled by hitting F5.
Implemented by mpnico in #2428.
Quick README update for game compatibility
This updates our README file to show the new total amount of games being playable going from 2100 in May to 2400 in September.
Implemented by Mou-Ikkai in #2694.
Add Linux Unicorn patch + desc.
This adds some info on compiling Unicorn with the necessary patch on Linux. Note that we do not use or ship Unicorn for CPU emulation. It is only used to unit test our own CPU emulator.
Implemented by mgielda in #2609.
But wait there’s something in the distance there!

No, your eyes are not deceiving you. Work for getting more applets to work, such as the player select applet above, is ongoing. This is a very complicated thing to do as many things need to be implemented for it to be functional, stay tuned for more news later!
Closing words
As always we would like to thank everyone who has contributed to the emulator so far whether it was through Patreon, reporting bugs, or code contributions. You all have made this project what it is today.
New code contributors September 2021:
bobhope9848
mgielda
Nistenf
MutantAura
We have an active Patreon campaign with specific goals and restructured subscriber benefits/tiers, so please consider becoming a patron to help push Ryujinx forward!
2021-10-10 00:11:30 +0000 UTC
View Post