XaiJu
obsproject
obsproject

patreon


What happened with 27.2: The tale of a legendary hotfix

That was a crazy week. Let’s talk about what happened with 27.2 and what we happened this past week.

With the Windows version of 27.2, we updated all of our dependencies. A dependency is a library made from external source code; something which is not OBS source code, but that OBS depends upon for major functionality and features. Sometimes this can be a feature such as software H.264 encoding, which relies on the x264 encoder library, or a feature such as the browser source, which relies on a much bigger dependency: Chromium (the browser engine that powers Google Chrome). More specifically, the browser source utilizes the Chromium Embedded Framework (CEF) to render a webpage as a source, or to render a webpage as a panel inside of OBS.

On Windows, before OBS 27.2, our browsers were stuck on Chromium version 75 because we had to use a complex custom Chromium patch to be able to use it with reasonable performance, and that patch was incompatible with newer versions. Chromium 75 was almost three years old as of 27.2, and many important features and changes have been added to Chromium since then; features and changes which are essential to modern production components such as stream overlays and advanced production displays. It was getting very outdated, so needless to say, it was a high priority for us to update Chromium as a dependency, and thanks to the effort of OBS Project members Pat, Dillon, Matt, pkv, as well as some wonderful people who contribute to the Chromium Embedded Framework, we were finally able to update it to version 95.

However, updating dependencies, especially dependencies that large, can pose some challenges: during the 27.2 release, we started to get sporadic reports of people’s entire computers freezing, requiring a full system reboot. We immediately began investigating, and at first I slowed, then eventually reverted the release of 27.2 as more reports came in confirming the issue. On the first day after 27.2, we were frantically trying to find affected users who would have the patience to allow us to examine and understand what was happening. Fortunately, we found some very kind users who were able to force a bluescreen during the system freeze and generate debug dump files of their system kernels. This allowed us to get the first hint of what was happening: it was very likely a bug in the graphics driver. We noticed it was centered around graphics operations, and it was only happening with one graphics card manufacturer. Fortunately, we have contacts with all the major graphics card manufacturers, so we immediately got in contact with them to file a driver bug report.

Being that we were in no position to expect a graphics card manufacturer to debug and fix a suspected driver bug in a timely manner, let alone expect users to update to those drivers within any reasonable time frame after their release, we had no choice: either find a workaround soon, or revert Chromium back to version 75. Considering so much effort in this update was spent updating Chromium to improve stream production features for users, reverting Chromium back to 75 was not the option I wanted to take. I had to do something, and soon, as Twitch was deprecating their v5 API in two weeks, meaning that old versions of OBS would no longer be able to use the “Connect Account” feature with Twitch.

R1CH, a contributor to OBS, had figured out a way to reproduce the system freeze: add a bunch of very active browser sources, and run OBS in 1300/1 fractional framerate (i.e. 1300 frames per second). This did the trick, and we were now able to reproduce the issue ourselves and debug it a bit more easily. While debugging the bluescreen kernel dumps, I immediately suspected what the problem was likely triggered by: the IDXGIKeyedMutex API. This API is used to lock and synchronize shared graphics memory between two different processes or threads on the system; our latest Chromium update had been modified to use it, a big change from how Chromium 75 functioned. Being that I am incredibly stubborn, and being that I already hated that API, around two or three days after 27.2 was released, I decided that I had to find out whether or not that was the trigger. For the next day or so after that, I modified and compiled Chromium to remove IDXGIKeyedMutex almost everywhere I could see it. Because Chromium is such a monumentally large project involving tens of millions of lines of code, and because it uses so many layers of abstraction and interprocess communication, I had doubts I would be able to accomplish it; some of our contributors suggested we should just let it go and revert back to Chromium 75. But being incredibly stubborn, after a few days of learning how Chromium works internally, I managed to remove almost all usage of keyed mutexes.

And I couldn’t believe it: it solved the system freeze. It did the trick! I felt like Luke Skywalker in the Death Star’s garbage compactor right after it was deactivated.

However, it introduced a new issue: although it solved the system freeze issue, removing synchronization inevitably caused frame stuttering and frame pacing issues when rendering the browser source. It was noticeable enough that I knew that my job wasn’t finished. For the next sleepless day or two after that, in an attempt to solve this issue, I went to work trying to figure out some other way to synchronize textures shared between Chromium and OBS. Coupled with the fact that Chromium code is so incredibly abstract and relies on so many different separate independent interprocess parts working together, it made the task incredibly difficult. After a day or two of no success, another contributor had suggested that I just let it go, and that it was good enough as-is, and that we had the v5 API deadline. I originally conceded, and let it go, but that same night while I was in the shower, I had an idea of how to solve it!

After that shower, I told the other contributors that I’m going to try one last thing that night, and that if I couldn’t do it before the night was over, I’d give up. In an all-or-nothing last-ditch effort, I spent the rest of the night reprogramming a couple key parts of Chromium to share a single texture with OBS, which would be automatically updated by a simple copy operation from the backbuffer texture: the same exact way that the patch for version 75 accomplished it.

My effort proved fruitful, and not only did it fix the frame stuttering issue, but it also vastly improved performance of the Chromium 95 build. Not only did we fix the system freeze and fix frame stuttering, we also greatly improved browser source performance!

Words can’t describe how good it felt. My elation went from simply getting out of the garbage compactor on the Death Star to blowing up the entire Death Star in one fell swoop. We tested it with everyone we could: everyone confirmed that all of their issues were fixed, and that the browser source was performing better than ever. After an entire week of sleepless toil, we’re now here with 27.2.1, a legendary hotfix. It was the worst week of my life that somehow turned into the best week of my life.

I want to make a big shout out to the very patient users who purposely crashed their PCs to get us kernel dumps, a big shout out to R1CH for figuring out a way to reproduce the system freeze reliably, and a big shoutout to Matt, pkv, Flaeri, Shaolin, RytoEX, Ace, and everyone else who spent time testing different builds on different systems. Thank you all. Without all of our wonderful contributors working together, none of this would have been possible.

What an incredibly crazy week. I can finally get some sleep again.

Comments

Why do we always get our best ideas in the shower?

Andy Marsh

Epic!!! I loved reading about your process in solving it, and thanks for your perseverance in updating to 95!

DSri Seah

I don’t know what half the stuff you said means, but hot damn am I glad you persisted and rocked it out!

Thanks a lot!

More specifically, this is the Chromium Embedded Framework that we're modifying, which is basically a modification to Chromium itself. Although most of the code I modified was in Chromium via a patch file that Chromium Embedded Framework applies to Chromium when compiling. Fundamentally, Chromium itself is probably only interested in one thing: Chrome. All else is likely secondary and they likely care very little if anyone other than Chrome and Google depends on their code. So that makes things very difficult to maintain for anyone who depends on Chromium, such as the Chromium Embedded Framework. Because of that, the texture sharing hardware acceleration patch is something that has caused Marshall grief in the older revisions because the patch is arguably the biggest patch they had to maintain, and it was too much of a pain to deal with, so it fell to the CEF community to make and maintain such a patch for themselves. The patches for 3440 and 3770 were the last patches that Marshall was willing to maintain, and after Chromium rewrote their graphics subsystem (for the N'th time) he had to drop it, as it was too complicated to continue supporting. Eventually, years later, a group of people tried submitting a pull request again around the time of 4638. However, it was again a complicated patch, and it depended on the Viz rendering subsystem, which, as expected, was deprecated by Chromium, so the patch was basically built for a subsystem that was already obsolete by the time it was finished, thus I'm pretty sure Marshall couldn't exactly just merge it on the verge of the entire subsystem being replaced. Again. So the patch we used was based off of a (basically rejected) pull request submission made to CEF by Isaac Richards of NVIDIA, but then we encountered this system freeze bug with it, thus I then had to rework everything just to work around the system freeze. (Isaac didn't actually do anything wrong. The system freeze issue just complicated everything, forcing a complete rework) And next time we upgrade, Viz will probably be gone, and I'm probably going to have to suffer this whole process yet again. I hate Chromium so much.

Lain

Excellent job and excellent story! Maintaining a fork of Chromium sounds like a huge pile of work. Are you planning to try and upstream any of these changes?

A

Great read! It's been years since I excitedly worked through the night to code bitchy logic but I know that feeling, when the machine is finally stable. It is its own reward. Until the next one and thanks for a a great system.

Wow congrats! I understood literally 0.1% of the whole thing but it was very interesting to read and I can tell you are an absolute mad lad for being able to fix it. Shout out to the whole crew who helped, keep up the great work! :-)

Thank you all so much for your hard work on making OBS as awesome as it is.

Brian Rubin (Space Game Junkie)

There is only one way to handle this Jim... and is to right now subscrite as a Patreon myself and help fund the effort... xxdd God Job (no typo)...

Good job. Thank you so much for all your work.

Dalerija

I'm just a stranger but I'm proud of you buddy! Keep being a homie!

flipnCrazy559

Wow. Well done, Jim!!!

Andi Vax

Good job, all of you! Your hard work is very appreciated

Simon Vacker


More Creators