Sins of the PS2: Knockout Kings 2001
Added 2020-10-02 07:20:28 +0000 UTCIt's not often you get a game that disables an interrupt and expects it to remain enabled.
Greetings, dear reader. Sins of the PS2 is about all those games with crippling bugs that just happen to work on a real PS2 but fail horribly on emulators. Today's article is a real doozy, involving a rather decent boxing game called Knockout Kings 2001. On an emulator, the game softlocks on the intro screen: music still plays, but it is impossible to progress further. Without a memory card inserted, the game can hang even earlier.

Strangely, this game used to work in Dobie thanks to yet another bug involving timers. When this bug was fixed, the game stopped working just like in PCSX2. When a game doesn't work on any emulator, it's always thanks to something bizarre, and I wanted to figure out what...
Interrupt Failure
When a game hangs, a common symptom is the "BIFCO" loop, an idle thread in the EE kernel located at memory address 0x81FC0. The idle thread is executed when all other threads are asleep, and all it does is infinitely branch to itself. This means that when the idle thread is active, the only way for another thread to wake up is from an interrupt, and when the correct interrupt never arrives, it's a hang.
I immediately suspected KK2001's BIFCO hang had something to do with the Input/Output Processor (IOP), a CPU responsible for handling sound, controller input, CD/DVD drive access, and other peripherals. The EE and IOP use a "remote procedure call" interface, where the EE calls a function that does some magic to transfer the parameters to the IOP. The IOP processes this function and returns a result. Here, the EE was trying to send an RPC request to the IOP, but the IOP never processed it as SIF1 interrupts were mysteriously turned off.
The Subsystem Interface (SIF) is how the EE and IOP communicate with each other. On the hardware level, it consists of two DMA channels - SIF0 (IOP->EE) and SIF1 (EE->IOP). The IOP kernel relies on SIF1 interrupts to know when the EE makes a request, so if the interrupt doesn't fire, it's as if the EE never said anything as far as the IOP is concerned. Clearly, SIF interrupts are critical for the system to function, so if they're being disabled, something very wrong is happening...
Wrong Function?
After some investigation, I found that a custom IOP module, EZMIDI, was responsible for disabling SIF interrupts.

The intent of this code is to sleep for a quarter of a frame, set up a critical section, copy data from a DMA buffer into another buffer, and then "tick" a sound engine responsible for playing MIDI notes. This critical section is meant to disable SIF interrupts so that the code can copy the buffer without risking it being modified by an interrupt. However... it uses the wrong function to re-enable the interrupt. CpuResumeIntr touches a different interrupt register that controls whether any interrupt can happen or not (EnableIntr would have been the correct function). This means that, once this code executes, SIF interrupts will never be re-enabled...
I patched the game on PCSX2 to never send data to EZMIDI. This got past the title screen hang, at the cost of certain sounds being missing. I concluded that the above code must execute on a real PS2, but that leaves us in a weird situation. Disabling SIF interrupts would also cause real hardware to hang. Something strange must be happening...
I came up with a crazy idea. What if, despite DisableIntr being called, SIF interrupts never actually got disabled? What if there was some hardware mechanism that always made the interrupt occur when necessary?
Hardware Tests!
I scoured the notes written on the IOP DMA controller by someone else. The documents said that certain DMA channels, namely SIF0 and SIF1, had two interrupt sources, not just one! The first one triggered an interrupt request when the DMA transfer completed, which was common to all channels. The second one, unique to SIF0 and SIF1, could also occur if a special bit in the transferred data was set. The notes implied that the latter could occur even when the first interrupt source was disabled, and DisableIntr, a kernel function, only disabled the first. Since the SIF drivers always enabled that "interrupt bit" in the data, this would mean that SIF interrupts could always occur. This lined up nicely with my crazy idea, but I needed to confirm it on real hardware.
tadanokojin wrote a simple program to test this theory. It first loaded a custom IOP module that just called DisableIntr and nothing more, then it loaded a different module, and finally it displayed something to the screen. If I were correct, emulators would hang trying to load the second module (since it required communication with the IOP), but a real PS2 would execute the whole program.
The result? I was right! Emulators hung as predicted, and a real PS2 worked.
While I have yet to implement the proper behavior in Dobie, allowing SIF interrupts to always execute should fix the game on both Dobie and PCSX2. What seemed like a bizarre bug is a simple fix, thankfully.
Closing
This is one of the strangest bugs I've seen in a PS2 game. Not only was such an obviously wrong function used, but also thanks to a hardware quirk, the intended behavior (temporarily disabling SIF1 interrupts) doesn't even happen. A lesson to all gamedevs: make sure you read your SDK docs thoroughly!
This also speaks to the importance of good hardware tests. While I could have gone off the documentation's word, it was safer to confirm if a real PS2 actually behaves in the way described. An emulator which strives to improve its accuracy and compatibility should do hardware tests whenever strange behavior is afoot, as making assumptions can break more games than it fixes.