XaiJu
JOTEGO
JOTEGO

patreon


DMA in CPS

Welcome back to one technical article. This time I'm going to talk about DMA (Direct Memory Access) in the CPS system. This aspect is completely ignored in emulators and has a large effect in the effective processor performance because if the DMA is active the CPU is stopped. The result is that emulators may be running the CPU of CPS games at some ~18% faster speed than the actual one. This article is very technical but I know some of you enjoy these reads and it also help me organize my mind.
Why DMA? When a large chunk of data is to be moved from the memory to a device there are two ways of doing it: either the CPU reads from memory and writes to the device or there is a special hardware device that performs that operation in a faster way. Most 8 and 16-bit computers didn't have DMA devices because of their cost but those are rather common in arcade systems. CPS is not an exception.
CPS uses the DMA for (at least) four different purposes:
1. Copy the sprite table data
2. Copy the palette of colours
3. Read the row offset for the SCR2 (scroll 2) layer
4. Cache the tile code data for each scroll layer

Items 1 and 2 are likely to be triggered or enabled by the CPU. Item 3 might be enabled conditionally and item 4 is permanent although its length may be variable. Note that while the DMA is accessing the memory the CPU is halted (again, emulators ignore this). Halting the CPU is not good. CPU time is precious so you want to minimize that.
How this works exactly, at this stage, can only be inferred from PCB measurements. There are some efforts going on to decap the CPS-A chip and derive the logic circuit from die shots. But that is not available yet.
So from PCB measurements, this is what can be gathered:
Once per line, there is a short 2us DMA period in which three different pieces of data are read. These data seem to be: the SCR2 row offset (1 read), four words from the sprites tables (4 reads) and one additional word which I have not identified (it could be a dummy cycle). The sprite table length needs 1024 reads, which means it needs 256 lines to complete.
Every eight lines there also are 98 reads which correspond to filling an internal cache for SCR1 tile data (code and attribute values, from which graphics data will be read later on).
Every sixteen lines there also are 96 reads for SCR2 data and 24 reads for SCR3 data. Note that SCR1 is 8x8 tiles and SCR3 data is 32x32 tiles, so 98 reads for SCR1 is equivalent to 24 reads for SCR3 in terms of how many pixels can be rendered with the information. Following the same logic, for SCR2 there should only be 48 reads. But there are 96, so twice the needed value. The reason is that SCR2 scroll position may vary from row to row (like in the floor of Street Fighter 2) thus the cache needs to cover a wider area. Nonetheless, 96 reads are not enough to cover the whole spread of the horizontal tile map, which means that there might be a limitation in how much row scroll values can vary within a 16 line range.
Finally, the palette is the simplest one. There are 6 blocks of palette information (one for each graphic element: SCR1, SCR2, SCR3, OBJ, STARS1 and STARS2). The DMA job is not split in this occasion and the CPU is locked for a whopping length of 782us. This corresponds to 3072+60 reads at 4MHz. The extra 60 might be dummy cycles needed by the DMA logic. During the palette transfer the SCR/OBJ DMA seems disabled.
If you count everything, this amounts for 2,959us per frame, i.e. 17.6% of the frame. During that time the CPU is halted and cannot operate. Emulators do not take this into account and effectively will run the CPU that much faster.
The core betas initially had no DMA, and at some point I introduced DMA for handling OBJ (sprite) and palette data but at a faster pace than the original. I am going to modify the core to introduce the correct DMA intervals. This may result in slowdowns, which as far as I can tell from all the data I have, will be the same as the original hardware.
Stay tuned...

DMA in CPS

Comments

This is especially apparent in sf2 hyper fighting, i have the original cps1 board of it and when playing on mame that runs insanely fast compared to it so I’ve always avoided the emulated version....after reading your post it’s nice to know it’s def a real thing....

John Yates

It was actually a present from a patron. It is serving me a lot!

JOTEGO

aaahh, A Rigol DS1054Z scope. I just sold mine and bought a Keysight DSOX2024. Just one example here where an affordable scope is used.

blacklistedcard


More Creators