pavelsevecek

Fluid simulation on the GPU

Added 2025-04-12 16:00:50 +0000 UTC

One of the most requested features is an option to run the simulation on the GPU. It's certainly a good idea. While there are a lot of obstacles to overcome and a number issues that come with GPU programming, the benefits can be clearly seen in the video above.

I want to add a basic GPU solver in version 0.8. The initial implementation should be able to do the following:

Basic IISPH solver (=collisions between particles)
Self-gravitation, optimized using the Barnes-Hut tree
Collisional heating
Gravitational interaction and collisions with rigid (non-deformable) objects
Massless particles (rings, jet particles, etc.) affected by gravity (but likely without collisions)
Option to select and track objects
Saving simulation data to history and the option to replay it and render a video

Pretty much every other simulation system is in a "maybe" pile at the moment. Some systems can be easily added in the future (i.e. phase transitions, heat conduction), but some systems will be difficult to add due to the limitations of GPU programming (i.e. cohesion).

Object trajectories might need a bit of work.

Here are some of the pros and cons of using the GPU instead of the CPU.

The good

The main advantage of the GPU is the speed. If done right, moving a task to a GPU can significantly improve the performance of a program. It's difficult to say what the performance gain will be in general, as it depends on your hardware (since the simulations run on a different computing device, we are comparing apples and oranges), the specifics of the simulation, the number of particles, etc. So far, the reported speed-ups range from 5x to 10x, but this can still change. There are a number of potential optimizations I could do to speed it up even further.

GPU simulation actually makes some things simpler compared to its CPU counterpart. For instance, there is no need to send the particle data to the GPU every time step to update the rendered simulation state - the data is already there. This makes the synchronization between the simulation and the renderer much easier and faster.

The bad

GPU programming is generally more difficult than writing C++ code, mainly due to the lack of tools for development, profiling and debugging. I chose to use OpenGL for the simulation because it's the graphics API I already use for rendering and it is supported by all graphics cards, even the integrated ones (unlike CUDA which is only supported by NVIDIA GPUs). Although OpenGL allows to run arbitrary tasks on the GPU using compute shaders, moving the simulation to the GPU is challenging, especially when there are complex dependencies between the computing tasks, or when the number of performed tasks or required memory is not known in advance. For this reason, a lot of algorithms used by the CPU simulation simply cannot be implemented on the GPU and it's necessary to use a different approach entirely.

A big obstacle is that GPU simulation is a all-or-nothing kind of deal. It's not possible to run only a part of the simulation on the GPU and the rest on the CPU. Doing so would require to constantly move the data between CPU and GPU, which would negate any performance benefits the GPU simulation would otherwise provide.

Another inconvenience is that SpaceSim already makes heavy use of the GPU to render particles and perform a number of post-processing effects - surface smoothing, lighting and shadows, bloom, etc. This is especially true for the 'raymarching' renderer, which is far from being real-time and can often be the bottleneck, making the simulation wait for the render to complete instead of the other way around. By moving the simulation to the GPU, it's necessary to share the resources between the simulation and the renderer, meaning rendering slows down the simulation. When using the CPU for simulation, we can use as much of the GPU as needed without negatively affecting the simulation speed. That's not the case when the simulation uses GPU too and it's something we have to take into consideration.

Due to the GPU resources being shared by both the renderer and the simulation, running a high-resolution simulation will make the application refresh rate lower. This is generally not the case for the CPU simulation; even though a single time step can take a few hundreds milliseconds to compute, the application FPS can still be high and you can freely view the simulation, use UI controls, etc.

The ugly

The differences between GPUs and even between drivers on the same GPU are generally much bigger than the differences between CPUs. Testing on a single GPU is not nearly enough - the shaders may fail to compile, work incorrectly, or have poor performance when tested on another device. Even worse, things can suddenly change (or even break completely) when the drivers get updated.

OpenGL has certain limitations which are difficult to get around. One of the GPU "features" is timeout detection, or TDR. When a compute task takes too much time (by default 2 seconds or longer), the operating system crashes the application to avoid locking the system completely. To my knowledge, nothing should ever take that long, even when using a large number of particles, but it cannot be guaranteed. There is no way the application can work around this, it can only be avoided if the user manually changes the TDR timeout by editing a registry value, which is not exactly a viable solution.

Despite the negatives, I'm optimistic the GPU simulation will make the program better. You can look forward to the next version.