Yosh

Some statistical principles behind the segmented run (Bonus 1/5)

Added 2025-09-05 14:29:37 +0000 UTC

This is the first of five bonus posts related to the A01 video! This clip is an extra section I originally cut from the main video to keep it shorter. It's a bit out-of-context so I'll give some extra explanations below.

In the A01 video, I made the AI drive a "segmented run". The principle is simple as showed in the video: the AI tries the first segment of the map many times, I take the best attempt, the end point of that attempt becomes the starting point for the next segment, and I repeat this process until the finish line. However, for this to work well, there is one key thing to decide: how do we select the "best attempt" for a given segment ?

The obvious choice would be to simply take the car that’s furthest ahead at the end of the segment (i.e. the car with the maximum distance). However, being in front is not the only thing that matters on a track like A01: for example, it's sometimes more important to have more speed. So instead of looking at just one variable, we need to consider at least two: distance and speed. In the clip, I show a graph plotting the final distance and speed for many attempts. Here we can see the problem: there isn’t a single attempt that maximizes both. So the challenge is to guess which distance/speed combination is most likely to result in the fastest overall finish time.

For that, my idea was to use statistics. For example, let's say I want to judge how promising an attempt is at t = 10s in the run, based on its speed and distance. To do that, I first made the AI drive many full runs from start to finish. Then, for each of these runs, I've recorded the speed and distance at t = 10s, as well as the final finish time. After that, I've used these data to build the following linear regression model:
FINISH_TIME ~ x0 + x1*SPEED(t=10) + x2*DISTANCE(t=10)

If you’re not familiar with regression: it’s a statistical tool that builds a mathematical relationship between variables we know (here: speed & distance) and the outcome we want to predict (here: finish time). Once I have that relation, I can use it to predict the finish time of any new attempt, based only on its speed and distance at t = 10s. That way, I can assign a “predicted finish time” to every attempt on my graph, and simply pick the attempt with the lowest prediction. Of course, this could be improved further by including more variables, such as car angle, lateral position, and so on.

I hope this explanation wasn’t too confusing! Let me know if you have questions, and see you in the next bonus post :)