Totally new to SIMD, can I use it to accelerate my collision detection?

NickDriscoll · 12 June 2025 16:37

Hey everyone,

I have a simple collision system in my game that uses the following proc in a loop to determine the closest point on a collision mesh to an arbitrary test point:

// Implementation adapted from section 5.1.5 of Real-Time Collision Detection
closest_pt_triangle :: proc(point: hlsl.float3, using triangle: ^Triangle) -> hlsl.float3 {
    dot :: hlsl.dot

    // Check if point is in vertex region outside A
    ab := b - a
    ac := c - a

    ap := point - a
    d1 := dot(ab, ap)
    d2 := dot(ac, ap)
    if d1 <= 0 && d2 <= 0 {
        return a
    }

    // Check if the point is in vertex region outside B
    bp := point - b
    d3 := dot(ab, bp)
    d4 := dot(ac, bp)
    if d3 >= 0 && d4 <= d3 {
        return b
    }

    // Check if P in edge region of AB
    vc := d1*d4 - d3*d2
    if vc <= 0 && d1 >= 0 && d3 <= 0 {
        w := d1 / (d1 - d3)
        candidate_point := a + w * ab
        return candidate_point
    }

    // Check if P in vertex region outside C
    cp := point - c
    d5 := dot(ab, cp)
    d6 := dot(ac, cp)
    if d6 >= 0 && d5 <= d6 {
        return c
    }

    // Check if P in edge region AC
    vb := d5*d2 - d1*d6
    if vb <= 0 && d2 >= 0 && d6 <= 0 {
        w := d2 / (d2 - d6)
        candidate_point := a + w * ac
        return candidate_point
    }

    // Check if P is in edge region of BC
    va := d3*d6 - d5*d4
    if va <= 0 && (d4 - d3) >= 0 && (d5 - d6) >= 0 {
        w := (d4 - d3) / ((d4 - d3) + (d5 - d6))
        candidate_point := b + w * (c - b)
        return candidate_point
    }

    // P inside face region. Compute Q through its barycentric coordinates (u,v,w)
    denom := 1.0 / (va + vb + vc)
    v := vb * denom
    w := vc * denom
    candidate := a + ab * v + ac * w
    return candidate
}

I have a scene with ~10,000 collision tris and a dozen enemies, and testing each enemy against the collision is taking ~12ms which is way too slow, so I’m looking to optimize. I was going to try implementing a simple broad phase using a uniform grid for binning the triangles, but then I remembered that SIMD is a thing that is famously underutilized. I thought I could use f32x8 registers to somehow process eight triangles at a time.

I vaguely understand that SIMD allows one to do simple math operations on multiple pairs of elements simultaneously, but I’m struggling to even get started re-implementing this algorithm with SIMD, so I have two main questions.

Is this even an appropriate use of SIMD? Would I be better off just trying to implement a broad phase?
If the answer to 1 is “yes”, can you give me any insight or resources or ways of thinking about SIMD that are helpful?

Thank you.

duchainer · 17 June 2025 00:01

I don’t have much experience with SIMD, but even with or without, if that can help, you might want to take a look at those 2 blog posts about sweep-and-prune algorithms for collision (works for 2d and 3d) :

And just in case you missed it, most simd procedures seem to be in package simd - pkg.odin-lang.org and to be using the base:intrinsics package from simd_add to simd_lanes_rotate_right .

Best of luck and sorry if none of that is useful to you personally after all.

NickDriscoll · 18 June 2025 13:07

It definitely seems like it will be useful. I’ve been using Christer Ericcson’s Realtime Collision Detection to learn everything, but I don’t think sweeap-and-prune is mentioned, and these blog posts seem pretty comprehensive, so thank you!

NickDriscoll · 19 June 2025 03:40

I’m wrong. The book discusses it in section 7.5. It’s really such an amazing book.