Hey everyone,
I have a simple collision system in my game that uses the following proc
in a loop to determine the closest point on a collision mesh to an arbitrary test point:
// Implementation adapted from section 5.1.5 of Real-Time Collision Detection
closest_pt_triangle :: proc(point: hlsl.float3, using triangle: ^Triangle) -> hlsl.float3 {
dot :: hlsl.dot
// Check if point is in vertex region outside A
ab := b - a
ac := c - a
ap := point - a
d1 := dot(ab, ap)
d2 := dot(ac, ap)
if d1 <= 0 && d2 <= 0 {
return a
}
// Check if the point is in vertex region outside B
bp := point - b
d3 := dot(ab, bp)
d4 := dot(ac, bp)
if d3 >= 0 && d4 <= d3 {
return b
}
// Check if P in edge region of AB
vc := d1*d4 - d3*d2
if vc <= 0 && d1 >= 0 && d3 <= 0 {
w := d1 / (d1 - d3)
candidate_point := a + w * ab
return candidate_point
}
// Check if P in vertex region outside C
cp := point - c
d5 := dot(ab, cp)
d6 := dot(ac, cp)
if d6 >= 0 && d5 <= d6 {
return c
}
// Check if P in edge region AC
vb := d5*d2 - d1*d6
if vb <= 0 && d2 >= 0 && d6 <= 0 {
w := d2 / (d2 - d6)
candidate_point := a + w * ac
return candidate_point
}
// Check if P is in edge region of BC
va := d3*d6 - d5*d4
if va <= 0 && (d4 - d3) >= 0 && (d5 - d6) >= 0 {
w := (d4 - d3) / ((d4 - d3) + (d5 - d6))
candidate_point := b + w * (c - b)
return candidate_point
}
// P inside face region. Compute Q through its barycentric coordinates (u,v,w)
denom := 1.0 / (va + vb + vc)
v := vb * denom
w := vc * denom
candidate := a + ab * v + ac * w
return candidate
}
I have a scene with ~10,000 collision tris and a dozen enemies, and testing each enemy against the collision is taking ~12ms which is way too slow, so I’m looking to optimize. I was going to try implementing a simple broad phase using a uniform grid for binning the triangles, but then I remembered that SIMD is a thing that is famously underutilized. I thought I could use f32x8
registers to somehow process eight triangles at a time.
I vaguely understand that SIMD allows one to do simple math operations on multiple pairs of elements simultaneously, but I’m struggling to even get started re-implementing this algorithm with SIMD, so I have two main questions.
- Is this even an appropriate use of SIMD? Would I be better off just trying to implement a broad phase?
- If the answer to 1 is “yes”, can you give me any insight or resources or ways of thinking about SIMD that are helpful?
Thank you.