Hello all
I would like, in the fastest way possible, from two 256-bit registers, take the highest 128 bits from the first register and the lowest 128 bits from the second register, e.g.:
a := [2]u128{1111, 2222}
b := [2]u128{3333, 5555}
result := … // expected: [2222, 3333]
Context: porting simdjson to Odin ( simdjson/include/simdjson/haswell/simd.h at master · simdjson/simdjson · GitHub , call ‘_mm256_permute2x128_si256’)
A)
There’s the ‘_mm256_permute2x128_si256’/‘vperm2i128’ intrinsic. But I can’t seem to use it from Odin (and maybe from LLVM?):
@(require_results, enable_target_feature=“avx2”)
_mm256_permute2x128_si256_test :: #force_inline proc “c” (a, b: simd.u8x32, idx: u8) → simd.u8x32 {
return vperm2i128(a, b, idx)
}@(private, default_calling_convention=“none”)
foreign _ {
@(link_name = “llvm.x86.avx2.vperm2i128”)
vperm2i128 :: proc(a, b: simd.u8x32, idx: u8) → simd.u8x32 —
}@(enable_target_feature=“avx2”)
main :: proc() {
a := [2]u128{1111, 2222}
b := [2]u128{3333, 5555}// miserable failure: undefined reference to `llvm.x86.avx2.vperm2i128' result := transmute([2]u128)_mm256_permute2x128_si256_test(transmute(simd.u8x32)a, transmute(simd.u8x32)b, 0x21) fmt.printfln("expect [2222, 3333]: %v", result)
}
It fails, the name ‘llvm.x86.avx2.vperm2i128’ has been removed a long time ago apparently ( ⚙ D37892 [X86] Use native shuffle vector for the perm2f128 intrinsics , line 910)
and replaced by an instruction ‘shufflevector’.
‘shufflevector’ itself may call ‘vperm2i128’ under the right conditions I guess ? ( llvm-project/llvm/test/CodeGen/X86/avx-vperm2x128.ll at b003face11fadc526a6f816243441f486ffc958d · llvm/llvm-project · GitHub )
B)
There’s also ‘simd.shuffle’ :
main :: proc() {
a := transmute(simd.u64x4)[2]u128{1111, 2222}
b := transmute(simd.u64x4)[2]u128{3333, 5555}result1 := simd.shuffle(a, b, 2, 3, 4, 5) fmt.printfln("simd.shuffle: %v", transmute([2]u128)result1)
}
It does the job, but it creates a bunch of ‘movaps’ apparently (after checking with ‘godbolt.org’).
Note: I’m out of my depth here, mistakes are likely.
Thanks in advance for the help!