Odd Behavior with Fixed Arrays

TheNeighborlyBoy · 13 August 2025 00:35

This is a weird one. I have to apologize for not being able to provide an example that does not include code from a foreign c library, but I was unable to create a more generic example that reproduced this issue.

Quick Info:

OpenSSL client/server application
Odin version dev-2025-08 for both client and server
built using odin run . --debug for both client and server, so nothing fancy
client: Windows 10 amd64
server: Debain 12 (bookworm) arm64 (Pi5)

The Issue

Whenever the server receives a message from the client, the first four bytes of server buffer is always zero with the rest of the buffer matching expectations. If the client sends the message “Hello from the otherside.” as the byte stream [72, 101, 108, 108, 111, 32, 102, 114, 111, 109, 32, 116, 104, 101, 32, 111, 116, 104, 101, 114, 115, 105, 100, 101, 46]. the server will receive the message “^@^@^@^@o from the otherside.” as the byte stream [0, 0, 0, 0, 111, 32, 102, 114, 111, 109, 32, 116, 104, 101, 32, 111, 116, 104, 101, 114, 115, 105, 100, 101, 46]. The first four bytes are always zero regardless of the number of calls to the receive procedure and the time delta between procedure calls.
It is important to note that this is only an issue with the server. Clients receiving messages from the server using the method in example_01 does not result in any missing/corrupted data.

OpenSSL Details:

The client and server both use the function SSL_read_ex to receive buffers. Here is some additional context for the binding used in the examples below:

// c declaration
__owur int SSL_read_ex(SSL *ssl, void *buf, size_t num, size_t *readbytes);
// odin proc binding
ssl_read_ex :: proc(ssl: SSL, buf: [^]byte, num: c.size_t, read: ^c.size_t) -> c.int ---

Ruling out OpenSSL:

I validated the client buffer is staged correctly using the RAD Debugger:

the buffer address is correctly staged for the proc
the internal memory for the OpenSSL structures contain the valid data prior to transmission from the client
no issue with the client using technique from example_01 to receive buffers from the server

Validation of the server memory was a little more tricky for me:

sshing into the pi and the linux cli debugging tools are tricky for me to navigate in that environment (If anyone has any remote debug workflow suggestions please share. lldb and gdb via the cli are not especially great for me…)
things appear to be in good standing
95% sure its not OpenSSL “server” side

Attempts Using Fixed Arrays as backing buffer:

First I tried to take a slice of the fixed-array and get the Multi-Pointer. This works fine on the client machine, but does not on the server.

// example_01
buffer: [1024]byte
read: c.size_t

ossl.ssl_read_ex(ssl_handle, raw_data(buffer[:]), len(buffer), &read)
fmt.printf("client_message: %v\n", buffer[:read])
// client_message: [0, 0, 0, 0, 111, 32, 102, 114, 111, 109, 32, 116, 104, 101, 32, 111, 116, 104, 101, 114, 115, 105, 100, 101, 46]

Next I tried to be a bit more explicit in case my understanding of raw_data() was inaccurate. This still produced the same results which was not entirely surprising to me.

// example_02
buffer: [1024]byte
read: c.size_t

ossl.ssl_read_ex(ssl_handle, transmute([^]byte) &buffer[0], len(buffer), &read)
fmt.printf("client_message: %v\n", buffer[:read])
// client_message: [0, 0, 0, 0, 111, 32, 102, 114, 111, 109, 32, 116, 104, 101, 32, 111, 116, 104, 101, 114, 115, 105, 100, 101, 46]

In a brute force attempt of offsetting the buffer by four bytes, everything now works as expected. This was attempted thinking that maybe on some platforms fixed arrays might have some protected/padded memory at the front of the buffer. Looking through the documentation and implementation, I could not find anything to support this idea, but the expected behavior was achieved.

// example_03
buffer: [1024]byte
buffer_ptr := raw_data(buffer[4:])
buffer_len := c.size_t(len(buffer) - 4)
read: c.size_t

ossl.ssl_read_ex(ssl_handle, bufferptr, buffer_len, &read)
fmt.printf("client_message: %v\n", buffer_ptr[:read])
// client_message: [72, 101, 108, 108, 111, 32, 102, 114, 111, 109, 32, 116, 104, 101, 32, 111, 116, 104, 101, 114, 115, 105, 100, 101, 46]

The Curious Case:

// example_04
buffer: [1024]byte
buffer_ptr := raw_data(buffer[4:])
read: c.size_t

ossl.ssl_read_ex(ssl_handle, raw_data(buffer[:]), len(buffer), &read)
fmt.printf("client_message: %v\n", buffer[:read])
// client_message: [72, 101, 108, 108, 111, 32, 102, 114, 111, 109, 32, 116, 104, 101, 32, 111, 116, 104, 101, 114, 115, 105, 100, 101, 46]

Something strange then occurred while commenting/uncommenting various versions of the examples. After a seemingly arbitrary slicing of the initial buffer, example_01 will now behave as expected. The multi-pointer buffer_ptr is never used/referenced aside from initialization, so I’m not sure why this would change the behavior of buffer. I would have thought the compiler might even optimize this line away. Remove the line initializing buffer_ptr and the first four bytes return to always being zero.

Closing Thoughts:

I am left with the following possibilities for why this behavior occurs:

I have a fundamental misunderstanding on how raw_data() and multi-pointers interact with fixed arrays
This is a misuse of fixed arrays and another technique should be used
This is some combination of issues with Debian, OpenSSL, and/or arm64
This is a weird bug/edge case for Odin
Something else entirely different is occurring

xuul · 14 August 2025 13:41

I tried playing around a little with this and can not see anything concrete. All I can think of is maybe the code could be simplified a bit. The following variations seems to give a [^]u8 just fine:

buffer: [1024]byte
rawdawg := raw_data(&buffer)
rawdawg2 := transmute([^]byte) &buffer
fmt.println(typeid_of(type_of(transmute([^]byte) &buffer)))
fmt.println(typeid_of(type_of(rawdawg)))
fmt.println(typeid_of(type_of(rawdawg2)))

buffer2: [^]u8
fmt.println(typeid_of(type_of(buffer2)))

TheNeighborlyBoy · 14 August 2025 18:39

Thank you for helping me look into this issue.

I tried all of the recommended variations and the initial four bytes of the buffer still remain zero. Only after the inclusion of line 3 of example_04 will the buffer be written to fully/correctly. This is true for all provided variations.

What confuses me most is that the address of the multi-pointer correctly points to the expected chunk of memory in each variations, yet does not seem to allow for the first four bytes to be written without the inclusion of that unrelated slicing operation prior.

xuul · 15 August 2025 07:05

Thought maybe I could stand to learn something from this, so did some more poking around. Did not find anything substantial, but here’s where I’m at.

It seems that the first 4 bytes of raw data is not protected at least on the Odin side of things since I can modify it and see the results in the original buffer. Only thing I could find in the documents that might have some bearing is that C’s default for variable declaration is to not initialize the memory, and Odin’s default is the opposite, which is to initialize to a zero value. Doubtful this will have an inpact, since not initializing the buffer allows there to be residual data from memory in the buffer, but maybe keeping apples-to-apples might make a difference since you are working with a foreign C library.

buffer: [1024]byte = ---
rawdawg: [^]byte = ---
rawdawg = raw_data(buffer[:])

dawg := "Hellope from Xuul the terror dawg!"
for r, i in dawg { rawdawg[i] = byte(r) }

fmt.println(buffer[0:len(dawg)])
for c in buffer[0:len(dawg)] { fmt.printf("%c", c) }

TheNeighborlyBoy · 15 August 2025 10:43

I tried using the uninitialized memory via buffer: [1024]byte = --- with the full list of variations and the same behaviors persist.

The plot also thickens. When trying to clean up the code to provide a fully fleshed out example, more odd behavior emerged after simply moving things around. Scenarios 1 and 2 are the effectively the same as the initial post with some additions.

1. First four bytes of buffer are incorrect (base case)

closing_message := "goodbye for now..."
buffer: [1024]byte
rcount, wcount: size_t

if ossl.ssl_read_ex(handle, raw_data(buffer[:]), len(buffer), &rcount) > 0 {
	fmt.printf("client_message(%i): \"%s\"\n", rcount, buffer[:rcount])
	// echo message back to client
	ossl.ssl_write_ex(handle, raw_data(buffer[:]), rcount, &wcount) 
	// send something original
	ossl.ssl_write_ex(handle, raw_data(closing_message), len(closing_message), &wcount)
}

client_message(25): “^@^@^@^@o from the otherside.”

2. Buffer is as expected (base case with additional line)

closing_message := "goodbye for now..."
buffer: [1024]byte
buffer_ptr := raw_data(buffer[:])  // <-- mysterious fix
rcount, wcount: size_t

if ossl.ssl_read_ex(handle, raw_data(buffer[:]), len(buffer), &rcount) > 0 {
	fmt.printf("client_message(%i): \"%s\"\n", rcount, buffer[:rcount])
	// echo message back to client
	ossl.ssl_write_ex(handle, raw_data(buffer[:]), rcount, &wcount) 
	// send something original
	ossl.ssl_write_ex(handle, raw_data(closing_message), len(closing_message), &wcount)
}

client_message(25): “Hello from the otherside.”

3. SEGFAULT (base case with first two lines swapped in position):

buffer: [1024]byte
closing_message := "goodbye for now..."
rcount, wcount: size_t

if ossl.ssl_read_ex(handle, raw_data(buffer[:]), len(buffer), &rcount) > 0 {
	fmt.printf("client_message(%i): \"%s\"\n", rcount, buffer[:rcount])
	// echo message back to client
	ossl.ssl_write_ex(handle, raw_data(buffer[:]), rcount, &wcount) 
	// send something original
	ossl.ssl_write_ex(handle, raw_data(closing_message), len(closing_message), &wcount) // <-- segfault
}

(gdb) run
180 Starting program: /home/cpi/odin_workspace/openssl_odin/openssl_odin
181 [Thread debugging using libthread_db enabled]
182 Using host libthread_db library “/lib/aarch64-linux-gnu/libthread_db.so.1”.
183 starting listener server…
184 new client accepted.
185 client_message(25): “Hello from the otherside.”
186
187 Program received signal SIGSEGV, Segmentation fault.
188 memcpy_generic () at …/sysdeps/aarch64/multiarch/…/memcpy.S:87
189 87 …/sysdeps/aarch64/multiarch/…/memcpy.S: No such file or directory.
190 (gdb) bt 10
191 #0 memcpy_generic () at …/sysdeps/aarch64/multiarch/…/memcpy.S:87
192 #1 0x00007ffff7a82a44 in ?? () from /lib/aarch64-linux-gnu/libssl.so.3
193 #2 0x00007ffff7ab6120 in ?? () from /lib/aarch64-linux-gnu/libssl.so.3
194 #3 0x00007ffff7ab6950 in ?? () from /lib/aarch64-linux-gnu/libssl.so.3
195 #4 0x00007ffff7a96f64 in SSL_write_ex () from /lib/aarch64-linux-gnu/libssl.so.3
196 #5 0x000000000040628c in main::run_example_server () at example_blocking_server.odin:101
197 #6 0x000000000040599c in main::main () at main.odin:6
198 #7 0x0000000000407c7c in main (argc=1, argv=0x7ffffffffa58) at /home/cpi/Odin/base/runtime/entry_unix.odin:57

4. Buffer is as expected and No SEGFAULT (first two lines swapped + mysterious fix)

buffer: [1024]byte
closing_message := "goodbye for now..."
buffer_ptr := raw_data(buffer[:])  // <-- now no segfault?
rcount, wcount: size_t

if ossl.ssl_read_ex(handle, raw_data(buffer[:]), len(buffer), &rcount) > 0 {
	fmt.printf("client_message(%i): \"%s\"\n", rcount, buffer[:rcount])
	// echo message back to client
	ossl.ssl_write_ex(handle, raw_data(buffer[:]), rcount, &wcount) 
	// send something original
	ossl.ssl_write_ex(handle, raw_data(closing_message), len(closing_message), &wcount)
}

client_message(25): “Hello from the otherside.”

In the instances where there was no segmentation fault, the client receives both the message echo and the closing message on the same level of correctness as the server.

Since the SEGFAULT in scenario 3 occurs on the second call of the write function but not the first and does not occur after the inclusion of buffer_ptr := raw_data(buffer[:]), I am starting to suspect something strange might be occurring with the linker on this particular platform.

xuul · 16 August 2025 13:00

When I see “^@^@^@^@” make me think an address pointer value was accessed instead of the data. Seems I recall seeing this when working in C, which usually meant something is accessing pointer data and not the data it points to. Maybe try commenting out everything that is accessing the data, i.e. the fmt.prints, etc, and instead take a look at the memory addresses. Does it look strange in any way? Is it contiguous?

for &a in buffer[0:rcount] {fmt.printfln("%v - %b - size_of: %v - align_of: %v", &a, &a, size_of(&a), align_of(&a))}

xuul · 16 August 2025 13:43

Just realized that size_of and align_of only return the type ID information, not the physical memory information, but still looking at the address values may be useful.

for &a in buffer[0:rcount] {fmt.printfln("%v - %b", &a, &a)}

So thinking more about memory, I looked deeper into core:mem. Maybe playing with some of the tools there could help. Not entirely sure I’m using the below correctly, but could give some insight. I chose 16 in the mem procedures, thinking that maybe there could be a mismatch in length, so just in case…

import "core:mem"

buffer: [1024]byte
rawdawg := raw_data(buffer[:])

fmt.printfln("%v", mem.compare_ptrs(&buffer, rawdawg, 16) == 0 ? "memory matches" : "memory does not match")

for i := 0; i < 8; i +=1 {fmt.printfln("%v", mem.compare_ptrs(&buffer[i], &rawdawg[i], 16) == 0 ? "memory matches" : "memory does not match")}
for i := 0; i < 8; i +=1 {fmt.printfln("%v", mem.compare_byte_ptrs(&buffer[i], &rawdawg[i], 16) == 0 ? "memory matches" : "memory does not match")}

fmt.printfln("%v", mem.check_zero_ptr(rawdawg, 8) ? "zero pointer found" : "no zero pointer found")
for i := 0; i < 8; i +=1 {fmt.printfln("%v", mem.check_zero_ptr(&rawdawg[i], 8) ? "zero pointer found" : "no zero pointer found")}
fmt.printfln("%v", mem.check_zero_ptr(&buffer, 8) ? "zero pointer found" : "no zero pointer found")
for i := 0; i < 8; i +=1 {fmt.printfln("%v", mem.check_zero_ptr(&buffer[i], 8) ? "zero pointer found" : "no zero pointer found")}

TheNeighborlyBoy · 16 August 2025 23:01

The addresses were extensively monitored and matched the expected addresses through all of the examples and variations documented above in the initial post and replies. I though this information had been included and I apologize for the omission.

Vaguely similar to Schrodinger’s Cat, the behavior changes only once the buffer is accessed in particular ways prior to calling the foreign procedure (i.e. attempting to print out address info to a terminal/log, attempting to store a slice of the buffer, etc.). Once one of these seemingly nullipotent operations are performed, then the buffer is written to properly.

The foreign procedure copies the data from its internal buffer, via the platform supplied memcpy, contiguously starting from the specified address. If the supplied pointer to the buffer was incorrect, I would expect the buffer to have the data shifted in accordance to the offset between the supplied and expected address, or written to another address space altogether. That is not what is occurring. The pointer passed to the foreign procedure has remained the expected address for the first element of the buffer through all variations and the data is not being copied as would be expected.

xuul · 18 August 2025 23:46

Is your client server small enough to copy paste? I’d be interested to look at the whole thing since I’ve been planning to setup a π system to monitor and water some plants. It’s a ways in the future, but something I’m thinking of doing. Currently I have 3 linux instances running: mint, ubuntu, kubuntu. Though non are ARM, and are virtualized on x86-64 in Windows 11 Hyper-V. So not exactly apples-to-apples there. If not it’s cool. Otherwise, if I can think of anything else specific, I’ll mention it here. At the moment I’m at a loss.

TheNeighborlyBoy · 26 August 2025 01:41

Resolved

The behavior was caused by an error I made in the OpenSSL binding, promptly fixed, but was then made persistent via a VCS setting being altered.

The Core Issue: VCS misconfiguration

Once a fresh bare-bone case for error reproduction was created, I quickly discovered that I could no longer reproduce the bug in the provided example cases. A quick diff between the binding files across the original and bare-bone repositories revealed that one of the files associated with the binding were not being synchronized properly. This binding file was correct on both the Windows machine and in the remote VCS instance, but not on the local Pi. The Pi local config for the VCS had somehow been changed to ignore changes to certain files without any warning/error when pulling from a remote source. There have been a high occurrence of power outages in my area this summer and my UPS has proven to be interruptible, so I would not be surprised if some tomfoolery occurred as a result of an outage.

How did the type information get mangled in the first place?

This was the first binding I had attempted from scratch, and made the poor choice of initially using data types “hardcoded” to the windows spec (i.e. i64 instead of the more portable types in core package such as c.long). After looking through the structure of the official vendor libraries and realizing this project would actually be running on more platforms, the binding was updated to use data types from the core:c package. The evolution of the type info had a progression of i64 to int then erroneously c.int in some cases. This caused a number of instances where one or more procedure parameters incorrectly used the wrong type, but more importantly types of the wrong size (i.e. int is 8 bytes and c.int is 4 bytes on arm64 and other platforms). The main culprit was a BIO_XXX procedure parameter that should have been of type c.size_t but was of c.uint (VSC change log showed a progression of u64 → uint → c.uint). This was fixed and pushed to the VCS from the windows machine, but the file changes were ignored even after several subsequent pulls on the arm64 machine.

The byte misalignment in the procedure parameters was what caused both the unexpected behavior of the missing four bytes and the segfault in example_03 where the stack had been corrupted in such a way that the buffer was being overridden and function pointers invalidated.