Fully dynamic procedure call

mrzo · 29 November 2024 09:02

I am sorry for the title, I don’t know how to call it.
My goal is to write a binding interface to embed a scripting language in Odin (like pybind11 for C++).

I want to expose Odin procedures to that scripting language. This is how I‘d design it to avoid boilerplate code for the user:

// example function
my_func :: proc(a: int, b: string) -> i32{…}

vm_register_proc(“my_func”, my_func)

vm_interpret(“r = my_func(5, ‘hola’)”)

// somewhere in vm_interpret I’d need to call my_func but how?
// how could I pass a list of arguments (each might have a different type) dynamically to a procedure?
// in pseudo code it might be like this:
args := []{cast(i32)incoming_args[0], cast(string)incoming_args(1)}
return_values = procedures[proc_name](..args)

I think in pybind11 they use some template and macro magic here. How would one approach such a problem in Odin? Or are there better/other designs to solve this?

Barinzaya · 29 November 2024 11:05

The approach that comes to mind for me requires two things:

You’ll need some single type that can hold any value that the VM may want to pass. Your VM may already have something like this internally. The simplest implementation would probably be a union, but you can do fancier stuff for this too for a more compact representation (e.g. NaN boxing, pointer tagging, etc.).
You’ll need a single proc type that you can wrap any proc into. This might be something like proc ([]Value) -> Value (if you’re fine with requiring 0/1 return values, or Value can represent multiple values) or proc ([]Value) -> []Value (to handle any number of return values, but will require allocation for every call).

The second part is the tricky part. You could just write all of the procs you plan to bind that way, or bind them manually, but I think there is a way to do it automatically–though a bit ugly. Something like this:

Script_Proc :: #type proc ([]Value) -> Value

// could also be NaN-boxed, etc.
Value :: union {
	f64,
	i64,
	string,
}

from_value :: proc ($T: typeid, value: Value) -> T {
	// used for bound proc arguments
	// maybe do more stuff; numeric type conversion, etc.
	return value.(T)
}

to_value :: proc (x: $T) -> Value {
	// used for bound proc results
	// maybe do more stuff; numeric type conversion, etc.
	return x
}

wrap :: proc ($f: $F) -> Script_Proc where intrinsics.type_is_proc(F) && intrinsics.type_proc_return_count(F) == 1 {
	// wraps a normal function f to a normalized Script_Proc
	return proc (x: []Value) -> Value {
		N :: intrinsics.type_proc_parameter_count(F)
		assert(len(x) == N)

		when N >= 1 do a := from_value(intrinsics.type_proc_parameter_type(F, 0), x[0])
		when N >= 2 do b := from_value(intrinsics.type_proc_parameter_type(F, 1), x[1])
		when N >= 3 do c := from_value(intrinsics.type_proc_parameter_type(F, 2), x[2])

		when N == 0 do return f()
		when N == 1 do return to_value(f(a))
		when N == 2 do return to_value(f(a, b))
		when N == 3 do return to_value(f(a, b, c))
		when N >  3 do #panic ("Too many arguments!")

		unreachable()
	}
}

Then your register_proc proc can take Script_Procs as its argument (or take any proc and internally call wrap), and call them (and receive results) in a uniform way.

foo :: proc (a: i64, b: string) -> i64 {
	return a + 10
}

bar :: proc (a: i64) -> string {
	assert(a == 123)
	return "bar"
}

main :: proc () {
	procs := map[string]Script_Proc {
		"foo" = wrap(foo),
		"bar" = wrap(bar),
	}

	fmt.println(procs["foo"]([]Value { i64(113), "hi" })) // 123
	fmt.println(procs["bar"]([]Value { i64(123) })) // bar
}

There are a few potential improvements here, depending on what your needs are:

wrap requires its argument to be a compile-time value. This is so it can return a single function pointer that knows how to reach the original function. An alternative would be to pass the original function to the wrapped call (probably as a rawptr which you cast back to F) along with the actual call arguments. In that case, you can store the original proc pointer (casted to rawptr) alongside the Script_Proc in your map, then just pass it in when calling it. You may even make wrap return a struct that contains other meta information along with the pointers.
This implementation of the wrap function requires exactly one return value. You can make it handle functions with other numbers of return values by making to_value be a proc group, with procs handling different numbers of Value arguments. You can also expand to support more than one return value this way. Don’t forget to adjust the where condition on wrap for this!
This implementation asserts on errors (e.g. parameter count mismatch, parameter type mismatch). You can improve the error handling as well, by making from_value and Script_Proc, and possibly to_value, return error results as well.
You may need to expand the number of arguments it supports. Unfortunately, I haven’t been able to come up with a way to handle arbitrary numbers of arguments, but it should be relatively clear how to do expand it as-is.

gingerBill · 29 November 2024 11:12

So this is hypothetically possible in Odin but it would an absolute horror show of the use of reflection.

So I won’t support it.

The simpler option is require all of your procedures to have the exact same signature and use reflection.

If you do not know how to use any, nor know how it precisely works, please do not use it since this is an advanced feature.

So it could be something like this:

my_func :: proc(args: ..any) -> i32 { // same status code for everything
    for arg, i in args {
        ...
    }
    return ...
} 
incoming_args := []any{i32(123), string("Hellope")}
status_code := procedures[proc_name](..incoming_args)

However Odin isn’t a dynamic language nor was ever designed to do this.

mrzo · 29 November 2024 17:45

Awesome! Thank you very much for your response! I didn’t think about wrapping the function and using “when” to do everything during compile time. I love your way! It is a little bit hacky but it allows to hide a lot of boiler plate from the user.

Thank you very much!

resu · 29 November 2024 22:38

Bill, any idea how the pybind11 folk achieved this feat?
because C++ isn’t dynamic language either.

ratchetfreak · 30 November 2024 17:08

C++ has turing complete templates, it’s possible for them to iterate the parameters of the passed callable and create validation+extraction+conversions for each parameter and then call the function directly.

if you look at the vanilla lua api you’ll find a much more simple where the passed function gets a pointer to the lua vm and must do the parameter extraction manually

mrzo · 1 December 2024 07:54

I completely understand your hesitation to put something like this into the language.

I will use Barinzaya’s idea to implement it. Some manual effort is fine. I was just worried whether it is not possible at all.

flysand7 · 1 December 2024 09:29

It is possible to do in Odin, however it is by no means simple or type safe. The only thing required is to know the number of arguments and, well depending on how far you want to go you might need to reduce the possible types of arguments into integers and pointers.

The idea is to ask the OS for a memory page with read-write permissions, write the machine code that calls your function with needed parameters, then change the page permission to read-execute. You can then take the base address of that page and cast it to proc(p1,p2,p3: int) -> int {} type and call it. The easiest way I can think of doing it is having a few fixed machine codes for calling a procedure with one, two, three, four, five and six pointer-sized arguments, and then “patching in” (in linker terms, this is called relocation) the values for the addresses, including the address of the Odin procedure that is being called.

Odin’s calling convention is close to SysV, with a slight difference, in any case you should be able to get the necessary information about this calling convention from their documentation. It’s not a simple read, but I’d focus on the simplest cases first.

After you do that, well… congradulations, you have built a simple JIT compiler. You can of course then surround all of that with an LRU cache, use direct CALL instructions, and maybe even attempt supporting different types of arguments (other than pointer-sized), or use reflection API to check for types of the arguments.

flysand7 · 1 December 2024 09:31

If you can get the address of the Odin procedure by name, then I guess you can just cast its pointer to one of the 6 types depending on the number of arguments. You don’t really need all that complexity if you limit what you can do.