Video: Into to Odin data types

Several months ago, I posted on here a draft of a video intro to Odin. Several people gave me helpful feedback, but I got distracted from making a final version. Coming back to this months later, I decided to retry from a different angle, this time focusing on Odin’s data types and related features: video 1: Odin Data Types (55 min)

I also have a follow-up that focuses on just polymorphism: video 2: Odin polymorphism (20 min)

Anyone who can spare the time, please critique any mistakes or other issues. There are definitely a few areas where the info might not be fully correct or complete, but rather than bias the feedback, I’ll let others identify them. (There are also some sound issues which I’ll try to address for the final version.)

Belated thanks to everyone who commented on the earlier video. I promise this time that I’ll actually publish a final draft!

7 Likes

Good video. Good flow. I actually did pick up something new for me. The shorthand on slices: “s = arr[offset:][:length]” instead of “s = [offset:offset + length]”. I’ll be looking for an opportunity to use that :slight_smile:

May want to update the bit about enum arrays “can only be contiguous”. They can be non-contiguous if the #sparse directive is used:

	Sparce_Enum :: enum {
		ONE   = 1,
		THREE = 3,
		FIVE  = 5,
	}

	sparce_array := #sparse [Sparce_Enum]string {
		.ONE   = "one",
		.THREE = "three",
		.FIVE  = "five",
	}

Last thing, and I apologize in advance. I’m not dogmatic about this, but since this is meant to be learning video, I thought this is something worth considering. I heard the word “function” used a lot. I did not go back to scrutinize whether it was accurate in each individual case. The Odin FAQ states the following:

https://odin-lang.org/docs/faq/#why-is-it-named-proc

I think the distinction is important to understand. I’ve gotten the impression from reading the forums and git that Odin is intentionally designed as a procedural language. Since “procedure is a superset of functions and subroutines”, you would always be correct when using the word “procedure”, since it includes the other possibilities.

I’m a geek…

Procedure :: union {
	Function,
	Subroutine,
}
3 Likes

odin datatypes draft

Pretty cool video! Some notes and mistakes

  • 11:30 and 12:26 typo, casting pointers require parens, so ip = (^int)(r) or ip = cast(^int)r
  • slices, you can emit both start and end, not just one or the other. s := arr[:] is how you convert a fix buffer to a slice
  • anonymous structs, can be casted to a named struct if fields are declared in the same order (missed), with the same name and type
  • enumerated arrays, #sparse can be used to have non-contiguous enum range as index.
  • unions, AFAIK do not store a typeid. Imagine that you have #no_nil then the zero value of that union should be the first type (not nil), which wouldn’t work because every variable is basically just "memzero"ed when declared (so it would not be initialised to the proper typeid), The hint is in reflect.union_variant_typeid implementation as the tag is not used directly as a typeid but as an index into the variant table.

In many cases when you say what’s inside a type it might be good to just show what it desugars to (e.g. any -> runtime.Raw_Any).

Thanks, @xuul!

I agree that “procedure” and “function” should never have been conflated, but sadly that ship sailed at some point in the 70’s and 80’s (blame C?). “Function” has long been established as the catch-all term to encompass procedure, routine, subroutine, method, and actual mathematical function.

More recent corruptions are still worth fighting, though, e.g. the way “functional” has become abused in the last decade.

Sorry for the late reply to the videos.

My general complaint (not talking about the minor mistakes which have already been mentioned) is how much time is being focused on any.

I would highly recommend either NOT talking any at all or only in passing.

We highly recommend virtually no one using any unless they know EXACTLY how it works. And the problem is, most people don’t actually bother to learn that and wonder why they have problems with any.

Most people should not be any whatsoever. At most, just state it is used for things like fmt.println for that to have runtime type-safe formatted printing.

I know you were showing how it was used with polymorphism in the second video, but I think it was confusing because it was treating the open-set approach with a switch statement still when in reality, open-set approaches use a vtable so that you don’t need to know the type.

2 Likes

Video 1 Mistakes/Comments:

  • 0:22
    • mentioning of complex numbers, but no mention of quaternions
    • misses bit_set which is much more common that bit_field
  • 3:27
    • The zero value for a string/string16 is "" not nil
    • The zero value for cstring/cstring16 is nil
    • Zero value for enums is also nil
  • 5:20
    • I don’t recommend distinct types for things like units most of the time, so I am not sure if this would be more confusing than not, as people would get the wrong idea.
  • 10:20
    • rawptr isn’t not “untyped” like other constants, but I understand what you meany this.
  • 11:03
    • Casting syntax for pointers needs parentheses to prevent ambiguity when parsing
    • (^int)(r) and (^string)(r) is the correct syntax
  • 12:13
    • Same problem with syntax. Use `(^int)(p)
    • I’d also mention that this isn’t the common way to do “pointer arithmetic” since slices or multi-pointers exist.
  • 13:51
    • As I said in a previous comment, I’d recommend just briefly talking about any and not really referring to it as a pointer either, unless you say “fat pointer” or something that effect.
    • Most people do not need nor do they understand any, so I’d recommend just pretending it doesn’t exist when teaching it.
    • But if you really want to explain how it works, don’t just explain it in words, show it with code too.
i: int = 123
a: any = i

// equivalent to
a: any
a.data = &i
a.id = typeid_of(type_of(i))

// or in the case of a non-addressable value
a: any = 123

// equivalent to

a: any
tmp: int = 123
a.data = &tmp
a.id = typeid_of(type_of(tmp))
  • 23:13
    • You do this throughout, but I think it’s probably going to confuse more people than now when using the underlying procedures rather than the overloaded/grouped names. e.g. please prefer make([]int, 10) and delete(s) over make_slice([]int, 10) and delete_slice(s)
  • 24:47
    • delete_slice(s, context.temp_allocator) will not necessarily result in a segmentation fault and if it is the default temp_allocator, it’ll just be a no-op. Make it clear what the default behaviour is and that any custom allocators are allocator-specific behaviour
  • 27:07
    • Same as previously said, please prefer make([dynamic]int, 4, 7) and append(&elems, 100, 101, 102) over the overloaded/grouped names
  • 36:10
    • Mentioning reflection at all at this stage seems a little weird to me since it is a bit of a high-level construct which not everyone will need directly nor understand.
  • 38:51
    • Unions do not store a typeid, they just store an integer to represent the tag of the variant. By default 0 represents the nil state and then starting from 1, the number represents the variant. The size of the integer chose is the smallest needed to represent the variants (e.g. u8 when the variant count < 255, u16 when larger, etc).
      • typeid isn’t used for loads of reasons, but mainly because it’s not needed, and typeid is meant purely for reflection needs.
      • typeid is also a u64 sized (not pointer sized) hash of the type from its canonical textual form.
  • 40:48
    • I’d also make it clear WHY all unions by default have a nil state, as it makes sense with the rest of Odin’s semantics.
      • I do get people who do not use Odin saying they don’t understand why it needs a nil state or say it isn’t a “proper” union type. It’s mainly not thinking through the logic of default implicit initialization of variables.
  • 41:24
    • You’ve forgot to mention the most common use of a parametric polymorphic union type: Maybe
    • Maybe is a user-level @builtin type which has the following definition: Maybe :: union($T: typeid) { T }
  • Because of the layout of a union when there is only one variable and it is pointer-like, no tag is stored and the nil value itself is shared as the nil value of the union, allowing for things like Maybe(^T) to be pointer-sized.
    • I’d mention that this is more useful for interfacing with foreign code or optional parameters than as a general construct in Odin where multiple return values are preferred.
  • 43:20
    • intrinsics.overflow_add doesn’t necessarily return an error but rather just returns a boolean which indicates if an overflow happened, which may or may not be an error
  • 45:46
  • This is the OPPOSITE of the or_return behaviour. or_return will return when the last value of the multiple return value list is either false if a boolean or NOT nil when other types (which support nil).
  • In the example you gave, you are saying it will return when things DO NOT overflow.
  • intrinsics.overflow_add is not a “safe add” operation, which is what you are think it is like.
  • 50:33
    • clamp is already a “builtin” procedure so this might be a little confusing to people assuming it doesn’t already exist.
      • It’s built-in to allow for compile-time evaluation and better error checking
3 Likes

Video 2 Mistakes/Comments:

  • 5:15
    • I’d recommend doing ^$T/Stack($E) instead because it is much more like people are going to pass a possibly distincted Stack than a distincted pointer to a Stack. That’s what the / effectively does: strip the distinctness from a type.
    • Stack is probably not the best example since append and pop exist for dynamic arrays already, but I understand the example.
  • 8:43
    • This is less a mistake more going to be one of those gotchas, Odin doesn’t need a Result type because of multiple return values, and the idiom of multiple return values is much better with the rest of the language. Result in a way is an anti-pattern for Odin
    • I’d also mention Maybe instead.
    • I do agree that parapoly unions are rare, and the reason why they might be needed is correct in your case.
  • 11:24
    • I would not use data: any here, like I was stating before.
    • when it comes to this kind of typing, I’d recommend either basic subtyping with using, vtables, a union, or an enum.
      • The latter two are for closed sets rather than open sets, but if you really want a properly open set, then use a vtable or something similar.
      • Open-sets only make sense as pure interfaces to me, and in practice when you have inheritance-like features, you pretty much only ever want a closed-set of variants.
  • 14:25
    • I’d also show a way to do subtyping with using and union, but you missed the variant field in the Pet
Pet :: struct {
    name:   string,
    age:      f32,
    weight: f32,
    variant: union { ^Cat, ^Dog }
}

Cat :: struct {
    using pet: Pet,
    foo: int,
}

Dog :: struct {
    using pet: Pet,
    bar: f32,
}

d: Dog
d.variant = &d
// or
d := new(Dog)
d.variant = d
  • 14:47
    • Again, please don’t mention any. People just misuse it all the time when they don’t understand it. It’s effectively never a good idea unless you know EXACTLY how it works AND what you are doing. Most people don’t know.
1 Like

Thanks, @gingerBill!

I’ve done some heavy reworking of the slides and script, though I haven’t yet re-recorded. Here’s a summary of changes:

  • many small clarifications and double checked that example code compiles
  • corrected pointer cast syntax
  • replaced all mentions of “function” with “procedure”
  • corrected coverage of union tags and zero values
  • type assertions and type switches are introduced with unions instead of any
  • moved all discussion of parapoly structs/unions into the polymorphism videos
  • use Maybe as the example of a parapoly union
  • simplified the discussion about “interfaces”

I’m sticking with the explicit make/delete procs because I don’t introduce proc groups until the later polymorphism videos. I do however more prominently mention immediately after that there is a preferred alternative.

Covering any still seems necessary to cover “extensible interfaces” in the style of Allocator, but I’ve moved the first mention of any to right before that final section. I suppose I could instead just split the data: any field into separate rawptr and typeid fields, but Allocator itself uses any.

What is your objection about distinct for separate units and unit conversion? Do you have a better example for distinct in mind?

I got it in my head that boolean errors use false as success for sake of consistency (because then all zero values would indicate success). Because this is backwards, I see then that add_overflow’s boolean isn’t meant to be an error, per se. What’s a better example proc that returns a bool as error?

Aside from basic using usage, I’ve stripped out any discussion of inheritance, though I find your example interesting. I suppose the idea is to express the relationship both ways: Dogs and Cats concretely contain their parent Pet, which references back to the child. For multiple levels of inheritance, does each ancestor reference back to the concrete instance or to the immediate parent? E.g. would Pet have a using field for Mammal which then has a pointer back to the Pet inside a Cat…or does Mammal have a pointer back to the Cat itself? This topic is probably too much to include in the video, but I might add it into the text supplement.

I originally planned to cover the more exotic math types + bit_sets + SOA + swizzziling etc, but I’ve cut them for time. Hopefully I’ll get back to these for a followup video later.

I also hope to later do a followup walking through some small useful examples, such as maybe some exercises from Exercism.

The below seems to be a common pattern in many core procedures…

num, ok := strconv.parse_f64("not a f64")
1 Like

I am a big fan of your work. Your linux API series is still the best thing on the subject on youtube. Very cool that you’re doing Odin.

1 Like

Covering any still seems necessary to cover “extensible interfaces” in the style of Allocator.

Allocator doesn’t use any either:

Allocator :: struct {
	procedure: Allocator_Proc,
	data:      rawptr,
}

It’s a basic flat syscall-like interface which takes a procedure value plus a rawptr. runtime.Allocator works even without RTTI.

The main thing that uses any is pretty much fmt.print*, and that’s because any was created effectively for things like this: serialization.


So the objection about distinct for separate units is a bit complicated and complex, but I will try to keep it short. We do use for things like time.Duration, but the problem is that how Odin’s type system works, it does not really express units correctly and people will get confused. Units “change” with the operator e.g. distance / time = speed, whilst in Odin, the two operands of a binary expression must coerce to each other and thus be equal.

It’s more of an edge case that it works for time.Duration than the norm e.g. 3 * time.Minute + 2 * time.Second.

The more common use case for distinct is probably “I want a distinct type with the same memory layout but I don’t want the things to convert implicitly”, and not necessarily “units”.


Common procedures that return a bool to indicate a failure are things like:

  • Parsing/Conversion like things
  • Cases which can only “succeed” or “fail”
  • iterators (which is not a beginner topic)

So something like the strconv.parse_* calls are a good example or any of the “iterator” examples.


Closed-set subtyping that I showed is mainly used as an alternative to just union when you want to minimize memory usage or have a specific memory layout. It’s less about “inheritance” per se, and more about that structure. I guess that’s must thinking more about memory than trying to keep to any OOP like approach.

Traditional Inheritance is obviously for when an open-set of variants with a closed-set of functionality.

I swear I saw Allocator use any, but I was probably imagining things.

Still, I worry about not having any type check. In my example, an accident like this could happen:

// accidental mixing of wrong proc with wrong Pet type:
// when called, sleep_dog will end up casting the ^Cat into a ^Dog
pet := Pet{ sleep = sleep_dog, data: &cat }

When such accidents occur, the fix is obvious if you get a type assert panic, but maybe not so obvious with a bad rawptr cast.

I suppose though such accidents could mostly be avoided by providing make procs, e.g.:

pet_from_cat proc(c: ^Cat) -> Pet { 
    return Pet{ sleep = sleep_cat, data = cat }  
}

Anyway, removing coverage of any will help keep it shorter and simpler.


This is what I’m used to doing in Go:

d: Distance
t: Duration
s: Speed = Speed(f64(d) / f64(t))

I guess it requires some care when doing the operations, but then you just create a helper function for anything you do frequently.

Anyway, I can just cut the topic of type aliases and distinct types.

Final revisions are published:

part 1: Data types
part 2: Polymorphism

3 Likes

Some feedback, but not necessarily a call to make changes.

At the end of the part 2 video on Polymorphism, there is mention that using Unions are “not extensible” and a rawptr solves this. I don’t disagree with those statements exactly, and I’m certain I’m not seasoned enough an Odin user to have an opinion on this, but my reaction to that part is as follows:

I tend to think of the extensibility of code and the method chosen to be based on use-case. Take for example a library intended for 3rd party use. I would not provide rawptr as the main entry point into the library and it’s procedures. There’s no way the library writer can be certain that whatever is assigned to the rawptr contains the expected struct definitions. So, personally, I would reserve this approach to creating interfaces for code that is either for personal use and not shared as a library, or for code projects where code reviews are performed to ensure any new struct entities adhere to the expected structure in the rawptr.

For 3rd party “extensibility” I might suggest unions instead. There’s a stronger level of control and certainty. Combining different Odin approaches can yield some reliability while still maintaining a supportable 3rd party library with flexibility within boundaries.

So for example, I might combine unions with procedure overloading and intrinsics specialization do the following with an overly simplified example:

Cat :: struct {
	name: string,
	age: int,
}

Dog :: struct {
	name: string,
	age: int,
}

Pet :: union {
	Cat,
	Dog,
}


print_pet :: proc {print_pet_struct, print_pet_union}

print_pet_struct :: proc(p: $P) where intrinsics.type_is_variant_of(Pet, P) {
	fmt.printfln("%v %v", p.name, p.age)
}

print_pet_union :: proc(p: Pet) {
	switch v in p {
	case Cat: print_pet_struct(v) // or could do print_pet(v) and the correct procedure will be selected
	case Dog: print_pet_struct(v) // or could do print_pet(v) and the correct procedure will be selected
	}
}

main :: proc() {

	pet: Pet
	cat := Cat{"Meow Face", 6}
	dog := Dog{"Woof Woof", 3}

	// print using explicit struct definitions
	print_pet(cat)
	print_pet(dog)

	// print using dynamic struct definitions
	pet = cat
	print_pet(pet)
	pet = dog
	print_pet(pet)

}

The above feels as though I’ve provided options on how to print the pet information without requiring the use of the union type Pet, but yet still have boundaries on what’s possible. It does require a little more work on the back end, but front end 3rd party users can still have reasonable flexibility.

Anyone more seasoned feel different?

Upon further thought, I may be splitting hairs on the closed-set of variants with an open-set of functionality vs. open-set of variants with a closed-set of functionality idea.

1 Like