Need some help understanding how to utilize core:encoding/xml

I’d like to create an epg (episode guide) reader/writer to use with my legal OTA channels. So far i’ve enjoyed using Odin for a few other small projects, so I figured since Odin is a data oriented language, xml should be easy enough. With that said, I do not have much experience programmatically processing xml files, so I’m admittedly getting lost in the weeds a bit. Please forgive my ignorance.

So far I’m struggling to understand:

  • The example xml “utf8.xml” contains no english identities for it’s elements, making it difficult for me to reverse engineer any test output I generate.
  • The example odin program does not appear to do anything useful other than generate metrics, which is neat, but makes it difficult to pinpoint where anything practicle is done with (or to) the example xml.
  • Every parse function I try to use on the example xml file “utf8.xml” generates an EOF error. Parsing body says it expects Ident, but gets EOF. Parsing prologue says it expects “Question”. All of these elements exist in the file. Not sure what I’m doing wrong here.

Could anyone point me to an example that opens an xml file and does some basic things? Maybe even an example of creating a new xml tree and saving to file?

Here is a snippet parsing an XML document, checking that the root node is a “map” and then looping through its children:

doc, err := xml.parse(data, xml.Options{
	flags={.Ignore_Unsupported},
	expected_doctype="",
})

if err != nil {
	panic("Failed to parse XML")
}

root := &doc.elements[0]
// Check that root element is a "map" element
assert(root.ident == "map")

// Loop through children
for child_value in root.value {
	switch child in child_value {
	case string:
		// Not expecting text inside root noed
		panic(fmt.tprintf("Unexpected string: \"%s\"", child))
	case xml.Element_ID:
		child_node := &doc.elements[child]
		if child_node.ident == "tileset" {
			// TODO: handle "tileset" node
		} else if child_node.ident == "layer" {
			// TODO: handle "layer" node
		} else {
			// Something we don't care about
			fmt.printfln("Skipping top level node: %s", child_node.ident)
		}
	}
}

You can search through attributes of a node like this:

get_attrib :: proc(element: ^xml.Element, key: string) -> (string, bool) {
	for attrib in element.attribs {
		if attrib.key == key {
			return attrib.val, true
		}
	}

	return "", false
}

Hope that gets you started.

2 Likes

Thank you very much. This is an exellent start. I’m already reading and parsing a file. I see where I went wrong at the beginning: load_from_file was already parsing the file, attempting to parse further was redundant. I noticed this when I saw your return into a doc starting with parsing a data variable. I was already returning into a doc using load_from_file, so that was my data. Silly me.

Now to create and output to a new file the filtered data. Also will need to read in from a URL xml or gz data instead of loading from file. I’ll see if I can finger this out myself before asking ;).

Try working with a small, custom XML file using clear English tags first—complex examples like utf8.xml can obscure learning. Ensure the XML is well-formed and encoded in UTF-8 without BOM. Start by calling xml.load_file(path) to get a document, then use xml.get_children() or xml.get_property() to access elements. For writing, use xml.make_document() to build a tree and xml.save_file(doc, path) to write it. A minimal example that reads a simple file and extracts the id would help clarify the flow—let me know if you want one!

xml.get_children(), xml.get_property(), or xml.make_document() do not seem to be availible as procedures in core:encoding/xml. I’m getting the nightly odin package using scoop.

I’ve tried to plow through anyway on my own and have figured some things out, but am starting to feel like I’m missing something. Not sure if the resistance I continue to encounter is expected or if I’m just doing things wrong.

At the moment I’m running into something that must either be a bug or an indication I’ve lost the plot somehow. When creating a new xml.Document, then use xml.new_element, the element_count is increased, but the [dynamic]elements is not getting an initial index. This means, that if I try to add an element to that array, I get an error “Index 0 is out of range 0…<0”. If I first use xml.new_element to increase the element_count, then manually append_elem to add the first element (i.e. the root), I am then able to use xml.new_element after, and the [dynamic]elements index is incremented allowing me to no longer need to manually append_elem and just reference the index.

As an aside issue: After successfully creating a xml.Document, I then am unable to properly add elements without xml.destroy(document) exiting with error afterwards. I’m working through this still, so not ready to cry wolf on this yet. I suspect it’s related to how I’m creating the document.

Not working - epgout.elements has no 0 index :

epgout := new(xml.Document)
newroot := xml.new_element(epgout)
fmt.println(epgout)
epgout.elements[newroot] = xml.Element{ident = "tv", kind = .Element, parent = 0}
fmt.println(epgout)
xml.destroy(epgout)

This seems to work, but I end up having issues later with updating the new element:

epgout := new(xml.Document)
xml.new_element(epgout)
append_elem(&epgout.elements, xml.Element{ident = "tv", kind = .Element, parent = 0})
xml.new_element(epgout)
fmt.println(epgout)
xml.destroy(epgout)

Notice how the second xml.new_element adds a new empty elements index, but in my first example, the first time use of xml.new_element leaves elements non-indexable.

The chatty-ness of the working code makes me think that I’ve not identified the proper way to do this, or there is a bug in the resize function used by xml.new_element. Someone please slap me with a trout if needed.

Welp, I’ve made some major progress. Aside from the above issue, which is easily delt with by creating the first elements index my self, I am now able to do all the things needed except write formated xml syntax to a file.

Achieved the following so far:

  • Download an epg.xml or epg.gz file from URL
  • Decode gz as needed
  • Parse the xml data into a xml.Document to read from
  • Create new xml.Document to store delta data
  • Extract what I want and add delta data to the new xml.Document which so far matches exactly the input from a test file.

Now just need to write the new xml.Document to a file with proper xml syntax. The example seems to only print the element names and the strings, without xml formatting. Also, I see a comment in encoding/xml/doc.odin about “Maybe: XML writer?” Is there an all ready known procedure for doing this, or do I need to build this from scratch?

Got all the steps working for this now. I’m focused on optimizing the code and reducing unneeded complexity. Can anyone provide a better, less syntactically complex way to get the type of a dynamic struct union that has no names, and 2 different types. So far I’m using this to generate the bool that I need, but it feels too wordy just to set a bool. I’ve tried variations of type_info_of, typeid_of, type_of, and only get back [dynamic]Value.

//discover if dynamic struct union value is a string
  is_string: bool
  for value in root.value {
    #partial switch v in value {
      case string: is_string=true; break
      case: break
    }
    break
  }