Skip to main content

Command Palette

Search for a command to run...

The Janus Array: When Your Data Structure Has Two Faces

Sometimes the most elegant solution is the one that embraces duality instead of fighting it.

Updated
3 min read
The Janus Array: When Your Data Structure Has Two Faces

The Monster in the File

Picture this: you're reverse-engineering some binary format, and you discover a nightmare hierarchy that makes onions jealous. Files split into slices. Slices containing commands. Commands holding elements. And oh, did I mention? It's recursive - those elements can contain more elements, going arbitrarily deep.

But here's the kicker: everything lives in dual address spaces. You have absolute offsets from byte zero of the file, AND relative offsets from the start of each container. It's like having a building where every room has both a street address AND a "third door on the left from the lobby" description.

Welcome to the world of load commands in executable files. Where sanity goes to die.

The Obvious (Wrong) Solutions

Your first instinct? "Throw it in a HashMap!" Map every absolute offset to its structure path. Simple, right?

Wrong. You'd need to map every single byte. For a 100MB file, that's 100 million entries. Your "optimization" just consumed more RAM than Chrome on a Tuesday.

Next attempt: "Fine, I'll use ranges!" Build a lookup table with (start, end, path) tuples. Better, but now you need interval trees, overlapping ranges, complex maintenance...

Still wrong. You're solving a problem that doesn't exist. These aren't arbitrary intervals - they're a clean hierarchical partition of the file.

The Janus Insight

Then it hit me: Why fight the duality? Embrace it.

Enter the Janus Array - named after the Roman god with two faces. One face looks forward through the hierarchy (give me slice[i].commands[j].elements[k]). The other face looks backward from absolute addresses (where does offset 0x47382 live?).

Two access patterns. One unified structure.

rust

// The "forward" face - direct hierarchical access
file.slices[i].commands[j].elements[k]  // O(1)

// The "backward" face - recursive binary search  
file.find_address(0x47382)  // O(log n)

The Implementation Beauty

Here's where Rust shines. One trait, multiple implementations, recursive elegance:

rust

trait DiskOffsets {
    fn find_address(&mut self, addr: u64) -> Result<Coordinates, Error>;
    fn get_absolute_range(&self) -> Range<u64>;
    // ... other methods
}

Every level (File, Slice, Command, Element) implements this trait. The search algorithm? Recursive binary search that delegates down the hierarchy:

rust

// In each level's find_address():
while start <= end {
    let mid = (start + end) / 2;
    let range = children[mid].get_absolute_range();

    if range.contains(&addr) {
        // Found it! Delegate to the child
        return children[mid].find_address(addr);
    }
    // ... binary search logic
}

Beautiful. No separate index structure. No memory duplication. The hierarchy IS the index.

When Theory Meets Reality

The textbook says "O(log n) per level, so O(log n × levels)."

Textbooks lie.

In practice, with load commands:

  • Files rarely have more than 3 slices

  • Many commands are leaf nodes (no elements)

  • Elements are uniformly distributed when present

So the real complexity? T(S,C,E) = ln(S) + ln(C) + ln(E), where S+C+E=n.

But S ≤ 3 in practice, so ln(S) ≈ 1.1 = constant.

Real complexity: O(ln(C) + ln(E))

Most of the time you don't even reach the element level, so it's just O(ln(C)).

This is what happens when you analyze algorithms in context, not in isolation.

The Payoff

The Janus Array gives you:

  • O(1) hierarchical navigation for the common case

  • O(log n) reverse lookup when you need it

  • Zero memory overhead (no auxiliary structures)

  • Type-safe error handling throughout

  • Recursive elegance that scales naturally

All wrapped in Rust's zero-cost abstractions and memory safety guarantees.

Conclusion: Architecture Over Algorithms

This isn't about being clever with data structures. It's about understanding your problem domain well enough to design solutions that feel inevitable.

The binary format had duality baked in. Instead of fighting it with complex indexing schemes, I embraced it with a structure that naturally supports both access patterns.

Sometimes the best optimization is realizing you don't need to optimize at all. You need to design better.

And yes, all the Python developers can kiss my gluteus maximus. 😉


GitHub: https://github.com/gb-at-r3/janus-array