_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Enum of Arrays
       
       
        tugu77 wrote 3 hours 25 min ago:
        This thing should be a poster example of premature optimization. Sure
        you can squeeze a few milliseconds out in a performance critical task.
        Most things won't measurably benefit though, while making all handling
        super awkward.
        
        If your abstract domain description is fundamentally a collection of
        things that have a few parts each, then have your data type represent
        that, instead of turning it inside out for cache effects. If those
        become relevant at some point, try to abstract that away and do the
        optimized internal representation under the hood. But don't
        preemptively design your data structures in a cumbersome way just in
        case. That's bad advice.
       
        hashmush wrote 7 hours 29 min ago:
        I don't get it, why wouldn't you just store tag + count instead? Am I
        missing something?
       
        moth-fuzz wrote 9 hours 52 min ago:
        The idea that arrays of structs are inherently more cache friendly and
        thus data-oriented-er is a bit reductive of the whole practice of
        data-oriented code. The point is to optimize data layout for access
        patterns. Putting fields of a struct into their own arrays is only
        actually an optimization if you're only accessing that field in-bulk.
        And if so, why is it even in a struct in the first place? If you use
        all fields of a struct in your algorithm, then an array of structs is
        the optimal way.
        
        All the same is true for enums.
       
          10000truths wrote 3 hours 53 min ago:
          Access patterns matter, but just as important is to have less stuff
          to access. That's why arrays-of-structs are considered cache friendly
          - columnar data layouts open the door to optimizations that
          significantly reduce memory footprint. You no longer waste memory
          with struct padding. Boolean fields can become bitsets. Enums can be
          bit-packed. Often-null optional fields can become sparse maps. 8-byte
          pointers can become narrower-sized indices into object pools.
       
            pavlov wrote 2 hours 3 min ago:
            > “That's why arrays-of-structs are considered cache friendly”
            
            Sounds like you mean structs-of-arrays?
       
              10000truths wrote 1 hour 30 min ago:
              Oops, brainfart on my part. Unfortunately, the edit window has
              passed.
       
          aragilar wrote 8 hours 33 min ago:
          Same with row-major vs. column major, accessing contiguous data is
          faster than non-contiguous data, so you should align your algorithms
          and data structures.
       
        fargle wrote 10 hours 2 min ago:
        an Enum of Arrays would be an enum where each enumerator was a product
        of each possible enumerator. there would be N^M enumerators where N is
        the length of the array and M is the number of enumerators. for
        example, if the original type was enum { red, green } then the enum of
        array[3] would have to be an enum containing the 8 enumerators:
        
            { red-red-red, red-red-green, red-green-red, red-green-green ...
        green-green-green }
        
        so that's essentially completely useless. i think the exact same
        problem would occur with array-of-tagged-union to tagged-union-to-array
        "transformation".
        
        you can't just say "hey: arrays and structs and unions are words and if
        you can do array of struct and struct of array and enum is also a
        similar word, then why not enum-of-array?".
        
        while tfa talks about "batches" of items with the same tag, and the
        advantages therein, that isn't something captured by the example given,
        at least without extending the EoA to a variable sized array of EoA and
        something else to track the number of items in a "run" (as in RLE).
        
        this is better thought of as a data-structure problem than a type
        theory.
       
        shwestrick wrote 10 hours 28 min ago:
        Worth mentioning that you can always safely switch between AoS and SoA.
        Either can represent the other; all you've done is transpose the data.
        The same is not true of AoE/EoA. The AoE [Spam1, Egg1, Spam2, Spam3,
        Egg2] has no corresponding EoA that can represent it.
        
        What they're actually doing is an AoE => AoEoA transformation: find
        batches elements with the same tag and reorder the elements so that
        redundant tags can be eliminated. Essentially, a kind of run-length
        encoding. It's a nice idea.
       
          rlupi wrote 3 hours 13 min ago:
          Good insight.
          
          Ah... category theory :-)
          
          Array-of-Stuct (AoS) treats order in arrays as meaningful, arrays as
          lists, so AoS => Struct-of-Array (SoA) doesn't loose information. It
          is a sound transformation because it is a homomorphism.
          
          Some languages (homoiconic, or with macros or template support) can
          express this code transformation: e.g. Julia, [1] , or Rust, [2] In a
          sense, you can see this transformation through the concept of monads
          (although Haskell monads or F# computational expressions cannot
          directly express it, as far as I know). Then the corresponding
          category diagrams leads to sets or multi-sets (run-length encoding
          requires or implies some concept of identity, so unordered lists with
          repetitions = bags and multi-sets are equivalent in this specific
          context), as the right concept for Enums of Arrays.
          
   URI    [1]: https://github.com/JuliaArrays/StructArrays.jl
   URI    [2]: https://www.abubalay.com/blog/2019/02/16/struct-of-arrays
       
            mbrock wrote 3 hours 0 min ago:
            Zig can represent AoS to SoA very nicely, it's a favored technique
            for the Zig compiler itself and well supported by the standard
            library where it's known as a MultiArrayList.
       
        shoo wrote 12 hours 8 min ago:
        see also: Richard Fabian's data-oriented design book -- the chapter on
        existential processing discusses enums [1] Rough idea: model everything
        as relational data - define 1 table for each state. membership of a
        record in the table corresponding to state X implies that record is in
        the given state X.
        
        > the reason why you would put an enum in table form, is to reduce
        control flow impact. Given this, it's when we aren't using the
        enumerations to control instruction flow that it's fine to leave them
        alone
        
        An example of the latter might be some kind of state machine, where you
        can write branch-free code to determine the successor state from
        current state, and no other processing needs to branch on the state
        tag.
        
   URI  [1]: https://www.dataorienteddesign.com/dodbook/node4.html
       
        mpweiher wrote 12 hours 48 min ago:
        Now do a single class pointer for an array of values...
       
        samatman wrote 13 hours 49 min ago:
        This is a somewhat, hmm, bilingual post.  The enum in question here is
        what Zig calls a tagged union, while Rust calls it an enum, with what
        Zig calls an enum being the degenerate case where the tag is the only
        payload.
        
        I thought this would be about std.enum.EnumArray[0], an array of some T
        which is indexed by an enum.  I've gotten a lot of mileage out of those
        as well.  But it's about std.MultiArrayList[1], as used with a tagged
        union.    I've had occasion to use that with structs, but not with
        unions, and didn't actually know that you could, although finding out
        it makes sense.
        
        Actually a variation on MultiArrayList which is optimized for
        homogenous collections of one specific union variant, since if that's
        the useful way to structure the data then the tag would be redundant to
        store one of per element.
        
        Good read, mostly wanted to add a few links for those who want to know
        more.  The comptime metaprogramming used in MultiArrayList is a great
        illustration of what Zig is capable of IMHO.
        
        [0]: [1]:
        
   URI  [1]: https://ziglang.org/documentation/master/std/#std.enums.EnumAr...
   URI  [2]: https://ziglang.org/documentation/master/std/#std.multi_array_...
       
          benatkin wrote 4 hours 10 min ago:
          In wit for wasm they call them variants, which makes more sense to
          me. Enum is kind of an odd name for them.
          
   URI    [1]: https://component-model.bytecodealliance.org/design/wit.html...
       
          saghm wrote 13 hours 23 min ago:
          > This is a somewhat, hmm, bilingual post. The enum in question here
          is what Zig calls a tagged union, while Rust calls it an enum, with
          what Zig calls an enum being the degenerate case where the tag is the
          only payload.
          
          To be fair, I think that most languages typically use enum to refer
          to the same thing as Zig; if anything, Rust (and Swift, iirc) are
          somewhat outliers for using that term for tagged unions.
       
            dist1ll wrote 10 hours 30 min ago:
            Scala also calls them enums fyi. Personally, I wish everyone would
            call them variant types.
       
              saghm wrote 9 hours 10 min ago:
              I often use the term "sum types" for them, since I think it helps
              explain why they're useful compared to "product" types like
              structs or objects or tuples. I've heard people refer to them as
              "algebraic" types, but I don't really like that as a term for
              them because that feels like it should refer to sum and product
              types as a categorization rather than one of the categories
              specifically. Unfortunately, "sum type" doesn't really work super
              clearly in verbal conversations that often; people often tend to
              hear it as "some types".
       
          skissane wrote 13 hours 36 min ago:
          > This is a somewhat, hmm, bilingual post.
          
          Yeah, I wish the author had just mentioned what language they were
          using in the blog post text. I was looking at it and I couldn't
          identify it. Now I know it is Zig, but I'm not familiar with Zig so I
          can't identify it by sight. I was looking at it and thinking "this
          looks a bit like Rust but isn't Rust".
       
        thechao wrote 14 hours 36 min ago:
        I don't think I've had the need for a uniformly tagged array of enums.
        Generally, when I do an AoS to SoA transform that includes tagged data,
        I just factor out the tag into its own array. In fact, if the tag is
        2-valued, I just build a bitmap, rather than burning a whole byte. If
        the tag is a resource indicator, then I have a group of 1-hot bitmaps.
       
          norir wrote 13 hours 6 min ago:
          The SoA transformation makes sense to me and is quite general. The
          EoA transformation on the other hand feels like a rare special case
          though it seems perhaps less rare for the OP.
          
          Either way, these types of optimizations are typically marginal in
          the context of end to end performance of most programs. It's good to
          have some knowledge of these kinds of techniques, but most of the
          time it makes sense to do the thing that is most straightforward to
          implement and optimize later once the program is already working. Of
          course if the problem maps neatly onto EoA then that should be
          preferred in the initial implementation. I though in my 30+ years of
          programming cannot think of a particular problem that I have solved
          that would have been enhanced by this.
       
            dist1ll wrote 10 hours 31 min ago:
            > I though in my 30+ years of programming cannot think of a
            particular problem that I have solved that would have been enhanced
            by this.
            
            One example that I frequently deal with that can benefit from this
            is compiler data structures.
       
            tcfhgj wrote 11 hours 13 min ago:
            Isn't performance and memory usage generally enhanced by this?
            
            So why not simply default to this instead of defaulting to
            Interfaces/traits doing dynamic polymorphism?
            
            Makes everyone a bit more happy.
       
            AndyKelley wrote 12 hours 29 min ago:
            It's an alternative to OOP. You can get there via a series of
            transformations:
            
            1. Start with OOP (heap-allocated objects with shared base structs)
            
            2. Transform to using tagged unions instead
            
            3. Transform to the approach outlined in the OP (I call it the
            "encoding" approach in this talk: [1] )
            
            It's handy because you get to use an index to refer to an object,
            and you get serialization benefits. The zig compiler uses this
            pattern in quite a few places:
            
            * [2] * [2] * [2] *
            
   URI      [1]: https://vimeo.com/649009599
   URI      [2]: https://github.com/ziglang/zig/blob/77c63ac36034db577a9287...
   URI      [3]: https://github.com/ziglang/zig/blob/77c63ac36034db577a9287...
   URI      [4]: https://github.com/ziglang/zig/blob/77c63ac36034db577a9287...
   URI      [5]: https://github.com/ziglang/zig/blob/77c63ac36034db577a9287...
       
              midnightchair wrote 2 hours 57 min ago:
              I'll tell you my experience with Zig. I don't have any. I saw
              maybe Primagen talking about it and I see your post here. I
              watched 10 minutes of your vimeo video. I see it has 30k+ stars
              on github. So now I have to try to understand it in a nutshell.
              
              First like any language, I go to indeed.com and put in "Zig" to
              see if there are any jobs listed which use it. I don't see any.
              
              Then I click to [1] and it describes Zig as "robust, optimal and
              reusable". Well that doesn't really say much of anything.
              
              I read the example listed, which appears to be a test case, and I
              wonder how the 'try' mechanism works without a 'catch'
              
              Then I go to [1] documentation/master/ and see that it says: 
              Robust
              Behavior is correct even for edge cases such as out of memory.
              
              I wonder how that works, but there are no links to support this
              claim.
              
              I read a little more then move on.
              
              This isn't to say anything one way or another about Zig, its just
              my 30 minutes of reading about Zig.
              
   URI        [1]: https://ziglang.org/
   URI        [2]: https://ziglang.org/documentation/master/
       
                brabel wrote 1 hour 58 min ago:
                > First like any language, I go to indeed.com and put in "Zig"
                to see if there are any jobs listed which use it. I don't see
                any.
                
                What does that have to do with anything? Zig is still in beta
                and they explicitly do not recommend that you use it in
                production yet unless you're ok with frequent breaking changes.
                Of course there will be very few jobs (though it's being used
                by a few notable projects already, including Tigerbeetle -
                authors of the post we're discussing - and Bun, the JS
                runtime).
       
       
   DIR <- back to front page