_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Push Ifs Up and Fors Down
       
       
        Aeyxen wrote 11 hours 13 min ago:
        Many variants of this debate play out in real-world systems: data
        pipelines, game engines, and large-scale web infra. The only universal
        law is that local code clarity must never be optimized at the expense
        of global throughput or maintainability. Pushing ifs up absolutely
        unlocks performance when you're dealing with a hot loop—early
        bailouts mean less work per iteration, and in my experience, that's
        often the difference between a scalable system and a bottleneck. But
        the real win is batch processing (pushing fors down): it's the only way
        you get cache locality, vectorization, and real-world performance on
        modern hardware. No amount of OOP purity or DRY dogma can change the
        physics of memory bandwidth or the nature of branch misprediction.
       
        stuaxo wrote 12 hours 28 min ago:
        Related:
        
        Within a function, I'm a fan of early bail out.
        
        While this goes against the usual advice of having the positive branch
        first, if the positive branch is sufficiently large you avoid having
        most of the function indented.
       
          satyanash wrote 11 hours 58 min ago:
          > having positive branching first
          
          This is advice I've never seen or received. It's always been the
          latter, exit early, etc. Languages like Swift even encode this into a
          feature, a la if guards.
       
            Zanfa wrote 11 hours 15 min ago:
            Positive branch first is good advice when both branches are roughly
            even in terms of complexity. If the negative branch is just a
            return, I’d bail early instead.
            
            Negative first makes else-branches double negative which reads
            weird, eg. if !userExists {…} else {…}
       
        billmcneale wrote 17 hours 43 min ago:
        These are some pretty arbitrary rules without much justification, quite
        reminiscent of the Clean Code fad.
       
        deepsun wrote 18 hours 25 min ago:
        In my field (server programming) readability trumps them all.
        
        Nested g() and h() can be much better if they are even just 1% easier
        to understand. No one cares about a few extra CPU cycles, because we
        don't write system or database code.
       
          throwaway17_17 wrote 16 hours 37 min ago:
          I will admit it is probably just some internal biasing from some
          unknown origin, but I always tend to think of server programming as a
          part of systems programming. To your mind, what keeps it out of the
          systems programming umbrella? I am not super certain about how
          programmers in your area break up the constituent parts of a
          server’s code, so maybe I’m thinking more along the lines of the
          ‘framework’/libraries your code is executing within.
          
          Is this more of a ‘server programming is not systems programming
          because we are just implementing the logic of what gets served’ vs.
          my assumption that server programming includes the how does the
          server connect to the world, cache things, handle resource
          allocation, and distribute/parallelize the computations required to
          serve data back to the user?
       
            deepsun wrote 12 hours 37 min ago:
            Good question, for me it was always the focus, e.g. systems
            programming is an infrastructure to build applications, in contrast
            to applications themselves. In server applications we really don't
            care if something takes 100 bytes of RAM when it could take 1 byte.
            RAM is cheap, engineers are expensive. So something like Rust
            doesn't make sense to use.
            
            Maybe I'm using the terminology wrong, and it's actually
            Applications Programming, but it's easy to confuse with
            mobile/desktop applications, where RAM does matter. In servers we
            pay for RAM/CPUs ourselves.
       
        ramesh31 wrote 18 hours 38 min ago:
        Becoming disciplined about early returns was lifechanging for me. It
        will blow your mind how much pointless code you were writing before.
       
        hk1337 wrote 19 hours 34 min ago:
        Stop using Else
        
        99% of the time you can write better code without it.
       
        krick wrote 19 hours 37 min ago:
        These are extremely opinionated, and shouldn't be treated as a rule of
        thumb. As somebody else said, there isn't a rule of thumb here at all,
        but if I was to make up one, I would probably tell you the opposite:
        
        - You have to push ifs down, because of DRY.
        
        - If performance allows, you should consider pushing fors up, because
        then you have the power of using filter/map/reduce and function
        compositions to choose what actions you want to apply to which objects,
        essentially vectorizing the code.
       
          panstromek wrote 13 hours 4 min ago:
          I feel like you either flipped the naming or the reasons you cite
          don't support the conclusion.
          
          Pushing ifs down usually prevents vectorization and the cases article
          mentions are those non-dry where a similar branch has to be
          multiplied on a ton of functions down in the stack, often because the
          type is internally tagged.
       
            coolcase wrote 11 hours 37 min ago:
            3rd opinion: don't care until you have a performance issue to
            profile. Or you are building a high frequency trading system.
       
        Terr_ wrote 19 hours 47 min ago:
        For optimization, sure, but there are also cases where you care more
        about a maintainable expression of business rules, or the mental model
        used by subject experts.
       
        lblume wrote 1 day ago:
        In some cases the difference between if and for is not as clear-cut. A
        for loop over an option? Likely rather to be considered as an if. What
        about length-limited arrays, where the iteration mainly occurs as a way
        to control whether executions occurs at all?
       
        kazinator wrote 1 day ago:
        > If there’s an if condition inside a function, consider if it could
        be moved to the caller instead
        
        This idle conjecture is too rife with counterexamples.
        
        - If the function is called from 37 places, should they all repeat the
        if statement?
        
        - What if the function is getaddrinfo, or EnterCriticalSection; do we
        push an if out to the users of the API?
        
        I think that we can only think about this transformation for internal
        functions which are called from at most two places, and only if the
        decision is out of their scope of concern.
        
        Another idea is to make the function perform only the if statement,
        which calls two other helper functions.
        
        If the caller needs to write a loop where the decision is to be hoisted
        out of the loop, the caller can use the lower-level "decoded-condition
        helpers". Callers which would only have a single if, not in or around a
        loop, can use the convenience function which hides the if. But we have
        to keep in mind that we are doing this for optimization. Optimization
        often conflicts with good program organization! Maybe it is not good
        design for the caller to know about the condition; we only opened it up
        so that we could hoist the condition outside of the caller's loop.
        
        These dilemmas show up in OOP, where the "if" decision that is in the
        callee is the method dispatch: selecting which method is called.
        
        Techniques to get method dispatch out of loops can also go against the
        grain of the design. There are some patterns for 
        it.
        
        E.g. wouldn't want to fill a canvas object with a raster image by
        looping over the image and calling canvas.putpixel(x, y, color). We'd
        have some method for blitting an image into a canvas (or a rectangular
        region thereof).
       
          panstromek wrote 13 hours 14 min ago:
          The keyword here is `consider`. The article targets a somewhat
          specific design problem where this comes up, especially when you use
          tagged unions or something similar.
       
          neoden wrote 13 hours 31 min ago:
          > If the function is called from 37 places, should they all repeat
          the if statement?
          
          the idea here is probably that in this case we might be able to split
          our function into two implementing true and false branches and then
          call them from 21 and 16 places respectively
       
            kazinator wrote 4 hours 27 min ago:
            That's possible only if the condition is constant-foldable.
            
            You can achieve it by turning the if part into an inline function.
            
            Before:
            
              function(cond, arg)
              {
                if (cond) { true logic } else { false logic } 
              }
            
            after:
            
              inline function(cond, arg) { cond ? function_true(arg) :
            function_false(arg) }
            
            Now you don't do anything to those 37 places. The function is
            inlined, and the conditional disappears due to cond being constant.
       
          PaulRobinson wrote 1 day ago:
          If the function is called from 37 places, you need to refactor your
          code, but to answer your question on that point: it depends. DRY
          feels like the right answer, but I think we'd have to review an
          actual code example to decide.
          
          On examples where you're talking about a library function, I think
          you have to accept that as a library you're in a special place:
          you're on an ownership boundary. Data is moving across domains.
          You're moving across bounded contexts, in DDD-speak. So, no, you look
          after your own stuff.
          
          EnterCriticalSection suggests a code path where strong validation on
          entry - including if conditions - makes sense, and it should be
          thought of as a domain boundary.
          
          But when you're writing an application and your regular application
          functions have if statements, you can safely push them out. And
          within a library or a critical code section you can move the `if` up
          into the edges of it safely, and not down in the dregs. Manage your
          domain, don't make demands of other people's and within that domain
          move your control flow to the edge. Seems a reasonable piece of
          advice.
          
          However, as ever, idioms are only that, and need to be evaluated in
          the real world by people who know what they're doing and who can make
          sensible decisions about that context.
       
            CJefferson wrote 11 hours 26 min ago:
            I can't imagine a large program where no function is useful enough
            to be called more than 37 times. Memory allocation? Printing?
            Adding a member to a list? Writing to a file?
            
            I'm guessing you mean something else, or do you feel useful
            functions can't be called many times in the same program?
       
            kenjackson wrote 19 hours 38 min ago:
            Refactoring due to being called more than N times seems very
            function dependent. As the prior author noted, I’d expect to call
            a lock function in some programs a lot.  Likewise, memcpy.  In fact
            I’d argue that well factored functionality is often called at
            many different call sites.
       
            jovial_cavalier wrote 19 hours 40 min ago:
            Pray tell, how many places is appropriate to call the same
            function? Is 5 too many? How about 6? When I hit 7, I have to
            refactor everything, right?
       
              tylersmith wrote 14 hours 55 min ago:
              You don't need an explicit rule, you just need to be smarter than
              than the average mid-curve tries-too-hard-to-feel-right hn poster
              and realize when you're repeating a calling convention too much.
       
              cakealert wrote 15 hours 0 min ago:
              This only applies to a situation where you have a function that
              requires dynamic checks for preconditions. I would suggest that
              such a function (or how it's being used) is likely a blight
              already, but tolerable with very few call sites. In which case
              checking at the call site is the right move. And as you continue
              to abuse the function perhaps the code duplication will prompt
              you to reconsider what you are doing.
       
                jovial_cavalier wrote 5 hours 18 min ago:
                So if a function dereferences a pointer, it doesn't make sense
                to check that it's not null inside the function?
                
                Unless there's an actual performance implication, this is all
                purely a matter of taste. This is the kind of broadly true,
                narrowly false stuff that causes people to destroy codebases.
                "I can't write it this way, because I have to push ifs up and
                fors down!!!" It's a totally fake requirement, and it imposes a
                fake constraint on the design.
       
                  kazinator wrote 2 hours 12 min ago:
                  If there is a performance implication of moving the if into
                  the callers or not, you can do it with an inline function.
                  
                    static inline int function(blob *ptr, int arg)
                    {
                       if (ptr == NULL)
                         return ERR_NULL;
                       return real_function(ptr, arg);
                    }
                  
                  Just like that, we effectively moved the if statement into 37
                  callers, where the compiler may be smart enough to hoist it
                  out of a for loop when it sees that the pointer is never
                  changed in the loop body, or to eliminate it entirely when it
                  sees that the pointer cannot be null.
       
                  cakealert wrote 4 hours 22 min ago:
                  IMO you should assert it's not null. There should never be a
                  circumstance where you pass a null pointer to a function.
       
                    kazinator wrote 1 hour 21 min ago:
                    ISO C allows:
                    
                      free(NULL); // convenient no-op, does nothing
                    
                      fflush(NULL); // flush all streams; done implicitly on
                    normal exit
                    
                      time(NULL); // don't store time_t into a location, just
                    return it
                    
                      strtol(text, NULL, 10);  // not interested in pointer to
                    first garbage char
                    
                      setbuf(stream, NULL);  // allocate a buffer for stream
                    
                      realloc(NULL, size); // behave like malloc(size)
                    
                    and others. More examples in POSIX and other APIs:
                    
                      sigprocmask(SIG_UNBLOCK, these_sigs, NULL); // not
                    interested in previous signal mask
                    
                      CreateEventA(NULL, FALSE, FALSE, NULL); // no security
                    attributes, no name
                    
                    Reality just ran over your opinion, oops!
       
            worik wrote 1 day ago:
            > If the function is called from 37 places, you need to refactor
            your code,
            
            Really?
            
            I do not have to think hard before I have a counter exampl:
            authentication
            
            I call authenticate() is some form from every API
            
            All 37 of them
       
              kazinator wrote 22 hours 29 min ago:
              The strongest interpretation of the remark is not that you need
              to refactor because you have a function called 37 times (which is
              likely a good thing) but rather that if you think you need to
              move an if statement into or out of it, you face refactoring.
       
              bognition wrote 23 hours 54 min ago:
              If you are explicitly calling authenticate() for each api,
              you’re doing it “wrong”.    At that point you want implied
              authentication not explicit authentication. Why not move it to
              some middleware that gets called in every api call?
       
                kazinator wrote 19 hours 26 min ago:
                Because then you are calling middleware_caching_auth_broker()
                from 37 places instead of authenticate(). Just the name has
                changed, not the 37.
       
                  KPGv2 wrote 16 hours 55 min ago:
                  > Because then you are calling
                  middleware_caching_auth_broker() from 37 places
                  
                  No you aren't. You aren't really calling it from anywhere.
                  The framework you're using, which you aren't writing, is
                  calling the registered middleware.
                  
                  The topic here is complexity for the code structure because
                  it's called from 37 different places. A registered middleware
                  doesn't run into that issue because it doesn't get called
                  anywhere that "code structure complexity" matters.
                  
                  Your reasoning is isomorphic to "I'm calling a bit shift
                  millions of times because I have written some code in a
                  programming language." Technically true but not what we're
                  talking about here.
       
                    philwelch wrote 15 hours 44 min ago:
                    That sounds like the programming equivalent to thinking
                    that food just comes from the grocery store.
       
                      tsurba wrote 12 hours 19 min ago:
                      And going to a grocery store instead of 37 individual
                      farmers…?
       
                  all2 wrote 18 hours 25 min ago:
                  But that's ok because the calls are hidden from the
                  programmer.
                  
                  I'm not sure if my response is serious or tongue-in-cheek.
                  Maybe a bit of both.
       
        wiradikusuma wrote 1 day ago:
        I just wrote some code with this "dilemma" a minute ago. But I was
        worried the callers forget to include the "if" so I put it inside the
        method. Instead, I renamed the method from "doSomething" to
        "maybeDoSomething".
       
        sparkie wrote 1 day ago:
        In some cases you want to do the opposite - to utilize SIMD.
        
        With AVX-512 for example, trivial branching can be replaced with
        branchless code using the vector mask registers k0-k7, so an if inside
        a for is better than the for inside the if, which may have to iterate
        over a sequence of values twice.
        
        To give a basic example, consider a loop like:
        
            for (int i = 0; i < length ; i++) {
            if (values[i] % 2 == 1)
                values[i] += 1;
            else
                values[i] -= 2;
            }
        
        We can convert this to one which operates on 16 ints per loop
        iteration, with the loop body containing no branches, where each int is
        only read and written to memory once (assuming length % 16 == 0).
        
            __mmask16 consequents;
            __mmask16 alternatives;
            __mm512i results;
            __mm512i ones = _mm512_set1_epi32(1);
            __mm512i twos = _mm512_set1_epi32(2);
            for (int i = 0; i < length ; i += 16) {
            results = _mm512_load_epi32(&values[i]); 
            consequents = _mm512_cmpeq_epi32_mask(_mm512_mod_epi32(results,
        twos), ones);
            results = _mm512_mask_add_epi32(results, consequents, results,
        ones);
            alternatives = _knot_mask16(consequents);
            results = _mm512_mask_sub_epi32(results, alternatives, results,
        twos);
            _mm512_store_epi32(&values[i], results);
            }
        
        Ideally, the compiler will auto-vectorize the first example and produce
        something equivalent to the second in the compiled object.
       
          gameman144 wrote 1 day ago:
          I am not sure that the before case maps to the article's premise, and
          and I think your optimized SIMD version does line up with the
          recommendations of the article.
          
          For your example loop, the `if` statements are contingent on the
          data; they can't be pushed up as-is. If your algorithm were something
          like:
          
              if (length % 2 == 1) {
                values[i] += 1;
              } else {
                values[i] += 2;
              }
          
          then I think you'd agree that we should hoist that check out above
          the `for` statement.
          
          In your optimized SIMD version, you've removed the `if` altogether
          and are doing branchless computations. This seems very much like the
          platonic ideal of the article, and I'd expect they'd be a big fan!
       
            sparkie wrote 1 day ago:
            The point was more that, you shouldn't always try to remove the
            branch from a loop yourself, because often the compiler will do a
            better job.
            
            For a contrived example, we could attempt to be clever and remove
            the branching from the loop in the first example by subtracting two
            from every value, then add three only for the odds.
            
                for (int i = 0; i < length ; i++) {
                values[i] -= 2;
                values[i] += (values[i] % 2) * 3;
                }
            
            It achieves the same result (because subtracting two preserves
            odd/evenness, and nothing gets added for evens), and requires no
            in-loop branching, but it's likely going to perform no better or
            worse than what the compiler could've generated from the first
            example, and it may be more difficult to auto-vectorize because the
            logic has changed. It may perform better than an unoptimized
            branch-in-loop version though (depending on the cost of branching
            on the target).
            
            In regards to moving branches out of the loop that don't need to be
            there (like your check on the length) - the compiler will be able
            to do this almost all of the time for you - this kind of thing is
            standard optimization techniques that most compilers implement. If
            you are interpreting, the following OPs advice is certainly worth
            doing, but you should probably not worry if you're using a mature
            compiler, and instead aim to maximize clarity of code for people
            reading it, rather than trying to be clever like this.
       
          William_BB wrote 1 day ago:
          My first thought was conditional branches inside the for loop based
          on the element as well. By any chance, do you know how hard it is for
          compilers to auto-vectorize something like this? I am generally not
          sure where the boundary is.
       
            sparkie wrote 1 day ago:
            GCC can do better than the example I gave.
            
   URI      [1]: https://godbolt.org/z/fo74G7d3W
       
        neilv wrote 1 day ago:
        A compiler that can prove that the condition-within-loop is constant
        for the duration of the looping, can lift up that condition branching,
        and emit two loops.
        
        But I like to help the compiler with this kind of optimization, by just
        doing it in the code.  Let the compiler focus on optimizations that I
        can't.
       
        uptownfunk wrote 1 day ago:
        Wow this is great where can I find this type of advice that relates to
        how to structure your code essentially.
       
        slt2021 wrote 1 day ago:
        Ifs = control flow
        
        Fors = data flow / compute kernel
        
        it makes sense to keep control flow and data flow separated for greater
        efficiency, so that you independently evolve either of flows while
        still maintaining consistent logic
       
        salamanderman wrote 1 day ago:
        Moving preconditions up depends what the definition of precondition is.
        There's some open source code I've done a deep dive in (Open cascade)
        and at some point they had an algorithm that assumed the precondition
        that the input was sorted, and that precondition was pushed up. Later
        they swapped out the algorithm for one that performs significantly
        better on randomized input and can perform very poorly on certain
        sorted input. Since the precondition was pushed up, though, it seems
        they didn't know how the input was transformed between the initial
        entrance function and the final inner function.
        Edit - if the precondition is something that can be translated into a
        Type then absolutely move the precondition up and let the compiler can
        enforce.
       
          deredede wrote 1 day ago:
          "Moving preconditions up" means moving the code that checks the
          precondition up. The precondition still needs to be documented (in
          the type system is ideal, with an assertion otherwise, in a comment
          if necessary) close to where it's assumed.
       
        bluejekyll wrote 1 day ago:
        I really like this advice, but aren’t these two examples the same,
        but yet different advice?
        
        // Good?
        for walrus in walruses {
            walrus.frobnicate()
        }
        
        Is essentially equivalent to
        
        // BAD
        for walrus in walruses {
          frobnicate(walrus)
        }
        
        And this is good,
        
        // GOOD
        frobnicate_batch(walruses)
        
        So should the first one really be something more like
        
        // impl FrobicateAll for &[Walrus]
        walruses.frobicate_all()
       
        rco8786 wrote 1 day ago:
        I'm not sure I buy the idea that this is a "good" rule to follow.
        Sometimes maybe? But it's so contextually dependent that I have a hard
        time drawing any conclusions about it.
        
        Feels a lot like "i before e except after c" where there's so many
        exceptions to the rule that it may as well not exist.
       
        boltzmann_ wrote 1 day ago:
        You notice this quickly after working on codebases efficiency is
        important. Filter pushdown is one of the first database optimizations
       
        carom wrote 1 day ago:
        I strongly disagree with this ifs take. I want to validate data where
        it is used. I do not trust the caller (myself) to go read some comment
        about the assumptions on input data a function expects. I also don't
        want to duplicate that check in every caller.
       
          Zambyte wrote 1 day ago:
          One option is to use asserts that are only included in debug builds.
          That way any incorrect call of the function will crash the program in
          debug builds, but will have the performance benefits of the lifted
          conditional checks in release builds.
          
          You'll end up duplicating the condition, but that seems like a
          reasonable price to pay for correct and performant software.
       
          azaslavsky wrote 1 day ago:
          Couldn't you just take the advice in [0] of parsing into types rather
          than validating? Then you get the best of both worlds: your inputs
          are necessarily checked every time the function is called (they would
          have to be to create the type in the first place), but you don't need
          to validate them at every nested layer. You also get the benefit of
          more descriptive function signatures to describe your interfaces.
          
          [0]
          
   URI    [1]: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
       
          zellyn wrote 1 day ago:
          At least in the first example, the optionality is directly encoded in
          the types, so no assumptions have been lost.
       
        jasonjmcghee wrote 1 day ago:
        My take on the if-statements example wasn't actually so much about if
        statements.
        
        And this was obfuscated by author's use of global variables everywhere.
        
        The key change was reducing functions' dependencies on outer
        parameters. Which is great.
       
        zahlman wrote 1 day ago:
        > If you have complex control flow, better to fit it on a screen in a
        single function, rather than spread throughout the file.
        
        This part in particular seems like an aesthetic judgment, and I
        disagree. I find it more natural to follow a flowchart than to stare at
        one.
        
        > A related pattern here is what I call “dissolving enum”
        refactor.... There are two branching instructions here and, by pulling
        them up, it becomes apparent that it is the exact same condition,
        triplicated (the third time reified as a data structure):
        
        The problem here isn't the code organization, but the premature
        abstraction. When you write the enum it should be because "reifying the
        condition as a data structure" is an intentional, purposeful act.
        Something that empowers you to, for example, evaluate the condition now
        and defer the response to the next event tick in a GUI.
        
        > The primary benefit here is performance. Plenty of performance, in
        extreme cases.
        
        Only if so many other things go right. Last I checked, simply wanting
        walruses to behave polymorphically already ruins your day, even if
        you've chosen a sufficiently low-level programming language.
        
        A lot of the time, the "bad" code is the implementation of the function
        called in the "good" code. That makes said function easier to
        understand, by properly separating responsibilities (defining
        frobnication and iterating over walruses). Abstracting the inner loop
        to a function also makes it sane to express the iteration as a list
        comprehension without people complaining about how you have these
        nested list comprehensions spread over multiple lines, and why can't
        you just code imperatively like the normal programmers, etc.
        
        > The two pieces of advice about fors and ifs even compose!
        
        1. The abstraction needed to make the example comprehensible already
        ruins the illusion of `frobnicate_batch`.
        
        2. If you're working in an environment where this can get you a
        meaningful performance benefit and `condition` is indeed a loop
        invariant (such that the transformation is correct), you are surely
        working in an environment where the compiler can just hoist that loop
        invariant.
        
        3. The "good" version becomes longer and noisier because we must repeat
        the loop syntax.
        
        > jQuery was quite successful back in the day, and it operates on
        collections of elements.
        
        That's because of how it allowed you to create those collections (and
        provided iterators for them). It abstracted away the complex logic of
        iterating over the entire DOM tree to select nodes, so that you could
        focus on iterating linearly over the selected nodes. And that design
        implicitly, conceptually separated those steps. Even if it didn't
        actually build a separate container of the selected nodes, you could
        reason about what you were doing as if it did.
       
        jonstewart wrote 1 day ago:
        One reason to move conditionals out of loops is that it makes it easier
        for the compiler to vectorize and otherwise optimize the loop.
        
        With conditionals, it's also useful to express them as ternary
        assignment when possible. This makes it more likely the optimizer will
        generate a conditional move instead of a branch. When the condition is
        not sufficiently predictable, a conditional move is far faster due to
        branch misprediction. Sometimes it's not always faster in the moment,
        but it can still alleviate pressure on the branch prediction cache.
       
        janosch_123 wrote 1 day ago:
        If's to the top as guard statements.
        
        Add asserts to the end of the function too.
        
        Loop's can live in the middle, take as much I/O and compute out of the
        loop as you can :)
       
        gnabgib wrote 1 day ago:
        (2023) Discussion at the time (662 points, 295 comments)
        
   URI  [1]: https://news.ycombinator.com/item?id=38282950
       
        xg15 wrote 1 day ago:
        Doesn't the second rule already imply some counterexamples for the
        first?
        
        When I work with batches of data, I often end up with functions like
        this:
        
          function process_batch(batch) {
            stuff = setUpNeededHelpers(batch);
            results = [];
            for (item in batch) {
              result = process_item(item, stuff);
              results.add(result);
            }
            return results;
          }
        
        Where "stuff" might be various objects, such as counters, lists or
        dictionaries to track aggregated state, opened IO connections, etc etc.
        
        So the setUpNeededHelpers() section, while not extremely expensive, can
        have nontrivial cost.
        
        I usually add a clause like
        
          if (batch.length == 0) {
            return [];
          }
        
        at the beginning of the function to avoid this initialization cost if
        the batch is empty anyway.
        
        Also, sometimes the initialization requires to access one element from
        the batch, e.g. to read metadata. Therefore the check also ensures
        there is at least one element available.
        
        Wouldn't this violate the rule?
       
          Jtsummers wrote 1 day ago:
          > Wouldn't this violate the rule?
          
          The article is offering a heuristic, not a hard rule (rule of thumb =
          heuristic, not dogma). It can't be applied universally without
          considering your circumstances.
          
          Following his advice to the letter (and ignoring his hedging where he
          says "consider if"), you'd move the `if (batch.length == 0)` into the
          callers of `setUpNeededHelpers`. But now you have to make every
          caller aware that calling the function could be expensive even if
          there are no contents in `batch` so they have to include the guard,
          which means you have this scattered throughout your code:
          
            if (batch.length == 0) { return default }
            setup(batch)
          
          Now it's a pair of things that always go together, which makes more
          sense to put into one function so you'd probably push it back down.
          
          The advice really is contingent on the surrounding context
          (non-exhaustive):
          
          1. Is the function with the condition called in only one place?
          Consider moving it up.
          
          2. Is the function with the condition called in many places and the
          condition can't be removed (it's not known to be called safely)?
          Leave the condition in the function.
          
          3. Is the function with the condition called only in places where the
          guard is redundant? In your example, `batch.length == 0` can be
          checked in `process_batch`. If all calls to `setup` are in similar
          functions, you can remove the condition from `setup` and move it up.
          
          4. If it's causing performance concerns (measured), and in many but
          not all cases the check is unneeded then remove the guard from
          `setup` and add it back to only those call-sites where it cannot be
          safely removed. If this doesn't get you any performance improvements,
          you probably want to move it back down for legibility.
          
          Basically, apply your judgment. But if you can, it's probably (but
          not always) a good idea to move the ifs up.
       
        renewiltord wrote 1 day ago:
        Yep, as a general heuristic pretty good. It avoids problems like n+1
        queries and not using SIMD. And the if thing often makes it easier to
        reason about code. There are exceptions but I have had this same rule
        and it’s served me well.
       
        nfw2 wrote 1 day ago:
        The performance gap of running a for loop inside or outside a function
        call is negligible in most real usage.
        
        The premise that you can define best patterns like this, removed from
        context with toy words like frobnicate, is flawed. You should abstract
        your code in such a way that the operations contained are clearly
        intuited by the names and parameters of the abstraction boundaries.
        Managing cognitive load >>> nickle and dime-ing performance in most
        cases.
       
        Waterluvian wrote 1 day ago:
        My weird mental model: You have a tree of possible states/program flow.
        Conditions prune the tree. Prune the tree as early as possible so that
        you have to do work on fewer branches.
        
        Don’t meticulously evaluate and potentially prune every single
        branch, only to find you have to prune the whole limb anyways.
        
        Or even weirder: conditionals are about figuring out what work
        doesn’t need to be done. Loops are the “work.”
        
        Ultimately I want my functions to be about one thing: walking the
        program tree or doing work.
       
          nagaiaida wrote 15 hours 39 min ago:
          it's not that weird, this taken to its logical conclusion is
          effectively prolog's execution model
       
          igregoryca wrote 17 hours 4 min ago:
          This aligns nicely with how things look in the "small-step" flavour
          of PL theory / lambda calculus.
          
          In the lingo, expressions are evaluated by repeatedly getting
          "rewritten", according to rules called reduction rules. e.g., (1 + 2)
          + 4 might get rewritten to 3 + 4, which would then get rewritten to
          7.
          
          There are two sorts of these rules. There are "congruence" rules,
          which direct where work is to be done ("which subexpression to
          evaluate next?"); and then there are "computation" rules (as Pierce
          [1] calls them), which actually rewrite the expression, and thus
          change the program state.
          
          "Strict"/"non-lazy" languages (virtually every popular
          general-purpose language? except Haskell) are full of congruence
          rules – all subexpressions must be fully evaluated before a parent
          expression can be evaluated. The important exceptions are special
          constructs like conditionals and indefinite loops.
          
          For conditionals in particular, a computation rule will kick in
          before congruence rules direct all subexpressions to be evaluated.
          This prunes the expression tree, now in a very literal sense.
          
          [1]: Benjamin C. Pierce, Types and Programming Languages
          (recommended!)
       
          BoorishBears wrote 1 day ago:
          My mental model: align with the world the very specific code I'm
          writing lives in. From domain specifics, to existing patterns in the
          codebase, to the stage in the data pipeline I'm at, performance
          profile, etc.
          
          I used to try and form these kinds of rules and heuristics for code
          constructs, but eventually accepted they're at the wrong level of
          abstraction to be worth keeping around once you write enough code.
          
          It's telling they tend to resort to made up function names or single
          letters because at that point you're setting up a bit of a punching
          bag with an "island of code" where nothing exists outside of it, and
          almost any rule can make sense.
          
          -
          
          Perfect example is the "redundancies and dead conditions" mentioned:
          we're making the really convenient assumption that `g` is the only
          caller of `h` and will forever be the only caller of `h` in order to
          claim we exposed a dead branch using this rule...
          
          That works on the island, but in an actual codebase there's typically
          a reason why `g` and `h` weren't collapsed into each other to start.
       
            jonahx wrote 17 hours 3 min ago:
            I feel this kind of critique, which I see often as a response to
            articles like this, is so easy as to be meaningless.  How is one
            supposed to ever talk about general principles without using
            simplified examples?
            
            Aren't you just saying "Real code is more complicated than your toy
            example"?
            
            Well sure, trivially so.  But that's by design.
            
            > Perfect example is the "redundancies and dead conditions"
            mentioned: we're making the really convenient assumption that `g`
            is the only caller of `h` and will forever be the only caller of
            `h` in order to claim we exposed a dead branch using this rule...
            
            Not really.  He's just saying that when you push conditional logic
            "up" into one place, it's often more readable and sometimes you
            might notice things you otherwise wouldn't.  And then he created
            the simplest possible example (but that's a good thing!) to
            demonstrate how that might work.  It's not a claim that it always
            will work that way or that real code won't be more complicated.
       
          0xWTF wrote 1 day ago:
          Can I float an adjacent model? Classes are nouns, functions are
          verbs.
       
            kjkjadksj wrote 17 hours 44 min ago:
            Working with python for a while and I still don’t bother with
            classes. Only when I “borrow” other code do I mess with them.
            It just seems like a flowery way to organize functions. I prefer to
            just write the functions. Maybe its because my first languages
            lacked classes that I don’t much like to reach for them.
            
            I don’t even like loops and prefer to functionalize them and run
            in parallel if sensible.
            
            I know this makes me a bit of a python heathen but my code runs
            fast as a result.
       
            acbart wrote 23 hours 32 min ago:
            And then at some point someone shows you how Classes can be verbs,
            and functions can be nouns, and your brain hurts for a while. You
            overuse that paradigm for a while, and eventually learn to find the
            appropriate balance of ideas.
       
              2muchcoffeeman wrote 11 hours 22 min ago:
              Writing code is like writing though. None of these ideas for
              structuring code are the be all and end all of coding. Things
              evolve, sometimes old idea are good, sometimes new.
              
              Like how the phrase “to boldly go where no man has gone
              before” will bring out pendants.
       
                AStonesThrow wrote 11 hours 13 min ago:
                I don't believe that anyone wears pendants much on that show,
                unless you mean the communicators people wear in TNG. I did
                have a Romulan keychain once, though.
       
              kiviuq wrote 16 hours 45 min ago:
              Example: Object Algebra pattern represents data types ("nouns")
              as functions.
       
              nailer wrote 22 hours 29 min ago:
              Haven’t seen that yet after 25 years. It just always seems like
              lazy naming when this isn’t followed. Maybe I missed something.
       
                cdaringe wrote 5 hours 5 min ago:
                Agreed. Compelling receipts required.
       
                angra_mainyu wrote 22 hours 6 min ago:
                I have to agree, particularly if you look at functions as
                pipelines: data/events go in, other data/events go out.
                
                If I had to hazard some kind of heuristic with 99%
                applicability, it'd be to always strive to have code with as
                few indentations (branches) as possible. If your code is
                getting too indented, those deep Vs are either a sign that your
                implementation has a strong mismatch with the underlying
                problem or you need to break things up into smaller functions.
       
            BobbyJo wrote 1 day ago:
            I like to think of it completely differently: Functions are where
            you hide things, Classes are where you expose things.
            
            Functions to me are more about scoping things down than about
            performing logic. The whole program is about performing logic.
       
            Waterluvian wrote 1 day ago:
            Didn’t the Apollo guidance computers work with VERB and NOUN?
       
            pshc wrote 1 day ago:
            
            
   URI      [1]: http://steve-yegge.blogspot.com/2006/03/execution-in-kingd...
       
            slipnslider wrote 1 day ago:
            I remember being taught that in CS101 and still use it today 15
            years later. It's a good and simple and easy to follow pattern
       
          Brian_K_White wrote 1 day ago:
          perfectly good models
       
        hello_computer wrote 1 day ago:
        This thread is a microcosm of Babel.
       
        throwawaymaths wrote 1 day ago:
        i would agree with the push ifs up except if youre doing options
        parsing.  having a clean line of flow that effectively updates a struct
        with a bunch of "maybe" functions is much better if youre consistent
        with it.
        
        anywhere else, push ifs up.
       
        shawnz wrote 1 day ago:
        Sometimes I like to put the conditional logic in the callee because it
        prevents the caller from doing things in the wrong order by accident.
        
        Like for example, if you want to make an idempotent operation, you
        might first check if the thing has been done already and if not, then
        do it.
        
        If you push that conditional out to the caller, now every caller of
        your function has to individually make sure they call it in the right
        way to get a guarantee of idempotency and you can't abstract that
        guarantee for them. How do you deal with that kind of thing when
        applying this philosophy?
        
        Another example might be if you want to execute a sequence of checks
        before doing an operation within a database transaction. How do you
        apply this philosophy while keeping the checks within the transaction
        boundary?
       
          bee_rider wrote 1 day ago:
          Maybe write the functions without the checks, then have wrapper
          functions that just do the checks and then call the internal
          function?
       
            astrobe_ wrote 1 day ago:
            It sounds like self-inflicted boilerplate to me.
       
              bee_rider wrote 22 hours 21 min ago:
              If you were going to write the tests anyway, the additional
              boilerplate for splitting it up and doing a wrapper isn’t so
              bad (in C at least, maybe it is worse for some language).
       
                astrobe_ wrote 11 hours 0 min ago:
                When you say "isn't so bad", is it just a manner of speech or
                is it actually a little bad (but it is a compromise?)?
       
                  bee_rider wrote 5 hours 58 min ago:
                  Well, I was working on a sort of green-field project that did
                  this, and I liked it. It neatly solved the problem of needing
                  the tests, but only wanting to call them on user-provided
                  inputs. However, some caveats:
                  
                  * I wasn’t around long enough to see if there was a hidden
                  maintenance cost
                  
                  * It was a very thoughtfully designed library in an
                  already-well-understood domain so it wasn’t like we were
                  going to need to change the arguments a ton
                  
                  * It was explicitly a library designed to be used as a
                  library from the get-go, so there was a clear distinction of
                  which functions should be user-visible.
                  
                  I think I would find it annoying if I was doing exploratory
                  programming and expected to change the arguments often. But,
                  in that case, maybe it is too early to start checking user
                  inputs anyway.
       
            shawnz wrote 1 day ago:
            Is that really achieving OP's goal though, if you're only raising
            it by creating a new intermediary level to contain the conditional?
            The conditional is still the same distance from the root of the
            code, so that seems like it's not in the spirit of what they are
            saying. Plus you're just introducing the possibility for confusion
            if people call the unwrapped function when they intended to call
            the wrapped function
       
              Brian_K_White wrote 1 day ago:
              But the checking and the writing really are 2 different things.
              The "rule" that you always want to do this check before write is
              really never absolute. Wrapper is exactly correct. You could have
              the single function and add a param that says skip the check this
              time, but that is messier and even more dangerous than the
              seperate wrapper.
              
              Depends just how many things are checked by the check I guess. A
              single aspect, checking whether the resource is already claimed
              or is available, could be combined since it could be part of the
              very access mechanism itself where anything else is a race
              condition.
       
          avianlyric wrote 1 day ago:
          You’ve kind of answered your own question here.
          
          > If you push that conditional out to the caller, now every caller of
          your function has to individually make sure they call it in the right
          way to get a guarantee of idempotency
          
          In this situation your function is no longer idempotent, so you
          obviously can’t provide the guarantee. But quite frankly, if
          you’re having to resort to making individual functions implement
          state management to provide idempotency, then I suspect you’re
          doing something very odd, and have way too much logic happening
          inside a single function.
          
          Idempotent code tends to fall into two distinct camps:
          
          1. Code that’s inherently idempotent because the data model and
          operations being performed are inherently idempotent. I.e. your
          either performing stateless operations, or you’re performing
          “PUT” style operations where in the input data contains all the
          state the needs to be written.
          
          2. Code that’s performing more complex business operations where
          you’re creating an idempotent abstraction by either performing
          rollbacks, or providing some kind of atomic apply abstraction that
          ensures partial failures don’t result in corrupted state.
          
          For point 1, you shouldn’t be checking for order of operations,
          because it doesn’t matter. Everything is inherently idempotent,
          just perform the operations again.
          
          For point 2, there is no simple abstraction you can apply. You need
          to have something record the desired operation, then ensure it either
          completes or fails. And once that happens, ensures that completion or
          failure is persistent permanently. But that kind of logic is not the
          kind of thing you put into a function and compose with other
          operations.
       
            jknoepfler wrote 1 day ago:
            Probably implicit in your #2, but there are two types of people in
            the world: people who know why you shouldn't try to write a
            production-grade database from scratch, and people who don't know
            why you shouldn't try to write a production-grade database from
            scratch. Neither group should try to write a production-grade
            database from scratch.
       
            shawnz wrote 1 day ago:
            Consider a simple example where you're checking if a file exists,
            or a database object exists, and creating it if not. Imagine your
            filesystem or database library either doesn't have an upsert
            function to do this for you, or else you can't use it because you
            want some special behaviour for new records (like writing the
            current timestamp or a running total, or adding an entry to a log
            file, or something). I think this is a simple, common example where
            you would want to combine a conditional with an action. I don't
            think it's very "odd" or indicative of "way too much logic".
       
              avianlyric wrote 1 day ago:
              > a database object exists, and creating it if not. Imagine your
              filesystem or database library either doesn't have an upsert
              function to do this for you, or else you can't use it because you
              want some special behaviour for new records (like writing the
              current timestamp or a running total, or adding an entry to a log
              file, or something).
              
              This is why databases have transactions.
              
              > simple example where you're checking if a file exists
              
              Personally I avoid interacting directly with the filesystem like
              the plague due to issues exactly like this. Working with a
              filesystem correctly is way harder than people think it is, and
              handling all the edge-cases is unbelievably difficult. If I'm
              building a production system where correctness is important, then
              I use abstractions like databases to make sure I don't have to
              deal with filesystem nuances myself.
       
                shawnz wrote 22 hours 44 min ago:
                Sure, I agree that a transaction should be used here (in the
                database example at least). But that's orthogonal to my point,
                or maybe even in favour of it: doesn't a transaction
                necessitate keeping the conditional close to the effect? It's a
                perfect example of what I'm trying to say, how do you make sure
                the conditional happens in the same transaction as the effect,
                while simultaneously trying to push the conditional towards the
                root of the code and away from the effect? Transaction
                boundaries are exactly the kind of thing that makes pushing up
                the conditionals difficult.
       
                  avianlyric wrote 8 hours 55 min ago:
                  By pushing up the transaction boundary. The only reason why
                  the conditional is important is because it part of a larger
                  sequence of operations that you want to complete in an atomic
                  fashion.
                  
                  Your transaction needs to encompass all of those operations,
                  not just parts of it.
       
        quantadev wrote 1 day ago:
        There's always a trade-off between performance v.s. clearness in the
        code.
        
        If a certain function has many preconditions it needs to check, before
        running, but needs to potentially run from various places in the code,
        then moving the precondition checks outside the method results in
        faster code but destroys readability and breaks DRY principle.
        
        In cases where this kind of tension (DRY v.s. non-DRY) exists I've
        sometimes named methods like 'maybeDoThing' (emphasis on 'maybe'
        prefix) indicating I'm calling the method, but that all the
        precondition checks are inside the function itself rather than
        duplicate logic all across the code, everywhere the method 'maybe'
        needs to run.
       
        layer8 wrote 1 day ago:
        The example listed as “dissolving enum refactor” is essentially
        polymorphism, i.e. you could replace the match by a polymorphic method
        invocation on the enum. Its purpose is to decouple the point where a
        case distinction is established (the initial if) from the point where
        it is acted upon (the invocation of foo/bar). The case distinction is
        carried by the object (enum value in this case) or closure and need not
        to be reiterated at the point of invocation (if the match were replaced
        by polymorphic dispatch). That means that if the case distinction
        changes, only the point where it is established needs to be changed,
        not the points where the distinct actions based on it are triggered.
        
        This is a trade-off: It can be beneficial to see the individual cases
        to be considered at the points where the actions are triggered, at the
        cost of having an additional code-level dependency on the list of
        individual cases.
       
        manmal wrote 1 day ago:
        It’s a bit niche for HN, but SwiftUI rendering works way better when
        following this. In a ForEach, you really shouldn’t have any
        branching, or you‘ll pay quite catastrophic performance penalties. I
        found out the hard way when rendering a massive chart with Swift
        Charts. All branching must be pushed upwards.
       
          ComputerGuru wrote 1 day ago:
          Why? Does it interpret the code?
       
            manmal wrote 1 day ago:
            Kind of, it’s a declarative framework like React & co. Under the
            hood it maps to either UIKit components or GPU (Metal) rendering.
            And view identity is very important for change detection. AFAICT,
            putting a branch in a ForEach invalidates all elements in that
            ForEach whenever one branch changes, because its whole identity
            changes.
       
              jollygoodshow wrote 11 hours 35 min ago:
              So say I have an set of elements to render (A,B,C) but they can
              can come in any order or number (C,B,A,B). If I want to render in
              the given order, how would I approach this for the best
              performance implementation?
       
        neRok wrote 1 day ago:
        I agree that the first example in the article is "bad"...
        
          fn frobnicate(walrus: Option)`)
        but the rest makes no sense to me!
        
          // GOOD
          frobnicate_batch(walruses)
          // BAD
          for walrus in walruses {
            frobnicate(walrus)
          }
        
        It doesn't follow through with the "GOOD" example though...
        
          fn frobnicate_batch(walruses)
            for walrus in walruses { frobnicate(walrus) }
          }
        
        What did that achieve?
        
        And the next example...
        
          // GOOD
          if condition {
            for walrus in walruses { walrus.frobnicate() }
          } else {
            for walrus in walruses { walrus.transmogrify() }
          }
          // BAD
          for walrus in walruses {
            if condition { walrus.frobnicate() }
            else { walrus.transmogrify() }
          }
        
        What good is that when...
        
          walruses = get_5_closest_walruses()
          // "GOOD"
            if walruses.has_hungry() { feed_them_all() }
            else { dont_feed_any() }
          // "BAD"
            for walrus in walruses {
               if walrus.is_hungry() { feed() }
               else { dont_feed() }
       
          magicalhippo wrote 1 day ago:
          > What did that achieve?
          
          An interface where the implementation can later be changed to do
          something more clever.
          
          At work we have a lot of legacy code written the BAD way, ie the
          caller loops, which means we have to change dozens of call sites if
          we want to improve performance, rather than just one implementation.
          
          This makes it significantly more difficult than it could have been.
       
            lblume wrote 1 day ago:
            Two counterpoints.
            
            Firstly, in many cases the function needs to serve both purposes
            — called on a single item or called on a sequence of such. A
            function that always loops would have to be called on some unitary
            sequence or iterator which is both unergonomic and might have
            performance implications.
            
            Second, the caller might have more information than the callee on
            how to optimize the loop. Consider a function that might be
            computationally expensive for some inputs while negligible for
            others — the caller, knowing this information, could choose to
            parallelize the former inputs while vectorizing etc. the latter
            (via use of inlining, etc.). This would be very hard or at least
            complicate things when the callee's responsibility.
       
          jerf wrote 1 day ago:
          I think the "push for loops down" is missing a bit of detail about
          the why. The author alludes to "superior performance" but I don't
          think makes it clear how that can happen.
          
          Vectorization is a bit obscure and a lot of coders aren't worried
          about whether their code vectorizes, but there's a much more common
          example that I have seen shred the performance of a lot of real-world
          code bases and HTTP APIs, which is functions (including APIs) that
          take only a single thing when they should take the full list.
          
          Suppose we have posts in a database, like for a forum or something.
          Consider the difference between:
          
              posts = {}
              for id in postIDs:
              post[id] = fetchPost(id)
          
          versus
          
              posts = fetchPosts(postIDs)
          
          fetchPost and fetchPosts both involve hitting the database. The
          singular version means that the resulting SQL will, by necessity,
          only have the one ID in it, and as a result, a full query will be
          made per post. This is a problem because it's pretty likely here that
          fetching a post is a very fast (indexed) operation, so the per-query
          overhead is going to hit you hard.
          
          The plural "fetchPosts", on the other hand, has all the information
          necessary to query the DB in one shot for all the posts, which is
          going to be much faster. An architecture based on fetching one post
          at a time is intrinsically less performant in this case.
          
          This opens up even more in the HTTP API world, where a single query
          is generally of even higher overhead than a DB query. I think the
          most frequent mistake I see in HTTP API design (at least, ignoring
          quibbling about which method and error code scheme to use) is
          providing APIs that operate on one thing at a time when the problem
          domain naturally lends itself to operating on arrays (or
          map/objects/dicts) at a time. It's probably a non-trivial part of the
          reason why so many web sites and apps are so much slower than they
          need to be.
          
          I find it is often easy to surprise other devs with how fast your
          system works. This is one of my "secrets" (please steal it!); you
          make sure you avoid as many "per-thing" penalties as possible by
          keeping sets of things together as long as possible. The "per-thing"
          penalties can really sneak up on you. Like nested for loops, they can
          easily start stacking up on you if you're not careful, as the
          inability to fetch all the posts at once further cascades in to you
          then, say, fetching user avatars one-by-one in some other loop, and
          then a series of other individual queries. Best part is, profiling
          may make it look like the problem is the DB because "the DB is taking
          a long time to serve this" because profiles are not always that good
          at turning up that your problem is per-item overhead rather than the
          amount of real work being done.
       
            mnahkies wrote 1 day ago:
            The worst / most amusing example of this I've seen in the wild was
            a third party line of business application that was sequentially
            triaging "pending tasks" to assign priority/to workers.
            
            Our cloud provider had an aircon/overheating incident in the region
            we were using, and after it was resolved network latency between
            the database and application increased by a few milliseconds. Turns
            out if you multiply that by a few million/fast arrival rate you get
            a significant amount of time, and the pending tasks queue backs up
            causing the high priority tasks to be delayed.
            
            Based on the traces we had it looked like a classic case of "ORM
            made it easy to do it this way, and it works fine until it doesn't"
            but was unfortunately out of our control being a third party
            product.
            
            If they'd fetched/processed batches of tasks from the database
            instead I'm confident it wouldn't have been an issue.
       
        hamandcheese wrote 1 day ago:
        Isn't this just inversion of control? AKA the I in SOLID?
       
        andyg_blog wrote 1 day ago:
        A more general rule is to push ifs close to the source of input: [1]
        It's really about finding the entry points into your program from the
        outside (including data you fetch from another service), and then
        massaging in such a way that you make as many guarantees as possible
        (preferably encoded into your types) by the time it reaches any core
        logic, especially the resource heavy parts.
        
   URI  [1]: https://gieseanw.wordpress.com/2024/06/24/dont-push-ifs-up-put...
       
          dataflow wrote 1 day ago:
          Doesn't this obfuscate what assumptions you can make when trying to
          understand the core logic? You prefer to examine all the call chains
          everywhere?
       
            geysersam wrote 21 hours 4 min ago:
            No I don't think so because if you make your assumptions early then
            the same assumptions exist in the entire program and that makes
            them easy to reason about
       
            furyofantares wrote 1 day ago:
            The idea and examples are that the type system takes care of it.
            The rule of thumb is worded overly generally, it's more just about
            stuff like null checks if you have non-nullable types available.
       
            fmbb wrote 1 day ago:
            The ”core logic” of a program is what output it yields for a 
            given input.
            
            If you find a bug, you find it because you discover that a given
            input does not lead to the expected output.
            
            You have to find all those ifs in your code because one of them is
            wrong (probably in combination with a couple of others).
            
            If you push all your conditionals up as close to the input as
            possible, your hunt will be shorter, and fixing will be easier.
       
            setr wrote 1 day ago:
            If you’ve massaged and normalized the data at entry, then the
            assumptions at core logic should be well defined — it’s
            whatever the rules of the normalized output are.
            
            You don’t need to know all of the call chains because you’ve
            established a “narrow waist” where ideally all things have been
            made clear, and errors have been handled or scoped. So you only
            need to know the call chain from entry point to narrow waist, and
            separately narrow waist till end.
       
            avianlyric wrote 1 day ago:
            This is why we invented type systems. No need to examine call
            chains, just examine input types. The types will not only tell you
            what assumptions you can make, but the compiler will even tell you
            if you make an invalid assumption!
       
              dataflow wrote 1 day ago:
              You can't shove every single assumption into the type system...
       
                sn9 wrote 1 day ago:
                You can at least shove them into the constructors.
       
                knome wrote 1 day ago:
                You can and should put as many as you can there [1] If instead
                of validating that someone has sent you a phone number in one
                spot and then passing along a string, you can as easily have a
                function construct an UncheckedPhoneNumber. You can choose to
                only construct VerifiedPhoneNumbers if the user has gone
                through a code check. Both would allow you to pluck a
                PhoneNumber out of them for where you need to have generic
                calling code.
                
                You can use this sort of pattern to encode anything into the
                type system. Takes a little more upfront typing than all of
                those being strings, but your program will be sure of what it
                actually has at every point. It's pretty nice.
                
   URI          [1]: https://lexi-lambda.github.io/blog/2019/11/05/parse-do...
       
                  alfiedotwtf wrote 13 hours 34 min ago:
                  Yep! I have seen so much pushed into a type system that in
                  the end there was hardly any code needed to do validation or
                  scaffolding… to the point where it felt magical
       
                treyd wrote 1 day ago:
                You can express a lot of concepts just through types in
                languages with richer type systems.
       
                  shiandow wrote 11 hours 42 min ago:
                  Even without a rich type system you can express a lot of
                  things just through naming.
                  
                  You just can't enforce those assumptions.
       
                    eyelidlessness wrote 10 min ago:
                    You can enforce them (statically) by other means if
                    you’re determined enough, eg by using lint rules which
                    enforce type-like semantics which the type system itself
                    doesn’t express.
       
                layer8 wrote 1 day ago:
                True, but there are still documented interface contracts you
                can program against. The compiler won’t catch violations of
                the non-type parts, but the requirements are still well-defined
                with a proper interface contract. It is a trade-off, but so is
                repeating the same case distinction in multiple parts of the
                program, or having to pass around the context needed to make
                the case distinction.
       
        Kuyawa wrote 1 day ago:
        Push everything down for better code readability
        
          printInvoice(invoice, options) // is much better than
        
          if(printerReady){
            if(printerHasInk){
              if(printerHasPaper){
            if(invoiceFormatIsPortrait){
          :
        
        The same can be said of loops
        
          printInvoices(invoices) // much better than
        
          for(invoice of invoices){
            printInvoice(invoice)
          }
        
        At the end, while code readability is extremely important,
        encapsulation is much more important, so mix both accordingly.
       
          coolcase wrote 11 hours 34 min ago:
          Everyone has different opinions as this might be the printer driver
          on the PC or the printers internal circuit. The printer itself
          absolutely shouldn't try to spook the wheels when there is no paper.
          Id stick that check in the function!
       
          theonething wrote 1 day ago:
          > printInvoice(invoice, options) // is much better than
          > ...
          
          in Elixirland, we'd name that function maybe_print_invoice which I
          like much better.
       
          inetknght wrote 1 day ago:
          > Push everything down for better code readability
          
          > demonstrates arrow anti-pattern
          
          Ewwww gross. No. Do this instead:
          
          if(!printerReady){
            return;
          }
          if(!printerHasInk){
            return;
          }
          if(!printerHasPaper){
            return;
          }
          if(!invoiceFormatIsPortrait){
            return;
          }
          
          Way more readable than an expanding arrow.
          
          > printInvoices(invoices) // much better than
          
          But yes, put the loop into its own function with all of the other
          assumptions already taken care of? This is good.
       
          lblume wrote 1 day ago:
          > printInvoice(invoice, options)
          
          The function printInvoice should print an invoice. What happens if an
          invoice cannot be printed due to one of the named conditionals being
          false? You might throw an exception, or return a sentinel or error
          type. What do to in that case is not immediately clear.
          
          Especially in languages where exceptions are somewhat frowned upon
          for general purpose code flow, and monadic errors are not common (say
          Java or C++), it might be a better option to structure the code
          similar to the second style. (Except for the portrait format of
          course, which should be handled by the invoice printer unless it
          represents some error.)
          
          > while code readability is extremely important, encapsulation is
          much more important
          
          Encapsulation seems to primarily be a tool for long-term code
          readability, the ability to refactor and change code locally, and to
          reason about global behavior by only concerning oneself with local
          objects. To compare the two metrics and consider one more important
          appears to me as a form of category error.
       
        imcritic wrote 1 day ago:
        This article doesn't explain the benefits of the suggested approach
        well enough.
        
        And the last example looks like a poor advice and contradicts previous
        advice: there's rarely a global condition that is enough to check once
        at the top: the condition usually is inside the walrus. And why do for
        walrus in pack {walrus.throbnicate()} instead of making throbnicate a
        function accepting the whole pack?
       
        Mystery-Machine wrote 1 day ago:
        Terrible advice. It's the exact opposite of "Tell, don't ask".
        
        Performance of an if-statement and for-loop are negligent. That's not
        the bottleneck of your app. If you're building something that needs to
        be highly performant, sure. But that's not the majority.
        
   URI  [1]: https://martinfowler.com/bliki/TellDontAsk.html
       
          hello_computer wrote 1 day ago:
          He’s giving object oriented design the medal for cache locality’s
          victory.
       
          eapriv wrote 1 day ago:
          Performance of any given CPU instruction is negligible, yet somehow
          they accumulate to noticeable values.
       
            drob518 wrote 1 day ago:
            Amen.
       
          achernik wrote 1 day ago:
          consider his another post in somewhat similar spirit: [1] the author
          is indeed working in a performance-oriented niche
          
   URI    [1]: https://tigerbeetle.com/blog/2024-12-19-enum-of-arrays/
       
        dcre wrote 1 day ago:
        I took a version of this away from Sandi Metz’s 99 Bottles of OOP.
        It’s not really my style overall, but the point about moving logic
        forks up the call stack was very well taken when I was working on a
        codebase where we had added a ton of flags that got passed down through
        many layers.
        
   URI  [1]: https://sandimetz.com/99bottles
       
          daxfohl wrote 1 day ago:
          Yeah, I immediately thought of "The Wrong Abstraction" by the same
          author. Putting the branch inside the for loop is an abstraction,
          saying "the for loop is the rule, and the branch is the behavior".
          But very often, some new requirement will break that abstraction, so
          you have to work around it, and the resulting code has an abstraction
          that only applies in some cases and doesn't in others, or you force a
          bunch of extra parameters into the abstraction so that it applies
          everywhere but is hard to follow. Whereas if you hadn't made the
          abstraction in the first place, the resulting code can be easier to
          modify and understand.
          
   URI    [1]: https://sandimetz.com/blog/2016/1/20/the-wrong-abstraction
       
            CodesInChaos wrote 1 day ago:
            I can recommend this article about the "midlayer mistake"
            
   URI      [1]: https://lwn.net/Articles/336262/
       
              drob518 wrote 1 day ago:
              Nice reference.
       
        jmull wrote 1 day ago:
        I really don't think there is any general rule of thumb here.
        
        You've really got to have certain contexts before thinking you ought to
        be pushing ifs up.
        
        I mean generally, you should consider pushing an if up. But you should
        also consider pushing it down, and leaving it where it is. That is,
        you're thinking about whether you have a good structure for your code
        as you write it... aka programming.
        
        I suppose you might say, push common/general/high-level things up, and
        push implementation details and low-level details down. It seems almost
        too obvious to say, but I guess it doesn't hurt to back up a little
        once in a while and think more broadly about your general approach. I
        guess the author is feeling that ifs are usually about a higher-level
        concern and loops about a lower-level concern? Maybe that's true? I
        just don't think it matters, though, because why wouldn't you think
        about any given if in terms of whether it specifically ought to move up
        or down?
       
          hetman wrote 1 day ago:
          I agree with this sentiment. I find attempts to create these kinds of
          universal rules are often a result of the programmer doing a specific
          and consistently repeating type of data transformation/processing. In
          their context it often makes a lot of sense... but try and apply the
          rules to a different context and you might end up with a mess. It can
          also often result in a reactionary type of coding where we eliminate
          a bad coding pattern by taking such an extremely opposite position
          that the code becomes just as unreadable for totally different
          reasons.
          
          This is not to say we shouldn't be having conversations about good
          practices, but it's really important to also understand and talk
          about the context that makes them good. Those who have read The
          Innovator's Solution would be familiar with a parallel concept. The
          author introduces the topic by suggesting that humanity achieved
          powered flight not by blindly replicating the wing of the bird (and
          we know how many such attempts failed because it tried to apply a
          good idea to the wrong context) but by understanding the underlying
          principle and how it manifests within a given context.
          
          The recommendations in the article smell a bit of premature
          optimisation if applied universally, though I can think of context in
          which they can be excellent advice. In other contexts it can add a
          lot of redundancy and be error prone when refactoring, all for little
          gain.
          
          Fundamentally, clear programming is often about abstracting code into
          "human brain sized" pieces. What I mean by that is that it's worth
          understanding how the brain is optimised, how it sees the world. For
          example, human short term memory can hold about 7±2 objects at once
          so write code that takes advantage of that, maintaining a balance
          without going to extremes. Holy wars, for example, about whether OO
          or functional style is always better often miss the point that
          everything can have its placed depending on the constraints.
       
          Tade0 wrote 1 day ago:
          I also don't think there's a general rule.
          
          I use `if`s a markers for special/edge cases and typically return in
          the last statement in the `if` block.
          
          If I have an `else` block and it's large, then it's a clear indicator
          that it's actually two methods dressed as one.
       
        daxfohl wrote 1 day ago:
        I like this a lot. At first, putting ifs inside the fors makes things
        more concise. But it seems like there's always an edge case or
        requirement change that eventually requires an if outside the for too.
        Now you've got ifs on both sides of the for, and you've got to look in
        multiple places to see what's happening. Or worse, subsequent changes
        will require updating both places.
        
        So yeah, I agree, pulling conditions up can often be better for
        long-term maintenance, even if initially it seems like it creates
        redundancy.
       
          hinkley wrote 1 day ago:
          There’s a lot of variable hoisting involved in loving conditional
          logic out of for loops and it generally tends to improve legibility.
          If a variable is loop invariant it makes debugging easier if you can
          prove it is and hoist it.
       
            variadix wrote 1 day ago:
            This is a great way of putting it.
            
            The more things you can prove are invariant, the easier it is to
            reason about a piece of code, and doing the hoisting in the code
            itself rather than expecting the compiler to do it will make future
            human analysis easier when it needs to be updated or debugged.
       
        stevage wrote 1 day ago:
        The author's main concern seems to be optimising performance critical
        code.
       
          drob518 wrote 1 day ago:
          Hmmm. Seems like he’s optimizing clarity of thought, first. The
          performance gain just comes along for the ride. If I were to
          summarize the article, I’d say that it’s advocating a pattern
          where you write code where higher layers decide what needs to be done
          and then lower layers do it, using a combination of straight line
          code and simple loops, with little to no further conditionality.
          Obviously, that represents an ideal.
       
          greesil wrote 1 day ago:
          That's not clear to me. It first reads like "don't branch in a for
          loop (because parallelization?)" but I think it's more for keeping
          the code from becoming a mess over time with multiple developers.
       
          hinkley wrote 1 day ago:
          If your compiler is having trouble juggling a number of variables
          just think how much difficulty the human brain will have on the same
          code.
       
        password4321 wrote 1 day ago:
        Code complexity scanners⁰ eventually force pushing ifs down. The
        article recommends the opposite:
        
        By pushing ifs up, you often end up centralizing control flow in a
        single function, which has a complex branching logic, but all the
        actual work is delegated to straight line subroutines.
        
        ⁰
        
   URI  [1]: https://docs.sonarsource.com/sonarqube-server/latest/user-guid...
       
          chrisweekly wrote 19 hours 2 min ago:
          tangent: how did you get that superscript 0 to render in an HN
          comment?
       
            password4321 wrote 18 hours 22 min ago:
            HN allows many Unicode characters, including U+2070 superscript
            zero which I copy+pasted after a web search. I'd actually be
            interested to learn the full allowlist.
            
   URI      [1]: https://en.wikipedia.org/wiki/Unicode_subscripts_and_super...
       
          ummonk wrote 1 day ago:
          I've always hated code complexity scanners ever since I noticed them
          complaining about perfectly readable large functions. It's a lot more
          readable when you have the logic in one place, and you should only be
          trying to break it up when the details cause you to lose track of the
          big picture.
       
          marcosdumay wrote 1 day ago:
          There was a thread yesterday about LLMs where somebody asked "what
          other unreliable tool people accept for coding?"
          
          Well, now I have an answer...
       
          jt2190 wrote 1 day ago:
          Code scanners reports should be treated with suspicion, not accepted
          as gospel. Sonar in particular will report “code smells” which
          aren’t actually bugs. Addressing these “not a bug” issues
          actually increases the risk of introducing a new error from zero to
          greater than zero, and can waste developer time addressing actual
          production issues.
       
            password4321 wrote 1 day ago:
            The tools are usually required for compliance of some sort.
            
            Fiddling with the default rules is a baby & bathwater opportunity
            similar to code formatters, best to advocate for a change to the
            shipping defaults but "ain't nobody got time for that"™.
       
            SoftTalker wrote 1 day ago:
            a/k/a if it works don't fuck with it.
       
            xp84 wrote 1 day ago:
            I agree with you. Cyclomatic complexity check may be my least
            favorite of these rules. I think any senior developer almost always
            “knows better” than the tool does what is a function of
            perfectly fine complexity vs too much. But I have to grudgingly
            grant that they have some use since if the devs in question
            routinely churn out 100-line functions that do 1,000 things, the
            CCC will basically coincidentally trigger and force a refactor
            which may help to fix that problem.
       
              mnahkies wrote 1 day ago:
              I wonder if there's any value in these kind of rules for
              detecting AI slop / "vibe coding" and preempting the need for
              reviewers to call it out.
       
              jerf wrote 1 day ago:
              Cyclomatic complexity may be a helpful warning to detect really
              big functions, but the people who worry about cyclomatic
              complexity also seem to be the sort of people who want to set the
              limit really low and get fiesty if a function has much more than
              a for loop with a single if clause in it. These settings produce
              those code bases where no function anywhere actually does
              anything, it just dispatches to three other functions that also
              don't hardly do anything, making it very hard to figure out what
              is going on, and that is not a good design.
       
                BurningFrog wrote 1 day ago:
                I call this "poltergeist code". Dozens of tiny functions that
                together clearly does something complex correctly, but it's
                very hard to find where and how it's actually done.
       
                  zellyn wrote 1 day ago:
                  I love that name, and will definitely steal it!
       
                  Brian_K_White wrote 1 day ago:
                  One incomplete but easy to state counter "rule" to fling back
                  in such cases is just: If the function isn't generic and
                  re-used for other unrelated things, then it probably
                  shouldn't be a seperate function.
                  
                  Yeah only probably, there can sure be large distinct
                  sub-tasks that aren't used by any other function yet would
                  improve understanding to encapsulate and replace with a
                  single function call. You decide which by asking which way
                  makes the overall ultimate intent clearer.
                  
                  Which way is a closer match to the email where the boss or
                  customer described the business logic they wanted? Did they
                  say "make it look at the 3rd word to see if it has trailing
                  spaces..."?
                  
                  Or to find out which side of the fuzzy line a given situation
                  is falling, just make them answer a question like, what is
                  the purpose of this function? Is it to do the thing it's
                  named after? Or is it to do some meaningless microscopic
                  string manipulation or single math op? Why in the world do
                  you want to give a name and identity to a single if() or
                  memcpy() etc?
       
          hinkley wrote 1 day ago:
          The way to solve this is to split decisions from execution and
          that’s a notion I got from our old pal Bertrand Meyer.
          
              if (weShouldDoThis()) {
              doThis();
              }
          
          It complements or is part of functional core imperative shell. All
          those checks being separate makes them easy to test, and if you care
          about complexity you can break out a function per clause in the
          check.
       
            btown wrote 1 day ago:
            To add to this, a pattern that's really helpful here is:
            findThingWeShouldDoThisTo can both satisfy a condition and greatly
            simplify doThis if you can pass it the thing in question. It's
            read-only, testable, and readable. Highly recommend.
       
              efitz wrote 23 hours 4 min ago:
              This is not obvious to me.  The whole point was to separate the
              conditionals from the actions.
              
              In your example, it’s not clear if/how much “should we do
              this” logic is in your function.  If none, then great; you’ve
              implemented a find or lookup function and I agree those can be
              helpful.
              
              If there’s some logic, eg you have to iterate through a set or
              query a database to find all the things that meet the criteria
              for “should do this”, then that’s different than what the
              original commenter was saying.
              
              maybe:
              doThis( findAllMatchingThings(
              determineCriteriaForThingsToDoThisTo()))
              
              would be a good separation of concerns
       
                hinkley wrote 20 hours 25 min ago:
                let list = findHumansToKill();
                
                //list = [];
                
                killHumans(list);
       
            0cf8612b2e1e wrote 1 day ago:
            Functions should decide or act, not both.
       
              const_cast wrote 14 hours 54 min ago:
              This just moves decisions from inside of functions to the call
              site. At which point, there's more that can go wrong, since
              you're calling a function much more than it's single definition.
       
              swat535 wrote 1 day ago:
              But if that’s all you have, then how does your system do
              anything ? You ultimately need to be able to decide and then act
              based in that decision somewhere..
       
                turtleyacht wrote 1 day ago:
                One possibility is a file.py that is called by your framework.
                The interface could be something like
                
                  def doth_match(*args):
                    return True  # the predicate
                
                  def doeth_thou(*args):
                    # processing things
                    return {}  # status object for example
                
                The framework loops and checks the first function; if true,
                then execute the second function. And then break or continue
                for other rule files (or objects).
                
                There could be multiple files rule1.py, rule2.py, etc that
                check and do different things.
       
                  gameman144 wrote 1 day ago:
                  I think the parent's argument is that wherever in your
                  framework you're calling `doth_match` and then `doeth_thou`,
                  you have a single function that's both deciding and acting.
                  There has to be a function in your program that's responsible
                  for doing both.
       
                    hinkley wrote 12 hours 23 min ago:
                    A function that asks a question with one function call and
                    calls another based on the answer isn’t doing any work or
                    asking any questions. It’s just glue code. And as long as
                    it stays just glue code, that’s barely any of your code.
                    
                    Asking for absolutes is something journeymen developers
                    need to grow out of.
                    
                    The principle of the excluded middle applies to Boolean
                    logic and bits of set theory and belongs basically nowhere
                    else in software development. But it’s a one trick pony
                    that many like to ride into the ground.
       
              d_burfoot wrote 1 day ago:
              This is an excellent rule.
       
          daxfohl wrote 1 day ago:
          IME this is frequently a local optimum though. "Local" meaning, until
          some requirement changes or edge case is discovered, where some
          branching needs to happen outside of the loop. Then, if you've got
          branching both inside and outside the loop, it gets harder to reason
          about.
          
          It can be case-dependent. Are you reasonably sure that the condition
          will only ever effect stuff inside the loop? Then sure, go ahead and
          put it there. If it's not hard to imagine requirements that would
          also branch outside of the loop, then it may be better to
          preemptively design it that way. The code may be more verbose, but
          frequently easier to follow, and hopefully less likely to turn into
          spaghetti later on.
          
          (This is why I quit writing Haskell; it tends to make you feel like
          you want to write the most concise, "locally optimum" logic. But that
          doesn't express the intent behind the logic so much as the logic
          itself, and can lead to horrible unrolling of stuff when minor
          requirements change. At least, that was my experience.)
       
        esafak wrote 1 day ago:
        I agree, except for this example, where the author effectively (after a
        substitution) prefers the former:
        
            fn f() -> E {
              if condition {
            E::Foo(x)
              } else {
            E::Bar(y)
              }
            }
            
            fn g(e: E) {
              match e {
            E::Foo(x) => foo(x),
            E::Bar(y) => bar(y)
              }
            }
        
        The latter is not only more readable, but it is safer, because a match
        statement can ensure all possibilities are covered.
       
          josephg wrote 1 day ago:
          > The latter is not only more readable, but it is safer, because a
          match statement can ensure all possibilities are covered.
          
          Whether or not this matters depends on what, exactly, is in those
          match arms. Sometimes there's some symmetry to the arms of an if
          statement. And in that case, being exhaustive is important. But
          there's plenty of times where I really just have a bit of bookkeeping
          to do, or an early return or something. And I only want to do it in
          certain cases. Eg if condition { break; } else { stuff(); }
          
          Also, if-else is exhaustive already. Its still exhaustive even if you
          add more "else if" clauses, like if {} else if {} else {}.
          
          Match makes sense when the arms of the conditional are more
          symmetrical. Or when you're dealing with an enum. Or when you want to
          avoid repeating conditions. (Eg match a.cmp(b) { Greater / Equal /
          Less } ).
          
          The best way to structure your code in general really comes down to
          what you're trying to do. Sometimes if statements are cleaner.
          Sometimes match expressions. It just depends on the situation.
       
            esafak wrote 1 day ago:
            It's only exhaustive in this toy case. Add another one and the
            burden of checking for exhaustiveness with ifs falls on your
            shoulders.
       
              josephg wrote 23 hours 33 min ago:
              So long as you have an else block on your if statement, it’s
              exhaustive. I think I can keep track of that.
       
                esafak wrote 21 hours 52 min ago:
                Just because your code flowed into the else block, it does not
                mean the condition got handled properly. If different switching
                values don't need special treatment, why have an if statement
                at all? Consider serving an ML model, and switching on the
                provider. Let's say you initially support OpenAI, and
                self-hosting, as the if and else cases, respectively. If you
                then add support for Anthropic, it will incorrectly follow the
                else path and be treated as self-hosted. Or you make else the
                error path, and fail when you should not have.
       
          CraigJPerry wrote 1 day ago:
          > Prefers the former after a substitution...
          
          That's not quite right, it's a substitution AND ablation of 2
          functions and an enum from the code base.
          
          There's quite a reduction in complexity he's advocating for.
          
          Further, the enum and the additional boilerplate is not adding type
          safety in this example. Presumably the parameters to foo and bar are
          enforced in all cases so the only difference between the two examples
          is the additional boilerplate of a 2-armed enum.
          
          I strongly suspect in this case (but i haven't godbolted it to be
          sure) that both examples compile to the same machine code. If my
          hunch is correct, then the remaining question is, does introduction
          of double-entry book keeping on the if condition add safety for
          future changes?
          
          Maybe. But at what cost? This is one of those scenarios where you
          bank the easy win of reduced complexity.
       
       
   DIR <- back to front page