gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   We found a bug in Go's ARM64 compiler
       
       
        wy1981 wrote 2 hours 52 min ago:
        Great find and writeup.
        
        As an aside, this is the type of a problem that I think model checkers
        can't help with. You can write perfect and complicated
        TLA+/Lean/FizzBee models and even if somehow these models can generate
        code for you from your correct models you can still run into bugs like
        these due to platform/compiler/language issues. But, thankfully, such
        bugs are rare.
       
          jraph wrote 2 hours 45 min ago:
          Yep. Model checking is for checking that your design is sound,
          basically, not at all the implementation.
          
          For the implementation, you can use certified compilers like CompCert
          [1], but:
          
          - you still have to show your code is correct
          
          - there are still parts of CompCert that are not certified
          
   URI    [1]: https://compcert.org/
       
        bradley13 wrote 4 hours 22 min ago:
        I find it interesting, how rare it has become to find s compiler bug.
        For me, at least, it used to be a regular event.
        
        Even Java, as widespread as it is, I have made half-a-dozen reports.
        None in the last several years, though.
        
        Better testing? The sheer scale of software being produced?
       
          lou1306 wrote 4 hours 18 min ago:
          Linus's law [1]? When it comes to compilers for mainstream languages,
          the userbases are so large that they will explore a surprisingly
          large portion of the compiler's state space.
          
          But definitely, better engineering and QA practices must also help
          here.
          
   URI    [1]: https://en.wikipedia.org/wiki/Linus%27s_law
       
        anthk wrote 4 hours 42 min ago:
        I miss the Delve debugger for OpenBSD 386 BTW.
       
        me2too wrote 6 hours 59 min ago:
        Great write-up
       
        Bengalilol wrote 8 hours 22 min ago:
        I always appreciate articles like this, where you can clearly see the
        engineerâs way of thinking.
        
        I was just puzzled by the middle part of the article, where they start
        investigating their code but seem to overlook the fact that it only
        happens on ARM64.
        
        Still, I understand that itâs professional to proceed step by step
        logically.
        
        Great article, it was a pleasure reading it!
       
          mixedbit wrote 8 hours 12 min ago:
          Hard to reproduce bugs often depend on an order of events or timing.
          Different architecture can trigger different order of execution, but
          this doesn't mean the bug is not in the application.
       
        Vipsy wrote 9 hours 42 min ago:
        One thing that often gets missed is how hard it is to even suspect the
        compiler as the root cause. Most engineers waste hours chasing bugs in
        their own code because weâre trained to trust our tools. This mindset
        alone can make these rare compiler bugs much trickier to find.
       
          Tor3 wrote 8 hours 19 min ago:
          In the past it was more common to suspect the compiler, as others
          mention here.
          On a minicomputer I worked with in the late eighties, early nineties,
          I occasionally found errors in the compiler output. This was a Pascal
          compiler and because of that it didn't take too long to figure out
          that the code was actually correct and something else must be going
          on. Then firing up the debugger/tracer and scrutinizing and analyzing
          what happens in the disassembly.. when the problem was found, send a
          fax (yes!) to the head designer of the compiler, get a fixed test
          compiler back on a set of floppies.. went through this several times.
          I still have a printout somewhere with my pen marks pointing out a
          bug in the generated code.
       
          pjmlp wrote 9 hours 0 min ago:
          In the early PC days we suspected them a lot given how manually
          writting Assembly was still much better, in many cases.
          
          I found out a bug on Turbo Pascal 6, where if you declare a variable
          with the same name as the function name, then the result was random
          garbage.
          
          For those that don't know Pascal, the function name has to be
          assigned for the result value, so if a local variable with the same
          name is possible, then you cannot set the return value.
          
          Something like this [1] (* In Turbo Pascal 6 this would compile *)
          
              function Square(num: Integer): Integer;
              var
              Square: Integer;
          
              begin
              Square := num * num; (* Here the local variable gets used
          instead *)
              end;
          
   URI    [1]: https://godbolt.org/z/s6srhTW66
       
          SuperQue wrote 9 hours 9 min ago:
          Yup, I had an issue filed against an open source project I work on.
          Was a crazy weird crash.
          
          The reporter actually spent the effort to track it down, turns out it
          _was_ a Go compiler bug. ( [1] )
          
   URI    [1]: https://github.com/golang/go/issues/20427
       
          kmarc wrote 9 hours 22 min ago:
          There are certain professions where the compilation process is
          (ab)used to optimize to a point where these bugs seemingly surface
          more often.
          
          In the HFT sphere i haven't talked to a company that hasn't reported
          (bragged about finding) a super weird gcc/clang bug.
          
          Well, also, at my last job we used a snapshot version of the
          compiler, bc... Any nanoseconds matters.
       
            hshdhdhehd wrote 7 hours 45 min ago:
            In HFT might you keep the bug fix secret so other HFTs cant benefit
            from it.
       
              kmarc wrote 7 hours 17 min ago:
              I saw both. One of the top firms wanted that, another I worked at
              we did report (of course with a scratched minimal reproducible
              example)
              
              The thing is, it's quite unlikely that your competitor hits the
              exact same bug. The  cost of us having to keep upstream patched,
              tested isn't justified.
              
              Also in HFT world there are some very similar patterns across
              competing companies, yet, we just saw TernFS coming out from XTX,
              with not much fear of competitors benefiting from it more than
              they do.
       
        neuroelectron wrote 15 hours 26 min ago:
        I've seen only one race condition in my career and it always surprises
        me how it is even found.
       
        alberth wrote 16 hours 19 min ago:
        I thought Cloudflare was 100% Rust, and x86 (EPYC) these days.
        
        Interesting to hear Go & ARM in use.
       
          surajrmal wrote 12 hours 54 min ago:
          I doubt any company is mono language at that scale. Using ARM usually
          makes sense for s lot of horizontal scaling workloads so it's also
          not that surprising.
       
          steveklabnik wrote 14 hours 32 min ago:
          Cloudflare has long kept Arm builds of everything even when they
          deployed to x86 only, to make it easy to switch when it made sense.
          
          And yeah, a lot of Rust but also a lot of Go.
       
        MarkSweep wrote 16 hours 33 min ago:
        I wonder if Go had a mode where you make it single step every
        instruction and trigger a GC interrupt on every opcode. That would make
        it easier to find these kinds of bugs.
       
        pfdietz wrote 17 hours 14 min ago:
        I see something like this and I wonder "what testing methodology would
        have found this?"  It has to be general, not something that would
        involve knowing what the bug was ahead of time.
       
          syncsynchalt wrote 13 hours 12 min ago:
          When your scale is large enough, you move to "what monitoring
          methodology will find this?"
          
          When you're doing enough transactions you start to see a noise floor
          of e.g. bit flips from cosmic rays, and looking for issues involves
          correlating/categorizing possible software failures and
          distinguishing them from the misbehavior of hardware.
       
        quotemstr wrote 17 hours 56 min ago:
        This problem strikes me more as a debuginfo generation bug than a
        "compiler" bug.
        
        > After this change, stacks larger than 1<<12 will build the offset in
        a temporary register and then add that to rsp in a single, indivisible
        opcode. A goroutine can be preempted before or after the stack pointer
        modification, but never during. This means that the stack pointer is
        always valid and there is no race condition.
        
        Seems silly to pessimize the runtime, even slightly, to account for the
        partial register construction. DWARF bytecode ought to be powerful
        enough to express the calculations needed for restoring the true stack
        pointer if we're between immediate adjustments.
       
          sauercrowd wrote 17 hours 50 min ago:
          > This problem strikes me more as a debuginfo generation bug than a
          "compiler" bug.
          
          But isn't that the same thing here? The bug occurred in their
          production workflows, not in some specific debug builds, so with that
          seems pretty reasonable to call it a compiler bug?
       
            quotemstr wrote 17 hours 37 min ago:
            Thanks. I think of unwinder information as debuginfo even though,
            as you point out, it's used outside of debugging contexts all the
            time. :-)
            
            As for the actual bug:
            
            Unless you're unwinding the stack by walking the linked list of
            frames threaded through the frame pointer, then each time you
            unwind a level of the stack, you need to consult a table keyed on
            instruction pointer to look up how to compute the register contents
            of the previous frame based on register content of the current
            frame. One of the registers you can compute this way is the
            previous frame's stack pointer.
            
            I haven't looked in depth at what the Go runtime is doing exactly,
            but at a glance, I don't see mention of frame pointers in the
            linked article, so I'm guessing Go uses the SP-and-unwind-table
            approach? If so, the real bug here is that the table didn't have
            separate entries for the two ADDs and so gave incorrect
            reconstruction instructions for one of them.
            
            If, however, frame pointers are a load-bearing part of the Go
            runtime, and that runtime  failed to update frame pointer (not just
            the stack pointer) in the contractually mandatory manner, well,
            that's a codegen bug and needs a codegen fix.
            
            I guess I just don't like, as a matter of philosophy if not
            practical engineering, having frame pointers at all. Without the
            frame pointer, the program already contains all the information you
            need to unwind, at no runtime cost --- you pay for table lookups
            only when you unwind, not all the time, on straight-line code.
            
            The purist in me doesn't like burning a register for debugging, but
            you have to use the right tool for the job I guess.
       
        yalok wrote 19 hours 57 min ago:
        Classic problem of non-atomic stack pointer modification.
        
        Used to have a lot of fun with those 3 decades ago.
       
        lordnacho wrote 20 hours 31 min ago:
        > This was a very fun problem to debug.
        
        I'm sure it was a relief to find a thorough solution that addressed the
        root cause. But it doesn't seem plausible that it was fun while it was
        unexplained. When I have this kind of bug it eats my whole attention.
        
        Something this deep is especially frustrating. Nobody suspects the
        standard library or the compiler. Devs have been taught from a young
        age that it's always you, not the tools you were given, and that's
        generally true.
        
        One time, I actually did find a standard library bug. I ended up taking
        apart absolutely everything on my side, because of course the last
        hypothesis you test is that the pieces you have from the SDK are
        broken. So a huge amount of time is spent chasing the wrong lead when
        it actually is a fundamental problem.
        
        On top of this, the thing is a race condition, so you can't even
        reliably reproduce it. You think it's gone like they did initially, and
        then it's back. Like cancer.
       
          saagarjha wrote 13 hours 14 min ago:
          The people who find the fun are often good at identifying when it is
          the standard library or the compiler.
       
          anyfoo wrote 17 hours 57 min ago:
          > I'm sure it was a relief to find a thorough solution that addressed
          the root cause. But it doesn't seem plausible that it was fun while
          it was unexplained. When I have this kind of bug it eats my whole
          attention.
          
          Yeah, and that's fun for me. Some of my most fun bugs to debug have
          been compiler, or even CPU issues.
       
          wat10000 wrote 18 hours 16 min ago:
          I find this sort of thing to be tremendously fun. It can be
          frustrating as well, but overall itâs my favorite part of my job. I
          donât see why this would be implausible. Different people enjoy
          different things.
       
          btbuilder wrote 18 hours 33 min ago:
          Segfaults with no use of âUnsafeâ equivalents in managed
          languages can give immediate indication itâs not a code problem.
       
            afdbcreid wrote 12 hours 41 min ago:
            They explicitly mention there was usage of unsafe, and they weren't
            sure that's not the cause.
       
          commandersaki wrote 18 hours 41 min ago:
          Probably just meant satisfying instead of fun. I found a bug in
          sscanf for the gcc arm toolchain that ships with Ubuntu (and Debian),
          and it wasn't fun since I had deadlines to deal with. Workaround was
          to use the official ARM one. But after 2 days, it was satisfying to
          nail the exact problem and write a regression test.
       
          rectang wrote 18 hours 56 min ago:
          Although Iâm good enough at it, like you I hate this kind of
          debugging experience, and try hard to avoid putting myself in a
          position where I have to do it.  Itâs not fun for me at all.
          
          I also donât like many puzzle games, like Sudoku, because to me
          they feel like this kind of work.  Many colleagues of mine have
          expressed bafflement that I donât find such puzzles fun and give me
          all kinds of grief about how I ought to enjoy them, since they do.
          
          Itâs the same thing here, just flipped around: this person seems to
          enjoy the debugging experience; just let them be.  Or recruit them,
          because that temperament is valuable.
       
          alfalfasprout wrote 18 hours 57 min ago:
          > Devs have been taught from a young age that it's always you, not
          the tools you were given, and that's generally true.
          
          That's not been my experience at all FWIW. Tools get things wrong all
          the time.
          
          Simply that more mature projects with heavy use like eg; gcc or
          clang/llvm generally tend to have had major bugs stamped out by this
          point. They do still happen though.
          
          More nascent language and compiler ecosystems are more likely to run
          into issues. Especially languages with runtimes.
       
          secondcoming wrote 19 hours 26 min ago:
          It becomes fun when you narrow down to the solution. Before that it's
          hell.
          
          I don't think I'd be allowed spend weeks to debug something like
          this. Credit to Cloudflare's PMs.
       
            maples37 wrote 17 hours 50 min ago:
            Apparently they have a "unexplained crashes must have an
            explanation determined" policy ever since there was a trend of
            uninvestigated unexplained crashes that were canaries in the mine
            for a security issue. [1] > But [the Cloudbleed sensitive
            information disclosure security incident] wasnât the only
            consequence of the bug. Sometimes it could lead to an invalid
            memory read, causing the NGINX process to crash, and we had metrics
            showing these crashes in the weeks leading up to the discovery of
            Cloudbleed. So one of the measures we took to prevent such a
            problem happening again was to require that every crash be
            investigated in detail.
            
            Since then, they have a "no crashes go uninvestigated" policy,
            which for the scale Cloudflare operates at, seems pretty
            impressive.
            
   URI      [1]: https://blog.cloudflare.com/however-improbable-the-story-o...
       
              jgrahamc wrote 3 hours 47 min ago:
              Yes, and we set up all the tooling for that and I would look at
              the output every single day and keep an eye on what was
              happening. Any team that didn't fix a crash quickly got a
              personal message from me. That responsibility has been taken over
              by others now.
       
          LoganDark wrote 19 hours 33 min ago:
          > I'm sure it was a relief to find a thorough solution that addressed
          the root cause. But it doesn't seem plausible that it was fun while
          it was unexplained. When I have this kind of bug it eats my whole
          attention.
          
          Hey; it could've been type-3 fun.
       
          dylan604 wrote 20 hours 4 min ago:
          Some people are perverse individuals and actually enjoy debugging
          very esoteric things. What might be frustrating to you might be the
          very thing that gets someone else very excited.
       
          akerl_ wrote 20 hours 12 min ago:
          It feels like this comment was almost a purely additive anecdote of
          your own experience with a similar kind of issue, but you've spoiled
          it by deciding to tell the author that they're incorrect about how
          they felt during the process?
          
          Maybe different people find different things fun.
       
            lordnacho wrote 19 hours 39 min ago:
            Not saying he's wrong, sometimes the word "fun" connotes something
            slightly different what what it literally means. "Satisfying" is
            something I'd use for the end state. Maybe "challenging" for the
            intermediate state. But while you're in a high-pressure situation
            that you don't understand, that is rarely "fun" in the literal
            sense.
            
            You wouldn't pay to be given compiler race condition bugs, right?
       
              klausa wrote 13 hours 53 min ago:
              I wouldn't pay to be given any kind of work, but there are some
              aspects of my job that I find more or less 'fun'.
              
              Hunting bugs that people have given up on or have no ideas on how
              to tackle is near the top of that list.
       
              Agingcoder wrote 17 hours 40 min ago:
              I like these bugs. Theyâre intricate, technical puzzles, that
              can take weeks to figure out. You need a proper strategy to
              figure them out, cannot rely on simple tactics, and when you
              finally understand whatâs going on, itâs immensely
              satisfying.
              
              This, and now thereâs pernosco which makes everything much
              easier.
              
              Now, under pressure, this is going to be a nightmare unless you
              have a high tolerance to stress.
       
              a10c wrote 19 hours 1 min ago:
              > Not saying he's wrong [1] - Iâm Thea âTeddyâ Heinen
              (she/her or they/them)!
              
   URI        [1]: https://heinen.dev/
       
              akerl_ wrote 19 hours 36 min ago:
              Maybe stop digging here and just let it be fun for the author?
       
        brcmthrowaway wrote 21 hours 18 min ago:
        I don't get it, how were the machine threads being stopped in thr
        middle of two instructions? This is baremetal, right?
       
          purplesyringa wrote 19 hours 24 min ago:
          Signals.
       
            ahoka wrote 8 hours 12 min ago:
            That's why the old advice was not to use signals and threads
            together, if you can avoid it.
       
          adgjlsfhk1 wrote 21 hours 14 min ago:
          go uses interrupts for GC notifications
       
        mperham wrote 21 hours 29 min ago:
        Did they ever explain why netlink was involved? Or was that a red
        herring?
       
          syncsynchalt wrote 13 hours 16 min ago:
          The netlink function uses a larger stack than most.
          
          Their repro case required a stack adjustment larger than 1<<12
          (4kiB).
       
          drob518 wrote 21 hours 3 min ago:
          Seemed like a red herring. They were able to reproduce it without any
          libraries. Might have just been net link forcing the stacks to a
          certain size and that made the bug visible.
       
          Sesse__ wrote 21 hours 5 min ago:
          The stack in that specific function was big enough to trigger the
          bug.
       
        wat10000 wrote 22 hours 15 min ago:
        I would have thought that unwinding would use the frame pointer and
        this wouldn't be a problem.
       
          mperham wrote 21 hours 30 min ago:
          The frame pointer was updated non-atomically in two asm ops. An async
          interruption between the two ops would lead to a corrupt frame
          pointer.
       
            wat10000 wrote 21 hours 9 min ago:
            So it was. The article never mentions the frame pointer and I'm
            familiar with compilers that load the saved value from the stack in
            the epilog, rather than adjusting it arithmetically. But they do
            have an assembly listing showing the two-step arithmetic adjustment
            for both the stack pointer and frame pointer.
            
            But I'm not sure that matters, because the unwind code they show
            uses the stack pointer rather than the frame pointer anyway.
       
        renewiltord wrote 23 hours 6 min ago:
        Great technical blog. Good pathway for narrative, tight examples,
        description so clear it makes me feel smarter than I am because so easy
        to follow though the last time I even read assembly seriously was x86
        years ago.
        
        Also, fulfills the marketing objective because I cannot help but think
        that this team is a bunch of hotshots who have the skill to do this on
        demand and the quality discipline to chase down rare issues.
        
        I assume these are Ampere Altra? I was considering some of those for
        web servers to fill out my rack (more space than power) but ended up
        just going higher on power and using Epyc.
       
        gok wrote 23 hours 13 min ago:
        The real lesson here should be that doing crazy shit like swizzling the
        program counter in a signal handler and writing your own assembler is
        not a good idea.
       
          platinumrad wrote 15 hours 33 min ago:
          Those are both completely normal things to do when you're
          implementing a programming language. For example, the Hotspot JVM
          uses SIGSEGV to stop the world for garbage collection.
       
          achierius wrote 20 hours 43 min ago:
          Sorry, how exactly do you think compilers are supposed to work if not
          by 'writing [their] own assembler'? Someone has to write the
          assembler, and different compilers have different needs.
       
          themafia wrote 21 hours 26 min ago:
          Neither of those are "crazy shit."  It's just complex because the
          environment offers specific features like automatic GC with async
          preemption in a compiled language which pretty much requires it.
          
          Complex engineering isn't something to be avoided by default.
       
            Diggsey wrote 1 hour 51 min ago:
            Agree, but I think there is a point to be made here: Go as a
            language has more subtle runtime invariants that must be upheld
            compared to other languages, and this has led to a relatively large
            number of really nasty bugs (eg. there have also been several bugs
            relating to native function calling due to stack space issues and
            calling convention differences). By "nasty" I mean ones that are
            really hard to track down if you don't have the resources that a
            company like CF does.
            
            To me this points to a lack of verification, testing, and most
            importantly awareness of the invariants that are relied on. If the
            GC relies on the stack pointer being valid at all times, then the
            IR needs a way to guarantee that modifications to it are not split
            into multiple instructions during lowering. It means that there
            should be explicit testing of each kind of stack layout, and tests
            that look at the real generated code and step through it
            instruction by instruction to verify that these invariants are
            never broken...
       
          wat10000 wrote 22 hours 16 min ago:
          The general wisdom is that you shouldn't do this stuff yourself, and
          you should instead rely on tried and tested implementations. But
          sometimes you're the one who provides the tried and tested
          implementations. Implementing a compiled language is often one of
          those times.
       
          blinkingled wrote 22 hours 45 min ago:
          This^. Keith W on Dtrace blog said it a decade ago [1] I like Go but
          I don't really like their NIH / replace everything with our stuff
          stance - esp on system tools like assemblers and linkers.
          
   URI    [1]: https://wesolows.dtrace.org/2014/12/29/golang-is-trash/
       
        riobard wrote 23 hours 20 min ago:
        What ARM64 machines are you using and what are they used for? Last year
        you were announcing Gen 12 servers on AMD EPYC ( [1] ), but IIRC there
        werenât any mentions of ARM64. But now it seems youâre running
        ARM64 in full production.
        
   URI  [1]: https://blog.cloudflare.com/gen-12-servers/
       
          EE84M3i wrote 16 hours 32 min ago:
          I seem to recall Cloudflare hosts their some of their non-edge
          compute on public clouds? Like control plane stuff. Could be that.
       
          zamadatix wrote 16 hours 40 min ago:
          I'm not Cloudflare, I just read their blog too much. As they hint in
          the article when mentioning secure boot, they've been deploying
          Ampere in parallel to AMD for several years now. Purpose wise it
          seems to be Edge related for efficiency reasons, but maybe they use
          them for other things too. You can read some more here [1] and here
          [2] along with the original evaluation of Qualcomm here
          
   URI    [1]: https://blog.cloudflare.com/designing-edge-servers-with-arm-...
   URI    [2]: https://blog.cloudflare.com/arms-race-ampere-altra-takes-on-...
   URI    [3]: https://blog.cloudflare.com/arm-takes-wing/
       
            riobard wrote 13 hours 27 min ago:
            Yeah but those are pretty dated. I was under the impression those
            old Ampere servers are not efficient compared to modern EPYC
            anymore. So Iâm wondering what their current generation of arm64
            servers look like :p
       
        pengaru wrote 23 hours 29 min ago:
        For the impatient, here's the fix:
        
   URI  [1]: https://github.com/golang/go/commit/f7cc61e7d7f77521e073137c60...
       
          chavi2 wrote 13 hours 55 min ago:
          One thing I worry about, probably unnecessarily, is anything with a
          sense of urgency.
          
          HEY GUYS WE JUST FOUND A GOLANG COMPILER BUG AND FATAL PANICS!
          
          Everyone is like âHmm. I need to fix this now.â
          
          So, 99% probability itâs what it is. 1% itâs some secret
          defensive thing because there was a bad stupid zero day someone would
          get fired over or that could leave the world in shambles if
          uncovered, or maybe something else needed to be swept under the rug,
          or maybe someone wants to distract while they introduce a new
          vulnerability.
          
          I donât think this with CVEs, but when someoneâs like âinstall
          this patch everybody!â the dim red light flickers on.
       
          cmckn wrote 22 hours 49 min ago:
          I noticed this when reviewing the linked issue: [1] Does the Go team
          have a natural language bot or is this just
          comment.contains(âbackportâ) type stuff?
          
   URI    [1]: https://github.com/golang/go/issues/73259#issuecomment-31004...
       
            kbolino wrote 22 hours 44 min ago:
            The latter: [1] (found via [2] )
            
   URI      [1]: https://github.com/golang/build/blob/master/cmd/gopherbot/...
   URI      [2]: https://go.dev/wiki/gopherbot
       
              etra0 wrote 19 hours 33 min ago:
              Kinda funny that it requires both "please" and "backport" for it
              to be considered haha.
       
              9rx wrote 22 hours 4 min ago:
              Although also the former (gabyhelp):
              
   URI        [1]: https://github.com/golang/oscar/tree/master/internal/gab...
       
        dreamcompiler wrote 23 hours 47 min ago:
        Always adjust your stack pointer atomically, kids.
       
          drob518 wrote 21 hours 6 min ago:
          Exactly what ran through my mind.
       
          whizzter wrote 23 hours 17 min ago:
          I guess those that wrote the preemption were on X86 where this
          doesn't happen thanks to variable length instructions being able to
          hold the constant and thus relied on the code-gen to do it
          atomically, then the ARM port had an automatic "split" from a higher
          level to make things "easy" thus giving us this bug.
          
          Nobodys fault really, but bad results ensued.
       
            yvdriess wrote 4 hours 4 min ago:
            Hands up, the dozens of us pedants that have used a relaxed atomic
            add in situations like these.  Updating the SP in the most paranoid
            way possible is the reason that sort of thing exists.
            
            (You cannot express relaxed atomics in golang, but you could
            technically add support in the compiler for use in the runtime
            code)
       
            Sesse__ wrote 21 hours 3 min ago:
            > Nobodys fault really, but bad results ensued.
            
            Uh, the fault is entirely in writing an assembler _that is not an
            assembler_, but rather something that is _almost_ like one but then
            1% like an IR instead. It's an unforced error.
       
              whizzter wrote 7 hours 46 min ago:
              It doesn't even need to be an error in the "assembler" but could 
              be another part that converts from some internal highlevel IR,
              also for most cases split ops doesn't matter for register
              manipulating instructions (that you might want generated as
              compactly as possible) since regular atomics are separate on
              memory addresses.
              
              Even then, if the code-gen was written BEFORE the preemption then
              it was fairly sloppy for those implementing the preemption to not
              consider the function epilogue, granted statically adjusting the
              stack/frame pointer by more than 4kb is probabably a tad of an
              edge-case.
       
              wbl wrote 19 hours 52 min ago:
              Assemblers used to do a ton of stuff back in the day
       
                anyfoo wrote 15 hours 35 min ago:
                Oh yeah. S/360 assembly almost looks like a high level language
                sometimes. In MVS, functions of the OS and standard libraries
                (or its equivalent) were implemented as elaborate macros, with
                their own invocation syntax, whereas nowadays you'd expect a
                function that you'd call (dynamically linked or not), with
                parameters passed in registers.
                
                At least in the 90s, there were actually macro assemblers that
                supported OOP programming in assembly. Borland Turbo Assembler
                5.0 comes to mind, if was kind of fun.
       
                  pjmlp wrote 8 hours 52 min ago:
                  Those are still around if you go for Assemblers with
                  background in PC culture like NASM, YASM, MASM (still part of
                  MSVC).
                  
                  By the way Embarcaredo still has Turbo Assembler. [1] Now a
                  thing of the past, but Assemblers for game consoles were also
                  quite powerfull in their macro capabilities.
                  
                  I never liked the UNIX Assembly culture, because naturally as
                  soon as C became a thing, they became the bare minimum
                  required to assemble the generated Assembly out of the C
                  compiler, as another step into the compilation pipeline.
                  
                  All the niceties of macro assemblers came through the other
                  platforms, like being able to use NASM instead of the
                  platform assembler, not even GNU AS nor clang are that great
                  in their abilities as Assemblers beyond the basic stuff.
                  
   URI            [1]: https://docwiki.embarcadero.com/RADStudio/Athens/en/...
       
        Agingcoder wrote 1 day ago:
        Excellent article as always from the cloudflare blog - engineering
        without magic infrastructure and ml. One day I will apply !
        
        Compiler bugs are actually quite common ( I used to find several a year
        in gcc ), but as the author says, some of them only appear when you
        work at a very large scale, and most people never dive that far.
       
          jgrahamc wrote 23 hours 44 min ago:
          What's stopping you applying today?
       
            Agingcoder wrote 19 hours 39 min ago:
            Fair question. Location primarily ( nothing in France ), and Iâm
            not sure how âweâre looking for people who enjoy doing that
            kind of thingâ( I very much do ) relates to the actual job
            offers, ie what job offer should I actually apply to.
            
            My background is not networking ( itâs math then hpc then broader
            stuff ) but I keep stumbling on similar problems ( including a
            beautiful one related to intel NICs a few years ago which led be
            into a rabbit hole of ebpf and kernel network layer and which
            surfaced later on the cloudflare blog), and the only tech company
            with which this seems to be a regular occurrence is cloudflare.
            Their space is a bit unknown to me so I guess Iâm having a hard
            time projecting something onto the job offers.
            
            Iâd happily chat to someone working for cloudflare though - I
            guess this would help me understand what it is that actually
            happens over there. I guess Iâm a bit intimidated by this unknown
            yet really good looking world :-)
       
              jgrahamc wrote 2 hours 39 min ago:
              You can email me jgc@ Cloudflare and I'll forward your details to
              the right people.
       
              sauercrowd wrote 17 hours 44 min ago:
              I've interned at Cloudflare back in 2020 and had a great time-
              would highly recommend!
              
              Can't speak to the locations but the stuff you're
              interested/experienced in seems extremely likely to overlap with
              what they do. They do a lot of very deep technical things in all
              kinds of areas.
              
              my recommendation if you want to talk to someone about it: search
              github/twitter/linkedin for ppl who work there on stuff you like,
              and just send them a message and ask for a 20 minute call!
              
              have done it plenty of times, has always been extremely positive
       
            kccqzy wrote 20 hours 10 min ago:
            Low compensation relative to many other companies. (It didn't stop
            me from applying, but I stopped me from accepting.)
       
            nevon wrote 23 hours 35 min ago:
            Similar to the previous commenter, every time I read a blog post
            from Cloudflare I end up checking the careers page thinking "this
            is exactly the kind of work I'd like to be doing". Sadly no
            openings in my country. I'll keep checking!
       
              moomoo11 wrote 23 hours 12 min ago:
              Pretty sure location is not a factor for these companies. You
              should apply anyway. Iâve worked with people living in active
              war zones.
              
              If you have the skills, they have the coin.
              
              They wonât hire some react guy in X country but someone who can
              find compiler bugs and save them XX+ million dollars a year? Heck
              yeah.
       
                stronglikedan wrote 21 hours 31 min ago:
                With seemingly the whole world rolling out new RTO mandates,
                location may not have been a factor recently, but may be
                lately.
       
                Degorath wrote 21 hours 50 min ago:
                Unfortunately, in 95% cases location IS a factor with bigger
                companies.
                
                I'm in a similar position where I'd like to do something a lot
                more interesting, but intersection between where the
                interesting companies have offices and where I'd be willing to
                live do not really overlap enough justify rooting up my life.
                
                (Unless we're talking about "too good to ignore", that's a
                different story.)
       
                  moomoo11 wrote 19 hours 40 min ago:
                  I was explicitly talking about too good to ignore.
                  
                  Anyone who can optimize a companyâs bottom line will be
                  hired.
                  
                  Like I said, no random average mid react guy or dime a dozen
                  Java developer is getting hired as a remote employee in some
                  flyover country.
                  
                  But if someone can provide like 50x value then hell yeah..
                  
                  I thought that was obvious in my message considering we are
                  discussing compiler optimization
       
                    ptsneves wrote 19 hours 7 min ago:
                    How do you rate yourself as higher than dime a dozen? I
                    work as a full remote dev but I am not sure I am anything
                    special, I mean how do you know that you are objectively
                    good.
       
                      moomoo11 wrote 18 hours 15 min ago:
                      Where did I say anything about myself? Sounds like
                      projection or some deep insecurities if you meant it
                      _that_ way.
                      
                      If you're asking what would constitute someone being
                      special, it would depend on the role and skillset. As I
                      said in my earlier comment, someone who is a beast and
                      can find and fix bugs in compilers is a rare person.
                      Especially if that skillset can help the company save
                      boatloads of money that can be deployed elsewhere.
                      
                      There are probably only a handful of people in the world
                      who understand and can push the AI landscape forward. A
                      lot of them are Chinese immigrants, and yet
                      OpenAI/Meta/etc are paying them boatloads of money.
                      
                      As for remote roles, I once worked on a project where we
                      hired some dude for like $500/hr as a contractor because
                      he was one of the few people who knew the inside/out of
                      postgres and oracle rdbms because we were doing some very
                      important migration.
       
                    Degorath wrote 19 hours 11 min ago:
                    (Yeah, I'd say your messaging was reasonably clear, but in
                    the context of the whole thread it wasn't obvious whether
                    the poster was putting themselves in that skill bucket.)
                    
                    I think there's also quite a big spectrum of skill, even
                    when we're talking about compiler optimization and highly
                    skilled software developers. I'd put myself up there, but
                    still I'm no Lars Bak (for whom Google allegedly created an
                    office in Denmark).
       
        Neywiny wrote 1 day ago:
        That's an incredible find and once I saw the assembly I was right along
        with them on the debug path. Interestingly it doesn't need to be
        assembly for this to work, it's just that that's where the split was.
        The IR could've done it, it just doesn't for very good reasons. So
        another win for being able to read arm assembly.
        
        Unsure if this would be another way to do it but to save an instruction
        at the cost of a memory access you could push then pop the stack size
        maybe? Since presumably you're doing that pair of moves on function
        entry and exit. I'm not really sure what the garbage collector is
        looking for so maybe that doesn't work, but I'd be interested to hear
        some takes on it
       
          pklausler wrote 22 hours 19 min ago:
          I'm a little surprised that this bug wasn't fixed in the assembler as
          a special case for immediate adds to RSP.  If the patch was to the
          compiler only, other instances of the bug could be lurking out there
          in aarch64 assembly code.
       
            Someone wrote 7 hours 48 min ago:
            Is that possible? I think you would have [1] to use a register to
            build up the immediate value. The assembler cannot/should not
            default to one, so I think the best one could do is having another
            macro for ADD that takes that helper register as an argument. That
            wouldnât fix other instances in the AArch64 assembly code.
            
            [1] Iâm not familiar with AMD64, but maybe, you could use a
            thread local (edit: wouldnât work with M:N threads. Youâd need
            a coroutine-local. That would tie the assembler to golang, and thus
            would, even on that alone, be a very bad idea) or reserve space in
            the stack frame for it, too, but I donât see those as realistic
            options
       
            moefh wrote 17 hours 47 min ago:
            Would that be wise? The implemented solution uses a temporary
            register to hold the full value being added to rsp.
            
            I don't know enough about how people use the go assembler, but I
            imagine it would be very surprising if `add $imm, rsp, rsp`
            clobbered an unrelated register when `$imm` is large enough.
            Especially since what's clobbered is the designated "temporary
            register", which I imagine is used all the time in handwritten go
            assembly.
       
              pklausler wrote 16 hours 25 min ago:
              Some architectures, and I believe aarch64 is one, have scratch
              registers reserved for being clobbered in special situations
              required by the assembler.
       
                saagarjha wrote 13 hours 21 min ago:
                No, I think thatâs just a MIPS thing.
       
                anyfoo wrote 15 hours 40 min ago:
                Not really, or at least not that I know if in the case of
                arm64. What you have is calling conventions that specify what
                one function/procedure/whatever can expect both from the caller
                and the callee's side.n
                 I.e. some registers are caller-saved, some are callee-saved,
                which basically means the called function can treat them as
                "scratch".
                
                Additionally, they call out interactions with the OS/execution
                environment. For example, x18 is the "platform register", and
                it's unspecified what the OS does with it. It's entirely
                possible that it clobbers it on context switch or during an
                interrupt or whatever. So don't use that one unless you have a
                contract with the OS itself.
                
                But locally, i.e. "from instruction to instruction", no such
                convention exists to my knowledge, and you probably don't want
                to have registers that pseudo-instructions might trash
                inadvertently in general, because it means you can't optimally
                use these registers.
                
                It's possible for pseudo-instructions or generally macros to be
                documented as, e.g., "this macro uses x3 as a temporary
                register and trashes it", but in my experience most macros that
                need additional temporary registers actually ask you to specify
                them as part of the macro invocation.
                
                E.g. suppose you have a macro "weirdhash" that takes two
                registers and saves some kind of hash of them in a third
                register, but that also needs an extra register to perform its
                work. You would call it with:
                
                    weirdhash x9, x10, x11, x0
                
                Where x0 would be the scratch register you don't care about.
       
                  adastra22 wrote 7 hours 48 min ago:
                  There are some architectures that do, but they're all old
                  RISC chips.
       
          Veserv wrote 22 hours 30 min ago:
          You would normally use the âLDR Rd, =exprâ pseudo-instruction
          form [1]. For immediates not directly constructible, it puts a copy
          of the immediate value in a PC-relative memory location, then does a
          PC-relative load into register.
          
          So that would turn the whole sequence of âadd constant to SPâ
          into 2 executable instructions, 1 for constructing immediate and 1
          for adding for a total of 8 bytes, and a 4 byte data area for the
          17-bit immediate for a total of 12 bytes of binary which is 3
          executable instructions worth.
          
   URI    [1]: https://developer.arm.com/documentation/dui0801/l/A64-Data-T...
       
            comex wrote 16 hours 25 min ago:
            I've usually seen compilers handle large constants with MOV/MOVK
            sequences (encoding 16 bits of data per 32-bit instruction) instead
            of loading them from memory.  Loading from memory was more common
            on 32-bit ARM.
       
          pjmlp wrote 23 hours 45 min ago:
          Usually in runtimes like Java and .NET there are safepoints exactly
          to avoid changing context in the middle of a set of instructions.
       
            andygocke wrote 23 hours 15 min ago:
            Yeah but we have codegen bugs in .NET as well. The biggest
            difference that stood out to me in this write up, is we would have
            gone straight for âcoredumpâ instead of other investigation
            tools. Our default mode of investigating memory corruption issues
            is dumps.
       
              pjmlp wrote 22 hours 3 min ago:
              Sure, I have experienced them, e.g. once in 2006 using IBM's JVM
              implementation with Websphere.
              
              However it is probably not as problematic due to the way Go
              allows for Assembly being used directly.
              
              While the JVM and CLR don't allow for direct access to Assembly
              code, Go does, thus I assume expecting safepoints everywhere is
              not an option, as any subroutine call can land on code that was
              manually written.
       
                yvdriess wrote 4 hours 25 min ago:
                Go users can only insert assembly wrapped in a function call.
                That might be safety related, I am not entirely sure.
                
                (Well technically there is a way to inject assembly without the
                function call overhead. That's what [1] is doing. But you will
                need to modify the runtime and compiler toolchain for it.)
                
   URI          [1]: https://pkg.go.dev/runtime/internal/atomic
       
          bloak wrote 1 day ago:
          > So another win for being able to read arm assembly.
          
          Yes, though that weird stuff with dollars in it is not normal AArch64
          assembly!
          
          The article could have mentioned the "stack moves once" rule.
       
            freep1zza wrote 6 hours 58 min ago:
            > Yes, though that weird stuff with dollars in it is not normal
            AArch64 assembly!
            
            See the AT&T vs Intel syntax since you aren't familiar with
            assembly:
            
   URI      [1]: https://en.wikipedia.org/wiki/X86_assembly_language#Syntax
       
              dpassens wrote 5 hours 50 min ago:
              That's an x86 thing, though.
       
            Neywiny wrote 23 hours 16 min ago:
            I've never heard of that rule (though tbh I'm not allocating > 64KB
            of stack when I'm in assembly) and it seems Google hasn't either.
            While I'm sure it makes sense, I don't think I've ever seen that be
            enforced. At least in C/C++. Maybe it makes more sense for these
            stack inspecting garbage collectors but I've also heard of ones
            that just scan the stack without unwinding anything. I did a test
            asking Google's AI to generate a complicated C function, put it in
            godbolt, and there's plenty of push push push push ..... Pop Pop
            Pop Pop going on
       
              mananaysiempre wrote 11 hours 20 min ago:
              > While I'm sure [bumping the stack pointer atomically] makes
              sense, I don't think I've ever seen that be enforced. At least in
              C/C++.
              
              Thatâs because the C ABI supports unwinding with a fairly
              expressive set of tools for describing stack-pointer state on a
              per-instruction level. Even the simpler Microsoft ABI essentially
              uses bytecode for that[1]; and on the more complicated Itanium
              ABI, you get DWARF CFI instructions, which make the correct way
              to preserve a(n x86) register in the function prologue look like
              
                push rbx
                .cfi_adjust_cfa_offset 8
                .cfi_rel_offset rbx, 8
              
              which are impossible to miss when reading compiler-generated
              assembly because of the sheer amount of annoying noise they
              create.
              
              The Go authors decided to sidestep all of this complexity, which
              is understandable to a degree, but apparently they did not think
              through all the ramifications of doing so.
              
   URI        [1]: https://learn.microsoft.com/en-us/cpp/build/exception-ha...
       
                dwattttt wrote 8 hours 22 min ago:
                MS's ARM64 unwinding ABI looks even more complicated:
                
   URI          [1]: https://learn.microsoft.com/en-us/cpp/build/arm64-exce...
       
                  mananaysiempre wrote 51 min ago:
                  Ehh I wouldnât say so (thanks for the correct link for
                  ARM64 though in any case). What you need to be comparing to
                  here is DWARF[1,2] section 6.4, and while itâs not as bad
                  as other parts of DWARF, I still think itâs plenty
                  complicated. [1] [2] Slightly modified by psABI[3] section
                  3.7 for x86-64 or the LSB[4] section 11.6 for ARM64, but at
                  this point thatâs a drop in the bucket as far as overall
                  complexity is concerned. [3] [2]
                  
   URI            [1]: https://dwarfstd.org/doc/DWARF5.pdf#page=171
   URI            [2]: https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/arti...
   URI            [3]: https://refspecs.linuxfoundation.org/LSB_4.0.0/LSB-C...
       
                    dwattttt wrote 3 min ago:
                    I was actually looking to point out MS's x64 ABI requires a
                    standardised function epilog since this bug occurred during
                    an epilog, only to find ARM64's epilogues are also
                    described by bytecode (at least at a cursory glance).
       
              JdeBP wrote 21 hours 38 min ago:
              You need to look at non-x86 architectures.  It was common years
              ago on MIPS.
              
              * [1] I wrote up the x86 equivalent of doing just two
              read-modify-write operations on the stack pointer over 16 years
              ago.
              
              *
              
   URI        [1]: https://jdebp.uk/FGA/function-perilogues.html#StandardMI...
   URI        [2]: https://jdebp.uk/FGA/function-perilogues.html#Standardx8...
       
              rcxdude wrote 22 hours 39 min ago:
              Did you compile with optimisations? I think GCC will do a bunch
              of activity on the stack with -O0, but it'll generally coalesce
              everything into one push/pop per function with optimisations (not
              because of any rule, but just because it's faster). alloca and
              other dynamic stack allocation may break this, but normal
              variables should in pretty much all just get turned into one
              block on the stack (with appropriate re-use of space if variable
              lifetimes don't overlap)
       
                ori_b wrote 16 hours 17 min ago:
                It will generate code to touch each page of the stack, because
                otherwise a very large stack allocation controlled by users
                (eg, in the case of a variable sized array) can be turned into
                a pointer to any location in memory by an attacker. Faulting in
                each page of the stack turns that into a crash.
                
                There was a userspace thread library I came across a long time
                ago that used variable length arrays to switch between thread
                stacks; the scheduler would allocate an array of the right size
                to bump the stack pointer to the different thread's stack.
       
                  saagarjha wrote 13 hours 19 min ago:
                  Wow, thatâs horrible.
       
                Neywiny wrote 20 hours 23 min ago:
                Yes
       
            pjmlp wrote 23 hours 46 min ago:
            It is due to the Plan 9 Assembly dialect most likely, because it
            wasn't enough that we already have differences between AT&T and
            Intel. [1] Still, I find great that Go got back the 1990's
            tradition that compiled languages have an assembler as part of
            their tooling, regardless of the syntax.
            
   URI      [1]: https://go.dev/doc/asm
       
          titzer wrote 1 day ago:
          I think the right fix is that the compiler should, e.g. load the
          constant into a register using two moves and then emit a single add.
          It's one more instruction, but then the adjustment is atomic (i.e. a
          single instruction). Another option is to do the arithmetic in a temp
          register and then move it back.
       
        javierhonduco wrote 1 day ago:
        Really enjoyed reading this. Thanks for writing it!
       
       
   DIR <- back to front page