gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   GCC 15.1
       
       
        pjmlp wrote 1 day ago:
        Interesting to see some improvements being done to Modula-2 frontend as
        well.
       
        fithisux wrote 1 day ago:
        Any Hope for HaikuOs + Winlibs. GDC would be greatly appreciated.
       
        codr7 wrote 1 day ago:
        Finally, musttail, can't wait to try that out.
       
        omoikane wrote 1 day ago:
        Really excited about #embed support:
        
        > C: #embed preprocessing directive support.
        
        > C++: P1967R14, #embed (PR119065)
        
        See also: [1] - Embed is in C23 (2022-07-23)
        
   URI  [1]: https://news.ycombinator.com/item?id=32201951
       
          NekkoDroid wrote 1 day ago:
          I'd really wish for an `std::embed<...>` that would be a consteval
          function (IIRC there is a proposal for this, but I don't know its
          status). The less pre-processor stuff going on the less there is to
          worry about, the syntax would end up much cleaner and you can create
          your own wrapper functions.
       
        elvircrn wrote 1 day ago:
        "C++ Modules have been greatly improved."
        
        It would be nice to know what these great improvements actually are.
       
          boris wrote 1 day ago:
          In GCC 14, C++ modules were unusable (incomplete, full of bugs, no
          std modules, etc). I haven't tried 15 yet but if that changed, then
          it definitely qualifies for a "great improvement".
       
            bluGill wrote 1 day ago:
            Still no std modules but otherwise likely useable. modules are
            ready for early adoptors to use and start writing the books on what
            you should do. (Not how to do it, those books are mostly written
            though not in print. How hou should as is was imbort std a good
            idea or shoule containers and algorithms been split - or maybe
            something I haven't though of)
       
          canucker2016 wrote 1 day ago:
          Later in the article, it mentions:
          
              Improved experimental support for C++23, including:
          
              std and std.compat modules (also supported for C++20).
          
          From [1] :
          
              The next major version of the GNU Compiler Collection (GCC),
          15.1, is expected to be released in April or May 2025.
          
              GCC 15 greatly improved the modules code. For instance, module
          std is now supported (even in C++20 mode).
          
   URI    [1]: https://developers.redhat.com/articles/2025/04/24/new-c-feat...
       
          artemonster wrote 1 day ago:
          those were the greatest improvements of all time. all of them. :D
       
        Calavar wrote 1 day ago:
        > {0} initializer in C or C++ for unions no longer guarantees clearing
        of the whole union (except for static storage duration initialization),
        it just initializes the first union member to zero. If initialization
        of the whole union including padding bits is desirable, use {} (valid
        in C23 or C++) or use -fzero-init-padding-bits=unions option to restore
        old GCC behavior.
        
        This is going to silently break so much existing code, especially union
        based type punning in C code. {0} used to guarantee full zeroing and {}
        did not, and step by step we've flipped the situation to the reverse.
        The only sensible thing, in terms of not breaking old code, would be to
        have both {0} and {} zero initialize the whole union.
        
        I'm sure this change was discussed in depth on the mailing list, but
        it's absolutely mind boggling to me
       
          not2b wrote 1 day ago:
          I'm skeptical of the claim that this change will "silently break so
          much existing code". For it to change the behavior of code, the first
          member would have to be smaller than other members, someone would
          have to use this construct to initialize union objects, and it would
          have to affect the behavior. In any case, it's standard for the
          Fedora, Ubuntu, and Debian developers to go through all the packages
          and test with new GCC versions before they come out, so that issues
          are fixed before the new compiler is released.
       
          zzo38computer wrote 1 day ago:
          I thought that {} should always initialize everything regardless of
          whether or not there is anything in between the braces, and that {0}
          should only be valid if the first member is a numeric or pointer type
          (but otherwise has the same effect as {} with nothing in between). I
          thought that would make more sense, isn't it?
          
          (If you write {} with multiple values when initializing a union, then
          it should be an error unless all of the values are the same and all
          of the corresponding members (the first few if you do not explicitly
          specify which ones) are of the same type as each other.)
       
            wahern wrote 1 day ago:
            C never had {} until C23. In C {0} was the only way to explicitly
            zero-initialize a structure in a generic manner. It works because
            in C initializer lists are applied to members as-if nested
            structures are flattened out lexically.
            
            However, a long time ago C++ went in a completely different
            direction with initializer lists, and gcc and clang started
            emitting warnings (in C mode) about otherwise perfectly valid C
            code, thus the adoption of C++'s {} for C23. {0} is still
            technically valid C23, though, as well as valid C89, C90, C99, and
            C11. In fact, reading both C23 and C89 I'm struck by how little the
            language has changed:
            
            C89 3.5.7p16:
            
            > If the aggregate contains members that are aggregates or unions,
            or if the first member of a union is an aggregate or union, the
            rules apply recursively to the subaggregates or contained unions.
            If the initializer of a subaggregate or contained union begins with
            a left brace, the initializers enclosed by that brace and its
            matching right brace initialize the members of the subaggregate or
            the first member of the contained union. Otherwise, only enough
            initializers from the list are taken to account for the members of
            the first subaggregate or the first member of the contained union;
            any remaining initializers are left to initialize the next member
            of the aggregate of which the current subaggregate or contained
            union is a part.
            
            C23 6.7.10p21:
            
            > If the aggregate or union contains elements or members that are
            aggregates or unions, these rules apply recursively to the
            subaggregates or contained unions. If the initializer of a
            subaggregate or contained union begins with a left brace, the
            initializers enclosed by that brace and its matching right brace
            initialize the elements or members of the subaggregate or the
            contained union. Otherwise, only enough initializers from the list
            are taken to account for the elements or members of the
            subaggregate or the first member of the contained union; any
            remaining initializers are left to initialize the next element or
            member of the aggregate of which the current subaggregate or
            contained union is a part.
       
          akoboldfrying wrote 1 day ago:
          Initialisation in C++ is just footguns all the way down.
       
          psyclobe wrote 1 day ago:
          There is no reason to use a union unless you're doing some C stuff;
          in which case just use C.
       
          Blikkentrekker wrote 1 day ago:
          I have to say, I've read the discussion this generated and it's a bit
          scary how no one seems to know whether type punning through unions is
          undefined or not in C, or rather, my conclusion reading it all is
          more so that many people are wrong and that is defined behavior, but
          some of the people who are wrong about it are actual GCC compiler
          developers so it can't be too easy to be right.
       
            krackers wrote 11 hours 56 min ago:
            I don't understand why newer revisions of C don't work on fixing
            these small issues. Things that were previously
            "undefined/implementation-defined behavior" can easily be made to
            behave sensibly without breaking anything. Type punning, 2s
            complement overflow, 0-initializtion of unions, all of those should
            "just behave" sensibly how the programmer expects. And you can
            already get there with the right compiler flags, so why not just
            codify it. It's also not going to break anything since it was
            undefined behavior in the first place.
       
              darthwalsh wrote 6 hours 50 min ago:
              C still supports a huge variety of embedded processors, which I
              imagine influences the overflow UB. But clearing up the type
              semantics would be nice.
       
                krackers wrote 6 hours 35 min ago:
                Are there any processors today which _don't_ use 2s complement?
       
                  Gibbon1 wrote 3 hours 24 min ago:
                  I use embedded processors. I don't know of any that don't use
                  2s complement. There are only a handful of increasingly
                  irreverent processors that are big endian. And x86 real mode
                  processors are long in the tooth.
                  
                  There other thing is the ratio of processing power vs memory
                  size is very high for embedded machines. You have processors
                  that can hold their own against a 486 but only have 16k of
                  RAM. And the marginal cost of performance is low. A lot of
                  devices spend most of their time doing utterly nothing.
       
          nikic wrote 1 day ago:
          Fun fact: GCC decided to adopt Clang's (old) behavior at the same
          time Clang decided to adopt GCC's (old) behavior.
          
          So now you have this matrix of behaviors:
          * Old GCC: Initializes whole union.
          * New GCC: Initializes first member only.
          * Old Clang: Initializes first member only.
          * New Clang: Initializes whole union.
       
            iamthejuan wrote 1 day ago:
            It is like an era of average.
       
            zeroq wrote 1 day ago:
            i will call it "webification" of C!
       
            homebrewer wrote 1 day ago:
            Since having multiple compilers is often touted as an advantage,
            how often do situations like what you're describing happen compared
            to the opposite â when a second compiler surfaces bugs in one's
            application or the other compiler?
       
            augusto-moura wrote 1 day ago:
            That's funny and sad at the same time.
            
            And it shows a deeper problem, even though they are willing to
            align behavior between each other, they failed to communicate and
            discuss what would be the best approach. That's a bit tragic, IMO
       
              Neywiny wrote 1 day ago:
              I would argue the even deeper problem is that it's implementation
              defined. Should be in the spec and they should conform to the
              spec. That's why I'm so paranoid and zeroize things myself. Too
              much hassle to remember what is or isn't zero.
       
                flohofwoe wrote 1 day ago:
                I wouldn't depend on that too much either though, or at least
                not depend on padding bytes being zeroed. The compiler is free
                to replace the memset call with code that only zeroes the
                struct members, but leaves junk in the padding bytes (and the
                same is true when copying/assigning a struct).
       
                  Gibbon1 wrote 9 hours 31 min ago:
                  Standard should be changed to require all uninitialized
                  memory be set to zero.
                  
                  Which includes padding bytes.
       
          anon-3988 wrote 1 day ago:
          lol this is exactly the kind of stuff I expects from C or C++ haha
          its kinda insane people just decide to do this amidst all the talk
          about correctness/safety.
       
          mastax wrote 1 day ago:
          Do distros have tooling to deal with this type of change?
          
          I imagine it would be very useful to be able to search through all
          the C/C++ source files for all the packages in the distro in a
          semantic manner, so that it understands typedefs and preprocessor
          macros etc. The search query for this change would be something like
          "find all union types whose first member is not its largest member,
          then find all lines of code where that type is initialized with
          `{0}`".
       
            ris wrote 20 hours 38 min ago:
            Distributions tend to use shell-script-wrapped compilers that can
            inject additional flags desired by the distribution, and in all
            likelihood distributions will just add flags that force the old
            behaviour if there are problems.
       
            ryao wrote 1 day ago:
            As a retired Gentoo developer, I can say not really as far as I
            know. There could be static analysis tools that can find this, but
            I am not aware of anyone who runs them on the entire distribution.
       
              mastax wrote 1 day ago:
              In theory it's just an extension of IDE tooling. A CLI with a
              little query language wrapping libclang. In practice I'm sure
              it's a nightmare just to get 20,000 packages' build systems
              wrangled such that the right source files get indexed by
              libclang, and all the endless plumbing for downloading packages
              and reporting results, and on and on.
       
                ryao wrote 1 day ago:
                Distribution build systems typically operate outside of an IDE.
                I suspect that it would be a nightmare to get 20,000 packages
                to compile in an IDE.
                
                It is possible in theory to write a compiler plugin to generate
                an error when code that does this is found and it would make it
                easy to find all of the instances in all packages by building
                with `make -k`, provided that the code is not hidden behind an
                unused package flag.
       
          myrmidon wrote 1 day ago:
          I honestly feel that "uninitialized by default" is strictly a
          mistake, a relic from the days when C was basically cross-platform
          assembly language.
          
          Zero-initialized-by-default for everything would be an extremely
          beneficial tradeoff IMO.
          
          Maybe with a __noinit attribute or somesuch for the few cases where
          you don't need a variable to be initialized AND the compiler is too
          stupid to optimize the zero-initialization away on its own.
          
          This would not even break existing code, just lead to a few easily
          fixed performance regressions, but it would make it significantly
          harder to introduce undefined and difficult to spot behavior by
          accident (because very often code assumes zero-initialization and
          gets it purely by chance, and this is also most likely to happen in
          the edge cases that might not be covered by tests under memory
          sanitizer if you even have those).
       
            nullc wrote 1 day ago:
            Zero initializing often hides real and serious bugs, however.  Say
            you have a function with an internal variable LEN that ought to get
            set to some dynamic length that internal operations will run over. 
            Changes to the code introduce a path which skips the setting of
            LEN.  Current compilers will (very likely) warn you about the
            potentially uninitialized use, valgrind will warn you (assuming the
            case gets triggered), and failing all that the program will
            potentially crash when some large value ends up in LEN-- alerting
            you to the issue.
            
            Compare with default zero init:  The compiler won't warn you,
            valgrind won't warn you, and the program won't crash.  It will just
            be silently wrong in many cases (particularly for length/count
            variables).
            
            Generally the attention to exploit safety can sometimes push us in
            directions that are bad for program correctness.  There are many
            places where exploit safety is important, but also many cases where
            its irrelevant.  For security it's generally 'safe' is a program
            erroneously shuts down or does less than it should but that is far
            from true for software generally.
            
            I prefer this behavior:  Use of an uninitialized variable is an
            error which the compiler will warn about, however, in code where
            the compiler cannot prove that it is not used the compiler's
            behavior is implementation defined and can include trapping on use,
            initializing to zero, or initializing to ~0 (the complement of
            zero) or other likely to crash pattern.  The developer may annotate
            with _noinit which makes any use UB and avoids the cost of
            inserting a trap or ~0 initialization.    ~0 init will usually fail
            but seldom in a silent way, so hopefully at least any user reports
            will be reproducible.
            
            Similar to RESTRICT _noinit is a potential footgun, but its usage
            would presumably be quite rare and only in carefully maintained
            performance critical code.  Code using _noinit like RESTRICT is at
            least still more maintainable than assembly.
            
            This approach preserves the compiler's ability to detect programmer
            error, and lets the implementation pick the preferred way to handle
            the remaining error. In some contexts it's preferable to trap
            cleanly or crash reliably (init to ~0 or explicit trap), in others
            its better to be silently wrong (init 0).
            
            Since C99 lets you declare variables wherever so it is often easy
            to just declare a variable where it is first set and that's
            probably best, of course.  .. when you can.
       
            bluGill wrote 1 day ago:
            C++26 has everything initialiied by default. The value is not
            specified though. Implementations are encourage to use something
            weird to detect using before explict initialization.
       
            rwmj wrote 1 day ago:
            GCC now supports
            -ftrivial-auto-var-init=[zero|uninitialized|pattern] for stack
            variables [1] For malloc, you could use a custom allocator, or
            replace all the calls with calloc.
            
   URI      [1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#i...
       
              myrmidon wrote 1 day ago:
              Very nice, did not know about this!
              
              The only problem with vendor extensions like this is that you
              can't really rely on it, so you're still kinda forced to keep all
              the (redundant) zero intialization; solving it at the language
              level is much nicer. Maybe with C2030...
       
            bjourne wrote 1 day ago:
            There are many low-level devices where initialization is very
            expensive. It may mean that you need two passes through memory
            instead of one, making whatever code you are running twice as slow.
       
              nullc wrote 1 day ago:
              meh, the compiler can almost always eliminate the spurious
              default initialization because it can prove that first use is the
              variable being set by the real initialization.    The only time the
              redundant initialization will be emitted by an optimizing
              compiler is when it can't prove its redundant.
              
              I think the better reason to not default initialize as a part of
              the language syntax is that it hides bugs.
              
              If the developers intent is that the correct initial state is 0
              they should just explicitly initialize to zero.  If they haven't,
              then they must intend that the correct initial state is the
              dynamic one in their code and the compiler silently slipping in a
              0 in cases the programmer overlooked is a missed opportunity to
              detect a bug due to the programmer under-specifying the program.
       
                bluecalm wrote 1 day ago:
                It only works for simple variables where initialisation to 0 is
                counter productive because you lose a useful compiler warning
                (about using initialised variable).
                
                The main case is about arrays. Here it's often impossible to
                prove some part of it is used before initialisation. There is
                no warning. It becomes a tradeoff: potentially costly
                initialisation (arrays can be very big) or potentially using
                random values other than 0.
       
                  nullc wrote 17 hours 37 min ago:
                  Fair point though compilers could presumably do much better
                  warning there on arrays-- at least treating the whole array
                  like a single variable and warning when it knows you've read
                  it without ever reading for it.
       
                RustyRussell wrote 1 day ago:
                In recent years I've come to rely on this non-initialization
                idiom. Both because as code paths change the compiler can warn
                for simple cases, and because running tests under Valgrind
                catches it.
       
              modeless wrote 1 day ago:
              Ok, those developers can use a compiler flag. We need defaults
              that work better for the vast majority.
       
                bjourne wrote 1 day ago:
                Then why are you using C? :P
       
                  01HNNWZ0MV43FF wrote 1 day ago:
                  I'm not, looks like a bad language with worse implementations
       
                    nullc wrote 1 day ago:
                    C is a bad language, too bad all the others are even worse.
                    :P
       
              myrmidon wrote 1 day ago:
              I would argue that these cases are pretty rare, and you could
              always get nominal performance with the __noinit hint, but I
              think this would seldomly even be needed.
              
              If you have instances of zero-initialized structs where you set
              individual fields after the initialization, all modern compiler
              will elide the dead stores in the the typical cases already
              anyway, and data of relevant size that is supposed to stay
              uninitialized for long is rare and a bit of an anti-pattern in my
              opinion anyway.
       
            elromulous wrote 1 day ago:
            Devil's advocate: this would be unacceptable for os kernels and
            super performance critical code (e.g. hft).
       
              saagarjha wrote 1 day ago:
              The same OS kernel that zeros out pages before handing them back
              to me?
       
                frontfor wrote 1 day ago:
                This is arguing in bad faith. Just because the kernel does that
                doesnât mean it does that in everywhere else.
       
                  saagarjha wrote 9 hours 41 min ago:
                  The point is that there are security implications to not
                  zeroing out memory, even if it costs performance. Making an
                  argument that itâs too performance sensitive to do anything
                  doesnât actually hold water.
       
              TuxSH wrote 1 day ago:
              > this would be unacceptable for os kernels
              
              Depends on the boundary. I can give a non-Linux, microkernel
              example (but that was/is shipped on dozens of millions of
              devices):
              
              - prior to 11.0, Nintendo 3DS kernel SVC (syscall)
              implementations did not clear output parameters, leading to
              extremely trivial leaks. Unprivileged processes could retrieve
              kernel-mode stack addresses easily and making exploit code much
              easier to write, example here: [1] - Nintendo started clearing
              all temporary registers on the Switch kernel at some point (iirc
              x0-x7 and some more); on the 3DS they never did that, and you can
              leak kernel object addresses quite easily (iirc by reading r2),
              this made an entire class of use-after-free and arbwrite bugs
              easier to exploit (call SvcCreateSemaphore 3 times, get sema
              kernel object address, use one of the now-patched exploit that
              can cause a double-decref on the KSemaphore, call
              SvcWaitSynchronization, profit)
              
              more generally:
              
              - unclearead padding in structures + copy to user = infoleak
              
              so one at least ought to be careful where crossing privilege
              boundaries
              
   URI        [1]: https://github.com/TuxSH/universal-otherapp/blob/master/...
       
              pjmlp wrote 1 day ago:
              It is acceptable enough for Windows, Android and macOS, that have
              been doing for at least the last five years.
              
              That is the usual fearmongering when security improvements are
              done to C and C++.
       
              myrmidon wrote 1 day ago:
              No, just throw the __noinit attribute at every place where its
              needed.
              
              You probably would not even need it in a lot of instances because
              the compiler would elide lots of dead stores (zeroing) even
              without hinting.
       
              sidkshatriya wrote 1 day ago:
              Would you rather have a HFT trade go correctly and a few
              nanoseconds slower or a few nanoseconds faster but with some edge
              case bugs related to variable initialisation ?
              
              You might claim that that you can have both but bugs are more
              inevitable in the uninitialised by default scenario. I doubt that
              variable initialisation is the thing that would slow down HFT. I
              would posit is it things like network latency that would
              dominate.
       
                hermitdev wrote 1 day ago:
                > Would you rather have a HFT trade go correctly and a few
                nanoseconds slower or a few nanoseconds faster but with some
                edge case bugs related to variable initialisation ?
                
                As someone who works in the HFT space: it depends. How
                frequently and how bad are the bad-trade cases?  Some slop
                happens. We make trade decisions with hardware _without even
                seeing an entire packet coming in on the network_. Mistakes/bad
                trades happen. Sometimes it results in trades that don't go our
                way or missed opportunities.
                
                Just as important as "can we do better?" is "should we do
                better?". Queue priority at the exchange matters. Shaving
                nanoseconds is how you get a competitive edge.
                
                > I would posit is it things like network latency that would
                dominate.
                
                Everything matters. Everything is measured.
                
                edit to add:  I'm not saying we write software that either has
                or relies upon unitialized values. I'm just saying in such a
                hypothetical, it's not a cut and dry "do the right thing
                (correct according to the language spec)" decision.
       
                  Imustaskforhelp wrote 1 day ago:
                  We make trade decisions with hardware _without even seeing an
                  entire packet coming in on the network_
                  
                  Wait what????
                  
                  Can you please educate me on high frequency trading... , like
                  I don't understand what's the point of it & lets say one
                  person has created a hft bot then why the need of other bot
                  other than the fact of different trading strats and I don't
                  think these are profitable / how they compare in the long run
                  with the boglehead strategy??
       
                    hermitdev wrote 1 day ago:
                    This is a vast, _vast_ over-simplification: The primary
                    "feature" of HFT is providing liquidity to market.
                    
                    HFT firms are (almost) always willing to buy or sell at or
                    near the current market price. HFT firms basically race
                    each other for trade volume from "retail" traders (and
                    sometimes each other). HFTs make money off the spread - the
                    difference between the bid & offer - typically only a cent.
                    You don't make a lot of money on any individual trade (and
                    some trades are losers), but you make money on doing a lot
                    of volume.  If done properly, it doesn't matter which
                    direction the market moves for an HFT, they'll make money
                    either way as long as there's sufficient trading volume to
                    be had.
                    
                    But honestly, if you want to learn about HFT, best do some
                    actual research on it - I'm not a great source as I'm just
                    the guy that keeps the stuff up and running; I'm not too
                    involved in the business side of things.  There's a lot of
                    negative press about HFTs, some positive.
       
          mtklein wrote 1 day ago:
          This was my instinct too, until I got this little tickle in the back
          of my head that maybe I remembered that Clang was already acting like
          this, so maybe it won't be so bad.  Notice 32-bit wzr vs 64-bit xzr:
          
              $ cat union.c && clang -O1 -c union.c -o union.o && objdump -d
          union.o
              union foo {
              float  f;
              double d;
              };
          
              void create_f(union foo *u) {
              *u = (union foo){0};
              }
          
              void create_d(union foo *u) {
              *u = (union foo){.d=0};
              }
          
              union.o: file format mach-o arm64
          
              Disassembly of section __TEXT,__text:
          
              0000000000000000 :
                 0: b900001f        str wzr, [x0]
                 4: d65f03c0        ret
          
              0000000000000008 :
                 8: f900001f        str xzr, [x0]
                 c: d65f03c0        ret
       
            mtklein wrote 1 day ago:
            Ah, I can confirm what I see elsewhere in the thread, this is no
            longer true in Clang.  That first clang was Apple Clang 17---who
            knows what version that actually is---and here is Clang 20:
            
                $ /opt/homebrew/opt/llvm/bin/clang-20 -O1 -c union.c -o union.o
            && objdump -d union.o
            
                union.o: file format mach-o arm64
            
                Disassembly of section __TEXT,__text:
            
                0000000000000000 :
                   0: f900001f        str xzr, [x0]
                   4: d65f03c0        ret
            
                0000000000000008 :
                   8: f900001f        str xzr, [x0]
                   c: d65f03c0        ret
       
              dzaima wrote 1 day ago:
              Looks like that change is clang â¤19 to clang 20:
              
   URI        [1]: https://godbolt.org/z/7zrocxGaq
       
          mistrial9 wrote 1 day ago:
          using UNION was always considered sketchy IMHO. This is trivia for
          security exploiters?
       
            grandempire wrote 1 day ago:
            No. This is how sum types are implemented.
            
            And from a runtime perspective itâs going to be a struct with
            perhaps more padding. Youâll need more details about your
            specific threat model to explain why thatâs bad.
       
              mistrial9 wrote 1 day ago:
              a quick search says that std::variant is the modern replacement
              to implement your niche feature "sum types"
       
                soraminazuki wrote 1 day ago:
                Whoa, that's a core building block of programming and computer
                science that you're dismissing as "niche" without explanation.
       
                  mistrial9 wrote 1 day ago:
                  yes types are a  core building block of programming and
                  computer science, but not using UNION ?  this casual
                  dismissal of "criticisms of UNION" here seems superficial and
                  un-wise to me.
       
                    soraminazuki wrote 1 day ago:
                    Sum types, not C unions. Different concepts.
                    
                    A sum type is a concept from type theory. Like unions, it
                    expresses a type that can be either one of multiple types.
                    But unlike unions, it retains information about which type
                    it is.
                    
                    Properly implemented sum types are completely type safe. I
                    can't be 100% sure what your particular "criticisms" of C
                    unions precisely are, but assuming they all relate to type
                    safety, they don't apply to sum types.
                    
                    Sum types are important because any real world project has
                    to deal with data that's either A or B. There's nothing
                    controversial here.
                    
                    In C, a union is a way to implement that. Yes, it's unsafe.
                    But can you eliminate the use of unsafe features from C
                    projects? No, if they deal with memory.
                    
                    Also, it's rich and quite frankly rude to brush off my
                    comment as "casual dismissals," "superficial," and "unwise"
                    when it's a direct response to this.
                    
                    > your niche feature "sum types"
                    
                    That's pure unprovoked smugness right there that contains
                    no substance of what your criticisms actually are, let
                    alone the reason.
       
                jlouis wrote 1 day ago:
                Not a niche feature. Fundamental for any decent language with a
                type system.
       
                  mistrial9 wrote 1 day ago:
                  ok, but C99 and C++11 and others, all have ways to implement
                  types. "Fundemental" as you say.. using UNION in C++ is not a
                  good choice to implement types.. in old C99, you can use
                  UNION that way but why? footguns all around.
       
                grandempire wrote 1 day ago:
                Thatâs for C++. And how is std::variant implemented?
       
                  LowLevelMahn wrote 1 day ago:
                  not using a union: [1] because the union can't be extended
                  with variadic template types
                  
   URI            [1]: https://ojdip.net/2013/10/implementing-a-variant-typ...
       
                    LegionMammal978 wrote 1 day ago:
                    Actually, it does use a union, in both libstdc++ [0] and
                    libc++ [1]. (Underneath a lengthy stack of base classes,
                    since it wouldn't be C++ if it weren't painful to match the
                    specified semantics.)
                    
                    [0] [1]
                    
   URI              [1]: https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstd...
   URI              [2]: https://github.com/llvm/llvm-project/blob/llvmorg-...
       
                    grandempire wrote 1 day ago:
                    So instead it has a buffer large enough to hold all the
                    types? Thatâs what union does.
                    
                    Still waiting to hear the security concerns.
       
          ogoffart wrote 1 day ago:
          > This is going to silently break so much existing code
          
          The code was already broken. It was an undefined behavior.
          
          That's a problem with C and it's undefined behavior minefields.
       
            mwkaufma wrote 1 day ago:
            Undefined in the standard doesn't mean undefined in GCC.
            Type-punning through unions has always been a special case that GCC
            has taken care with beyond the standard.
       
            grandempire wrote 1 day ago:
            When you have a big system many people rely on you generally try to
            look for ways to keep their code working - not look for the changes
            youâre contractually allowed to make.
            
            GCC probably has a better justification than âwe are allowed
            toâ.
       
              arp242 wrote 1 day ago:
              > GCC probably has a better justification than âwe are allowed
              toâ.
              
              Maybe, but I've seen GCC people justify such changes with little
              more than "it's UB, we can change it, end of story", so I
              wouldn't assume it.
       
            ryao wrote 1 day ago:
            GCC has long been known to define undefined behavior in C unions.
            In particular, type punning in unions is undefined behavior under
            the C and C++ standards, but GCC (and Clang) define it.
       
              flohofwoe wrote 1 day ago:
              > type punning in unions is undefined behavior under the C and
              C++ standards
              
              Union type punning is entirely valid in C, but UB in C++ (one of
              the surprisingly many subtle but still fundamental differences
              between C and C++). There's specifically a (somewhat obscure)
              footnote about this in the C standard, which also has been more
              clarified in one of the recent C standards.
       
                ryao wrote 1 day ago:
                There is no footnote about it in the C standard. Someone
                proposed adding one to standardize the behavior, but it was
                never accepted. Ever since then, people keep quoting it even
                though it is a rejected amendment.
       
                  jcranmer wrote 1 day ago:
                  Footnote 107 in C23, on page 75 in Â§6.5.2.3:
                  
                  > If the member used to read the contents of a union object
                  is not the same as the member last used to store a value in
                  the object the appropriate part of the object representation
                  of the value is reinterpreted as an object representation in
                  the new type as described in 6.2.6 (a process sometimes
                  called type punning). This might be a non-value
                  representation.
                  
                  (though this footnote has been present as far back as C99,
                  albeit with different numbers as the standard has added more
                  text in the intervening 24 years).
       
                    ryao wrote 1 day ago:
                    The GCC developers disagree with your interpretation:
                    
                    > Type punning via unions is undefined behavior in both c
                    and c++.
                    
   URI              [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11814...
       
                      nialv7 wrote 17 hours 47 min ago:
                      I wouldn't be surprised if Andrew Pinski was just wrong.
                      It's anecdotal but my impression of him isn't very good.
       
                      flohofwoe wrote 1 day ago:
                      I'm not sure tbh what's there to 'interpret' or how a
                      compiler developer could misread that, the wording is
                      quite clear.
       
                        ryao wrote 1 day ago:
                        It is an excerpt being taken out of context. Of course
                        it is quite clear. Taking it out of context ignores
                        everything else that the standard says. That
                        interpretation is wrong as far as compiler authors are
                        concerned.
       
                          trealira wrote 1 day ago:
                          The context is that it's a footnote. The footnote is
                          referenced in this paragraph:
                          
                          A postfix expression followed by the . operator and
                          an identifier designates a member of a structure or
                          union object. The value is that of the named member
                          (106), and is an lvalue if the first expression is an
                          lvalue. If the first expression has qualified type,
                          the result has the so-qualified version of the type
                          of the designated member.
                          
                          106) If the member used to read the contents of a
                          union object is not the same as the member last used
                          to store a value in the object the appropriate part
                          of the object representation of the value is
                          reinterpreted as an object representation in the new
                          type as described in 6.2.6 (a process sometimes
                          called type punning). This might be a non-value
                          representation.
                          
                          In that same document, union type punning is
                          explicitly listed under Annex J.1, Unspecified
                          Behavior:
                          
                          (11) The values of bytes that correspond to union
                          members other than the one last stored into
                          (6.2.6.1).
                          
                          The standard is extremely clear and explicit that
                          it's not undefined behavior.
       
                            ryao wrote 1 day ago:
                            This is not considering the document as a whole. I
                            will defer to the GCC developers on what the
                            document means on this.
       
                              jcranmer wrote 1 day ago:
                              I am a member of the C standards committee, and
                              I'm telling you you're wrong here. Martin Uecker
                              is also member of the C standards committee, and
                              has just responded to that bug saying that the
                              comment you linked is wrong. I, and others here,
                              have quoted literal standards text to you
                              explaining why type punning through unions is
                              well-defined behavior in C.
                              
                              I don't know who Andrew Pinski is, but they're
                              factually incorrect regarding the legality of
                              type punning via unions in C.
       
                                uecker wrote 1 day ago:
                                Andrew is a GCC developer who is very competent
                                (much more than myself regarding GCC), but I
                                think he was mistakenly assuming the C++ rules
                                apply to C here as well.
       
                              trealira wrote 1 day ago:
                              I'm interested in hearing how considering the
                              document as a whole leads to a different
                              conclusion.
       
              mat_epice wrote 1 day ago:
              EDIT: This comment is wrong, see fsmvâs comment below. Leaving
              for posterity because Iâm no coward!
              
              - - -
              
              Undefined behavior only means that the spec leaves a particular
              situation undefined and that the compiler implementor can do
              whatever they want. Every compiler defines undefined behavior,
              whether itâs documented (or easy to qualify, or deterministic)
              or not.
              
              It is in poor taste that gcc has had widely used, documented
              behaviors that are changing, especially in a point release.
       
                fsmv wrote 1 day ago:
                I think you're confusing unspecified and undefined behavior. UB
                could do something randomly different every time and
                unspecified must chose an option.
                
                In a lot of cases in optimizing compilers they just assume UB
                doesn't exist. Yes technically the compiler does do something
                but there's still a big difference between the two.
       
                  mat_epice wrote 1 day ago:
                  Thanks, youâre right, I was mistaken.
       
              mtklein wrote 1 day ago:
              I have always thought that punning through a union was legal in C
              but UB in C++, and that punning through incompatible pointer
              casting was UB in both.
              
              I am basing this entirely on memory and the wikipedia article on
              type punning.  I welcome extremely pedantic feedback.
       
                jcranmer wrote 1 day ago:
                > punning through a union was legal in C
                
                In C89, it was implementation-defined. In C99, it was made
                expressly legal, but it was erroneously included in the list of
                undefined behavior annex. From C11 on, the annex was fixed.
                
                > but UB in C++
                
                C++11 adopted "unrestricted unions", which added a concept of
                active members that is UB to access other members unless you
                make them active. Except active members rely on constructors
                and destructors, which primitive types don't have, so the
                standard isn't particularly clear on what happens here. The
                current consensus is that it's UB.
                
                C++20 added std::bit_cast which is a much safer interface to
                type punning than unions.
                
                > punning through incompatible pointer casting was UB in both
                
                There is a general rule that accessing an object through an
                'incompatible' lvalue is illegal in both languages. In general,
                changing the const or volatile qualifier on the object is
                legal, as is reading via a different signed or unsigned
                variant, and char pointers can read anything.
       
                  ryao wrote 1 day ago:
                  The GCC developers disagree as of last December:
                  
                  > Type punning via unions is undefined behavior in both c and
                  c++.
                  
   URI            [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141#...
       
                    saagarjha wrote 1 day ago:
                    I think they're wrong about C.
       
                  trealira wrote 1 day ago:
                  > In C99, it was made expressly legal, but it was erroneously
                  included in the list of undefined behavior annex.
                  
                  In C99, union type punning was put under Annex J.1, which is
                  unspecified behavior, not undefined behavior. Unspecified
                  behavior is basically implementation-defined behavior, except
                  that the implementor is not required to document the
                  behavior.
       
                    ryao wrote 1 day ago:
                    We can use UB to refer to both. :)
       
                      hermitdev wrote 1 day ago:
                      > We can use UB to refer to both. :)
                      
                      You can, but in the context of the standard, you'd be
                      wrong to do so. Undefined behavior and unspecified
                      behavior have specific, different, meanings in context of
                      the C and C++ standards.
                      
                      Conflate them at your own peril.
       
                      trealira wrote 1 day ago:
                      Maybe, but we were talking about "undefined behavior,"
                      not "UB," so the point is moot.
       
                jotux wrote 1 day ago:
                Saw this recently and thought it was good:
                
   URI          [1]: https://www.youtube.com/watch?v=NRV_bgN92DI
       
                ryao wrote 1 day ago:
                There has been plenty of misinformation spread on that. One of
                the GCC developers told me explicitly that type punning through
                a union was UB in C, but defined by GCC when I asked (after I
                had a bug report closed due to UB). I could find the bug report
                if I look for it, but I would rather not do the search.
       
                  uecker wrote 1 day ago:
                  Union type punning is allowed and supported by GCC:
                  
   URI            [1]: https://godbolt.org/z/vd7h6vf5q
       
                    ryao wrote 1 day ago:
                    I said that GCC defines type punning via unions. It is an
                    extension to the C standard that GCC did.
                    
                    That said, using âthe code compiles in godboltâ as
                    proof that it is not relying on what the standard specifies
                    to be UB is fallacious.
       
                      uecker wrote 1 day ago:
                      I am a member of the standards committee and a GCC
                      maintainer. The C standard supports union punning. (You
                      are right though that relying on godbolt examples can be
                      misleading.)
       
                  jotux wrote 1 day ago:
                  
                  
   URI            [1]: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options....
       
                    ryao wrote 1 day ago:
                    What is your point? I already said that GCC defines it even
                    though the C standard does not. As per the GCC developers:
                    
                    > Type punning via unions is undefined behavior in both c
                    and c++.
                    
   URI              [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11814...
       
                      jotux wrote 1 day ago:
                      > One of the GCC developers told me explicitly that type
                      punning through a union was UB in C, but defined by GCC
                      when I asked
                      
                      I just was citing the source of this for reference.
       
                        ryao wrote 1 day ago:
                        I see. Carry on then. :)
       
                  trealira wrote 1 day ago:
                  From a draft of the C23 standard, this is what it has to say
                  about union type punning:
                  
                  > If the member used to read the contents of a union object
                  is not the same as the member last used to store a value in
                  the object the appropriate part of the object representation
                  of the value is reinterpreted as an object representation in
                  the new type as described in 6.2.6 (a process sometimes
                  called type punning). This might be a non-value
                  representation.
                  
                  In past standards, it said "trap representation" rather than
                  "non-value representation," but in none of them did it say
                  that union type punning was undefined behavior. If you have a
                  PDF of any standard or draft standard, just doing a search
                  for "type punning" should direct you to this footnote
                  quickly.
                  
                  So I'm going to say that if the GCC developer explicitly said
                  that union type punning was undefined behavior in C, then
                  they were wrong, because that's not what the C standard says.
       
                    ryao wrote 1 day ago:
                    Here is what was said:
                    
                    > Type punning via unions is undefined behavior in both c
                    and c++. [1] Feel free to start a discussion on the GCC
                    mailing list.
                    
   URI              [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11814...
       
                      trealira wrote 1 day ago:
                      I actually might, although not now. Thanks for the link.
                      I'm surprised he directly contradicted the C standard,
                      rather than it just being a misunderstanding.
       
                        ryao wrote 1 day ago:
                        According to another comment, the C standard
                        contradicts the C standard on this: [1] Taking snippets
                        of the C standard out of context of the whole seems to
                        result in misunderstandings on this.
                        
   URI                  [1]: https://news.ycombinator.com/item?id=43794268
       
                          trealira wrote 1 day ago:
                          It doesn't. That commenter is saying that in C99, it
                          was unspecified behavior. Since C11 onward, it's been
                          removed from the unspecified behavior annex and type
                          punning is allowed, though it may generate a
                          trap/non-value representation. It was never undefined
                          behavior, which is different.
                          
                          Edit: no, it's still in the unspecified behavior
                          annex, that's my mistake. It's still not undefined,
                          though.
       
                            ryao wrote 1 day ago:
                            Most of the C code I write is C99 code, so it is
                            undefined behavior either way for me (if I care
                            about compilers other than GCC and Clang).
                            
                            That said, I am going to defer to the GCC
                            developers on this since I do not have time to make
                            sense of all versions of the C standard.
       
                              trealira wrote 1 day ago:
                              That's fair. In the end, what matters is how C is
                              implemented in practice on the platforms your
                              code targets, not  what the C standard says.
       
                    amboar wrote 1 day ago:
                    Section J.1 _Unspecified_ behavior says
                    
                    > (11) The values of bytes that correspond to union members
                    other than the one last stored into (6.2.6.1).
                    
                    So it's a little more constrained in the ramifications, but
                    the outcomes may still be surprising. It's a bit
                    unfortunate that "UB" aliases to both "Undefined behavior"
                    and "Unspecified behavior" given they have subtly different
                    definitions.
                    
                    From section 4 we have:
                    
                    > A program that is correct in all other aspects, operating
                    on correct data, containing unspecified behavior shall be a
                    correct program and act in accordance with 5.1.2.4.
       
          ryao wrote 1 day ago:
          > This is going to silently break so much existing code
          
          How much code actually uses unions this way?
          
          >  especially union based type punning in C code
          
          I have never done type punning via the GNU C compiler extension in a
          way that would break because of this. I always assign a value to it
          and then get out the value from a new type. Do you know of any code
          that does things differently to be affected by this?
       
            ndiddy wrote 1 day ago:
            > How much code actually uses unions this way?
            
            I see this change caused Mbed-TLS to start failing its test suite
            when compiled with GCC 15: [1] (kinda scary since it's a security
            library). Hopefully other projects with less rigorous test suites
            aren't using {0} in that way. The Github issue mentions that Clang
            tried a similar optimization a while ago and backed it out after
            user complaints, so maybe the same thing will happen with GCC.
            
   URI      [1]: https://github.com/Mbed-TLS/mbedtls/issues/9814
       
              ryao wrote 1 day ago:
              GCCâs developers have a strong insistence on standards
              conformance (minus situations where they explicitly choose to
              deviate, like type punning in unions) over the status quo. We
              already went through a much more severe shift with strict
              aliasing enforcement by GCC and they never changed course. I do
              not expect this to be any different.
       
            Calavar wrote 1 day ago:
            I would guess a lot. People aren't intimately familiar with the
            standard, and people are lazy when it comes to writing boilerplate
            like initialization code. And up until now, it just worked, so even
            a good test suite wouldn't catch it.
            
            EDIT: I initially mentioned type punning for arithmetic, but this
            compiler change wouldn't affect that
       
              ryao wrote 1 day ago:
              How would that be broken by this? The union will be zero
              initialized regardless because this change only affects
              situations where the union members are of different lengths, but
              for integer to float, the union members should always be the same
              length or bad things will happen.
       
                Calavar wrote 1 day ago:
                I realized my mistake and I think I edited my comment a split
                second before you replied, but you're right. That particular
                type punning scenario wouldn't be affected by this change
                because 1) the members are the same size, so there's no padding
                bits 2) the specific union member is going to be initialized to
                the input parameter, not with the syntax sugar for aggregate
                zero initialization.
       
                  ryao wrote 1 day ago:
                  Well, under your original version, I could see someone
                  filling in bit fields in the float like the exponent and sign
                  while leaving the mantissa zeroed, but given that the integer
                  and float would be the same length, there is no section that
                  would be left uninitialized by this change.
                  
                  In order for this change to leave something uninitialized,
                  you would need to have a member of the union after the first
                  member that is longer than the first member. Code that does
                  that and relies on {0} to zero the union seems incredibly
                  rare to me.
       
          VyseofArcadia wrote 1 day ago:
          I feel like once a language is standardized (or reaches 1.0), that's
          it. You're done. No more changes. You wanna make improvements? Try
          out some new ideas? Fine, do that in a new language.
          
          I can deal with the footguns if they aren't cheekily mutating over
          the years. I feel like in C++ especially we barely have the time to
          come to terms with the unintended consequences of the previous
          language revision before the next one drops a whole new load of them
          on us.
       
            Ragnarork wrote 1 day ago:
            > I feel like once a language is standardized (or reaches 1.0),
            that's it. You're done. No more changes. You wanna make
            improvements? Try out some new ideas? Fine, do that in a new
            language.
            
            Thank goodness this is not how the software world works overall.
            I'm not sure you understand the implications of what you ask for.
            
            > if they aren't cheekily mutating over the years
            
            You're complaining about languages mutating, then mention C++ which
            has added stuff but maintained backwards compatibility over the
            course of many standards (aside from a few hiccups like auto_ptr,
            which was also short lived), with a high aversion to modifying
            existing stuff.
       
            _joel wrote 1 day ago:
            Perl 6 and Python 3 joined the chat
       
            pjmlp wrote 1 day ago:
            Programming languages are products, that is like saying you want to
            keep using vi 1.0.
            
            Maybe C should have stop at K&R C from UNIX V6, at least that would
            have spared the world in having it being adopted outside UNIX.
       
              ryao wrote 1 day ago:
              If C++ had never been invented, that might have been the case.
       
                pjmlp wrote 1 day ago:
                C++ was invented exactly because Bjarne Stroustoup vouched
                never again to repeat the downgrade of his development
                experience from Simula to BCPL.
                
                When faced with writing a distributed systems application at
                Bell Labs, and having to deal with C, the very first step was
                to create C with Classes.
                
                Also had C++ not been invented, or C gone into an history
                footnote, so what, there would be other programming languages
                to chose from.
                
                Lets not put programming languages into some kind of worshiping
                sanctuary.
       
                  uecker wrote 1 day ago:
                  I don't think C would have become a footnote if not for C++
                  given UNIX.
       
                    pjmlp wrote 1 day ago:
                    Most likely C++ would not happened, while at the same time
                    C and UNIX adoption would never gotten big enough to be
                    relevant outside Bell Labs.
                    
                    Which then again, isn't that much of a deal, industry would
                    have steered into other programming languages and operating
                    systems.
                    
                    Overall that would be a much preferable alternative
                    timeline, assuming security would be taken more seriously,
                    as it has taken 45 years since C.A.R Hoare Turing award
                    speech and Morris worm, and only after companies and
                    government started to feel the monetary pain of their
                    decisions.
       
                      uecker wrote 18 hours 57 min ago:
                      I think there are very good reasons why C and UNIX were
                      successful and are still around as foundational
                      technologies. Nor do I think C or UNIX legacy are the
                      real problem we have with security. Instead, complexity
                      is the problem.
       
                        pjmlp wrote 17 hours 5 min ago:
                        Starting by being available for free with source code
                        tapes, and a commented source code book.
                        
                        History would certainly have taken a different path
                        when AT&T was allowed to profit from Bell Labs work, as
                        their attempts to later regain control from UNIX prove.
                        
                        Unfortunately that seems the majority opinion on WG14,
                        only changed thanks to government and industry
                        pressure.
       
                          uecker wrote 15 hours 30 min ago:
                          Being free was important and history could have taken
                          many paths, but this does not explain why it is still
                          important today and has not been replaced despite
                          many alternatives.  WG14 consists mostly of industry
                          representatives.
       
                            pjmlp wrote 54 min ago:
                            It is important today just like COBOL and Fortran
                            are with ongoing ISO updates, sunken cost, no one
                            is getting more money out of rewriting their
                            systems just because, unless there are external
                            factors, like government regulations.
                            
                            Then we have the free beer UNIX clones as well.
                            
                            Those industry members of WG14 don't seem to have
                            done much security wise language improvement during
                            the last 50 years.
       
              rgoulter wrote 1 day ago:
              I liked the idea I heard: internet audiences demand progress, but
              internet audiences hate change.
       
            hulitu wrote 1 day ago:
            It's careless development. Why think something in advance when you
            can fix it later. It works so well for Microsoft, Google and lately
            Apple. /s
            
            The release cycle of a software speaks a lot about its quality.
            Move fast, break things has become the new development process.
       
              pasc1878 wrote 1 day ago:
              That does not make sense for anything that exists over decades.
              
              Do you want to be still using Windows NT, or C++ pred 2004
              standard or python 2.0
              
              We learn more and need to add to things., Some things we designed
              30 years ago were a mistake should we stick with them.
              
              You can't design everything before release for much software.
              Games you can or bespoke software for a business as you can
              define what it does, but then the business changes.
       
            seritools wrote 1 day ago:
            > If the size of the new type is larger than the size of the
            last-written type, the contents of the excess bytes are unspecified
            (and may be a trap representation). Before C99 TC3 (DR 283) this
            behavior was undefined, but commonly implemented this way. [1] >
            When initializing a union, the initializer list must have only one
            member, which initializes the first member of the union unless a
            designated initializer is used(since C99). [2] â = {0}
            initializes the first union variant, and bytes outside of that
            first variant are unspecified. Seems like GCC 15.1 follows the 26
            year old standard correctly. (not sure how much has changed from
            C89 here)
            
   URI      [1]: https://en.cppreference.com/w/c/language/union
   URI      [2]: https://en.cppreference.com/w/c/language/struct_initializa...
       
            ryao wrote 1 day ago:
            I suspect this change was motivated by standards conformance.
       
              fuhsnn wrote 1 day ago:
              The wording of GCC maintainer was "the standard doesn't require
              it." when they informed Linux kernel mailing list.
              
   URI        [1]: https://lore.kernel.org/linux-toolchains/Z0hRrrNU3Q+ro2T...
       
                matheusmoreira wrote 1 day ago:
                Reminds me of strict aliasing. Same attitude...
                
   URI          [1]: https://www.yodaiken.com/2018/06/07/torvalds-on-aliasi...
       
       
   DIR <- back to front page