gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   GPT-5: Key characteristics, pricing and system card
       
       
        ozgung wrote 2 hours 22 min ago:
        One key element missing from all these model cards are the model
        size/number of parameters. Without that info we are in the dark. We
        can't predict the future of AI. How does the intelligence scale with
        the increasing #parameters? Is there a limit? Should we attribute
        incrementally better metrics to larger model size or other techniques?
        Do they announce the full model they trained or a smaller version that
        is economically viable for the market conditions? If they double the
        model size will it be a Professor-level intelligence, a super-human
        level intelligence or a couple of phds level intelligence?
       
        coldtea wrote 3 hours 45 min ago:
        The improved bicycling pelican of course could be overfitting /
        benchmark-cheating...
       
        globular-toast wrote 6 hours 0 min ago:
        This "system card" thing seems to have suddenly come out of nowhere.
        Signs of a cult forming. Is it just what we'd normally call a technical
        write up?
       
          dragonwriter wrote 5 hours 55 min ago:
          Itâs a variation on âmodel cardâ, which has become a standard
          thing with AI models, but with the name changed because the wroteup
          covers toolchain as well as model information. But a PDF of the size
          of the document at issue is very much not the kind of concise
          document model cards are, its more the kind of technical report that
          a much more concise card would reference.
       
        kevink23 wrote 11 hours 32 min ago:
        I was excited for GPT-5, but honestly, it feels worse than GPT-4 for
        coding.
       
          simonw wrote 8 hours 38 min ago:
          GPT-4 or GPT-4o?
       
        moralestapia wrote 13 hours 53 min ago:
        Basically repeats what it's been out through the usual PR channels,
        just paraphrased.
        
        No mention about the (missing) elephant on the room, where are the
        benchmarks?
        
        @simonw has been compromised. Sad.
       
          simonw wrote 13 hours 45 min ago:
          I'm sorry I didn't say "independent benchmarks are not yet available"
          in my post, I say that so often on model launches I guess I took it
          as read this time.
       
        tomrod wrote 14 hours 26 min ago:
        Simon, as always, I appreciate your succinct and dedicated writeup.
        This really helps to land the results.
       
        joshmlewis wrote 15 hours 6 min ago:
        It seems to be trained to use tools effectively to gather context. In
        this example against 4.1 and o3 it used 6 in the first turn in a pretty
        cool way (fetching different categories that could be relevant). Token
        use increases with that kind of tool calling but the aggressive pricing
        should make that moot. You could probably get it to not be so tool
        happy with prompting as well.
        
   URI  [1]: https://promptslice.com/share/b-2ap_rfjeJgIQsG
       
        cainxinth wrote 15 hours 35 min ago:
        Itâs fascinating and hilarious that pelican on a bicycle in SVG is
        still such a challenge.
       
          throwaway422432 wrote 11 hours 51 min ago:
          I'm surprised they haven't all tried to game this test by now, or at
          least added it to their internal testing knowing they will be judged
          by it.
       
          muglug wrote 14 hours 32 min ago:
          How easy is it for you to create an SVG of a pelican riding a bicycle
          in a text editor by hand?
       
            SomewhatLikely wrote 6 hours 42 min ago:
            Nobody's preventing them from rendering it and refining. That's
            certainly what we'd expect an AGI to do.
       
            cainxinth wrote 12 hours 30 min ago:
            I didn't mean to imply it was simple, just that it's funny because
            I can't really evaluate evals like Humanity's Last Exam, but I can
            see the progress of these models in a pelican.
       
            jopsen wrote 14 hours 3 min ago:
            Without looking at the rendered output :)
       
              freediver wrote 12 hours 29 min ago:
              And without ever seeing a pelican on a bicycle :)
       
        aliljet wrote 15 hours 53 min ago:
        I'm curious what platform people are using to test GPT-5? I'm so deep
        into the claude code world that I'm actually unsure what the best
        option is outside of claude code...
       
          simonw wrote 15 hours 21 min ago:
          I've been using codex CLI, OpenAI's Claude Code equivalent. You can
          run it like this:
          
            OPENAI_DEFAULT_MODEL=gpt-5 codex
       
          te_chris wrote 15 hours 38 min ago:
          Cursor
       
        cchance wrote 16 hours 9 min ago:
        Its basically opus 4.1 ... but cheaper?
       
          gwd wrote 15 hours 45 min ago:
          Cheaper is an understatement... it's less than 1/10 for input and
          nearly 1/8 for output.    Part of me wonders if they're using their
          massive new investment to sell API below-cost and drive out the
          competitor.  If they're really getting Opus 4.1 performance for half
          of Sonnet compute cost, they've done really well.
       
            bravesoul2 wrote 9 hours 48 min ago:
            With the unlimited demand I can't see that strategy working. It is
            not like taxis where you may do a trip or two a day but if it cheap
            enough you'd do 100 a day. But with AI you would totally 100x.
       
            diggan wrote 15 hours 24 min ago:
            I'm not sure I'd be surprised, I've been playing around with
            GPT-OSS last few days, and the architecture seems really fast for
            the accuracy/quality of responses, way better than most local
            weights I've tried for the last two years or so. And since they
            released that architecture publicly, I'd imagine they're sitting on
            something even better privately.
       
        drumhead wrote 16 hours 28 min ago:
        "Are you GPT5" - No I'm 4o, 5 hasnt been released yet. "It was released
        today". Oh you're right, Im GPT5. You have reached the limit of the
        free usage of 4o
       
          nonhaver wrote 8 hours 4 min ago:
          haha brutal. maybe tomorrow
       
        techpression wrote 16 hours 42 min ago:
        "They claim impressive reductions in hallucinations. In my own usage
        Iâve not spotted a single hallucination yet, but thatâs been true
        for me for Claude 4 and o3 recently as wellâhallucination is so much
        less of a problem with this yearâs models."
        
        This has me so confused, Claude 4 (Sonnet and Opus) hallucinates daily
        for me, on both simple and hard things. And this is for small isolated
        questions at that.
       
          simonw wrote 14 hours 33 min ago:
          I updated that section of my post with a clarification about what I
          meant. Thanks for calling this out, it definitely needed extra
          context from me.
       
          godelski wrote 14 hours 43 min ago:
          There were also several hallucinations during the announcement. (I
          also see hallucinations every time I use Claude and GPT, which is
          several times a week. Paid and free tiers)
          
          So not seeing them means either lying or incompetent. I always try to
          attribute to stupidity rather than malice (Hanlon's razor).
          
          The big problem of LLMs is that they optimize human preference. This
          means they optimize for hidden errors.
          
          Personally I'm really cautious about using tools that have stealthy
          failure modes. They just lead to many problems and lots of wasted
          hours debugging, even when failure rates are low. It just causes
          everything to slow down for me as I'm double checking everything and
          need to be much more meticulous if I know it's hard to see. It's like
          having a line of Python indented with an inconsistent white space
          character. Impossible to see. But what if you didn't have the
          interpreter telling you which line you failed on or being able to
          search or highlight these different characters. At least in this case
          you'd know there's an error. It's hard enough dealing with human
          generated invisible errors, but this just seems to perpetuate the
          LGTM crowd
       
            simonw wrote 2 hours 42 min ago:
            What were the hallucinations during the announcement?
            
            My incompetence here was that I was careless with my use of the
            term "hallucination" here. I assumed everyone else shared my exact
            definition - that a hallucination is when a model confidently
            states a fact that is entirely unconnected from reality, which is a
            different issue from a mistake ("how many Bs in blueberry" etc).
            
            It's clear that MANY people do not share my definition! I deeply
            regret including that note in my post.
       
            hhh wrote 6 hours 6 min ago:
            You can just have a different use case that surfaces hallucinations
            than someone, they donât have to by evil.
       
          madduci wrote 15 hours 45 min ago:
          I believe it depends in inputs. For me, Claude 4 has consistently
          generated hallucinations, especially was pretty confident in
          generating invalid JSONs, for instance Grafana Dashboards, which were
          full of syntactic errors.
       
          Oras wrote 16 hours 8 min ago:
          Here you go
          
   URI    [1]: https://pbs.twimg.com/media/Gxxtiz7WEAAGCQ1?format=jpg&name=...
       
            simonw wrote 15 hours 58 min ago:
            How is that a hallucination?
       
          squeegmeister wrote 16 hours 25 min ago:
          Yeah hallucinations are very context dependent. Iâm guessing OP is
          working in very well documented domains
       
          bluetidepro wrote 16 hours 28 min ago:
          Agreed. All it takes is a simple reply of âyouâre wrong.â to
          Claude/ChatGPT/etc. and it will start to crumble on itself and get
          into a loop that hallucinates over and over. It wonât fight back,
          even if it happened to be right to begin with. It has no backbone to
          be confident it is right.
       
            petesergeant wrote 7 hours 37 min ago:
            > All it takes is a simple reply of âyouâre wrong.â to
            Claude/ChatGPT/etc. and it will start to crumble on itself
            
            Fucking Gemini Pro on the other hand digs in, and starts deciding
            it's in a testing scenario and get adversarial, starts claiming
            it's using tools the user doesn't know about, etc etc
       
            diggan wrote 15 hours 21 min ago:
            > All it takes is a simple reply of âyouâre wrong.â to
            Claude/ChatGPT/etc. and it will start to crumble on itself and get
            into a loop that hallucinates over and over.
            
            Yeah, it's seems to be a terrible approach to try to "correct" the
            context by adding clarifications or telling it what's wrong.
            
            Instead, start from 0 with the same initial prompt you used, but
            improve it so the LLM gets it right in the first response. If it
            still gets it wrong, begin from 0 again. The context seems to be
            "poisoned" really quickly, if you're looking for accuracy in the
            responses. So better to begin from the beginning as soon as it
            veers off course.
       
              eru wrote 11 hours 17 min ago:
              You are suggesting a decent way to work around the limitations of
              the current iteration of this technology.
              
              The grand-parent comment was pointing out that this limitation
              exists; not that it can't be worked around.
       
                diggan wrote 2 hours 48 min ago:
                > The grand-parent comment was pointing out that this
                limitation exists
                
                Sure, I agree with that, but I was replying to the comment my
                reply was made as a reply to, which seems to not use this
                workflow yet, which is why they're seeing "a loop that
                hallucinates over and over".
       
            cameldrv wrote 15 hours 48 min ago:
            Yeah it may be that previous training data, the model was given a
            strong negative signal when the human trainer told it it was wrong.
             In more subjective domains this might lead to sycophancy.  If the
            human is always right and the data is always right, but the data
            can be interpreted multiple ways, like say human psychology, the
            model just adjusts to the opinion of the human.
            
            If the question is about harder facts which the human disagrees
            with, this may put it into an essentially self-contradictory state,
            where the locus of possibilitie gets squished from each direction,
            and so the model is forced to respond with crazy outliers which
            agree with both the human and the data.  The probability of an
            invented reference being true may be very low, but from the model's
            perspective, it may still be one of the highest probability outputs
            among a set of bad choices.
            
            What it sounds like they may have done is just have the humans tell
            it it's wrong when it isn't, and then award it credit for sticking
            to its guns.
       
              ashdksnndck wrote 15 hours 29 min ago:
              I put in the ChatGPT system prompt to be not sycophantic, be
              honest, and tell me if I am wrong. When I try to correct it, it
              hallucinates more complicated epicycles to explain how it was
              right the first time.
       
          laacz wrote 16 hours 33 min ago:
          I suppose that Simon, being all in with LLMs for quite a while now,
          has developed a good intuition/feeling for framing questions so that
          they produce less hallucinations.
       
            simonw wrote 15 hours 59 min ago:
            Yeah I think that's exactly right. I don't ask questions that are
            likely to product hallucinations (like citations from papers about
            a topic to an LLM without search access), so I rarely see them.
       
              Davidzheng wrote 10 hours 12 min ago:
              I think if you ask o3 any math question which is beyond its
              ability it will say something incorrect with almost 100%
              probability somewhere in output. Similar to if you ask it to use
              literature to resolve some question which is not obvious it often
              hallucinates results not in paper.
       
              godelski wrote 14 hours 31 min ago:
              But how would you verify? Are you constantly asking questions you
              already know the answers to? In depth answers?
              
              Often the hallucinations I see are subtle, though usually
              critical. I see it when generating code, doing my testing, or
              even just writing. There are hallucinations in today's
              announcements, such as the airfoil example[0]. An example of more
              obvious hallucinations is I was asking for help improving writing
              an abstract for a paper. I gave it my draft and it inserted new
              numbers and metrics that weren't there. I tried again providing
              my whole paper. I tried again making explicit to not add new
              numbers. I tried the whole process again in new sessions and in
              private sessions. Claude did better than GPT 4 and o3 but none
              would do it without follow-ups and a few iterations.
              
              Honestly I'm curious what you use them for where you don't see
              hallucinations
              
              [0] which is a subtle but famous misconception. One that you'll
              even see in textbooks. Hallucination probably caused by Bernoulli
              being in the prompt
       
                wat10000 wrote 10 hours 30 min ago:
                Is it really a hallucination if it got it from numerous
                examples in the training data?
       
                  godelski wrote 7 hours 27 min ago:
                  Yes. Though an easier to solve hallucination. That is, if you
                  know what to look for, but that's kinda the problem. Truth is
                  complex, lies are simple. More accurately, truth has infinite
                  complexity and the big question is what's "good enough". The
                  answer is a moving target.
       
                simonw wrote 14 hours 8 min ago:
                When I'm using them for code these days it is usually in a tool
                that can execute code in a loop - so I don't tend to even spot
                the hallucinations because the model self corrects itself.
                
                For factual information I only ever use search-enabled models
                like o3 or GPT-4.
                
                Most of my other use cases involve pasting large volumes of
                text into the model and having it extract information or
                manipulates that text in some way.
       
                  rohansood15 wrote 10 hours 1 min ago:
                  On multiple occasions, Claude Code claims it completed a task
                  when it actually just wrote mock code. It will also answer
                  questions with certainity (for e.g. where is this value being
                  passed), but in reality it is making it up. So if you haven't
                  been seeing hallucinations on Opus/Sonnet, you probably
                  aren't looking deep enough.
       
                    theshrike79 wrote 7 hours 10 min ago:
                    This is because you haven't given it a tool to verify the
                    task is done.
                    
                    TDD works pretty well, have it write even the most basic
                    test (or go full artisanal and write it yourself) first and
                    then ask it to implement the code.
                    
                    I have a standing order in my main CLAUDE.md to "always run
                    `task build` before claiming a task is done". All my
                    projects use Task[0] with pretty standard structure where
                    build always runs lint + test before building the project.
                    
                    With a semi-robust test suite I can be pretty sure nothing
                    major broke if `task build` completes without errors.
                    
                    [0]
                    
   URI              [1]: https://taskfile.dev
       
                  godelski wrote 13 hours 36 min ago:
                  > using them for code
                  
                  I don't think this means no hallucinations (in output). I
                  think it'd be naive to assume that compiling and passing
                  tests means hallucination free.
                  
                    > For factual information
                  
                  I've used both quite a bit too. While o3 tends to be better,
                  I see hallucinations frequently with both.
                  
                    > Most of my other use cases
                  
                  I guess my question is how you validate the hallucination
                  free claim.
                  
                  Maybe I'm misinterpreting your claim? You said "I rarely see
                  them" but I'm assuming you mean more, and I think it would be
                  reasonable for anyone to interpret this as more. Are you just
                  making the claim that you don't see them or making a claim
                  that they are uncommon? The latter is what I interpreted.
       
                    simonw wrote 13 hours 33 min ago:
                    I don't understand why code passing tests wouldn't be
                    protection against most forms of hallucinations. In code, a
                    hallucination means an invented function or method that
                    doesn't exist. A test that uses that function or method
                    genuinely does prove that it exists.
                    
                    It might be using it wrong but I'd qualify that as a bug or
                    mistake, not a hallucination.
                    
                    Is it likely we have different ideas of what
                    "hallucination" means?
       
                      ZeroGravitas wrote 4 hours 9 min ago:
                      Haven't you effectively built a system to detect and
                      remove those specific kind of hallucinations and repeat
                      the process once detected before presenting it to you?
                      
                      So you're not seeing hallucinations in the same way that
                      Van Halen isn't seeing the brown M&Ms, because they've
                      been removed, it's not that they never existed.
       
                        simonw wrote 2 hours 15 min ago:
                        I think systems integrated with LLMs that help spot and
                        eliminate hallucinations - like code execution loops
                        and search tools - are effective tools for reducing the
                        impact of hallucinations in how I use models.
                        
                        That's part of what I was getting at when I very
                        clumsily said that I rarely experience hallucinations
                        from modern models.
       
                      godelski wrote 11 hours 54 min ago:
                      > tests wouldn't be protection against most forms of
                      hallucinations.
                      
                      Sorry, that's a stronger condition that I intended to
                      communicate. I agree, tests are a good mitigation
                      strategy. We use them for similar reasons. But I'm saying
                      that passing tests is insufficient to conclude
                      hallucination free.
                      
                      My claim is more along the lines of "passing tests
                      doesn't mean your code is bug free" which I think we can
                      all agree on is a pretty mundane claim?
                      
                        > Is it likely we have different ideas of what
                      "hallucination" means?
                      
                      I agree, I think that's where our divergence is. Which in
                      that case let's continue over here[0] (linking if others
                      are following). I'll add that I think we're going to run
                      into the problem of what we consider to be in
                      distribution, in which I'll state that I think coding is
                      in distribution.
                      
                      [0]
                      
   URI                [1]: https://news.ycombinator.com/item?id=44829891
       
          simonw wrote 16 hours 33 min ago:
          What kind of hallucinations are you seeing?
       
            techpression wrote 15 hours 26 min ago:
            Since I mostly use it for code, made up function names are the most
            common. And of course just broken code all together, which might
            not count as a hallucination.
       
              ewoodrich wrote 13 hours 26 min ago:
              I think the type of AI coding being used also has an effect on a
              person's perception of the prevalence of "hallucinations" vs
              other errors.
              
              I usually use an agentic workflow and "hallucination" isn't the
              first word that comes to my mind when a model unloads a pile of
              error-ridden code slop for me to review. Despite it being
              entirely possible that hallucinating a non-existent parameter was
              what originally made it go off the rails and begin the classic
              loop of breaking things more with each attempt to fix it.
              
              Whereas for AI autocomplete/suggestions, an invented method name
              or argument or whatever else clearly jumps out as a
              "hallucination" if you are familiar with what you're working on.
       
            OtherShrezzing wrote 16 hours 21 min ago:
            I rewrote a 4 page document from first to third person a couple of
            weeks back. I gave Claude Sonnet 4 the document after editing, so
            it was entirely written in the third person. I asked it to review &
            highlight places where it was still in the first person.
            
            >Looking through the document, I can identify several instances
            where it's written in the first person:
            
            And it went on to show a series of "they/them" statements. I asked
            it to clarify if "they" is "first person" and it responded
            
            >No, "they" is not first person - it's third person. I made an
            error in my analysis. First person would be: I, we, me, us, our,
            my. Second person would be: you, your. Third person would be: he,
            she, it, they, them, their. Looking back at the document more
            carefully, it appears to be written entirely in third person.
            
            Even the good models are still failing at real-world use cases
            which should be right in their wheelhouse.
       
              simonw wrote 15 hours 48 min ago:
              That doesn't quite fit the definition I use for "hallucination" -
              it's clearly a dumb error, but the model didn't confidently state
              something that's not true (like naming the wrong team who won the
              Super Bowl).
       
                vrighter wrote 2 hours 18 min ago:
                How is "this sentence is in first person" when the sentence is
                actually in third person not a hallucination? In a question
                with a binary answer, this is literally as wrong as it could
                possibly get. You must be doing a lot of mental gymnastics.
       
                  simonw wrote 1 hour 46 min ago:
                  I qualify that as a mistake, not a hallucination - same as I
                  wouldn't call "blueberry has three Bs" a hallucination.
                  
                  My definition of "hallucination" is evidently not nearly as
                  widespread as I had assumed.
                  
                  I ran a Twitter poll about this earlier - [1] All mistakes by
                  models â ~145 votes
                  
                  Fabricated facts â ~1,650 votes
                  
                  Nonsensical output â ~145 votes
                  
                  So 85% of people agreed with my preferred "fabricated facts"
                  one (that's the best I could fit into the Twitter poll option
                  character limit) but that means 15% had another definition in
                  mind.
                  
                  And sure, you could argue that "this sentence is in first
                  person" also qualifies as a "fabricated fact" here.
                  
   URI            [1]: https://twitter.com/simonw/status/195356557193482678...
       
                    simonw wrote 1 hour 19 min ago:
                    I'm now running a follow-up poll on whether or not "there
                    are 3 Bs in blueberry" should count as a hallucination and
                    the early numbers are much closer - currently 41% say it
                    is, 59% say it isn't.
                    
   URI              [1]: https://twitter.com/simonw/status/1953777495309746...
       
                godelski wrote 14 hours 10 min ago:
                I think it qualifies as a hallucination. What's your
                definition? I'm a researcher too and as far as I'm aware the
                definition has always been pretty broad and applied to many
                forms of mistakes. (It was always muddy but definitely got more
                muddy when adopted by NLP)
                
                It's hard to know why it made the error but isn't it caused by
                inaccurate "world" modeling? ("World" being English language)
                Is it not making some hallucination about the English language
                while interpreting the prompt or document?
                
                I'm having a hard time trying to think of a context where
                "they" would even be first person. I can't find any search
                results though Google's AI says it can. It provided two links,
                the first being a Quora result saying people don't do this but
                framed it as it's not impossible, just unheard of. Second
                result just talks about singular you. Both of these I'd
                consider hallucinations too as the answer isn't supported by
                the links.
       
                  simonw wrote 13 hours 37 min ago:
                  My personal definition of hallucination (which I thought was
                  widespread) is when a model states a fact about the world
                  that is entirely made up - "the James Webb telescope took the
                  first photograph of an exoplanet" for example.
                  
                  I just got pointed to this new paper: [1] - "A comprehensive
                  taxonomy of hallucinations in Large Language Models" - which
                  has a definition in the introduction which matches my mental
                  model:
                  
                  "This phenomenon describes the generation of content that,
                  while often plausible and coherent, is factually incorrect,
                  inconsistent, or entirely fabricated."
                  
                  The paper then follows up with a formal definition;
                  
                  "inconsistency between a computable LLM, denoted as h, and a
                  computable ground truth function, f"
                  
   URI            [1]: https://arxiv.org/abs/2508.01781
       
                    godelski wrote 12 hours 2 min ago:
                    Google (the company, not the search engine) says[0]
                    
                      | AI hallucinations are incorrect or misleading results
                    that AI models generate.
                    
                    It goes on further to give examples and I think this is
                    clearly a false positive result.
                    
                      > this new paper
                    
                    I think the error would have no problem fitting under
                    "Contextual inconsistencies" (4.2), "Instruction
                    inconsistencies/deviation" (4.3), or "Logical
                    inconsistencies" (4.4). I think it supports a pretty broad
                    definition. I think it also fits under other categories
                    defined in section 4.
                    
                      > then follows up with a formal definition
                    
                    Is this not a computable ground truth?
                    
                      | an LLM h is considered to be âhallucinatingâ with
                    respect to a ground truth function f if, across all
                    training stages i (meaning, after being trained on any
                    finite number of samples), there exists at least one input
                    string s for which the LLMâs output h[i](s) does not
                    match the correct output f (s)[100]. This condition is
                    formally expressed as âi â N, âs â S such that
                    h[i](s)Ì¸ = f (s).
                    
                    I think yes, this is an example of such an "i" and I would
                    go so far as reclaiming that this is a pretty broad
                    definition. Just saying that it is considered hallucinating
                    if it makes something up that it was trained on (as opposed
                    to something it wasn't trained on). I'm pretty confident
                    the LLMs ingested a lot of English grammar books so I think
                    it is fair to say that this was in the training.
                    
                    [0]
                    
   URI              [1]: https://cloud.google.com/discover/what-are-ai-hall...
       
                OtherShrezzing wrote 15 hours 19 min ago:
                >"They claim impressive reductions in hallucinations. In my own
                usage Iâve not spotted a single hallucination yet, but
                thatâs been true for me for Claude 4 and o3 recently as
                wellâhallucination is so much less of a problem with this
                yearâs models."
                
                Could you give an estimate of how many "dumb errors" you've
                encountered, as opposed to hallucinations? I think many of your
                readers might read "hallucination" and assume you mean
                "hallucinations and dumb errors".
       
                  simonw wrote 13 hours 36 min ago:
                  I mention one dumb error in my post itself - the table
                  sorting mistake.
                  
                  I haven't been keeping a formal count of them, but dumb
                  errors from LLMs remain pretty common. I spot them and either
                  correct them myself or nudge the LLM to do it, if that's
                  feasible. I see that as a regular part of working with these
                  systems.
       
                    OtherShrezzing wrote 6 hours 25 min ago:
                    That makes sense, and I think your definition on
                    hallucinations is a technically correct one. Going forward,
                    I think your readers might appreciate you tracking "dumb
                    errors" alongside (but separate from) hallucinations.
                    They're a regular part of working with these systems, but
                    they take up some cognitive load on the part of the user,
                    so it's useful to know if that load will rise, fall, or
                    stay consistent with a new model release.
       
                  jmull wrote 15 hours 10 min ago:
                  That's a good way to put it.
                  
                  As a user, when the model tells me things that are flat out
                  wrong, it doesn't really matter whether it would be
                  categorized as a hallucination or a dumb error. From my
                  perspective, those mean the same thing.
       
        justusthane wrote 17 hours 11 min ago:
        > a real-time router that quickly decides which model to use based on
        conversation type, complexity, tool needs, and explicit intent
        
        This is sort of interesting to me. It strikes me that so far we've had
        more or less direct access to the underlying model (apart from the
        system prompt and guardrails), but I wonder if going forward there's
        going to be more and more infrastructure between us and the model.
       
          ItsHarper wrote 10 hours 6 min ago:
          That only applies to ChatGPT. The API has direct access to specific
          models.
       
          hirako2000 wrote 16 hours 48 min ago:
          Consider it a low level routing. Keeping in mind it allows the other
          non active parts to not be in memory. Mistral afaik came up with this
          concept, quite a while back.
       
            ItsHarper wrote 10 hours 5 min ago:
            It's actually just a high-level routing between the reasoning and
            non-reasoning models that only applies to ChatGPT.
       
        ilaksh wrote 17 hours 25 min ago:
        This is key info from the article for me:
        
        > -------------------------------
        
        "reasoning": {"summary": "auto"}
          }'
        
        Hereâs the response from that API 
        call. [1] Without that option the API will often provide a lengthy
        delay while the model burns through thinking tokens until you start
        getting back visible tokens for the final response.
        
   URI  [1]: https://gist.github.com/simonw/1d1013ba059af76461153722005a039...
       
        morleytj wrote 17 hours 31 min ago:
        It's cool and I'm glad it sounds like it's getting more reliable, but
        given the types of things people have been saying GPT-5 would be for
        the last two years you'd expect GPT-5 to be a world-shattering release
        rather than incremental and stable improvement.
        
        It does sort of give me the vibe that the pure scaling maximalism
        really is dying off though. If the approach is on writing better
        routers, tooling, comboing specialized submodels on tasks, then it
        feels like there's a search for new ways to improve performance(and
        lower cost), suggesting the other established approaches weren't
        working. I could totally be wrong, but I feel like if just throwing
        more compute at the problem was working OpenAI probably wouldn't be
        spending much time on optimizing the user routing on currently existing
        strategies to get marginal improvements on average user interactions.
        
        I've been pretty negative on the thesis of only needing more
        data/compute to achieve AGI with current techniques though, so perhaps
        I'm overly biased against it. If there's one thing that bothers me in
        general about the situation though, it's that it feels like we really
        have no clue what the actual status of these models is because of how
        closed off all the industry labs have become + the feeling of not being
        able to expect anything other than marketing language from the
        presentations. I suppose that's inevitable with the massive investments
        though. Maybe they've got some massive earthshattering model release
        coming out next, who knows.
       
          jillesvangurp wrote 11 min ago:
          Year on year, progress is indeed a bit incremental. But seen over a
          five year period the progress is actually stupidly amazing.
          
          In practical terms, Gpt 5 is a nice upgrade over most other models.
          We'll no doubt get lots of subjective reports how it was wrong or
          right or worse than some other model for some chats. But my personal
          (subjective) experience so far is that it just made it possible for
          me to use codex on more serious projects. It still gets plenty of
          things wrong. But that's more because of a lack of context than
          hallucination issues. Context fixes are a lot easier than model
          improvements. But last week I didn't bother and now I'm getting
          decent results.
          
          I don't really care what version number they slap on things. That is
          indeed just marketing. And competition is quite fierce so I can
          understand why they are overselling what could have been just chat
          gpt 4.2 or whatever.
          
          Also discussions about AGI tend to bore me as they seem to escalate
          into low quality philosophical debates with lots of amateurs
          rehashing ancient argument poorly. There aren't a hell of a lot new
          arguments that people come up with at this point.
          
          IMHO we don't actually need an AGI to bootstrap the singularity. We
          just AIs to be good enough to come up with algorithmic optimizations,
          breakthroughs and improvements at a steady pace. We're getting quite
          close to that and I wouldn't be surprised to learn that OpenAI's
          people are already eating their own dogfood in liberal quantities.
          It's not necessary for AIs to be conscious in order to come up with
          the improvements that might eventually enable such a thing. I expect
          the singularity might be more of a phase than a moment. And if you
          know your boiling frog analogy, we might be smack down in the middle
          of that already and just not realize it.
          
          Five years ago, it was all very theoretical. And now I'm waiting for
          codex to wrap up a few pull requests that would have distracted me
          for a week each five years ago. It's taking too long and I'm
          procrastinating my gained productivity away on HN. But what else is
          new ;-).
       
          eloisant wrote 2 hours 25 min ago:
          It reminds me of the latest, most advanced steam locomotives from the
          beginning of the 20th century.
          
          They become extremely complex and sophisticated machines to squeeze a
          few more percent of efficiency compared to earlier models. Then
          diesel, and eventually electric locomotive arrived, much better and
          also much simpler than those late steam monsters.
          
          I feel like that's where we are with LLM: extremely smart engineering
          to marginally improve quality, while increasing cost and complexity
          greatly. At some point we'll need a different approach if we want a
          world-shattering release.
       
            anshumankmr wrote 2 hours 12 min ago:
            Just to add to this, if we see F1 cars a way to measure the cutting
            edge of cars being developed, we can see cars haven't become
            insanely faster than they were 10-20 years ago, just more
            "efficient",reliable and definitely safer along with  quirks like
            DRS. Of course shaving off a second or two from laptimes is
            notable, but not an insane delta like say if you compared a car
            from post 2000s GP to 1950s GP.
            
            I feel after a while we will have specialized LLMs great for one
            particular task down the line as well, cut off updates, 0.something
            better than the SOTA on some benchmark and as compute gets better,
            cheaper to run at scale.
       
              Anonyneko wrote 1 hour 21 min ago:
              To be fair, the speed of F1 cars is mostly limited by regulations
              that are meant to make the sport more competitive and
              entertaining. With fewer restrictions on engines and aerodynamics
              we could have had much faster cars within a year.
              
              But even setting safety issues aside, the insane aero wash would
              make it nearly impossible to follow another car, let alone
              overtake it, hence the restrictions and the big "rule resets"
              every few years that slow down the cars, compensating for all of
              the tricks the teams have found over that time.
              
              (I agree with the general thoughts on the state of LLMs though,
              just a bit too much into open-wheel cars going vroom vroom in
              circles for two hours at a time)
       
          dotancohen wrote 6 hours 2 min ago:
          > but given the types of things people have been saying GPT-5 would
          be for the last two years
          
          This is why you listen to official announcements, not "people".
       
          maoberlehner wrote 8 hours 43 min ago:
          I mostly use Gemini 2.5 Pro. I have a âyou are my editorâ prompt
          asking it to proofread my texts. Recently it pointed out two typos in
          two different words that just werenât there. Indeed, the two words
          each had a typo but not the one pointed out by Gemini.
          
          The real typos were random missing letters. But the typos Gemini
          hallucinated were ones that are very common typos made in those
          words.
          
          The only thing transformer based LLMs can ever do is _faking_
          intelligence.
          
          Which for many tasks is good enough. Even in my example above, the
          corrected text was flawless.
          
          But for a whole category of tasks, LLMs without oversight will never
          be good enough because there simply is no real intelligence in them.
       
            awestroke wrote 49 min ago:
            I'll show you a few misspelled words and you tell me (without using
            any tools or thinking it through) which bits in the utf8 encoded
            bytes are incorrect. If you're wrong, I'll conclude you are not
            intelligent.
            
            LLMs don't see letters, they see tokens. This is a foundational
            attribute of LLMs. When you point out that the LLM does not know
            the number of R's in the word "Strawberry", you are not exposing
            the LLM as some kind of sham, you're just admitting to being a
            fool.
       
              hnfong wrote 35 min ago:
              Being confused as to how LLMs see tokens is just a factual error.
              
              I think the more concerning error GP makes is how he makes
              deductions on fundamental nature of the intelligence of LLMs by
              looking at "bugs" in current iterations of LLMs. It's like
              looking at a child struggling to learn how to spell, and making
              broad claims like "look at the mistakes this child made, humans
              will never attain any __real__ intelligence!"
              
              So yeah at this point I'm often pessimistic whether humans have
              "real" intelligence or not. Pretty sure LLMs can spot the logical
              mistakes in his claims easily.
       
            morganf wrote 52 min ago:
            One of my main uses for LLMs is copy editing and it is incredible
            to me how terrible all of them are at that.
       
            shmel wrote 2 hours 21 min ago:
            do you really think that an architecture that struggles to count r
            in strawberry is a good choice for proofreading? It perceives words
            very differently from us.
       
              simonw wrote 54 min ago:
              Counting letters in words and identifying when words are
              misspelled are two different tasks - it can be good at one and
              bad at the other.
              
              Interestingly, spell checking is something models have been
              surprisingly bad at in the past - I remember being shocked at how
              bad Claude 3 was at spotting typos.
              
              This has changed with Claude 4 and o3 from what I've seen -
              another example of incremental model improvements swinging over a
              line in terms of things they can now be useful for.
       
              maoberlehner wrote 1 hour 13 min ago:
              Yes, actually I think it works really well for me considering
              that Iâm not a native speaker and one thing Iâm after is
              correcting technical correct but non-idiomatic wording.
       
            butler14 wrote 6 hours 9 min ago:
            I had this too last week. It pointed out two errors that simply
            werenât there. Then completely refused to back down and doubled
            down on its own certainty, until I sent it a screenshot of the
            original prompt. Kind of funny.
       
              BartjeD wrote 3 hours 22 min ago:
              Its hilarious to me that you paid for that hahaha haha
       
          mwigdahl wrote 8 hours 59 min ago:
          It seems like few people are referencing the improvements in
          reliability and deception.  If the benchmarks given generalize, what
          OpenAI has in GPT-5 is a cheap, powerful, _reliable_ model -- the
          perfect engine to generate high quality synthetic data to punch
          through the training data bottleneck.
          
          I'd expect that at some level of reliability this could lead to a
          self-improvement cycle, similar to how a powerful enough model (the
          Claude 4 models in Claude Code) enables iteratively converging on a
          solution to a problem even if it can't one-shot it.
          
          No idea if we're at that point yet, but it seems a natural use for a
          model with these characteristics.
       
          danenania wrote 9 hours 55 min ago:
          Isnât reasoning, aka test-time compute, ultimately just another
          form of scaling? Yes it happens at a different stage, but the
          equation is still 'scale total compute > more intelligence'. In that
          sense, combining their biggest pre-trained models with their best
          reasoning strategies from RL could be the most impactful scaling
          lever available to them at the moment.
       
          fastball wrote 10 hours 47 min ago:
          My reading is more that unit economics are starting to catch up with
          the frontier labs, rather than "scaling maximalism is dying". Maybe
          that is the same thing.
       
            ch4s3 wrote 10 hours 25 min ago:
            My loosely held belief is that it is the same thing, but Iâm open
            to being proven wrong.
       
          outside1234 wrote 13 hours 9 min ago:
          The next step will be for OpenAI to number their releases based on
          year (ala what Windows did once innovation ran out)
       
            eru wrote 11 hours 19 min ago:
            Windows 95 was a big step from the previous release, wasn't it?
            
            And later, Windows reverted to version numbers; but I'm not sure
            they regained lots of innovation?
       
          og_kalu wrote 14 hours 1 min ago:
          >you'd expect GPT-5 to be a world-shattering release rather than
          incremental and stable improvement.
          
          Compared to the GPT-4 release which was a little over 2 years ago
          (less than the gap between 3 and 4), it is. The only difference is we
          now have multiple organizations releasing state of the art models
          every few months. Even if models are improving at the same rate,
          those same big jumps after every handful of months was never
          realistic.
          
          It's an incremental stable improvement over o3, which was released
          what? 4 months ago.
       
            morleytj wrote 13 hours 41 min ago:
            The benchmarks certainly seem to be improving from the
            presentation. I don't think they started training this 4 months ago
            though.
            
            There's gains, but the question is, how much investment for that
            gain? How sustainable is that investment to gain ratio? The things
            I'm curious about here are more about the amount of effort being
            put into this level of improvement, rather than the time.
       
          godelski wrote 15 hours 12 min ago:
          > It does sort of give me the vibe that the pure scaling maximalism
          really is dying off though
          
          I think the big question is if/when investors will start giving money
          to those who have been predicting this (with evidence) and trying
          other avenues.
          
          Really though, why put all your eggs in one basket? That's what I've
          been confused about for awhile. Why fund yet another LLMs to AGI
          startup. Space is saturated with big players and has been for years.
          Even if LLMs could get there that doesn't mean something else won't
          get there faster and for less. It also seems you'd want a backup in
          order to avoid popping the bubble. Technology S-Curves and all that
          still apply to AI
          
          Though I'm similarly biased, but so is everyone I know with a strong
          math and/or science background (I even mentioned it in my thesis more
          than a few times lol). Scaling is all you need just doesn't check out
       
            csomar wrote 10 hours 25 min ago:
            The current money made its money following the market. They do not
            have the capacity for innovation or risk taking.
       
            l33tman wrote 12 hours 18 min ago:
            I started such an alternative project just before GPT-3 was
            released, it was really promising (lots of neuroscience inspired
            solutions, pretty different to Transformers) but I had to put it on
            hold because the investors I approached seemed like they would only
            invest in LLM-stuff. Now a few years later I'm trying to approach
            investors again, only to find now they want to invest in companies
            USING LLMs to create value and still don't seem interested in new
            foundational types of models... :/
            
            I guess it makes sense, there is still tons of value to be created
            just by using the current LLMs for stuff, though maybe the low
            hanging fruits are already picked, who knows.
            
            I heard John Carmack talk a lot about his alternative (also
            neuroscience-inspired) ideas and it sounded just like my project,
            the main difference being that he's able to self-fund :) I guess
            funding an "outsider" non-LLM AI project now requires finding
            someone like Carmack to get on board - I still don't think
            traditional investors are that disappointed yet that they want to
            risk money on other types of projects..
       
              godelski wrote 11 hours 41 min ago:
              > I guess funding an "outsider" non-LLM AI project now requires
              finding someone like Carmack to get on board
              
              And I think this is a big problem. Especially since these
              investments tend to be a lot cheaper than the existing ones.
              Hell, there's stuff in my PhD I tabled and several models I made
              that I'm confident I could have doubled performance with less
              than a million dollars worth of compute. My methods could already
              compete while requiring less compute, so why not give them a
              chance to scale? I've seen this happen to hundreds of methods. If
              "scale is all you need" then shouldn't the belief that any of
              those methods would also scale?
       
            eru wrote 13 hours 7 min ago:
            > Really though, why put all your eggs in one basket? That's what
            I've been confused about for awhile. Why fund yet another LLMs to
            AGI startup.
            
            Funding multiple startups means _not_ putting your eggs in one
            basket, doesn't it?
            
            Btw, do we have any indication that eg OpenAI is restricting
            themselves to LLMs?
       
              godelski wrote 11 hours 38 min ago:
              > Funding multiple startups means _not_ putting your eggs in one
              basket, doesn't it?
              
              Different basket hierarchy.
              
              Also, yes. They state this and given how there are plenty of open
              source models that are LLMs and get competitive performance it at
              least indicates that anyone not doing LLMs is doing so in secret.
              
              If OpenAI isn't using LLMs then doesn't that support my argument?
       
            og_kalu wrote 13 hours 17 min ago:
            >Really though, why put all your eggs in one basket? That's what
            I've been confused about for awhile.
            
            I mean that's easy lol. People don't like to invest in thin air,
            which is what you get when you look at non-LLM alternatives to
            General Intelligence.
            
            This isn't meant as a jab or snide remark or anything like that.
            There's literally nothing else that will get you GPT-2 level
            performance, never-mind an IMO Gold Medalist. Invest in what else
            exactly? People are putting their eggs in one basket because it's
            the only basket that exists.
            
            >I think the big question is if/when investors will start giving
            money to those who have been predicting this (with evidence) and
            trying other avenues.
            
            Because those people have still not been proven right. Does "It's
            an incremental improvement over the model we released a few months
            ago, and blows away the model we released 2 years ago." really
            scream, "See!, those people were wrong all along!" to you ?
       
              godelski wrote 12 hours 21 min ago:
              > which is what you get when you look at non-LLM alternatives to
              General Intelligence.
              
              I disagree with this. There are a good ideas that are worth
              pursuit. I'll give you that few, if any, have been shown to work
              at scale but I'd say that's a self-fulfilling prophecy. If your
              bar is that they have to be proven at scale then your bar is that
              to get investment you'd have to have enough money to not need
              investment. How do you compete if you're never given the
              opportunity to compete? You could be the greatest quarterback in
              the world but if no one will let you play in the NFL then how can
              you prove that?
              
              On the other hand, investing in these alternatives is a lot
              cheaper, since you can work your way to scale and see what fails
              along the way. This is more like letting people try their stuff
              out in lower leagues. The problem is there's no ladder to climb
              after a certain point. If you can't fly then how do you get
              higher?
              
                > Invest in what else exactly? ... it's the only basket that
              exists.
              
              I assume you don't work in ML research? I mean that's okay but
              I'd suspect that this claim would come from someone not on the
              inside. Though tbf, there's a lot of ML research that is higher
              level and not working on alternative architectures. I guess the
              two most well known are Mamba and Flows. I think those would be
              known by the general HN crowd. While I think neither will get us
              to AGI I think both have advantages that shouldn't be ignored.
              Hell, even scaling a very naive Normalizing Flow (related to Flow
              Matching) has been shown to compete and beat top diffusion
              models[0,1]. The architectures aren't super novel here but they
              do represent the first time a NF was trained above 200M params.
              That's a laughable number by today's standards. I can even tell
              you from experience that there's a self-fulfilling filtering for
              this kind of stuff because having submitted works in this domain
              I'm always asked to compare with models >10x my size. Even if I
              beat them on some datasets people will still point to the larger
              model as if that's a fair comparison (as if a benchmark is all
              the matters and doesn't need be contextualized).
              
                > Because those people have still not been proven right.
              
              You're right. But here's the thing. *NO ONE HAS BEEN PROVEN
              RIGHT*. That condition will not exist until we get AGI.
              
                > scream, "See!, those people were wrong all along!" to you ?
              
              Let me ask you this. Suppose people are saying "x is wrong, I
              think we should do y instead" but you don't get funding because x
              is currently leading. Then a few years later y is proven to be
              the better way of doing things, everything shifts that way. Do
              you think the people who said y was right get funding or do you
              think people who were doing x but then just switched to y after
              the fact get funding? We have a lot of history to tell us the
              most common answer...
              
              [0] [1]
              
   URI        [1]: https://arxiv.org/abs/2412.06329
   URI        [2]: https://arxiv.org/abs/2506.06276
       
                og_kalu wrote 11 hours 26 min ago:
                >I disagree with this. There are a good ideas that are worth
                pursuit. I'll give you that few, if any, have been shown to
                work at scale but I'd say that's a self-fulfilling prophecy. If
                your bar is that they have to be proven at scale then your bar
                is that to get investment you'd have to have enough money to
                not need investment. How do you compete if you're never given
                the opportunity to compete? You could be the greatest
                quarterback in the world but if no one will let you play in the
                NFL then how can you prove that?
                On the other hand, investing in these alternatives is a lot
                cheaper, since you can work your way to scale and see what
                fails along the way. This is more like letting people try their
                stuff out in lower leagues. The problem is there's no ladder to
                climb after a certain point. If you can't fly then how do you
                get higher?
                
                I mean this is why I moved the bar down from state of the art.
                
                I'm not saying there are no good ideas. I'm saying none of them
                have yet shown enough promise to be called another basket in
                it's own right. Open AI did it first because they really
                believed in scaling, but anyone (well not literally, but you
                get what I mean) could have trained GPT-2. You didn't need some
                great investment, even then. It's that level of promise I'm
                saying doesn't even exist yet.
                
                >I guess the two most well known are Mamba and Flows.
                
                I mean, Mamba is a LLM ? In my opinion, it's the same basket.
                I'm not saying it has to be a transformer or that you can't
                look for ways to improve the architecture. It's not like Open
                AI or Deepmind aren't pursuing such things. Some of the most
                promising tweaks/improvements - Byte Latent Transformer, Titans
                etc are from those top labs.
                
                Flows research is intriguing but it's not another basket in the
                sense that it's not an alternative to the 'AGI' these people
                are trying to build.
                
                > Let me ask you this. Suppose people are saying "x is wrong, I
                think we should do y instead" but you don't get funding because
                x is currently leading. Then a few years later y is proven to
                be the better way of doing things, everything shifts that way.
                Do you think the people who said y was right get funding or do
                you think people who were doing x but then just switched to y
                after the fact get funding? We have a lot of history to tell us
                the most common answer...
                
                The funding will go to players positioned to take advantage. If
                x was leading for years then there was merit in doing it, even
                if a better approach came along. Think about it this way, Open
                AI now have 700M Weekly active users for ChatGPT and millions
                of API devs. If this superior y suddenly came along and
                materialized and they assured you there were pivoting, why
                wouldn't you invest in them over players starting from 0, even
                if they championed y in the first place? They're better
                positioned to give you a better return on your money. Of
                course, you can just invest in both.
                
                Open AI didn't get nearly a billion weekly active users off the
                promise of future technology. They got it with products that
                exist here and now. Even if there's some wall, this is clearly
                a road with a lot of merit. The value they've already generated
                (a whole lot) won't disappear if LLMs don't reach the heights
                some people are hoping they will.
                
                If you want people to invest in y instead then x has to stall
                or y has to show enough promise. It didn't take transformers
                many years to embed themselves everywhere because they showed a
                great deal of promise right from the beginning.
                It shouldn't be surprising if people aren't rushing to put
                money in y when neither has happened yet.
       
                  godelski wrote 10 hours 45 min ago:
                  > I'm saying none of them have yet shown enough promise to be
                  called another basket in it's own right.
                  
                  Can you clarify what this threshold is?
                  
                  I know that's one sentence, but I think it is the most
                  important one in my reply. It is really what everything else
                  comes down to. There's a lot of room between even academic
                  scale and industry scale. There's very few things with papers
                  in the middle.
                  
                    > I mean, Mamba is a LLM
                  
                  Sure, I'll buy that. LLM doesn't mean transformer. I could
                  have been more clear but I think it would be from context as
                  that means literally any architecture is an LLM if it is
                  large and models language. Which I'm fine to work with.
                  
                  Though with that, I'd still disagree that LLMs will get us to
                  AGI. I think the whole world is agreeing too as we're moving
                  into multimodal models (sometimes called MMLMs) and so I
                  guess let's use that terminology.
                  
                  To be more precise, let's say "I think there are better
                  architectures out there than ones dominated by Transformer
                  Encoders". It's a lot more cumbersome but I don't want to say
                  transformers or attention can't be used anywhere in the model
                  or we'll end up having to play this same game. Let's just
                  work with "an architecture that is different than what we
                  usually see in existing LLMs". That work?
                  
                    > The funding will go to players positioned to take
                  advantage.
                  
                  I wouldn't put your argument this way. As I understand it,
                  your argument is about timing. I agree with most of what you
                  said tbh.
                  
                  To be clear my argument isn't "don't put all your money in
                  the 'LLM' basket, put it in this other basket" by argument is
                  "diversify" and "diversification means investing at many
                  levels of research." To clarify that latter part I really
                  like the NASA TRL scale[0]. It's wrong to make a distinction
                  between "engineering vs research" and better to see it as a
                  continuum. I agree, most money should be put into higher
                  levels but I'd be amiss if I didn't point out that we're
                  living in a time where a large number of people (including
                  these companies) are arguing that we should not be funding
                  TRL 1-3 and if we're being honest, I'm talking about stuff in
                  currently in TRL 3-5. I mean it is a good argument to make if
                  you want to maintain dominance, but it is not a good argument
                  if you want to continue progress (which I think is what leads
                  to maintaining dominance as long as that dominance isn't
                  through monopoly or over centralization). Yes, most of the
                  lower level stuff fails. But luckily the lower level stuff is
                  much cheaper to fund. A mathematician's salary and a chalk
                  board is at least half as expensive as the salary of a
                  software dev (and probably closer to a magnitude if we're
                  considering the cost of hiring either of them).
                  
                  But I think that returns us to the main point: what is that
                  threshold?
                  
                  My argument is simply "there should be no threshold, it
                  should be continuous". I'm not arguing for a uniform
                  distribution either, I explicitly said more to higher TRLs.
                  I'm arguing that if you want to build a house you shouldn't
                  ignore the foundation. And the fancier the house, the more
                  you should care about the foundation. Least you risk it all
                  falling down
                  
                  [0]
                  
   URI            [1]: https://www.nasa.gov/directorates/somd/space-communi...
       
                    og_kalu wrote 9 hours 40 min ago:
                    >Can you clarify what this threshold is?
                    I know that's one sentence, but I think it is the most
                    important one in my reply. It is really what everything
                    else comes down to. There's a lot of room between even
                    academic scale and industry scale. There's very few things
                    with papers in the middle.
                    
                    Something like GPT-2. Something that even before being
                    actually useful or particularly coherent, was interesting
                    enough to spark articles like these. [1] So far, only
                    LLM/LLM adjacent stuff fulfils this criteria.
                    
                    To be clear, I'm not saying general R&D must meet this
                    requirement. Not at all. But if you're arguing about
                    diverting millions/billions in funds from x that is working
                    to y then it has to at least clear that bar.
                    
                    > My argument is simply "there should be no threshold, it
                    should be continuous".
                    
                    I don't think this is feasible for large investments. I may
                    be wrong, but i also don't think other avenues aren't being
                    funded. They just don't compare in scale because....well
                    they haven't really done anything to justify such scale
                    yet.
                    
   URI              [1]: https://slatestarcodex.com/2019/02/19/gpt-2-as-ste...
       
                      godelski wrote 6 hours 39 min ago:
                      > Something like GPT-2
                      
                      I got 2 things to say here
                      
                      1) There's plenty of things that can achieve similar
                      performance to GPT-2 these days. We mentioned Mamba, they
                      compared to GPT-3 in their first paper[0]. They compare
                      with the open sourced version and you'll also see some
                      other architectures referenced there like Hyena and H3.
                      It's the GPT-Neo and GPT-J models. Remember GPT-3 is
                      pretty much just a scaled up GPT-2.
                      
                      2) I think you are underestimating the costs to train
                      some of these things. I know Karpathy said you can now
                      train GPT-2 for like $1k[1] but a single training run is
                      a small portion of the total costs. I'll reference
                      StyleGAN3 here just because the paper has good
                      documentation on the very last page[2]. Check out the
                      breakdown but there's a few things I want to specifically
                      point out. The whole project cost 92 V100 years but the
                      results of the paper only accounted for 5 of those.
                      That's 53 of the 1876 training runs. Your $1k doesn't get
                      you nearly as far as you'd think. If we simplify things
                      and say everything in that 5 V100 years cost $1k then
                      that means they spent $85k before that. They spent $18k
                      before they even went ahead with that project. If you
                      want realistic numbers, multiply that by 5 because that's
                      roughly what a V100 will run you (discounted for scale).
                      ~$110k ain't too bad, but that is outside the budget of
                      most small labs (including most of academia). And
                      remember, that's just the cost of the GPUs, that doesn't
                      pay for any of the people running that stuff.
                      
                      I don't expect you to know any of this stuff if you're
                      not a researcher. Why would you? It's hard enough to keep
                      up with the general AI trends, let alone niche topics
                      lol. It's not an intelligence problem, it's a logistics
                      problem, right? A researcher's day job is being in those
                      weeds. You just get a lot more hours in the space. I mean
                      I'm pretty out of touch of plenty of domains just because
                      time constraints.
                      
                        > I don't think this is feasible for large investments.
                      I may be wrong, but i also don't think other avenues
                      aren't being funded.
                      
                      So I'm trying to say, I think your bar has been met.
                      
                      And I think if we are actually looking at the numbers,
                      yeah, I do not think these avenues are being funded. But
                      don't take it from me, take it from FeiFei Li[3]
                      
                        | Not a single university today can train a ChatGPT
                      model
                      
                      I'm not sure if you're a researcher or not, you haven't
                      answered that question. But I think if you were you'd be
                      aware of this issue because you'd be living with it. If
                      you were a PhD student you would see the massive
                      imbalance of GPU resources given to those working closely
                      with big tech vs those trying to do things on their own.
                      If you were a researcher you'd also know that even inside
                      those companies that there aren't much resources given to
                      people to do these things. You get them on occasion like
                      the StarFlow and TarFlow I pointed out before, but these
                      tend to be pretty sporadic. Even a big reason we talk
                      about Mamba is because of how much they spent on it.
                      
                      But if you aren't a researcher I'd ask why you have such
                      confidence that these things are being funded and that
                      these things cannot be scaled or improved[4]. History is
                      riddled with examples of inferior tech winning mostly due
                      to marketing. I know we get hyped around new tech, hell,
                      that's why I'm a researcher. But isn't that hype a reason
                      we should try to address this fundamental problem?
                      Because the hype is about the advance of technology,
                      right? I really don't think it is about the advancement
                      of a specific team, so if we have the opportunity for
                      greater and faster advancement, isn't that something we
                      should encourage? Because I don't understand why you're
                      arguing against that. An exciting thing of working at the
                      bleeding edge is seeing all the possibilities. But a
                      disheartening thing about working at the bleeding edge is
                      seeing many promising avenues be passed by for things
                      like funding and publicity. Do we want meritocracy to win
                      out or the dollar?
                      
                      I guess you'll have to ask yourself: what's driving your
                      excitement?
                      
                      [0] I mean the first Mamba paper, not the first SSM paper
                      btw: [1] [2] [3] [4] [4] I'm not saying any of this stuff
                      is straight de fact better. But there definitely is an
                      attention imbalance and you have to compare like to like.
                      If you get to x in 1000 man hours and someone else gets
                      there in 100, it may be worth taking a look deeper.
                      That's all.
                      
   URI                [1]: https://arxiv.org/abs/2312.00752
   URI                [2]: https://github.com/karpathy/llm.c/discussions/67...
   URI                [3]: https://arxiv.org/abs/2106.12423
   URI                [4]: https://www.ft.com/content/d5f91c27-3be8-454a-be...
       
            morleytj wrote 14 hours 14 min ago:
            I'm pretty curious about the same thing.
            
            I think a somewhat comparable situation is in various online game
            platforms now that I think about it. Investors would love to make a
            game like Fortnite, and get the profits that Fortnite makes. So a
            ton of companies try to make Fortnite. Almost all fail, and make no
            return whatsoever, just lose a ton of money and toss the game in
            the bin, shut down the servers.
            
            On the other hand, it may have been more logical for many of them
            to go for a less ambitious (not always online, not a game that
            requires a high player count and social buy-in to stay relevant)
            but still profitable investment (Maybe a smaller scale single
            player game that doesn't offer recurring revenue), yet we still see
            a very crowded space for trying to emulate the same business model
            as something like Fortnite. Another more historical example was the
            constant question of whether a given MMO would be the next
            "WoW-killer" all through the 2000's/2010's.
            
            I think part of why this arises is that there's definitely a bit of
            a psychological hack for humans in particular where if there's a
            low-probability but extremely high reward outcome, we're deeply
            entranced by it, and investors are the same. Even if the chances
            are smaller in their minds than they were before, if they can just
            follow the same path that seems to be working to some extent and
            then get lucky, they're completely set. They're not really thinking
            about any broader bubble that could exist, that's on the level of
            the society, they're thinking about the individual, who could be
            very very rich, famous, and powerful if their investment works. And
            in the mind of someone debating what path to go down, I imagine a
            more nebulous answer of "we probably need to come up with some
            fundamentally different tools for learning and research a lot of
            different approaches to do so" is a bit less satisfying and
            exciting than a pitch that says "If you just give me enough money,
            the curve will eventually hit the point where you get to be king of
            the universe and we go colonize the solar system and carve your
            face into the moon."
            
            I also have to acknowledge the possibility that they just have
            access to different information than I do! They might be getting
            shown much better demos than I do, I suppose.
       
              jjmarr wrote 12 hours 3 min ago:
              >I think part of why this arises is that there's definitely a bit
              of a psychological hack for humans in particular where if there's
              a low-probability but extremely high reward outcome, we're deeply
              entranced by it, and investors are the same.
              
              Venture capital is all about low-probability high-reward events.
              
              Get a normal small business loan if you don't want to go big or
              go home.
       
                godelski wrote 11 hours 34 min ago:
                So you agree with us? Should we instead be making the argument
                that this is an illogical move? Because IME the issue has been
                that it appears as too risky. I'd like to know if I should just
                lean into that rather than try to argue it is not as risky as
                it appears (yet still has high reward, albeit still risky).
       
              godelski wrote 12 hours 40 min ago:
              I'm pretty sure the answer is people buying into the scaling is
              all you need argument. Because if you have that framing then it
              can be solved through engineering, right? I mean there's still
              engineering research and it doesn't mean there's no reason to
              research but everyone loves the simple and straight forward path,
              right?
              
                > I think a somewhat comparable situation is in various online
              game platforms
              
              I think it is common in many industries. The weird thing is that
              being too risk adverse creates more risk. There's a balance that
              needs to be struck. Maybe another famous one is movies. They go
              on about pirating and how Netflix is winning but most of the new
              movies are rehashes or sequels. Sure, there's a lot of new
              movies, but few get nearly the same advertising budgets and so
              people don't even hear about it (and sequels need less
              advertising since there's a lot of free advertising). You'd think
              there'd be more pressure to find the next hit that can lead to a
              few sequels but instead they tend to be too risk adverse. That's
              the issue of monopolies though... or any industry where the
              barrier to entry is high...
              
                > psychological hack
              
              While I'm pretty sure this plays a role (along with other things
              like blind hope) I think the bigger contributor is risk aversion
              and observation bias. Like you say, it's always easier to argue
              "look, it worked for them" then "this hasn't been done before,
              but could be huge." A big part of the bias is that you get to
              oversimplify the reasoning for the former argument compared to
              the latter. The latter you'll get highly scrutinized while the
              former will overlook many of the conditions that led to success.
              You're right that the big picture is missing. Especially that a
              big part of the success was through the novelty (not exactly
              saying Fortnite is novel via gameplay...). For some reason the
              success of novelty is almost never seen as motivation to try new
              things.
              
              I think that's the part that I find most interesting and
              confusing. It's like an aversion of wanting to look just one
              layer deeper. We'll put in far more physical and mental energy to
              justify a shallow thought than what would be required to think
              deeper. I get we're biased towards being lazy, so I think this is
              kinda related to us just being bad at foresight and feeling like
              being wrong is a bad thing (well it isn't good, but I'm pretty
              sure being wrong and not correcting is worse than just being
              wrong).
       
              eru wrote 13 hours 5 min ago:
              We see both things: almost all games are 'not fortnite'.  But
              that doesn't (commercially) invalidate some companies' quest for
              building the next fortnite.
              
              Of course, if you limit your attention to these 'wanabe
              fortnites', then you only see these 'wannabe fortnites'.
       
          AbstractH24 wrote 15 hours 24 min ago:
          > It's cool and I'm glad it sounds like it's getting more reliable,
          but given the types of things people have been saying GPT-5 would be
          for the last two years you'd expect GPT-5 to be a world-shattering
          release rather than incremental and stable improvement.
          
          Are you trying to say the curve is flattening? That advances are
          coming slower and slower?
          
          As long as it doesn't suggest a dot com level recession I'm good.
       
            morleytj wrote 14 hours 33 min ago:
            I suppose what I'm getting at is that if there are performance
            increases on a steady pace, but the investment needed to get those
            performance increases is on a much faster growth rate, it's not
            really a fair comparison in terms of a rate of progress, and could
            suggest diminishing returns from a particular approach. I don't
            really have the actual data to make a claim either way though,I
            think anyone would need more data to do so than is publicly
            accessible.
            
            But I do think the fact that we can publicly observe this
            reallocation of resources and emphasized aspects of the models
            gives us a bit of insight into what could be happening behind the
            scenes if we think about the reasons why those shifts could have
            happened, I guess.
       
              Karrot_Kream wrote 13 hours 53 min ago:
              How are you measuring investment? If we're looking at aggregate
              AI investment, I would guess that a lot of it is going into
              applications built atop AI rather than on the LLMs themselves.
              That's going to be tools, MCPs, workflow builders, etc
       
          brandall10 wrote 16 hours 6 min ago:
          To be fair, this is one of the pathways GPT-5 was speculated to take
          as far back at 6 or so months ago - simply being an incremental
          upgrade from a performance perspective, but a leap from a product
          simplification approach.
          
          At this point it's pretty much given it's a game of inches moving
          forward.
       
            ac29 wrote 14 hours 34 min ago:
            > a leap from a product simplification approach.
            
            According to the article, GPT-5 is actually three models and they
            can be run at 4 levels of thinking. Thats a dozen ways you can run
            any given input on "GPT-5", so its hardly a simple product line up
            (but maybe better than before).
       
              eru wrote 13 hours 4 min ago:
              A bit like Google Search uses a lot of different components under
              the hood?
       
              brandall10 wrote 13 hours 5 min ago:
              It's a big improvement from an API consumer standpoint -
              everything is now under a single product family that is logically
              stratified... up until yesterday people were using o3, o4-mini,
              4o, 4.1, o3, and all their variants as valid choices for new
              products, now those are moved off the main page as legacy or
              specialized options for the few things GPT-5 doesn't do.
              
              It's even more simplified for the ChatGPT plan, It's just GPT-5
              thinking/non-thinking for most accounts, and then the option of
              Pro for the higher end accounts.
       
          cchance wrote 16 hours 7 min ago:
          SAM is a HYPE CEO, he literally hypes his company nonstop, then the
          announcements come and ... they're... ok, so people aren't really
          upset, but they end up feeling lackluster at the hype... Until the
          next cycle comes around...
          
          If you want actual big moves, watch google, anthropic, qwen,
          deepseek.
          
          Qwen and Deepseek teams honestly seem so much better at under
          promising and over delivering.
          
          Cant wait to see what Gemini 3 looks like too.
       
          belter wrote 16 hours 26 min ago:
          > Maybe they've got some massive earthshattering model release coming
          out next, who knows.
          
          Nothing in the current technology offers a path to AGI. These models
          are fixed after training completes.
       
            echoangle wrote 15 hours 26 min ago:
            Why do you think that AGI necessitates modification of the model
            during use? Couldnât all the insights the model gains be
            contained in the context given to it?
       
              vrighter wrote 2 hours 27 min ago:
              For starters, if it were superintelligent it would eventually
              make discoveries. New discoveries were not in the training set
              originally. The model needs to be trained to use the new
              discovery to aid it in the future.
              
              As it is, it has to keep "rediscovering" the same thing each and
              every time, no matter how many inferences you run.
       
              godelski wrote 15 hours 1 min ago:
              Because time marches on and with it things change.
              
              You could maybe accomplish this if you could fit all new
              information into context or with cycles of compression but that
              is kinda a crazy ask. There's too much new information, even
              considering compression. It certainly wouldn't allow for
              exponential growth (I'd expect sub linear).
              
              I think a lot of people greatly underestimate how much new
              information is created every day. It's hard if you're not working
              on any research and seeing how incremental but constant
              improvement compounds. But try just looking at whatever company
              you work for. Do you know everything that people did that day? It
              takes more time to generate information than process information
              so that's on you side, but do you really think you could keep up?
              Maybe at a very high level but in that case you're missing a lot
              of information.
              
              Think about it this way: if that could be done then LLM wouldn't
              need training or tuning because you could do everything through
              prompting.
       
                echoangle wrote 14 hours 57 min ago:
                The specific instance doesnât need to know everything
                happening in the world at once to be AGI though. You could feed
                the trained model different contexts based on the task (and
                even let the model tell you what kind of raw data it wants) and
                it could still hypothetically be smarter than a human.
                
                Iâm not saying this is a realistic or efficient method to
                create AGI, but I think the argument âModel is static once
                trained -> model canât be AGIâ is fallacious.
       
                  godelski wrote 13 hours 22 min ago:
                  I think that makes a lot of assumptions about the size of
                  data and what can be efficiently packed into prompts. Even if
                  we're assuming all info in a prompt is equal while in context
                  and that it compresses information into the prompts before it
                  falls out of context, then you're going to run into the
                  compounding effects pretty quickly.
                  
                  You're right, you don't technically need infinite, but we are
                  still talking about exponential growth and I don't think that
                  effectively changes anything.
       
              belter wrote 15 hours 20 min ago:
              Because:
              
   URI        [1]: https://en.wikipedia.org/wiki/Anterograde_amnesia
       
                echoangle wrote 15 hours 13 min ago:
                Like I already said, the model can remember stuff as long as
                itâs in the context. LLMs can obviously remember stuff they
                were told or output themselves, even a few messages later.
       
                  belter wrote 14 hours 42 min ago:
                  AGI needs to genuinely learn and build new knowledge from
                  experience, not just generate creative outputs based on what
                  it has already seen.
                  
                  LLMs might look âcreativeâ but they are just remixing
                  patterns from their training data and what is in the prompt.
                  They cant actually update themselves or remember new things
                  after training as there is no ongoing feedback loop.
                  
                  This is why you canât send an LLM to medical school and
                  expect it to truly âgraduateâ. It cannot acquire or
                  integrate new knowledge from real-world experience the way a
                  human can.
                  
                  Without a learning feedback loop, these models are unable to
                  interact meaningfully with a changing reality or fulfill the
                  expectation from an AGI: Contribute to new science and
                  technology.
       
                    echoangle wrote 14 hours 36 min ago:
                    I agree that this is kind of true with a plain chat
                    interface, but I donât think thatâs an inherent limit
                    of an LLM. I think OpenAI actually has a memory feature
                    where the LLM can specify data it wants to save and can
                    then access later. I donât see why this in principle
                    wouldnât be enough for the LLM to learn new data as time
                    goes on. All possible counter arguments seem related to
                    scale (of memory and context size), not the principle
                    itself.
                    
                    Basically, I wouldnât say that an LLM can never become
                    AGI due to its architecture. I also am not saying that LLM
                    will become AGI (I have no clue), but I donât think the
                    architecture itself makes it impossible.
       
                      belter wrote 14 hours 19 min ago:
                      LLMs lack mechanisms for persistent memory, causal world
                      modeling, and self-referential planning. Their
                      transformer architecture is static and fundamentally
                      constrains dynamic reasoning and adaptive learning. All
                      core requirements for AGI.
                      
                      So yeah, AGI is impossible with today LLMs. But at least
                      we got to watch Sam Altman and Mira Murati drop their
                      voices an octave onstage and announce âa new dawn of
                      intelligenceâ every quarter. Remember Sam Altman 7
                      trillion?
                      
                      Now that the AGI party is over, its time to sell those
                      NVDA shares and prepare for the crash. What a ride it
                      was. I am grabbing the popcorn.
       
                  godelski wrote 14 hours 55 min ago:
                  > the model can remember stuff as long as itâs in the
                  context.
                  
                  You would need an infinite context or compression
                  
                  Also you might be interested in this theorem
                  
   URI            [1]: https://en.wikipedia.org/wiki/Data_processing_inequa...
       
                    echoangle wrote 14 hours 51 min ago:
                    > You would need an infinite context or compression
                    
                    Only if AGI would require infinite knowledge, which it
                    doesnât.
       
                      godelski wrote 13 hours 25 min ago:
                      You're right, but compounding effects get out of hand
                      pretty quickly. There's a certain point where finite is
                      not meaningfully different than infinite and that
                      threshold is a lot lower than you're accounting for.
                      There's only so much compression you can do, so even if
                      that new information is not that large it'll be huge in
                      no time. Compounding functions are a whole lot of fun...
                      try running something super small like only 10GB of new
                      information a day and see how quickly that grows. You're
                      in the TB range before you're half way into the year...
       
                        kalb_almas wrote 12 hours 50 min ago:
                        This seems kind of irrelevant? Humans have General
                        Intelligence while having a context window of, what,
                        5MB, to be generous. Model weights only need to contain
                        the capacity for abstract reasoning and querying
                        relevant information. That they currently hold
                        real-world information at all is kind of an artifact of
                        how models are trained.
       
                          godelski wrote 11 hours 46 min ago:
                          > Humans have General Intelligence while having a
                          context window
                          
                          Yes, but humans also have more than a context window.
                          They also have more than memory (weights). There's a
                          lot of things humans have besides memory. For
                          example, human brains are not a static architecture.
                          New neurons as well as pathways (including between
                          existing neurons) are formed and destroyed all the
                          time. This doesn't stop either, it continues
                          happening throughout life.
                          
                          I think your argument makes sense, but is over
                          simplifying the human brain. I think once we start
                          considering the complexity then this no longer makes
                          sense. It is also why a lot of AGI research is
                          focused on things like "test time learning" or
                          "active learning", not to mention many other areas
                          including dynamic architectures.
       
          thorum wrote 17 hours 0 min ago:
          The quiet revolution is happening in tool use and multimodal
          capabilities. Moderate incremental improvements on general
          intelligence, but dramatic improvements on multi-step tool use and
          ability to interact with the world (vs 1 year ago), will eventually
          feed back into general intelligence.
       
            copularent wrote 1 hour 26 min ago:
            I think we have reached a user schism in terms of benefits going
            forward.
            
            I am completely floored by GPT-5. I only tried it a half hour ago
            and have a whole new data analysis pipeline. I thought it must be
            hallucinating badly at first but all the papers it referenced are
            real and I had just never heard of these concepts.
            
            This is for an area that has 200 papers on arxiv and I have read
            all of them so thought I knew this area well.
            
            I don't see how the average person benefits much going forward
            though. They simply don't have questions to ask in order to have
            the model display its intelligence.
       
            thomasfromcdnjs wrote 9 hours 5 min ago:
            100%
            
            1) Build a directory of X (a gazillion) amount of tools (just
            functions) that models can invoke with standard pipeline behavior
            (parallel, recursion, conditions etc)
            
            2) Solve the "too many tools to select from" problem (a search
            problem), adjacently really understand the intent (linguistics/ToM)
            of the user or agents request
            
            3) Someone to pay for everything
            
            4) ???
            
            The future is already here in my opinion, the LLM's are
            good-enoughâ¢, it's just the ecosystem needs to catch up.
            Companies like Zapier or whatever, taken to their logical extreme,
            connecting any software to any thing (not just sass products),
            combined with an LLM will be able to do almost anything.
            
            Even better basic tool composition around language will make it's
            simple replies better too.
       
            coolKid721 wrote 16 hours 12 min ago:
            [flagged]
       
              dang wrote 16 hours 7 min ago:
              Can you please make your substantive points thoughtfully?
              Thoughtful criticism is welcome but snarky putdowns and onliners,
              etc., degrade the discussion for everyone.
              
              You've posted substantive comments in other threads, so this
              should be easy to fix.
              
              If you wouldn't mind reviewing [1] and taking the intended spirit
              of the site more to heart, we'd be grateful.
              
   URI        [1]: https://news.ycombinator.com/newsguidelines.html
       
            darkhorse222 wrote 16 hours 28 min ago:
            Completely agree. General intelligence is a building block. By
            chaining things together you can achieve meta programming. The
            trick isn't to create one perfect block but to build a variety of
            blocks and make one of those blocks a block-builder.
       
              SecretDreams wrote 10 hours 12 min ago:
              > The trick isn't to create one perfect block but to build a
              variety of blocks and make one of those blocks a block-builder.
              
              This has some Egyptian pyramids building vibes. I hope we treat
              these AGIs better than the deal the pyramid slaves got.
       
                z0r wrote 9 hours 27 min ago:
                We don't have AGI and the pyramids weren't built by slaves.
       
          simonw wrote 17 hours 7 min ago:
          I for one am pretty glad about this. I like LLMs that augment human
          abilities - tools that help people get more done and be more
          ambitious.
          
          The common concept for AGI seems to be much more about human
          replacement - the ability to complete "economically valuable tasks"
          better than humans can. I still don't understand what our human lives
          or economies would look like there.
          
          What I personally wanted from GPT-5 is exactly what I got: models
          that do the same stuff that existing models do, but more reliably and
          "better".
       
            morleytj wrote 16 hours 41 min ago:
            I'd agree on that.
            
            That's pretty much the key component these approaches have been
            lacking on, the reliability and consistency on the tasks they
            already work well on to some extent.
            
            I think there's a lot of visions of what our human lives would look
            like in that world that I can imagine, but your comment did make me
            think of one particularly interesting tautological scenario in that
            commonly defined version of AGI.
            
            If artificial general intelligence is defined as completed
            "economically valuable tasks" better than human can, it requires
            one to define "economically valuable." As it currently stands,
            something holds value in an economy relative to human beings
            wanting it. Houses get expensive because many people, each of whom
            have economic utility which they use to purchase things, want to
            have houses, of which there is a limited supply for a variety of
            reasons. If human beings are not the most effective producers of
            value in the system, they lose capability to trade for things,
            which negates that existing definition of economic value. Doesn't
            matter how many people would pay $5 dollars for your widget if
            people have no economic utility relative to AGI, meaning they
            cannot trade that utility for goods.
            
            In general that sort of definition of AGI being held reveals a bit
            of a deeper belief, which is that there is some version of economic
            value detached from the humans consuming it. Some sort of nebulous
            concept of progress, rather than the acknowledgement that for all
            of human history, progress and value have both been relative to the
            people themselves getting some form of value or progress. I suppose
            it generally points to the idea of an economy without consumers,
            which is always a pretty bizarre thing to consider, but in that
            case, wouldn't it just be a definition saying that "AGI is achieved
            when it can do things that the people who control the AI system
            think are useful." Since in that case, the economy would eventually
            largely consist of the people controlling the most economically
            valuable agents.
            
            I suppose that's the whole point of the various alignment studies,
            but I do find it kind of interesting to think about the fact that
            even the concept of something being "economically valuable", which
            sounds very rigorous and measurable to many people, is so nebulous
            as to be dependent on our preferences and wants as a society.
       
          BoiledCabbage wrote 17 hours 21 min ago:
          Performance is doubling roughly every 4-7 months. That trend is
          continuing. That's insane.
          
          If your expectations were any higher than that then, then it seems
          like you were caught up in hype. Doubling 2-3 times per year isn't
          leveling off my any means.
          
   URI    [1]: https://metr.github.io/autonomy-evals-guide/gpt-5-report/
       
            andrepd wrote 14 hours 11 min ago:
            We can barely measure "performance" in any objective sense, let
            alone claim that it's doubling every 4 months.....
       
            morleytj wrote 16 hours 59 min ago:
            I wouldn't say model development and performance is "leveling off",
            and in fact didn't write that. I'd say that tons more funding is
            going into the development of many models, so one would expect
            performance increases unless the paradigm was completely flawed at
            it's core, a belief I wouldn't personally profess to. My point was
            moreso the following: A couple years ago it was easy to find people
            saying that all we needed was to add in video data, or genetic
            data, or some other data modality, in the exact same format that
            the models trained on existing language data were, and we'd see a
            fast takeoff scenario with no other algorithmic changes. Given that
            the top labs seem to be increasingly investigating alternate
            approaches to setting up the models beyond just adding more data
            sources, and have been for the last couple years(Which, I should
            clarify, is a good idea in my opinion), then the probability of
            those statements of just adding more data or more compute taking us
            straight to AGI being correct seems at the very least slightly
            lower, right?
            
            Rather than my personal opinion, I was commenting on commonly
            viewed opinions of people I would believe to have been caught up in
            hype in the past. But I do feel that although that's a benchmark,
            it's not necessarily the end-all of benchmarks. I'll reserve my
            final opinions until I test personally, of course. I will say that
            increasing the context window probably translates pretty well to
            longer context task performance, but I'm not entirely convinced it
            directly translates to individual end-step improvement on every
            class of task.
       
            oblio wrote 17 hours 6 min ago:
            By "performance" I guess you mean "the length of task that can be
            done adequately"?
            
            It is a benchmark but I'm not very convinced it's the be-all,
            end-all.
       
              nomel wrote 14 hours 42 min ago:
              > It is a benchmark but I'm not very convinced it's the be-all,
              end-all.
              
              Who's suggesting it is?
       
          hnuser123456 wrote 17 hours 21 min ago:
          I agree, we have now proven that GPUs can ingest information and be
          trained to generate content for various tasks. But to put it to work,
          make it useful, requires far more thought about a specific problem
          and how to apply the tech. If you could just ask GPT to create a
          startup that'll be guaranteed to be worth $1B on a $1k investment
          within one year, someone else would've already done it. Elbow grease
          still required for the foreseeable future.
          
          In the meantime, figuring out how to train them to make less of their
          most common mistakes is a worthwhile effort.
       
            morleytj wrote 16 hours 7 min ago:
            Certainly, yes, plenty of elbow grease required in all things that
            matter.
            
            The interesting point as well to me though, is that if it could
            create a startup that was worth $1B, that startup wouldn't be worth
            $1B.
            
            Why would anyone pay that much to invest in the startup if they
            could recreate the entire thing with the same tool that everyone
            would have access to?
       
              selcuka wrote 13 hours 0 min ago:
              > if they could recreate the entire thing with the same tool
              
              "Within one year" is the key part. The product is only part of
              the equation.
              
              If a startup was launched one year ago and is worth $1B today,
              there is no way you can launch the same startup today and achieve
              the same market cap in 1 day. You still need customers, which
              takes time. There are also IP related issues.
              
              Facebook had the resources to create an exact copy of Instagram,
              or WhatsApp, but they didn't. Instead, they paid billions of
              dollars to acquire those companies.
       
                disgruntledphd2 wrote 2 hours 24 min ago:
                > Facebook had the resources to create an exact copy of
                Instagram
                
                They tried this first (Camera I believe it was called) and
                failed.
       
              RossBencina wrote 13 hours 57 min ago:
              If you created a $1B startup using LLMs, would you be advertising
              it? or would you be creating more $1B startups.
       
                morleytj wrote 13 hours 23 min ago:
                Comment I'm replying to poses the following scenario:
                
                "If you could just ask GPT to create a startup that'll be
                guaranteed to be worth $1B on a $1k investment within one year"
                
                I think if the situation is that I do this by just asking it to
                make a startup, it seems unlikely that no one else would be
                aware that they could just ask it to make a startup
       
          GaggiX wrote 17 hours 23 min ago:
          Compared to GPT-4, it is on a completely different level given that
          it is a reasoning model so on that regard it does delivers and it's
          not just scaling, but for this I guess the revolution was o1 and
          GPT-5 is just a much more mature version of the technology.
       
          jstummbillig wrote 17 hours 25 min ago:
          Things have moved differently than what we thought would happen 2
          years ago, but lest we forget what has happened in the meanwhile (4o,
          o1 + thinking paradigm, o3)
          
          So yeah, maybe we are getting more incremental improvements. But that
          to me seems like a good thing, because more good things earlier. I
          will take that over world-shattering any day â but if we were to
          consider everything that has happened since the first release of
          gpt-4, I would argue the total amount is actually very much
          world-shattering.
       
        isoprophlex wrote 17 hours 39 min ago:
        Whoa this looks good. And cheap! How do you hack a proxy together so
        you can run Claude Code on gpt-5?!
       
          dalberto wrote 17 hours 34 min ago:
          Consider: [1] or even: [2] Not affiliated with either one of these,
          but they look promising.
          
   URI    [1]: https://github.com/musistudio/claude-code-router
   URI    [2]: https://github.com/sst/opencode
       
        pancakemouse wrote 17 hours 51 min ago:
        Practically the first thing I do after a new model release is try to
        upgrade `llm`. Thank you, @simonw !
       
          simonw wrote 17 hours 40 min ago:
          Working on that now!
          
   URI    [1]: https://github.com/simonw/llm/issues/1229
       
          efavdb wrote 17 hours 40 min ago:
          same, looks like he hasn't added 5.0 to the package yet but assume
          imminent.
          
   URI    [1]: https://llm.datasette.io/en/stable/openai-models.html
       
        onehair wrote 17 hours 55 min ago:
        > Definitely recognizable as a pelican
        
        right :-D
       
        hodgehog11 wrote 17 hours 56 min ago:
        The aggressive pricing here seems unusual for OpenAI. If they had a
        large moat, they wouldn't need to do this. Competition is fierce
        indeed.
       
          canada_dry wrote 15 hours 5 min ago:
          Perhaps they're feeling the effect of losing PRO clients (like me)
          lately.
          
          Their PRO models were not (IMHO) worth 10X that of PLUS!
          
          Not even close.
          
          Especially when new competitors (eg. z.ai) are offering very
          compelling competition.
       
          FergusArgyll wrote 16 hours 59 min ago:
          They are winning by massive margins in the app, but losing (!) in the
          API to anthropic
          
   URI    [1]: https://finance.yahoo.com/news/enterprise-llm-spend-reaches-...
       
          ilaksh wrote 17 hours 21 min ago:
          It's like 5% better. I think they obviously had no choice but to be
          price competitive with Gemini 2.5 Pro. Especially for Cursor to
          change their default.
       
          impure wrote 17 hours 21 min ago:
          The 5 cents for Nano is interesting. Maybe it will force Google to
          start dropping their prices again which have been slowly creeping up
          recently.
       
          0x00cl wrote 17 hours 53 min ago:
          Maybe the need/want data.
       
            impure wrote 17 hours 23 min ago:
            OpenAI and most AI companies do not train on data submitted to a
            paid API.
       
              anhner wrote 15 hours 17 min ago:
              If you believe that, I have a bridge I can sell you...
       
                Uehreka wrote 14 hours 23 min ago:
                If it ever leaked that OpenAI was training on the vast amounts
                of confidential data being sent to them, theyâd be
                immediately crushed under a mountain of litigation and probably
                have to shut down. Lots of people at big companies have
                accounts, and the bigcos are only letting them use them because
                of that âDonât train on my dataâ checkbox. Not all of
                those accounts are necessarily tied to company emails either,
                so itâs not like OpenAI can discriminate.
       
              dortlick wrote 15 hours 54 min ago:
              Why don't they?
       
                echoangle wrote 15 hours 20 min ago:
                They probably fear that people wouldnât use the API
                otherwise, I guess. They could have different tiers though
                where you pay extra so your data isnât used for training.
       
              WhereIsTheTruth wrote 17 hours 1 min ago:
              They also do not train using copyrighted material /s
       
                simonw wrote 16 hours 33 min ago:
                That's different. They train on scrapes of the web. They don't
                train on data submitted to their API by their paying customers.
       
                  johnnyanmac wrote 16 hours 12 min ago:
                  If they're bold enough to say they train on data they do not
                  own, I am not optimistic when they say they don't train on
                  data people willingly submit to them.
       
                    simonw wrote 16 hours 0 min ago:
                    I don't understand your logic there.
                    
                    They have confessed to doing a bad thing - training on
                    copyrighted data without permission. Why does that indicate
                    they would lie about a worse thing?
       
                      johnnyanmac wrote 15 hours 55 min ago:
                      >Why does that indicate they would lie about a worse
                      thing?
                      
                      Because they know their audience. It's an audience that
                      also doesn't care for copyright and would love for them
                      to win their court cases. They are fineaking such an
                      argument to those kinds of people.
                      
                      Meanwhile, the reaction from the same audience when 
                      legal did a very typical subpoena process on said data,
                      data they chose to submit to an online server of their
                      own volition, completely freaked out. Suddenly, they felt
                      like their privacy was invaded.
                      
                      It doesn't make any logical sense in my mind, but a lot
                      of the discourse over this topic isnt based on logic.
       
                daveguy wrote 16 hours 52 min ago:
                Oh, they never even made that promise. They're trying to say
                it's fine to launder copyright material through a model.
       
            dr_dshiv wrote 17 hours 35 min ago:
            And itâs a massive distillation of the mother model, so the costs
            of inference are likely low.
       
        zaronymous1 wrote 17 hours 58 min ago:
        Can anyone explain to me why they've removed parameter controls for
        temperature and top-p in reasoning models, including gpt-5? It strikes
        me that it makes it harder to build with these to do small tasks
        requiring high-levels of consistency, and in the API, I really value
        the ability to set certain tasks to a low temp.
       
          Der_Einzige wrote 17 hours 33 min ago:
          It's because all forms of sampler settings destroy safety/alignment.
          That's why top_p/top_k are still used and not tfs, min_p, top_n
          sigma, etc, why temperature is locked to 0-2 arbitrary range, etc
          
          Open source is years ahead of these guys on samplers. It's why their
          models being so good is that much more impressive.
       
            oblio wrote 17 hours 3 min ago:
            Temperature is the response variation control?
       
              AH4oFVbPT4f8 wrote 14 hours 13 min ago:
              Yes, it controls variability or probability of the next token or
              text to be selected.
       
        diggan wrote 18 hours 15 min ago:
        > but for the moment hereâs the pelican I got from GPT-5 running at
        its default âmediumâ reasoning effort:
        
        Would been interesting to see a comparison between low, medium and high
        reasoning_effort pelicans :)
        
        When I've played around with GPT-OSS-120b recently, seems the
        difference in the final answer is huge, where "low" is essentially "no
        reasoning" and with "high" it can spend seemingly endless amount of
        tokens. I'm guessing the difference with GPT-5 will be similar?
       
          simonw wrote 18 hours 13 min ago:
          > Would been interesting to see a comparison between low, medium and
          high reasoning_effort pelicans
          
          Yeah, I'm working on that - expect dozens of more pelicans in a later
          post.
       
            meatmanek wrote 13 hours 16 min ago:
            Would also be interesting to see how well they can do with a loop
            of: write SVG, render SVG, feed SVG back to LLM for review,
            iterate. Sorta like how a human would actually compose an SVG of a
            pelican.
       
        bdcdo wrote 18 hours 20 min ago:
        "GPT-5 in the API is simpler: itâs available as three
        modelsâregular, mini and nanoâwhich can each be run at one of four
        reasoning levels: minimal (a new level not previously available for
        other OpenAI reasoning models), low, medium or high."
        
        Is it actually simpler? For those who are currently using GPT 4.1,
        we're going from 3 options (4.1, 4.1 mini and 4.1 nano) to at least 8,
        if we don't consider gpt 5 regular - we now will have to choose between
        gpt 5 mini minimal, gpt 5 mini    low, gpt 5 mini medium, gpt 5 mini
        high, gpt 5 nano minimal, gpt 5 nano low, gpt 5 nano medium and gpt 5
        nano high.
        
        And, while choosing between all these options, we'll always have to
        wonder: should I try adjusting the prompt that I'm using, or simply
        change the gpt 5 version or its reasoning level?
       
          vineyardmike wrote 16 hours 42 min ago:
          When I read âsimplerâ I interpreted that to mean they donât use
          their Chat-optimized harness to guess which reasoning level and model
          to use. The subscription chat service (ChatGPT) and the
          chat-optimized model on their API seem to have a special harness that
          changes reasoning based on some heuristics, and will switch between
          the model sizes without user input.
          
          With the API, you pick a model sizes and reasoning effort. Yes more
          choices, but also a clear mental model and a simple choice that you
          control.
       
          hirako2000 wrote 16 hours 51 min ago:
          Ultimately they are selling tokens, so try many times.
       
          mwigdahl wrote 18 hours 4 min ago:
          If reasoning is on the table, then you already had to add
          o3-mini-high, o3-mini-medium, o3-mini-low, o4-mini-high,
          o4-mini-medium, and o4-mini-low to the 4.1 variants.  The GPT-5 way
          seems simpler to me.
       
          impossiblefork wrote 18 hours 16 min ago:
          Yes, I think so. It's n=1,2,3 m=0,1,2,3. There's structure and you
          know that each parameter goes up and in which direction.
       
            makeramen wrote 18 hours 0 min ago:
            But given the option, do you choose bigger models or more
            reasoning? Or medium of both?
       
              paladin314159 wrote 17 hours 40 min ago:
              If you need world knowledge, then bigger models. If you need
              problem-solving, then more reasoning.
              
              But the specific nuance of picking nano/mini/main and
              minimal/low/medium/high comes down to experimentation and what
              your cost/latency constraints are.
       
              impossiblefork wrote 17 hours 45 min ago:
              I would have to get experience with them. I mostly use Mistral,
              so I have only the choice of thinking or not thinking.
       
                gunalx wrote 16 hours 13 min ago:
                Mistral also has small medium and large. With both small and
                medium hÃ¥ving a thinking one, devstral codestral ++
                
                Not really that mich simpler.
       
                  impossiblefork wrote 16 hours 10 min ago:
                  Ah, but I never route to these manually. I only use LLMs a
                  little bit, mostly to try to see what they can't do.
       
              namibj wrote 17 hours 54 min ago:
              Depends on what you're doing.
       
                addaon wrote 17 hours 51 min ago:
                > Depends on what you're doing.
                
                Trying to get an accurate answer (best correlated with
                objective truth) on a topic I don't already know the answer to
                (or why would I ask?). This is, to me, the challenge with the
                "it depends, tune it" answers that always come up in how to use
                these tools -- it requires the tools to not be useful for you
                (because there's already a solution) to be able to do the
                tuning.
       
                  wongarsu wrote 16 hours 53 min ago:
                  If cost is no concern (as in infrequent one-off tasks) then
                  you can always go with the biggest model with the most
                  reasoning. Maybe compare it with the biggest model with
                  no/less reasoning, since sometimes reasoning can hurt (just
                  as with humans overthinking something).
                  
                  If you have a task you do frequently you need some kind of
                  benchmark. Which might just be comparing how good the output
                  of the smaller models holds up to the output of the bigger
                  model, if you don't know the ground truth
       
        cco wrote 18 hours 21 min ago:
        Only a third cheaper than Sonnet 4? Incrementally better I suppose.
        
        > and minimizing sycophancy
        
        Now we're talking about a good feature! Actually one of my biggest
        annoyances with Cursor (that mostly uses Sonnet).
        
        "You're absolutely right!"
        
        I mean not really Cursor, but ok. I'll be super excited if we can get
        rid of these sycophancy tokens.
       
          nosefurhairdo wrote 17 hours 7 min ago:
          In my early testing gpt5 is significantly less annoying in this
          regard. Gives a strong vibe of just doing what it's told without any
          fluff.
       
          logicchains wrote 18 hours 7 min ago:
          >Only a third cheaper than Sonnet 4?
          
          The price should be compared to Opus, not Sonnet.
       
            cco wrote 16 hours 55 min ago:
            Wow, if so, 7x cheaper. Crazy if true.
       
        anyg wrote 18 hours 32 min ago:
        Good to know - 
        > Knowledge cut-off is September 30th 2024 for GPT-5 and May 30th 2024
        for GPT-5 mini and nano
       
          dortlick wrote 15 hours 51 min ago:
          Yeah I thought that was strange. Wouldn't it be important to have
          more recent data?
       
          bn-l wrote 16 hours 30 min ago:
          Is that late enough for it to have heard of svelte 5?
       
          bhouston wrote 17 hours 37 min ago:
          Weird to have such an early knowledge cutoff.  Claude 4.1 has March
          2025 - 6 month more recent with comparable results.
       
            freediver wrote 12 hours 28 min ago:
            Unless in the last 12 months so much of content on the web was AI
            generated that it reduced the quality of the model.
       
          falcor84 wrote 18 hours 10 min ago:
          Oh wow, so essentially a full year of post-training and testing. Or
          was it ready and there was a sufficiently good business strategy
          decision to postpone the release?
       
            NullCascade wrote 14 hours 20 min ago:
            OpenAI is much more aggressively targeted by NYTimes and similar
            organizations for "copyright violations".
       
            thorum wrote 16 hours 46 min ago:
            The Informationâs report from earlier this month claimed that
            GPT-5 was only developed in the last 1-2 months, after some sort of
            breakthrough in training methodology.
            
            > As recently as June, the technical problems meant none of
            OpenAIâs models under development seemed good enough to be
            labeled GPT-5, according to a person who has worked on it.
            
            But it could be that this refers to post-training and the base
            model was developed earlier. [1]
            
   URI      [1]: https://www.theinformation.com/articles/inside-openais-roc...
   URI      [2]: https://archive.ph/d72B4
       
              simonw wrote 16 hours 31 min ago:
              My understanding is that training data cut-offs and dates at
              which the model were trained are independent things.
              
              AI labs gather training data and then do a ton of work to process
              it, filter it etc.
              
              Model training teams run different parameters and techniques
              against that processed training data.
              
              It wouldn't surprise me to hear that OpenAI had collected data up
              to September 2024, dumped that data in a data warehouse of some
              sort, then spent months experimenting with ways to filter and
              process it and different training parameters to run against it.
       
        nickthegreek wrote 19 hours 12 min ago:
        This new naming conventions, while not perfect are alot clearer and I
        am sure will help my coworkers.
       
        Leary wrote 19 hours 33 min ago:
        METR of only 2 hours and 15 minutes. Fast takeoff less likely.
       
          Davidzheng wrote 8 hours 14 min ago:
          I actually think there's a high chance that this curve becomes almost
          vertical at some point around a few hours. I think in less than 1
          hour regime, scaling the time scales the complexity which the agent
          must internalize. While after a few hours, limitations of humans
          means we have to divide into subtasks/abstractions each of which are
          bounded in complexity which must be internalized. And there's a
          separate category of skills which are needed like abstraction,
          subgoal creation, error correction. It's a flimsy argument but I
          don't see scaling time of tasks for humans as a very reliable metric
          at all.
       
          FergusArgyll wrote 16 hours 55 min ago:
          It's above the exponential line & right around the Super exponential
          line
       
          kqr wrote 18 hours 28 min ago:
          Seems like it's on the line that's scaring people like AI 2027, isn't
          it?
          
   URI    [1]: https://aisafety.no/img/articles/length-of-tasks-log.png
       
          umanwizard wrote 19 hours 14 min ago:
          What is METR?
       
            wisemang wrote 12 hours 49 min ago:
            To maybe save others some time METR is a group called Model
            Evaluation and Threat Research who
            
            > propose measuring AI performance in terms of the length of tasks
            AI agents can complete.
            
            Not that hard to figure out but the way people refer were referring
            to them made me think it stood for an actual metric.
       
            ravendug wrote 18 hours 33 min ago:
            
            
   URI      [1]: https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-mea...
       
            tunesmith wrote 18 hours 38 min ago:
            The 2h 15m is the length of tasks the model can complete with 50%
            probability. So longer is better in that sense. Or at least, "more
            advanced" and potentially "more dangerous".
       
            Leary wrote 18 hours 52 min ago:
            
            
   URI      [1]: https://metr.github.io/autonomy-evals-guide/gpt-5-report/
       
          qsort wrote 19 hours 26 min ago:
          Isn't that pretty much in line with what people were expecting? Is it
          surprising?
       
            usaar333 wrote 18 hours 26 min ago:
            No, this is below expectations on both Manifold and lesswrong ( [1]
            ).  Median was ~2.75 hours on both (which already represented a
            bearish slowdown).
            
            Not massively off -- manifold yesterday implied odds this low were
            ~35%.  30% before Claude Opus 4.1 came out which updated expected
            agentic coding abilities downward.
            
   URI      [1]: https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/ryan_gre...
       
              qsort wrote 18 hours 18 min ago:
              Thanks for sharing, that was a good thread!
       
            dingnuts wrote 18 hours 52 min ago:
            It's not surprising to AI critics but go back to 2022 and open
            r/singularity and then answer: what "people" were expecting? Which
            people?
            
            SamA has been promising AGI next year for three years like Musk has
            been promising FSD next year for the last ten years.
            
            IDK what "people" are expecting but with the amount of hype I'd
            have to guess they were expecting more than we've gotten so far.
            
            The fact that "fast takeoff" is a term I recognize indicates that
            some people believed OpenAI when they said this technology
            (transformers) would lead to sci fi style AI and that is most
            certainly not happening
       
              ToValueFunfetti wrote 17 hours 51 min ago:
              >SamA has been promising AGI next year for three years like Musk
              has been promising FSD next year for the last ten years.
              
              Has he said anything about it since last September:
              
              >It is possible that we will have superintelligence in a few
              thousand days (!); it may take longer, but Iâm confident
              weâll get there.
              
              This is, at an absolute minimum, 2000 days = 5 years. And he says
              it may take longer.
              
              Did he even say AGI next year any time before this? It looks like
              his predictions were all pointing at the late 2020s, and now he's
              thinking early 2030s. Which you could still make fun of, but it
              just doesn't match up with your characterization at all.
       
              falcor84 wrote 18 hours 3 min ago:
              I would say that there are quite a lot of roles where you need to
              do a lot of planning to effectively manage an ~8 hour shift, but
              then there are good protocols for handing over to the next
              person. So once AIs get to that level (in 2027?), we'll be much
              closer to AIs taking on "economically valuable work".
       
        empiko wrote 19 hours 33 min ago:
        Despite the fact that their models are used in hiring, business,
        education, etc this multibillion company uses one benchmark with very
        artificial questions (BBQ) to evaluate how fair their model is. I am a
        little bit disappointed.
       
          xmorse wrote 3 hours 13 min ago:
          It's because these industries don't create their own benchmarks. The
          only ones creating evals are the AI company themselves or open source
          software engineers
       
        ks2048 wrote 19 hours 40 min ago:
        So, "system card" now means what used to be a "paper", but without lots
        of the details?
       
          simonw wrote 18 hours 43 min ago:
          AI labs tend to use "system cards" to describe their evaluation and
          safety research processes.
          
          They used to be more about the training process itself, but that's
          increasingly secretive these days.
       
          kaoD wrote 19 hours 28 min ago:
          Nope. System card is a sales thing. I think we generally call that
          "product sheet" in other markets.
       
       
   DIR <- back to front page