gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   GPT-5
       
       
        Razengan wrote 9 min ago:
        I asked ChatGPT 5 about the main differences between 4 and 5, and it
        said:
        
        "I couldnât find any credible, up-to-date details on a model
        officially named âGPT-5â or formal comparisons to âGPT-4o.â
        Itâs possible that GPT-5, if it exists, hasn't been announced
        publicly or covered in verifiable sources â¦ GPT-5 as of August 8,
        2025 has no formal release announcement"
        
        Reassuring.
       
        miroljub wrote 21 min ago:
        Now it's a perfect time for DeepSeek to finally release R2.
       
        alexnewman wrote 53 min ago:
        What's the bullish case that it's actually a big deal. Not trying to be
        a neg, but Seems pretty incremental on first glance
       
          energy123 wrote 51 min ago:
          We'd need visibility on compute costs. If it's 30% cheaper than o3
          but slightly better, that's a large improvement in just 4 months.
       
        energy123 wrote 54 min ago:
        > "If youâre on Plus or Team, you can also manually select the
        GPT-5-Thinking model from the model picker with a usage limit of up to
        200 messages per week."
        
        And what's the reasoning effort parameter set to?
       
        ritzaco wrote 57 min ago:
        Ok this[0] sounds very, uh bold to me? Surely this is going to break a
        ton of workflows etc seemingly with nearly no notice? I'm assuming
        'launches' equates with 'fully rolls out' or something but it's not
        that clear to me.
        
            When GPT-5 launches, several older models will be retired,
        including:
            - GPT-4o
            - GPT-4.1
            - GPT-4.5
            - GPT-4.1-mini
            - o4-mini
            - o4-mini-high
            - o3
            - o3-pro
        
             If you open a conversation that used one of these models, ChatGPT
        will automatically switch it to the closest GPT-5 equivalent. Chats
        with 4o, 4.1, 4.5, 4.1-mini, o4-mini, or o4-mini-high will open in
        GPT-5, chats with o3 will open in GPT-5-Thinking, and chats with o3-Pro
        will open in GPT-5-Pro (available only on Pro and Team).
        
        [0]
        
   URI  [1]: https://help.openai.com/en/articles/11909943-gpt-5-in-chatgpt
       
          alexmorley wrote 48 min ago:
          > For Free and Plus users, these changes take effect immediately.
          Pro, Team, and Enterprise users will also see the changes at launch
          but will have access to older models through legacy model settings.
          
          So only for free/plus users (for now). I do wonder how long they will
          take to deprecate these models via API though...
       
            BoorishBears wrote 18 min ago:
            So they confirmed what we've all been speculating: this is a cost
            saving update
            
            Smaller base models + more RL. Technically better at the verticals
            that are making money, but worse on subjective preference.
            
            They'll probably try to prompt engineer back in some of the
            "vibes", hence the personalities. But also maybe they decided
            people spending $20 a month to hammer 4o all day as a friend (no
            judgement, really) are ok to tick off for now... and judging by
            Reddit, they are very ticked off.
       
            weird-eye-issue wrote 31 min ago:
            I'm not worried about when they will deprecate them but I am
            worried about when they will be removed
            
            3.5 Turbo has been deprecated for a long time but is still running
       
          raincole wrote 49 min ago:
          > For Free and Plus users, these changes take effect immediately.
          Pro, Team, and Enterprise users will also see the changes at launch
          but will have access to older models through legacy model settings.
          
          Just right next paragraph...
       
          artursapek wrote 53 min ago:
          Yeah I was surprised how fast they rugged 4. I guess they want to
          concentrate their hardware on 5.
       
            hoppp wrote 42 min ago:
            If it costs the same compute to run it then there is no point
            running worse models
       
              boringg wrote 40 min ago:
              That's assuming all else holds on the model which isn't always
              clear.
       
        monster_truck wrote 58 min ago:
        I'm extremely whelmed. I cancelled my subscription
       
        junon wrote 1 hour 4 min ago:
        Anecdotal review:
        
        Been using it all morning. Had to switch back to 4. 5 has all of the
        problems that 2/3 had with ignoring any context, flagrantly ignoring
        the 'spirit' of my requests, and talking to me like I'm a little baby.
        
        Not to mention almost all of my prompts result in a several minute wait
        with "thinking longer about the answer".
       
          getcrunk wrote 47 min ago:
          Yea I see this a lot with Gemini since 2.5
          
          Very stubborn and âopinionatedâ
          
          I think most models will tend this way (to consolidate more control
          over how we âthinkâ and what we believe)
       
        jsumrall wrote 1 hour 42 min ago:
        It seems 'GPT-5 Pro' is not available via the API.
       
        Applejinx wrote 1 hour 50 min ago:
        I am very puzzled that I cannot search for the word 'blueberry' in this
        HN discussion. Is my browser broken, or is the subject inappropriate to
        raise in this community?
       
        nodesocket wrote 2 hours 36 min ago:
        Why do I have access to GPT-5 on only some of my devices? All logged
        into my plus account. My iPad ChatGPT shows 5, but my iPhone ChatGPT
        only allows 4o?
       
          withinboredom wrote 1 hour 32 min ago:
          rollout is probably not user-specific, but device specific. Classic
          rookie mistake.
       
            nodesocket wrote 17 min ago:
            Ya, strange rollout. My browser session which I use by far the most
            with ChatGPT is also still stuck on 4o.
       
        primaprashant wrote 2 hours 46 min ago:
        created a summary of comments from this thread about 15 hours after it
        had been posted and had 1983 comments, using gpt-5-high and
        gemini-2.5-pro using a prompt similar to simonw [1]. Used a Python
        script [2] that I wrote to generate the summary.
        
        - gpt-5-high summary: [1] - gemini-2.5-pro summary: [2] [1]: [3] [2]:
        
   URI  [1]: https://gist.github.com/primaprashant/1775eb97537362b049d643ea...
   URI  [2]: https://gist.github.com/primaprashant/4d22df9735a1541263c67115...
   URI  [3]: https://news.ycombinator.com/item?id=43477622
   URI  [4]: https://gist.github.com/primaprashant/f181ed685ae563fd06c49d3d...
       
          jiggawatts wrote 2 hours 24 min ago:
          Wow, the 2.5 Pro summary is far better, it reads like coherent
          English instead of a list of bullet points.
       
            primaprashant wrote 2 hours 16 min ago:
            yes, agreed. Context length might be playing a factor as total
            number of prompt tokens is >120k. Performance of LLMs generally
            degrade at higher context length.
       
            mustaphah wrote 2 hours 19 min ago:
            Someone should start a Gemini-powered blog that distills the top HN
            posts into concise summaries.
       
          mustaphah wrote 2 hours 26 min ago:
          Why not use the ChatGPT interface instead of the API to save credits?
          Pass the cookies.
       
            primaprashant wrote 2 hours 14 min ago:
            Only have access to GPT-5 through API for now. The amount of tokens
            (>130k) used is higher than the limit of ChatGPT (128k) so it
            wouldn't really work well.
       
        froh42 wrote 3 hours 14 min ago:
        Wow, I just got GPT-5. Tried to continue the discussion of my 3D print
        problems with it (which I started with 4o). In comparison GPT-5 is an
        entitled prick trying to gaslight me into following what it wants.
        
        Can I have 4o back?
       
          withinboredom wrote 57 min ago:
          If we're going to be forced to trust a new model, might as well
          evaluate other companies as well to make a decision before my plan
          renews.
       
        lynx97 wrote 3 hours 26 min ago:
        Not impressed.    gpt-5-nano gives noticeably worse results then o4-mini
        does.  gpt-5 and gpt-5-mini are both behind the verification wall, and
        can stay there if they like.
       
        reportgunner wrote 3 hours 50 min ago:
        First OpenAI video I've ever seen, the people in it all seem
        incompetent for some reason, like a grotesque version of apple
        employees from temu or something.
       
        nacholibrev wrote 3 hours 50 min ago:
        I've tried it in cursor and I didn't like it. The claude-4-sonnet gives
        me far better results.
        
        Also it's a lot slower than Claude and Google models.
        
        In general GPT models doesn't work well for me for both coding and
        general questions.
       
          energy123 wrote 1 hour 46 min ago:
          On livebench.ai, GPT-5 is the best model overall, and the second best
          for agentic coding. But for the Coding benchmarks, it's ranked like
          20th. Quite interesting. I'm finding it exceptional for non-trivial
          summarization tasks.
       
        tennisflyi wrote 4 hours 7 min ago:
        How/where do I see my chat history!?
       
        kiitos wrote 4 hours 37 min ago:
        absolutely miserable results as an agent in my ide :
       
        fergie wrote 5 hours 0 min ago:
        Anecdote:
        
        It can now speak in various Scots dialects- for example, it can
        convincingly create a passage in the style of Irvine Welsh. It can also
        speak Doric (Aberdonian). Before it came nowhere close.
       
        tw1984 wrote 5 hours 7 min ago:
        just wondering whether Altman is still going to promote his AGI/ASI
        coming in 12 months story.
       
        RobinL wrote 6 hours 2 min ago:
        Hypothesis: to the average user this will feel like a much greater jump
        in capability then to the average HNer, because most users were not
        using the model selector. So it'll be more successful than the
        benchmarks suggest.
       
        danjc wrote 6 hours 26 min ago:
        > describe gpt 5 in one word
        
        > incremental
       
        tapland wrote 7 hours 16 min ago:
        Ugh. Could they have their expert make a website that doesnât crash
        safari on my iPhone SE? :)
       
        saddat wrote 7 hours 44 min ago:
        If Grol , Claude , ChatGPT seemingly still all scale , yet their
        Performance feels similar, could this mean that the Technology path is
        narrow, with little differentiations left ?
       
        zone411 wrote 7 hours 54 min ago:
        It is the new leader on my Short Story Creative Writing benchmark:
        
   URI  [1]: https://github.com/lechmazur/writing/
       
        throwpoaster wrote 8 hours 11 min ago:
        Iâve been working on an electrochemistry project, with several models
        but mostly o3-pro.
        
        GPT-5 refused to continue the conversation because it was worried about
        potential weapons applications, so we gave the business to the other
        models.
        
        Disappointing.
       
        zastai0day wrote 8 hours 32 min ago:
        All people are talking about GPT-5 all over the world, the competition
        is so intense that every major tech company is racing to develop their
        own advanced AI models.
       
        alenguo wrote 8 hours 40 min ago:
        I've already used it
       
        lutusp wrote 8 hours 42 min ago:
        I have a canonical test for chatbots -- I ask them who I am. I'm
        sufficiently unknown in modern times that it's a fair test. Just ask,
        "Who is Paul Lutus?"
        
        ChatGPT 5's reply is mostly made up -- about 80% is pure invention. I'm
        described as having written books and articles whose titles I don't
        even recognize, or having accomplished things at odds with what was
        once called reality.
        
        But things are slowly improving. In past ChatGPT versions I was
        described as having been dead for a decade.
        
        I'm waiting for the day when, instead of hallucinating, a chatbot will
        reply, "I have no idea."
        
        I propose a new technical Litmus test -- chatbots should be judged
        based on what they won't say.
       
        kkukshtel wrote 8 hours 51 min ago:
        Something that's really hitting me is something brought up in this
        piece: [1] When a model comes out, I usually think about it in terms of
        my own use. This is largely agentic tooling, and I mostly us Claude
        Code. All the hallucination and eval talk doesn't really catch me
        because I feel like I'm getting value of these tools today.
        
        However, this model is not _for_ me in the same way models normally
        are. This is for the 800m or whatever people that open up chatgpt every
        day and type stuff in. All of them have been stuck on GPT-4o unbeknwst
        to them. They had no idea SOTA was far beyond that. They probably dont
        even know that there is a "model" at all. But for all these people,
        they just got a MAJOR upgrade. It will probably feel like turning the
        lights on for these people, who have been using a subpar model for the
        past year.
        
        That said I'm also giving GPT-5 a run in Codex and it's doing a pretty
        good job!
        
   URI  [1]: https://www.interconnects.ai/p/gpt-5-and-bending-the-arc-of-pr...
       
          m3kw9 wrote 7 hours 7 min ago:
          Free users will get the gpt5 nano.
       
          techpineapple wrote 8 hours 41 min ago:
          Iâm curious what this means.    Maybe Iâm stupid but I read through
          the sample gpt-4 vs got-5 and I largely couldnât tell the
          difference and sometimes preferred the gpt-4 answer.  But like what
          are the average 800 million people using this for that the average
          800 million user will be able to see a difference?
          
          Maybe Iâm a far below average user?  But I canât tell the
          difference between models in causal use.
          
          Unless youâre talking performance, apparently gpt-5 is much faster.
       
            MagicMoonlight wrote 1 hour 40 min ago:
            4o would start writing immediately without thinking. So if the
            first thing it wrote was âThe world is flat becauseâ¦â then it
            will continue to write as if the world is flat.
            
            It makes it very stupid, but very compliant. If youâre mentally
            ill it will go along with whatever delusions you have, without any
            objection.
       
        deathflute wrote 9 hours 18 min ago:
        Lots of debate here about the best model. The best model is the one
        which creates the most value for you â- this typically is a function
        of your skill in using the model for tasks that matter to you. Always
        was. Always will be.
       
        obloid wrote 9 hours 27 min ago:
        So far GPT-5 has not been able to pass my personal "Turing test" which
        has been unsuccessful for the past several years starting through
        various versions of Dall-e up to the latest model. I want it to create
        an image of Santa Claus pulling the sleigh with a reindeer in the
        sleigh holding the reins, driving the sleigh. No matter how I modify
        the prompt it is still unable to create this image that my daughter
        requested a few years ago. This is an image that is easily imagined and
        drawn by a small child yet the most advanced AI models still can't
        produce it.
        I think this is a good example that these models are unable to
        "imagine" something that falls outside of the realm of it's training
        data.
       
          energy123 wrote 6 hours 19 min ago:
          Is GPT-5 not just routing this request to a 4o/other tool call?
       
          ramzyo wrote 9 hours 20 min ago:
          Is this what you mean?
          
   URI    [1]: https://chatgpt.com/share/6895632c-fb58-800e-b287-b7a98ad64d...
       
            obloid wrote 9 hours 9 min ago:
            Interesting. Yes, that's basically what I've been going for but
            none of my prompts ever gave a satisfactory response. Plus I
            noticed you just copy/pasted from my initial comment and it worked.
            Weird.
            
            After my last post I was eventually able to get it to work by
            uploading an example image of Santa pulling the sleigh and telling
            it to use the image as an example, but I couldn't get it by text
            prompt alone. I guess I need to work on my prompt skills!
            
   URI      [1]: https://chatgpt.com/share/689564d1-90c8-8007-b10c-8058c149...
       
            simultsop wrote 9 hours 10 min ago:
            that was smooth
       
        beardedwizard wrote 10 hours 39 min ago:
        I asked it how to run the image and expose a port. it was just terrible
        in cursor. thought a Dockerfile wasn't in the repo, called no tools,
        then hallucinated a novel on dockefile best practices.
       
        cellis wrote 10 hours 44 min ago:
        My favorite thing to ask is ascii art:         
                    _ _     
          _ __     ___  _ __ ___     __ _  __| (_) ___ 
         | '_ \ / _ \| '_  _ \ / _ |/ _ | |/ __|
         | | | | (_) | | | | | | (_| | (_| | | (__ 
         |_| |_|\___/|_| |_| |_|\__,_|\__,_|_|\___|
        
        What does this say?
        
        GPT 5:
        
        When read normally without the ASCII art spacing, itâs the stylized
        text for:
        
        markdown
        Copy
        Edit
           _ _        
          _ __     ___  _ __ ___     __ _  __| (_) ___ 
         | '_ \ / _ \| '_ ` _ \ / _` |/ _` | |/ __|
         | | | | (_) | | | | | | (_| | (_| | | (__ 
         |_| |_|\___/|_| |_| |_|\__,_|\__,_|_|\___|
        Which is the ASCII art for:
        
        rust â the default âRustâ welcome banner in ASCII style.
       
        zombiwoof wrote 10 hours 51 min ago:
        Given most of human intelligence isnât that smart, AGI doesnât seem
        hard
       
        anshumankmr wrote 10 hours 55 min ago:
        I miss the model pickerâ¦ is that just me?
       
        sbinnee wrote 11 hours 0 min ago:
        I didn't know that OpenAI added what they call organization
        verification process for API calls for some models. While I haven't
        noticed this change at work using OpenAI models, when I wanted to try
        GPT-5 on my personal laptop, I came across this obnoxious verification
        issue.
        
        It seems that it's all because that users can get thinking traces from
        API calls, and OpenAI wants to prevent other companies from distilling
        their models.
        
        Although I don't think OpenAI will be threatened by a single user from
        Korea, I don't want to go through this process for many reasons. But
        who knows that this kind of verification process may become norm and
        users will have no ways to use frontier models. "If you want to use the
        most advanced AI models, verify yourself so that we can track you down
        when something bad happens". Is it what they are saying?
       
          piskov wrote 10 hours 56 min ago:
          It started with o-models in the API.
       
        Aeolun wrote 11 hours 7 min ago:
        I'm just sitting here hoping that their lowered prices will force
        Anthropic to follow suit xD
       
        felixfurtak wrote 11 hours 35 min ago:
        It's still terrible at Wordle. This is one of my benchmarks.
       
        Telemakhos wrote 12 hours 8 min ago:
        I am thoroughly unimpressed by GPT-5.  It still can't compose iambic
        trimeters in ancient Greek with a proper penthemimeral cÃ¦sura, and it
        insists on providing totally incorrect scansion of the flawed lines it
        does compose.  I corrected its metrical sins twice, which sent it into
        "thinking" mode until it finally returned a "Reasoning failed" error.
        
        There is no intelligence here: it's still just giving plausible output.
         That's why it can't metrically scan its own lines or put a cÃ¦sura in
        the right place.
       
          tim333 wrote 3 hours 22 min ago:
          I too can't compose iambic trimeters in ancient Greek but am normally
          regarded as of average+ intelligence. I think it's a bit of an unfair
          test as that sort of thing is based of the rhythm of spoken speech
          and GPT-5 doesn't really deal with audio in a deep way.
       
            Telemakhos wrote 3 hours 13 min ago:
            Most classicists today canât actually speak Latin or Greek,
            especially observing vowel quantities and rhythm properly, but
            youâd be hard pressed to find one who canât scan poetry with
            pen and paper. Itâs a very simple application of rules to written
            characters on a page, but it is application, and AI still doesnât
            apply concepts well.
       
          xhevahir wrote 6 hours 43 min ago:
          I can't tell whether you're serious or not. Your criterion for an
          "impressive" AI tool is that it be able to write and scan poetry in
          ancient Greek?
       
            Telemakhos wrote 3 hours 23 min ago:
            AI looks like it understands things because it generates text that
            sounds plausible. Poetry requires the application of certain rule
            to that text, and the rules for Latin and Greek poetry are very
            simple and well understood. Scansion is especially easy once you
            understand the concept, and you actually can, as someone else
            suggested, train a child to scan poetry by applying these rules.
            
            An LLM will spit out what looks like poetry, but will violate
            certain rules. It will generate some hexameters but fail harder on
            trimeter, presumably because it is trained on more hexametric data
            (epic poetry: think Homer) than trimetric (iambic and tragedy,
            where itâs mixed with other meters). It is trained on text
            containing the rules for poetry too, so it can regurgitate rules
            like defining a penthemimeral cÃ¦sura. But, LLMs do not understand
            those rules and thus cannot apply them as a child could. That makes
            ancient poetry a great way to show how far LLMs are from actually
            performing simple, rules-based analysis and how badly they hide
            that lack of understanding by BS-ing.
       
              BoorishBears wrote 1 hour 55 min ago:
              This is not a useful diversion, it's like arguing if a submarine
              swims.
              
              LLMs are simple, it doesn't take much more than high school math
              to explain their building blocks.
              
              What's interesting is that they can remix tasks they've been
              trained very flexibly, creating new combinations they weren't
              directly trained on: compare this to earlier smaller models like
              T5 that had a few set prefixes per task.
              
              They have underlying flaws. Your example is more about the
              limitations of tokens than "understanding", for example. But
              those don't keep them from being useful.
       
          sibeliuss wrote 7 hours 47 min ago:
          Pure failure:
          
          "Youâve given:
          
          Moon in the 10th house (from the natal Ascendant)
          
          Venus in the 1st house (from the natal Ascendant)
          
          Step-by-step:
          From the natal Ascendantâs perspective
          
          Moon = 10th house
          
          Venus = 1st house
          
          Set Moon as the 1st house (Chandra Lagna)
          
          The natal 10th house becomes the 1st house in the Chandra chart.
          
          Therefore, the natal 1st house is 3rd house from the Moon:
          
          10th â 1st (Moon)
          
          11th â 2nd
          
          12th â 3rd (which is the natal 1st)
          
          Locate Venus from the Moonâs perspective
          
          Since Venus is in the natal 1st, and natal 1st is 3rd from Moon,
          
          Venus is in the 3rd house from Chandra Lagna.
          
          Answer: From Chandra Lagna, Venus is in the 3rd house."
       
          ipnon wrote 8 hours 29 min ago:
          This is a great test because itâs something you could teach an
          elementary school kid in an hour.
       
            Davidzheng wrote 8 hours 20 min ago:
            is this a joke
       
              Telemakhos wrote 3 hours 18 min ago:
              No, itâs easy if the kid already knows the alphabet.    Latin
              scansion was standard grade school material up until the
              twentieth century. Greek less so, but the rules for it are very
              clear-cut and well understood. An LLM will regurgitate the rules
              to you in any language you want, but it cannot actually apply the
              rules properly.
       
                Davidzheng wrote 3 hours 10 min ago:
                is ancient greek similar enough to modern day greek that an
                elementary school kid could learn to compose anything not
                boilerplate in an hour? Also, do you know that if you fed the
                same training material you need to train the kid in an hour
                into the LLM it can't do it?
       
          taylorlapeyre wrote 10 hours 54 min ago:
          It once again completely fails on an extremely simple test: look at a
          screenshot of sheet music, and tell me what the notes are. Producing
          a MIDI file for it (unsurprisingly) was far beyond its capabilities.
          [1] This is not anywhere remotely close to general intelligence.
          
   URI    [1]: https://chatgpt.com/share/68954c9e-2f70-8000-99b9-b4abd69d1a...
       
            adrianh wrote 5 hours 22 min ago:
            Interpreting sheet music images is very complex, and Iâm not
            surprised general-purpose LLMs totally fail at it. Itâs orders of
            magnitude harder than text OCR, due to the two-dimensional-ness.
            
            For much better results, use a custom trained model like the one at
            Soundslice:
            
   URI      [1]: https://www.soundslice.com/sheet-music-scanner/
       
        gnulinux wrote 12 hours 17 min ago:
        My first impressions: not impressed at all. I tried using this for my
        daily tasks today and for writing it was very poor. For this task o3
        was much better. I'm not planning on using this model in the upcoming
        days, I'll keep using Gemini 2.5 Pro, Claude Sonnet, and o3.
       
        mkoubaa wrote 12 hours 23 min ago:
        HyPeRbOlIc SiNgUlArItY
       
        epistemovault wrote 12 hours 34 min ago:
        If AGI really arrives, will it run the worldâor just binge Netflix
        and complain about being tired like the rest of us?
       
        throw03172019 wrote 12 hours 41 min ago:
        Has anyone figured out how to not be forced to use GPT5 in chat gpt?
       
          Jordan-117 wrote 12 hours 37 min ago:
          They said they deprecated all their older models.
       
        w10-1 wrote 12 hours 50 min ago:
        > a real-time router that quickly decides which model to use based on
        conversation type, complexity, tool needs, and explicit intent
        
        I'd love to see factors considered in the algorithm for system-1 vs
        system 2 thinking.
        
        Is "complexity" the factor that says "hard problem"?  Because it's
        often not the complexity that makes it hard.
       
        adammarples wrote 12 hours 51 min ago:
        Which is bigger, 9.9 or 9.11? Well it insta-failed my first test
        question
       
        gtirloni wrote 13 hours 25 min ago:
        If they ever wanted to IPO, maybe now is not the best time.
       
        zone411 wrote 13 hours 36 min ago:
        GPT-5 set a new record on my Confabulations on Provided Texts
        benchmark:
        
   URI  [1]: https://github.com/lechmazur/confabulations/
       
          metzpapa wrote 13 hours 17 min ago:
          For how much Iâve seen it pushed that this model has lower
          hallucination rates, itâs quite odd that every actual test Iâve
          seen says the opposite.
       
        meribold wrote 13 hours 39 min ago:
        Sad to see GPT-4.5 being gone. It knew things. More than any other
        model I'm aware of.
       
          mrits wrote 13 hours 37 min ago:
          I can't imagine anyone leaving this comment besides GPT-4.5
       
        jdelman wrote 13 hours 48 min ago:
        Whenever OpenAI releases a new ChatGPT feature or model, it's always a
        crapshoot when you'll actually be able to use it. The headlines - both
        from tech media coverage and OpenAI itself - always read "now
        available", but then I go to ChatGPT (and I'm a paid pro user) and it's
        not available yet. As an engineer I understand rollouts, but maybe
        don't say it's generally available when it's not?
       
          replwoacause wrote 5 hours 26 min ago:
          Weird. I got it immediately. I actually found out about it when I
          opened the app and saw it and thought âoh, a new model just dropped
          better go check YT for the videoâ which had just been uploaded. And
          Iâm just a Plus user.
       
          andrelaszlo wrote 11 hours 42 min ago:
          I asked GPT about it:
          
          > You are using the newest model OpenAI offers to the public
          (GPT-4o). There is no âGPT-5â model accessible yet, despite the
          splashy headlines.
       
            h4ch1 wrote 7 hours 30 min ago:
            I can use it with the Github Copilot Pro plan.
       
        zone411 wrote 13 hours 50 min ago:
        On the Extended NYT Connections benchmark, GPT-5 Medium Reasoning
        scores close to o3 Medium Reasoning, and GPT-5 Mini Medium Reasoning
        scores close to o4-Mini Medium Reasoning:
        
   URI  [1]: https://github.com/lechmazur/nyt-connections/
       
        UrineSqueegee wrote 13 hours 54 min ago:
        pretty underwhelming results so far for me
       
        revskill wrote 14 hours 25 min ago:
        How do people actually without ai models ???
       
        psyclobe wrote 14 hours 30 min ago:
        Claude Opus 4 has changed my workflow; never going back.
       
          SV_BubbleTime wrote 6 hours 55 min ago:
          It would be very difficult to convince me 6 months ago that I would
          be happy to pay $100 for an AI service. Here we are.
       
        joshmlewis wrote 14 hours 31 min ago:
        It's a really good model from my testing so far. You can see the
        difference in how it tries to use tools to the greatest extent when
        answering a question, especially compared to 4.1 and o3. In this
        example it used 6! tool calls in the first response to try and collect
        as much info as possible.
        
   URI  [1]: https://promptslice.com/share/b-2ap_rfjeJgIQsG
       
          mustaphah wrote 3 hours 10 min ago:
          Is there any value in using XML elements to guide the model instead
          of simple text (e.g., "Recommendation criteria:")?
       
            flexagoon wrote 2 hours 52 min ago:
            XML tags generally help models understand prompts better. That's
            how most official system prompts are written and what the Anthropic
            prompting guide says.
       
          Zone3513 wrote 13 hours 58 min ago:
          That movie doesn't even exist. There is no Thunder Run from 2025.
       
            joshmlewis wrote 13 hours 57 min ago:
            The data is made up, the point is to see how models respond to the
            same input / scenario. You're able to create whatever tools you
            want and import real data or it'll generate fake tool responses for
            you based on the prompt and tool definition.
            
            Disclaimer: I made PromptSlice for creating and comparing prompts,
            tools, and models.
       
          hollownobody wrote 14 hours 11 min ago:
          720 tool calls? Amazing!
       
            joshmlewis wrote 14 hours 2 min ago:
            Where'd you get 720 from?
       
              brian626 wrote 13 hours 56 min ago:
              Math punâ¦ 6! = Factorial(6) = 720
       
                joshmlewis wrote 13 hours 54 min ago:
                Whoosh, it went right over my head.
       
              terhechte wrote 13 hours 58 min ago:
              the _6!_
       
        hahahacorn wrote 14 hours 36 min ago:
        Anecdotally, as someone who operates in a very large legacy codebase, I
        am very impressed by GPT-5's agentic abilities so far. I've given it
        the same tasks I've given Claude and previous iterations via the Codex
        CLI, and instead of getting loss due to the massive scope of the
        problem, it correctly identifies the large scope and breaks it down
        into it's correct parts and creates the correct plan and begins
        executing.
        
        I am wildly impressed. I do not believe that the 0.x% increase in
        benchmarks tell the story of this release at all.
       
          bn-l wrote 4 hours 16 min ago:
          What are you using to run it?
       
          gwd wrote 14 hours 28 min ago:
          I'm a solo founder.  I fed it a fairly large "context doc" for the
          core technology of my company, current state of things, and the
          business strategy, mostly generated with the help of Claude 4, and
          asked it what it thought.  It came back with a massive list of
          detailed ambiguities and inconsistencies -- very direct and detailed.
           The only praise was the first sentence of the feedback: "The core
          idea is sound and well-differentiated."
          
          It's got quite a different feel so far.
       
        6ai wrote 14 hours 38 min ago:
        Shall we say â¦ ASI is here ???
       
        ElijahLynn wrote 14 hours 38 min ago:
        OpenAI is the new Google.
       
        TechDebtDevin wrote 14 hours 52 min ago:
        Gemini Flash is about 100x better at using my browser than Chat GPT 5
        lmfao.
       
        gigatexal wrote 14 hours 54 min ago:
        I for one am totally here for the autocomplete revolution. Hundreds of
        billions of dollars spent to make autocomplete better. Cool.
       
        mafro wrote 14 hours 58 min ago:
        One reason for this release is surely to respond to their mess of
        product line-up naming.
        
        How many people are going to understand (or remember) the difference
        between:
        
        GPT-4o
        GPT-4.1
        o3
        o4
        ....
        
        Anthropic and Google have a much better named product for the market
       
        semiinfinitely wrote 15 hours 5 min ago:
        im just glad that I don't have to switch between models any more. for
        me thats a huge ease of use improvement.
       
        agnosticmantis wrote 15 hours 11 min ago:
        Unless the whole presentation was generated using sora-gpt-5 or
        something, this was very underwhelming.
        
        We know for a fact the slides/charts were generated using an LLM, so
        the hypothesis is not totally unfounded. /s
       
        andrewinardeer wrote 15 hours 17 min ago:
        Every release of every SOTA model is the same.
        
        "It's like having a bunch of experts at your fingertips"
        
        "Our most capable model ever"
        
        "Complex reasoning and chain of thought"
       
       
   DIR <- back to front page