gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Are OpenAI and Anthropic losing money on inference?
       
       
        chillee wrote 1 hour 14 min ago:
        This article's math is wrong on many fundamental levels. One of the
        most obvious ones is that prefill is nowhere near bandwidth bound.
        
        If you compute out the MFU the author gets it's 1.44 million input
        tokens per second * 37 billion active params * 2 (FMA) / 8 [GPUs per
        instance] = 13 Petaflops per second. That's approximately 7x absolutely
        peak FLOPS on the hardware. Obviously, that's impossible.
        
        There's many other issues with this article, such as assuming only 32
        concurrent requests(?), only 8 GPUs per instance as opposed to the more
        efficient/standard prefill-decode disagg setups, assuming that
        attention computation is the main thing that makes models
        compute-bound, etc. It's a bit of an indictment of HN's understanding
        of LLMs that most people are bringing up issues with the article that
        aren't any of the fundamental misunderstandings here.
       
          pama wrote 29 min ago:
          Agree that the writeup is very wrong, especially for the output
          tokens. Here is how anyone with enough money to allocate a small
          cluster of powerful GPUs can decode huge models at scale, since
          nearly 4 months ago, with costs of 0.2 USD/million output tokens. [1]
          This has gotten significantly cheaper yet with additional code hacks
          since then, and with using the B200s.
          
   URI    [1]: https://lmsys.org/blog/2025-05-05-large-scale-ep/
       
          Den_VR wrote 53 min ago:
          So, bottom line, do you think itâs probable that either OpenAI or
          Anthropic are âlosing money on inference?â
       
            chillee wrote 43 min ago:
            No. In some sense, the article comes to the right conclusion haha.
            But it's probably >100x off on its central premise about output
            tokens costing more than input.
       
              martinald wrote 27 min ago:
              Thanks for the correction (author here). I'll update the article
              - very fair point on compute on input tokens which I messed up.
              Tbh I'm pleased my napkin math was only 7x off the laws of
              physics :).
              
              Even rerunning the math on my use cases with way higher input
              token cost doesn't change much though.
       
                chillee wrote 8 min ago:
                The 32 parallel sequences is also arbitrary and significantly
                changes your conclusions. For example, if they run with 256
                parallel sequences then that would result in a 8x cheaper
                factor in your calculations for both prefill and decode.
                
                The component about requiring long context lengths to be
                compute-bound for attention is also quite misleading.
       
              doctorpangloss wrote 36 min ago:
              Iâm pretty sure input tokens are cheap because they want to
              ingest the data for training later no? They want huge contexts to
              slice up.
       
        yalogin wrote 2 hours 34 min ago:
        Will these companies ever stop training new models? What does it mean
        if we get there. Feels like they will have to constantly train and
        improve the models, not sure what that means either. What ncremental
        improvements can these models show?
        
        Another question is - will it ever become less costly to train?
        
        Let to see opinions from someone in the know
       
          Romario77 wrote 2 hours 25 min ago:
          current way the models works is that they don't have memory, it's
          included in training (or has to be provided as context).
          
          So to keep up with times the models have to be constantly trained.
          
          One thing though is that right now it's not just incremental
          training, the whole thing gets updated - multiple parameters and how
          the model is trained is different.
          
          This might not be the case in the future where the training could
          become more efficient and switch to incremental updates where you
          don't have to re-feed all the training data but only the new things.
          
          I am simplifying here for brevity, but I think the gist is still
          there.
       
            yalogin wrote 1 hour 53 min ago:
            Sure the training can be made efficient, but how much better can
            these LLMs get in functionality?
       
            senko wrote 2 hours 18 min ago:
            Updating the internal knowledge is not the primary motivator here,
            as you can easily, and more reliably (less hallucination), get that
            information at inference stage (through web search tool).
            
            They're training new models because the (software) technology keeps
            improving, (proprietary) data sets keep improving (through a lot of
            manual labelling but also synthetic data generation), and in
            general researchers have better understanding of what's important
            when it comes to LLMs.
       
        atleastoptimal wrote 3 hours 11 min ago:
        Everyone claiming AI companies are a financial ticking time bomb are
        using the same logic people used back in the 2000s when they claimed
        Amazon ânever made a profitâ and thus was a bad investment.
       
          9cb14c1ec0 wrote 1 hour 22 min ago:
          Wrong.    The depreciation cost on the hundreds of billions of dollars
          spent on the AI build-out is almost certainly larger then the AI
          industry's gross income.  This is a vastly different depreciation
          cost schedule than AWS.
       
        ProofHouse wrote 3 hours 44 min ago:
        Only introducing this *NVIDIA AI Released Jet-Nemotron: 53x Faster
        Hybrid-Architecture Language Model Series that Translates to a 98% Cost
        Reduction for Inference at Scale, into the conversation as it just
        dropped (so it is timely) and while it seems unlikely either OpenAI or
        Anthropic use this or a technique like it (yet or if they even can),
        these types of breakthroughs may introduce dramatic savings for both
        closed and open source inference at scale moving forward
        
   URI  [1]: https://www.marktechpost.com/2025/08/26/nvidia-ai-released-jet...
       
        reenorap wrote 4 hours 18 min ago:
        An interesting exercise would be what prompts would create the most
        costs for LLMs but outwards little to no costs for the issuers. Are all
        prompts equal and the only factor being the lengths of the input and
        output prompts? Or is there processing of the prompts that could be
        exceedingly expensive for the LLM?
       
        fallmonkey wrote 4 hours 32 min ago:
        The estimation for output token is too low since one reasoning-enabled
        response can burn through thousands of output tokens. Also low for
        input tokens since in actual use there're many context (memory,
        agents.md, rules, etc) included nowadays.
       
          atq2119 wrote 29 min ago:
          When using APIs, you pay for reasoning tokens like you do for actual
          outputs. So, the estimation on a per-token basis is not affected by
          reasoning.
          
          What reasoning affects is the ratio of input to output tokens, and
          since input tokens are cheaper, that may well affect the economics in
          the end.
       
        resters wrote 4 hours 46 min ago:
        Consider some of the scaling properties of frontier cloud LLMs:
        
        1) routing: traffic can be routed to smaller, specialized, or quantized
        models
        
        2) GPU throughput vs latency: both parameters can be tuned and adjusted
        based on demand. What seems like lots of deep "thinking" might just be
        trickling the inference over less GPU resources for longer.
        
        3) caching
       
        WhitneyLand wrote 4 hours 54 min ago:
        Model context limits are not âartificialâ as claimed.
        
        The largest context window a model can offer at a given quality level
        depends on the context size the model was pretrained with as well as
        specific fine tuning techniques.
        
        Itâs not simply a matter of considering increased costs.
       
          Der_Einzige wrote 4 hours 44 min ago:
          Context extension methods exist and work. Please educate yourself
          about these rather than confidentially saying wrong things.
       
        _sword wrote 5 hours 6 min ago:
        I've done the modeling on this a few times and I always get to a place
        where inference can run at 50%+ gross margins, depending mostly on GPU
        depreciation and how good the host is at optimizing utilization. The
        challenge for the margins is whether or not you consider model training
        costs as part of the calculation. If model training isn't capitalized +
        amortized, margins are great. If they are amortized and need to be
        considered... yikes
       
          lumost wrote 2 hours 47 min ago:
          I wonder how much capex risk there is in this model, depreciating the
          GPUs over 5 years is fine if you can guarantee utilization. Losing
          market share might be a death sentence for some of these firms as
          utilization falls.
       
          next_xibalba wrote 3 hours 15 min ago:
          > whether or not you consider model training costs as part of the
          calculation
          
          Whether they flow through COGS/COR or elsewhere on the income
          statement, they've gotta be recognized. In which case, either you
          have low gross margins or low operating profit (low net income??).
          Right?
          
          That said, I just can't conceive of a way that training costs are not
          hitting gross margins. Be it IFRS/GAAP etc., training is 1) directly
          attributable to the production of the service sold, 2) is not SG&A,
          financing, or abnormal cost, and  thus 3) only makes sense to match
          to revenue.
       
          ozgune wrote 3 hours 50 min ago:
          I agree that you could get to high margins, but I think the modeling
          holds only if you're an AI lab operating at scale with a setup tuned
          for your model(s). I think the most open study on this one is from
          the DeepSeek team: [1] For others, I think the picture is different.
          When we ran benchmarks on DeepSeek-R1 on 8x H200 SXM using vLLM, we
          got up to 12K total tok/s (concurrency 200, input:output ratio of
          6:1). If you're spiking up 100-200K tok/s, you need a lot of GPUs for
          that. Then, the GPUs sit idle most of the time.
          
          I'll read the blog post in more detail, but I don't think the
          following assumptions hold outside of AI labs.
          
          * 100% utilization (no spikes, balanced usage between day/night or
          weekdays)
          * Input processing is free (~$0.001 per million tokens)
          * DeepSeek fits into H100 cards in a way that network isn't the
          bottleneck
          
   URI    [1]: https://github.com/deepseek-ai/open-infra-index/blob/main/20...
       
            _sword wrote 1 hour 57 min ago:
            I was modeling configurations purpose-built for running specific
            models in specific workloads. I was trying to figure out how much
            of a gross margin drag some software companies could have if they
            hosted their own models and served them up as APIs or as integrated
            copilots with their other offerings
       
          ProofHouse wrote 3 hours 52 min ago:
          can you share the model?
       
          trilogic wrote 4 hours 16 min ago:
          I have to disagree. The biggest cost is still energy consumption,
          water and maintenance. Not to mention, to keep up with the rivals in
          incredibly high tempo (so offering billions like Meta recently). Then
          the cost of hardware that is equal to Nvidia skyrocketing shares :) 
          No one should dare to talk about profit yet. Now is time to grab the
          market, invest a lot and work hard, hopping for a future profit. The
          equation is still work on progress.
       
            DoesntMatter22 wrote 3 hours 37 min ago:
            Is that not baked into the h100 rental costs?
       
              tptacek wrote 3 hours 31 min ago:
              It is.
       
            wtallis wrote 4 hours 13 min ago:
            > The biggest cost is still energy consumption, water and
            maintenance.
            
            Are you saying that the operating costs for inference exceed the
            costs of training?
       
              trilogic wrote 3 hours 22 min ago:
              The global cost of inference in both Openai and Anthropic it
              exceed training cost for sure. 
              The reason is simple: the inference cost grows with requests not
              with datasets. My math simplified by AI says: Suppose training
              GPT-like model costs
              
              =
              $
              10,000,000
              C
              T
              
              =$10,000,000.
              
              Each query costs
              
              =
              $
              0.002
              C
              I
              
              =$0.002.
              
              Break-even:
              
              >
              10,000,000
              0.002
              =
              5,000,000,000
              
              inferences
              N>
              0.002
              10,000,000
              
              =5,000,000,000inferences
              
              So after 5 billion queries, inference costs surpass the training
              cost.
              
              Openai claims it has 100 million users x queries = I let you
              judge.
       
              umpalumpaaa wrote 4 hours 1 min ago:
              No. But training an LLM is certainly very very expensive and a
              gamble every time you do it. I think of it a bit like a
              pharmaceutical company doing vaccine researchâ¦
       
          BlindEyeHalo wrote 4 hours 40 min ago:
          Why wouldn't you factor in training? It is not like you can train
          once and then have the model run for years. You need to constantly
          improve to keep up with the competition. The lifespan of a model is
          just a few months at this point.
       
            jacurtis wrote 1 hour 2 min ago:
            In a recent episode of Hard Fork podcast, the hosts discussed an
            on-the-record conversation they had with Sam Altman from OpenAI.
            They asked him about profitability and he claimed that they are
            losing money mostly because of the cost of training. But as the
            model advances, they will train less and less. Once you take
            training out of the equation he claimed they were profitable based
            on the cost of serving the trained foundation models to users at
            current prices.
            
            Now, when he said that, his CFO corrected him and said they aren't
            profitable, but said "it's close".
            
            Take that with a grain of salt, but thats a conversation from one
            of the big AI companies that is only a few weeks old. I suspect
            that it is pretty accurate that pricing is currently reasonable if
            you ignore training. But training is very expensive and the reason
            most AI companies are losing money right now.
       
              dgfitz wrote 4 min ago:
              > But as the model advances, they will train less and less.
              
              They sure have a lot of training to do between now and whenever
              that happens. Rolling back from 5 to whatever was before it is
              their own admission of this fact.
       
              pas wrote 46 min ago:
              > most AI companies are losing money right now
              
              which is completely "normal" at this point, """right"""? if you
              have billions of VC money chasing returns there's no time to sit
              around, it's all in, the hype train doesn't wait for
              bootstrapping profitability. and of course with these gargantuan
              valuations and mandatory YoY growth numbers, there is no way they
              are not fucking with the unit economy numbers too. (biases are
              hard to beat, especially if there's not much conscious effort to
              do so.)
       
            _sword wrote 2 hours 0 min ago:
            I spoke with management at a couple companies that were training
            models, and some of them expensed the model training in-period as
            R&D. That's why
       
            ugh123 wrote 3 hours 27 min ago:
            It's possible they factor in training purely as an "R&D" cost and
            then can tax that development at a lower rate.
       
            christina97 wrote 4 hours 8 min ago:
            In the same way that every other startup tries to sweep R&D costs
            under the rug and say âyeah but the marginal unit economics have
            50% gross margins, weâll be a great business soonâ.
       
            vonneumannstan wrote 4 hours 9 min ago:
            I suspect we've already reached the point with models at the GPT5
            tier where the average person will no longer recognize improvements
            and this model can be slightly improved at slow intervals and
            indeed run for years. Meanwhile research grade models will still
            need to be trained at massive cost to improve performance on
            relatively short time scales.
       
              black_knight wrote 2 hours 4 min ago:
              Strangely, I feel GPT-5 as the opposite of an improvement over
              the previous models, and consider just using Claude for actual
              work. Also the voice mode went from really useful to useless
              âAbsolutely, I will keep it brief and give it to you directly.
              â¦some wrong annswerâ¦ And there you have it! As simple as
              that!â
       
                vonneumannstan wrote 1 hour 21 min ago:
                >Strangely, I feel GPT-5 as the opposite of an improvement over
                the previous models
                
                This is almost surely wrong but my point was about GPT5 level
                models in general not GPT5 specifically...
       
              AJ007 wrote 3 hours 52 min ago:
              Whenever someone has complained to me about issues they are
              having with ChatGPT on a particular question or type of question,
              the first thing I do is ask them what model they are using. So
              far, no one has ever known offhand what model they were using,
              nor were not aware there are more models!
              
              If you understand there are multiple models from multiple
              providers, some of those models are better at certain things than
              others, and how you can get those models to complete your tasks,
              you are in the top 1% (probably less) of LLM users.
       
                th0ma5 wrote 1 hour 17 min ago:
                This would be helpful if there was some kind of first principle
                at which to gauge that better or worse comparison but there
                isn't outside of people's value judgements like what you're
                offering.
       
            MontyCarloHall wrote 4 hours 19 min ago:
            As long as models continue on their current rapid improvement
            trajectory, retraining from scratch will be necessary to keep up
            with the competition. As you said, that's such a huge amount of
            continual CapEx that it's somewhat meaningless to consider AI
            companies' financial viability strictly in terms of inference
            costs, especially because more capable models will likely be much
            more expensive to train.
            
            But at some point, model improvement will saturate (perhaps it
            already has). At that point, model architecture could be frozen,
            and the only purpose of additional training would be to bake new
            knowledge into existing models. It's unclear if this would require
            retraining the model from scratch, or simply fine-tuning existing
            pre-trained weights on a new training corpus. If the former, AI
            companies are dead in the water, barring a breakthrough in
            dramatically reducing training costs. If the latter, assuming the
            cost of fine-tuning is a fraction of the cost of training from
            scratch, the low cost of inference does indeed make a bullish case
            for these companies.
       
              mgh95 wrote 1 hour 54 min ago:
              > If the latter, assuming the cost of fine-tuning is a fraction
              of the cost of training from scratch, the low cost of inference
              does indeed make a bullish case for these companies.
              
              On the other hand, this may also turn into cost effective methods
              such as model distillation and spot training of large companies
              (similarly to Deepseek). This would erode the comparative
              advantage of Anthropic and OpenAI, and result in a pure value-add
              play for integration with data sources and features such as SSO.
              
              It isn't clear to me that a slowing of retraining will result in
              advantages to incumbents if model quality cannot be readily
              distinguished by end-users.
       
                echelon wrote 3 min ago:
                > model distillation
                
                I like to think this is the end of software moats. You can
                simply call a foundation model company's API enough times and
                distill their model.
                
                It's like downloading a car.
                
                Distribution still matters, of course.
       
          lawlessone wrote 5 hours 3 min ago:
          Does that include legal fights and potential payouts to artists and
          writers whose work was used without permission?
          
          Can anyone explain why it's not allowed to compensate the creators of
          the data?
       
            Night_Thastus wrote 5 hours 1 min ago:
            It's already questionable if anyone can make it profitable once you
            account for all the costs. Why do you think they try to squash the
            legal concerns so hard? If they move fast and stick their fingers
            in their ears, they can just steal whatever the want.
       
            ergocoder wrote 5 hours 2 min ago:
            Of course not. Those usually wouldn't be considered "margin".
            
            Another similar example is R&D and development by engineers aren't
            considered in margin either.
       
        snowwrestler wrote 5 hours 8 min ago:
        Not wishing to do a shallow dismissal here, but I always assumed AI
        must be profitable on inference otherwise no one would pursue it as a
        business given how expensive the training is.
        
        It seems sort of like wondering if a fiber ISP is profitable per GB
        bandwidth. Of course it is; the expensive part is getting the fiber to
        all the homes. So the operations must be profitable or there is simply
        no business model possible.
       
          overgard wrote 4 hours 24 min ago:
          AI right now seems more like a religious movement than a business
          one. It doesn't matter how much it costs (to the true believers), its
          about getting to AGI first.
       
        stephenatgmi wrote 5 hours 24 min ago:
        Some agent startups are already feeling the squeeze â The Information
        reported Cursorâs gross margins hit â16% due to token costs. So
        even if inference is profitable for OAI/Anthropic, downstream
        token-hungry apps may not see the same unit economics, and that is why
        token-intensive agent startups like Cursor and Perplexity are taking
        open-source models like Qwen or other OSS-120B and post-training them
        to bring down inference costs.
       
        KallDrexx wrote 5 hours 45 min ago:
        Since DeepSeek R1 is open weight, wouldn't it be better to validate the
        napkin math to validate how many realistic LLM full inferences can be
        done on a single H100 in a time period, and calculate the token cost of
        that?
        
        Without having in depth knowledge of the industry, the margin
        difference between input and output tokens is very odd to me between
        your napkin math and the R1 prices.  That's very important as any
        reasoning model explodes reasoning tokens, which means you'll encounter
        a lot more output tokens for fewer input tokens, and that's going to
        heavily cut into the high margin ("essentially free") input token cost
        profit.
        
        Unless I'm reading the article wrong.
       
          RhythmFox wrote 4 hours 45 min ago:
          I am so glad someone else called this out, I was reading the napkin
          math portions and struggling to see how the numbers really worked out
          and I think you hit the nail on the head.  The author is assuming
          'essentially free' input token cost and extrapolating in a business
          model that doesn't seem to connect directly to any claimed
          'usefulness'.  I think the bias on this is stated in the beginning of
          the article clearly as the author assumes 'given how useful the
          current models are...'.  That is not a very scientific starting point
          and I think it leads to reasoning errors within the business model he
          posits here.
          
          There were some oddities with the numbers themselves as well but I
          think it was all within rounding, though it would have been nice for
          the author to spell it out when he rounded some important numbers (~s
          don't tell me a whole lot).
          
          TL;DR I totally agree, there are some napkin math issues going on
          here that make this pretty hard to see as a very useful stress test
          of cost.
       
        overgard wrote 5 hours 52 min ago:
        I thought the thing that made DeepSeek interesting (besides competition
        from China) was that its inference costs were something like 1/10th. So
        unless that gap has been bridged (has it?) I don't think a calculation
        based on DeepSeek can apply to OpenAI or Anthropic.
       
        fancyfredbot wrote 6 hours 9 min ago:
        When you are operating at scale you are likely to use a small model
        during the auto regressive phase to generate sequential tokens and only
        involve the large model once you've generated several tokens. Whenever
        the two predict the same output you effectively generate more than one
        token at a time. The idea is the models will agree often enough to
        significantly reduce output token costs. Does anyone know how effective
        that is in practice?
       
        simonw wrote 6 hours 39 min ago:
         [1] quotes Sam Altman saying:
        
        > Most of what we're building out at this point is the inference [...]
        We're profitable on inference. If we didn't pay for training, we'd be a
        very profitable company.
        
   URI  [1]: https://www.axios.com/2025/08/15/sam-altman-gpt5-launch-chatgp...
       
          noodlescb wrote 5 hours 21 min ago:
          Except these tech billionaires lie the most of the time. This is
          still the "grow at any cost" phase, so I don't even genuinely believe
          he has a confident understanding of how or at what point anything
          will be profitable. This just strikes me as the best answer he has at
          the moment.
       
          827a wrote 5 hours 46 min ago:
          From the latest NYT Hard Fork podcast [1]. The hosts were invited to
          a dinner hosted by Sam, where Sam said "we're profitable if we remove
          training from the equation", they report he turned to Lightcap (COO)
          and asked "right?" and Lightcap gave an "eeekk we're close".
          
          They aren't yet profitable even just on inference, and its possible
          Sam didn't know that until very recently.
          
   URI    [1]: https://www.nytimes.com/2025/08/22/podcasts/is-this-an-ai-bu...
       
            twoodfin wrote 4 hours 46 min ago:
            âWeâre not profitable even if we discount training costs.â
            
            and
            
            âInference revenue significantly exceeds inference costs.â
            
            are not incompatible statements.
            
            So maybe only the first part of Samâs comment was correct.
       
              NoahZuniga wrote 1 hour 6 min ago:
              I imagine that one of the largest costs for openai is the wages
              they pay.
       
          aeternum wrote 6 hours 31 min ago:
          This can be technically true without being actually true.
          
          IE OpenAI invests in Cursor/Windsurf/Startups that give away credits
          to users and make heavy use of inference API.  Money flows back to
          OpenAI then OpenAI sends it back to those companies via
          credits/investment $.
          
          It's even more circular in this case because nvidia is also funding
          companies that generate significant inference.
          
          It'll be quite difficult to figure out whether it's actually
          profitable until the new investment dollars start to dry up.
       
            citizenpaul wrote 4 hours 37 min ago:
            There a journalist ed zittron [1] That is an openai skeptic.  His
            research if correct says not only is openai unprofitable but it
            likely never will be.  Can't be ,its various finance ratios make
            early uber, amazon ect look downright fiscally frugal.
            
            He is not a tech person for what that means to you.
            
   URI      [1]: https://www.wheresyoured.at/
       
              dcre wrote 3 hours 10 min ago:
              Zitron is not a serious analyst. [1] [2] [3]
              
   URI        [1]: https://bsky.app/profile/davidcrespo.bsky.social/post/3l...
   URI        [2]: https://bsky.app/profile/davidcrespo.bsky.social/post/3l...
   URI        [3]: https://bsky.app/profile/davidcrespo.bsky.social/post/3l...
   URI        [4]: https://bsky.app/profile/davidcrespo.bsky.social/post/3l...
       
                jrflowers wrote 1 hour 16 min ago:
                Ed Zitron: I donât think OpenAI will become profitable
                
                The link you posted: I think it is very plausible that it will
                be hard for OpenAI to become profitable
       
                  dcre wrote 52 min ago:
                  Are you referring to the post where I listed 4 claims and
                  marked one ridiculous, one wrong, one unlikely, and one
                  plausible?
                  
                  He is not wrong about everything. For example, after Sam
                  Altman said in January that OpenAI would introduce a model
                  picker, Zitron was able to predict in March that OpenAI would
                  introduce a model picker. And he was right about that.
       
                    jrflowers wrote 17 min ago:
                    Yes. In this thread about the profitability (or lack
                    thereof) of OpenAIâs business model, I pointed out the
                    part where you appeared to agree with Ed Zitron about the
                    profitability (or lack thereof) of OpenAIâs business
                    model. Like it seems like all of those posts were pretty
                    clearly motivated by wanting to poke holes in Zitronâs
                    criticism, their lack of profitability (which is central to
                    his criticism) is where you declined to push back with any
                    real argument.
       
              oblio wrote 3 hours 30 min ago:
              Amazon was very frugal. If you look at Amazon losses for the
              first 10 years, they were all basically under 5% of revenue and
              many years were break even or slightly net positive.
              
              Uber burnt through a lot of money and even now I'm not sure their
              lifetime revenue is positive (it's possible that since their
              foundation they've lost more money than they've made).
       
                citizenpaul wrote 3 hours 27 min ago:
                Exactly Zittrons point.
       
            onlyrealcuzzo wrote 5 hours 39 min ago:
            It's even more circular, because Microsoft and Amazon also fund
            ChatGPT and Anthropic with Azure and AWS credits.
       
            milesskorpen wrote 5 hours 40 min ago:
            While this could be true, I don't think OpenAI is investing the
            $hundreds of millions-to-billions that would be required otherwise
            make it actually true.
            
            OpenAI's fund is ~$250-300mm
            Nvidia reportedly invested $1b last year - still way less than Open
            AI revenue
       
          drob518 wrote 6 hours 32 min ago:
          Which is like saying, âIf all we did is charge people money and
          didnât have any COGS, weâd be a very profitable company.â
          Thatâs a truism of every business and therefore basically
          meaningless.
       
            gomox wrote 6 hours 4 min ago:
            I can't imagine the hoops an accountant would have to go through to
            argue training cost is COGS. In the most obvious
            stick-figures-for-beginners interpretation, as in, "If I had to
            explain how a P&L statement works to an AI engineer", training is
            R&D cost and inference cost is COGS.
       
              drob518 wrote 4 hours 58 min ago:
              I wasnât using COGS in a GAAP sense, but rather as a synonym
              for unspecified âcosts.â My bad. I suppose you would classify
              training as development and ongoing datacenter and GPU costs as
              actual GAAP COGS. My point was, if all you focus on is revenue
              and ignore the costs of creating your business and keeping it
              running, itâs pretty easy for any business to be
              âprofitable.â
       
                DenisM wrote 4 hours 22 min ago:
                Itâs generally useful to consider unit economy separate from
                whole company. If your unit economy is negative thing are very
                bleak. If itâs positive, your chance are going up by a lot -
                scaling the business amortizes fixed (non-unit) costs, such as
                admin and R&D, and slightly improves unit margins as well.
                
                However this does not work as well if your fixed (non-unit)
                cost is growing exponentially. You canât get out of this
                unless your user base grows exponentially or the customer value
                (and price) per user grows exponentially.
                
                I think this is what Altman is saying - this is an unusual
                situation: unit economy is positive but fixed costs are
                exploding faster than economy if scale can absorb it.
                
                You can say itâs splitting hair, but insightful perspective
                often requires teasing things apart.
       
                  drob518 wrote 3 hours 36 min ago:
                  Itâs splitting a hair, but a pretty important hair. Does
                  anyone think that models wonât need continuous retraining?
                  Does anyone think models wonât continue to try to scale?
                  Personally, I think weâre reaching diminishing returns with
                  scaling, which is probably good because weâve basically run
                  out of content to train on, and so perhaps that does stop or
                  at least slow down drastically. But I donât see a scenario
                  where constant retraining isnât the norm, even if the rough
                  amount of content weâre using for it grows only slightly.
       
              jgalt212 wrote 6 hours 0 min ago:
              there's not a bright line there, though.
       
            dcre wrote 6 hours 20 min ago:
            The Amodei quote in my other reply explains why this is wrong. The
            point is not to compare the training of the current model to
            inference on the current model. The thing that makes them lose so
            much money is that they are training the next model while making
            back their training cost on the current model. So it's not COGS at
            all.
       
              drob518 wrote 5 hours 3 min ago:
              So,if they stopped training theyâd be profitable? Only in some
              incremental sense, ignoring all sunk costs.
       
              ToucanLoucan wrote 6 hours 6 min ago:
              So is OpenAI capable of not making a new model at some point?
              They've been training the next model continuously as long as
              they've existed AFAIK.
              
              Our software house spends a lot on R&D sure, but we're still
              incredibly profitable all the same. If OpenAI is in a position
              where they effectively have to stop iterating the product to be
              profitable, I wouldn't call that a very good place to be when
              you're on the verge of having several hundred billion in debt.
       
                DenisM wrote 4 hours 17 min ago:
                Thereâs still untapped value in deeper integrations. They
                might hit a jackpot of exponentially increasing value from
                network effects caused by tight integration with e.g. disjoint 
                
                business processes.
                
                We know that businesses with tight network effects can grow to
                about 2 trillion in valuation.
       
                  oblio wrote 3 hours 33 min ago:
                  How would that look with at least 3 US companies, probably 2
                  Chinese ones and at least 1 European company developing state
                  of the art LLMs?
       
                    drob518 wrote 3 hours 23 min ago:
                    Like a very over-served market, I think. I see perhaps
                    three survivors long term, or at most one gorilla, two
                    chimps, and perhaps a few very small niche-focused monkeys.
       
                dcre wrote 5 hours 4 min ago:
                I think at that point there is strong financial pressure to
                figure out how to continuously evolve models instead of
                changing new ones, for example by building models out of
                smaller modules that can be trained individually and swapped
                out. Jeff Dean and Noam Shazeer talked about that a bit in
                their interview with Dwarkesh:
                
   URI          [1]: https://www.dwarkesh.com/p/jeff-dean-and-noam-shazeer
       
              prasadjoglekar wrote 6 hours 8 min ago:
              Well, only if the one training model continued to function as a
              going business. Their amortization window for the training cost
              is 2 months or so. They can't just keep that up and collect $.
              
              They have to build the next model, or else people will go to
              someone else.
       
                dcre wrote 5 hours 6 min ago:
                Why two months? It was almost a year between Claude 3.5 and 4.
                (Not sure how much it costs to go from 3.5 to 3.7.)
       
                  oblio wrote 3 hours 40 min ago:
                  Don't they need to accelerate that, though? Having a 1 year
                  old model isn't really great, it's just tolerable.
       
                    dcre wrote 3 hours 34 min ago:
                    I think this is debatable as more models become good enough
                    for more tasks. Maybe a smaller proportion of tasks will
                    require SOTA models. On the other hand, the set of tasks
                    people want to use LLMs for will expand along with the
                    capabilities of SOTA models.
       
                  Jalad wrote 3 hours 47 min ago:
                  Even being generous, and saying it's a year, most capital
                  expenditures depreciate over a period of 5-7 years. To state
                  the obvious, training one model a year is not a saving grace
       
                    dcre wrote 3 hours 37 min ago:
                    I don't understand why the absolute time period matters â
                    all that matters is that you get enough time making money
                    on inference to make up for the cost of training.
       
          ugh123 wrote 6 hours 35 min ago:
          That might be the case, but inference times have only gone up since
          GPT-3 (GPT-5 is regularly 20+ seconds for me).
       
            asabla wrote 6 hours 28 min ago:
            And by GPT-5 you mean through their API? Directly through Azure
            OpenAI services? or are you talking about ChatGPT set to using
            GPT-5.
            
            All of these alternatives means different things when you say it
            takes +20 seconds for a full response.
       
              ugh123 wrote 6 hours 12 min ago:
              Sure, apologies. I mean ChatGPT UI
       
          dcre wrote 6 hours 36 min ago:
          ICYMI, Amodei said the same in much greater detail:
          
          "If you consider each model to be a company, the model that was
          trained in 2023 was profitable. You paid $100 million, and then it
          made $200 million of revenue. There's some cost to inference with the
          model, but let's just assume, in this cartoonish cartoon example,
          that even if you add those two up, you're kind of in a good state.
          So, if every model was a company, the model, in this example, is
          actually profitable.
          
          What's going on is that at the same time as you're reaping the
          benefits from one company, you're founding another company that's
          much more expensive and requires much more upfront R&D investment.
          And so the way that it's going to shake out is this will keep going
          up until the numbers go very large and the models can't get larger,
          and then it'll be a large, very profitable business, or, at some
          point, the models will stop getting better, right? The march to AGI
          will be halted for some reason, and then perhaps it'll be some
          overhang. So, there'll be a one-time, 'Oh man, we spent a lot of
          money and we didn't get anything for it.' And then the business
          returns to whatever scale it was at."
          
   URI    [1]: https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...
       
            kgwgk wrote 3 hours 48 min ago:
            >> If we didn't pay for training, we'd be a very profitable
            company.
            
            > ICYMI, Amodei said the same
            
            No. He says that even paying for training a model is profitable. It
            makes more revenue that it costs - all things considered. A much
            stronger claim.
       
              dcre wrote 3 hours 30 min ago:
              I take them to be saying the same thing â the difference is
              that Altman is referring to the training of the next model
              happening now, while Amodei is referring to the training months
              ago of the model you're currently earning money back on through
              inference.
       
                kgwgk wrote 1 hour 36 min ago:
                Maybe he means that but the quote says âWe're profitable on
                inference.â - not âWe're profitable on inference including
                training of that model.â
       
            DenisM wrote 4 hours 31 min ago:
            Fantastic perspective.
            
            Basically each new company puts competitive pressure on the
            previous company, and together they compress margins.
            
            They are racing themselves to the bottom. I imagine they know this
            and bet on AGI primacy.
       
              oblio wrote 3 hours 58 min ago:
              >  I imagine they know this and bet on AGI primacy.
              
              Just like Uber and Tesla are betting on self driving cars. I
              think it's been 10 years now ("any minute now").
       
                runako wrote 2 hours 32 min ago:
                Notably, Uber switched horses and now runs Waymos with no human
                drivers.
       
            Avshalom wrote 4 hours 33 min ago:
            Okay but noticeably he invents two numbers then pretends that a
            third number is irrelevant in order to claim that each model (which
            is not a company) is a profitable company.
            
            You'd think maybe the CEO might be able to give a ball park on the
            profit made off that 2023 model.
            
            ETA: "You paid $100 million... There's some cost to inference with
            the model, but let's just assume ... that even if you add those two
            up, you're kind of in a good state."
            
            You see this right? He literally says that if you assume revenue
            exceeds costs then it's profitable. He doesn't actually say that it
            does though.
       
            LZ_Khan wrote 5 hours 25 min ago:
            Also Amodei has an assumption that a 100m model will make 200m of
            revenue but a 1B model will make 2B of revenue. Does that really
            hold up? There's no phenomenon that prevents them from only making
            200m of revenue off a $1B model.
       
              colinsane wrote 4 hours 45 min ago:
              > So, there'll be a one-time, 'Oh man, we spent a lot of money
              and we didn't get anything for it.'
       
              jumploops wrote 5 hours 12 min ago:
              GPT-4.5 has entered the chat..
       
            meshugaas wrote 5 hours 32 min ago:
            The "model as company" metaphor makes no sense. It should actually
            be models are products, like a shoe. Nike spends money developing a
            shoe, then building it, then they sell it, and ideally those R&D
            costs are made up in shoe sales. But you still have to run the
            whole company outside of that.
            
            Also, in Nike's case, as they grow they get better at making more
            shoes for cheaper. LLM model providers tell us that every new model
            (shoe) costs multiples more than the last one to develop. If they
            make 2x revenue on training, like he's said, to be profitable they
            have to either double prices or double users every year, or stop
            making new models.
       
              Szpadel wrote 3 hours 53 min ago:
              I believe better analogy is CPU development on next process node.
              
              each node is much more expensive to design for, but when you
              finally have it you basically print money.
              
              and of course you always have to develop next more powerful and
              power efficient CPU to keep competitive
       
              skybrian wrote 4 hours 4 min ago:
              Analogies don't prove anything, but they're still useful for
              suggesting possibilities for thinking about a problem.
              
              If you don't like "model as company," how about "model as making
              a movie?" Any given movie could be profitable or not. It's not
              necessarily the case that movie budgets always get bigger or that
              an increased budget is what you need to attract an audience.
       
              vonneumannstan wrote 4 hours 5 min ago:
              >Also, in Nike's case, as they grow they get better at making
              more shoes for cheaper.
              
              This is clearly the case for models as well. Training and serving
              inference for GPT4 level models is probably > 100x cheaper than
              they used to be. Nike has been making Jordan 1's for 40+ years!
              OpenAI would be incredibly profitable if they could live off the
              profit from improved inference efficiency on a GPT4 level model!
       
                Avshalom wrote 3 hours 54 min ago:
                >>This is clearly the case ... probably
                
                >>OpenAI would be incredibly profitable if they could live off
                the profit from improved inference efficiency on a GPT4 level
                model!
                
                If gpt4 was basically free money at this point it's real weird
                that their first instinct was to cut it off after gpt5
       
                  steveklabnik wrote 58 min ago:
                  > If gpt4 was basically free money at this point it's real
                  weird that their first instinct was to cut it off after gpt5
                  
                  People find the UX of choosing a model very confusing, the
                  idea with 5 is that it would route things appropriately and
                  so eliminate this confusion. That was the motivation for
                  removing 4. But people were upset enough that they decided to
                  bring it back for a while, at least.
       
                  dcre wrote 3 hours 33 min ago:
                  I think the idea here is that gpt-5-mini is the cheap gpt-4
                  quality model they want to serve and make money on.
       
              true_religion wrote 4 hours 28 min ago:
              It's model as a company because people are using the VC
              mentality, and also explaining competition.
              
              Model as a product is the reality, but each model competes with
              previous models and is only successful if it's both more cost
              effective, and also more effective in general at its tasks. By
              the time you get to model Z, you'll never use model A for any
              task as the model lineage cannibalizes sales of itself.
       
              pegasus wrote 4 hours 55 min ago:
              If you're going to use shoes as the metaphor, a model would be
              more like a shoe factory. A shoe would be a LLM answer, i.e.
              inference.  In which case it totally makes sense to consider each
              factory as an autonomous economic unit, like a company.
       
              renjimen wrote 4 hours 57 min ago:
              But new models to date have cost more than the previous ones to
              create, often by an order of magnitude, so the shoe metaphor
              falls apart.
              
              A better metaphor would be oil and gas production, where existing
              oil and gas fields are either already finished (i.e. model is no
              longer SOTA -- no longer making a return on investment) or
              currently producing (SOTA inference -- making a return on
              investment). The key similarity with AI is new oil and gas fields
              are increasingly expensive to bring online because they are
              harder to make economical than the first ones we stumbled across
              bubbling up in the desert, and that's even with technological
              innovation. That is to say, the low hanging fruit is long gone.
       
                runako wrote 2 hours 34 min ago:
                > new models to date have cost more than the previous ones to
                create
                
                This largely was the case in software in the '80s-'10s (when
                versions largely disappeared) and still is the case in
                hardware. iPhone 17 will certainly cost far more to develop
                than did iPhone 10 or 5. iPhone 5 cost far more than 3G, etc.
       
                  Romario77 wrote 1 hour 53 min ago:
                  I don't think it's the case if you take inflation into
                  account.
                  
                  You could see here: [1] new ones are generally cheaper if
                  adjusted for inflation. This is a sale price, but assuming
                  that margins stay the same it should reflect the
                  manufacturing price. And from what I remember about apple
                  earnings their margins increased over time, so it means the
                  new phones are even cheaper. Which kind of makes sense.
                  
   URI            [1]: https://www.reddit.com/r/dataisbeautiful/comments/16...
       
                    runako wrote 1 hour 21 min ago:
                    I should have addressed this. This thread is about the
                    capital costs of getting to the first sale, so that's model
                    training for an LLM vs all the R&D in an iPhone.
                    
                    Recent iPhones use Apple's own custom silicon for a number
                    of components, and are generally vastly more complex. The
                    estimates I have seen for iPhone 1 development range from
                    $150 million to $2.5 billion. Even adjusting for inflation,
                    a current iPhone generation costs more than the older
                    versions.
                    
                    And it absolutely makes sense for Apple to spend more in
                    total to develop successive generations, because they have
                    less overall product risk and larger scale to recoup.
       
                meshugaas wrote 3 hours 53 min ago:
                exactly: itâs like making shoes if youâre really bad at
                making shoes :)
       
            pera wrote 5 hours 34 min ago:
            Copy laundering as a service is only profitable when you discount
            future settlements:
            
   URI      [1]: https://www.reuters.com/legal/government/anthropics-surpri...
       
            827a wrote 5 hours 39 min ago:
            OpenAI and Anthropic have very different customer bases and usage
            profiles. I'd estimate a significantly higher percentage of
            Anthropic's tokens are paid by the customer than OpenAI's. The
            ChatGPT free tier is magnitudes more popular than Claude's free
            tier, and Anthropic in all likelihood does a higher percentage of
            API business versus consumer business than OpenAI does.
            
            In other words, its possible this story is correct and true for
            Anthropic, but not true for OpenAI.
       
              dcre wrote 4 hours 55 min ago:
              Good point, very possible that Altman is excluding free tier as a
              marketing cost even if it loses more than they make on paid
              customers. On the other hand they may be able to cut free tier
              costs a lot by having the model router send queries to gpt-5-mini
              where before they were going to 4o.
       
                jacurtis wrote 54 min ago:
                This is very true. ChatGPT has a very generous free tier. I
                used to pay for it, but realized I was never really hitting the
                limits of what is needed to pay for it.
                
                However, at the same time, I was using Claude much less, really
                preferring the answers from it most of the time, and constantly
                being hit with limits. So guess what I did. I cancelled my
                OpenAI subscription and moved to Anthropic. Not only do i get
                Claude Code, which OpenAI really has no serious competitor for.
                
                I still use both models but never run into problems with
                OpenAI, so i see no reason to pay for it.
       
                DenisM wrote 4 hours 37 min ago:
                Free tier provides a lot of training material. Every time you
                correct ChatGPT on its mistakes youâre giving them knowledge
                thatâs not in any book or website.
                
                Thats a moat, albeit one that is slow to build.
       
                  dcre wrote 3 hours 32 min ago:
                  That's interesting, though you have to imagine the data set
                  is very low quality on average and distilling high quality
                  training pairs out of it is very costly.
       
            whatshisface wrote 5 hours 42 min ago:
            I don't see why the declining marginal returns can't be continuous.
       
            selimthegrim wrote 5 hours 45 min ago:
            This sounds like fabs.
       
        alienbaby wrote 6 hours 42 min ago:
        Hasn't Sam Altman already said they are profitable on inference, minus
        training costs?
       
        mrcwinn wrote 6 hours 46 min ago:
        As the author seems to admit, an outsider is going to lack so much
        information (costs, loss leaders, etc), one has to assume any modeling
        is so inaccurate that it's not worth anything.
        
        So the question remains unanswered, at least for us. For those putting
        money in, you can be absolutely certain they have a model with
        sufficient data to answer the question. Since money did go in, even if
        it's venture, the answer is probably "yes in the immediate, but no over
        time."
       
        qrios wrote 6 hours 46 min ago:
        For sure an interesting calculation. Only one remark from someone with
        GPU metal experience:
        
        > But compute becomes the bottleneck in certain scenarios. With long
        context sequences, attention computation scales quadratically with
        sequence length.
        
        Even if the statement about quadratically scales is right, the
        bottleneck we are talking about is somewhere north by factor 1000. If
        10k cores do only simple matrix operations each needs to have new data
        (up to 64k) available every 500 cycles (let's say). Getting these
        amount of data (without _any_ collision) means something like
        100+GByte/s per core. Even 2+TByte/s on HBM means the bottleneck is the
        memory transfer rate, by something like 500 times. With collision, we
        talk about an additional factor like 5000 (last time I've done some
        tests with a 4090).
       
          Onavo wrote 6 hours 22 min ago:
          What do you mean by collision?
       
            qrios wrote 5 hours 47 min ago:
            If multiple cores tries to get the same memory addresses, the MMU
            feeds only one core, the second one have to whait. Depends on the
            type of RAM, this will cost a lot of cycles.
            
            GPU MMUs can handle multiple line in parallel. But not 10k cores at
            the same time. The HBM is not able to transfer 3.5TByte sequencial.
       
              whatshisface wrote 5 hours 43 min ago:
              Why is that? It seems like multiple cores requesting the same
              address would be easier for the MMU to fetch for, not harder.
       
                recursivecaveat wrote 20 min ago:
                Not necessarily the exact same address (you can fix that in a
                program anyways with a broadcast tree), but same memory bank.
                Imagine 1000 trains leaving a small town at the same time,
                instead of 1000 trains leaving 1000 different towns
                simultaneously. At some point there are not enough
                transportation resources to get stuff out of a particular area
                at the parallelism desired.
       
                reliabilityguy wrote 5 hours 19 min ago:
                Itâs not that the fetching is the problem, but serving the
                data to many cores at the same time from a single source.
       
                  supersour wrote 2 hours 17 min ago:
                  I'm not familiar with GPU architecture, is there not a shared
                  L2/L3 data cache from which this data would be shared?
       
                qrios wrote 5 hours 37 min ago:
                This is not my domain, but I assume the MMUs acting like a
                switch and something like multicast is not available here.
                Iâve tried to implement such on a FPGA and it was extremely
                cost intensiv.
       
            agf wrote 6 hours 1 min ago:
            I believe it's that the bus can only serve one chip at a time, so
            it has to actually be faster since sometimes one chip's data will
            have to wait for the data of another chip to finish first.
       
        freediver wrote 6 hours 57 min ago:
        Message to Martin if you are reading this - a blog without an RSS feed
        is not a blog. Please add one :)
       
        bjornsing wrote 7 hours 7 min ago:
        Can you really rent a cluster of 75 H100s for 75*2 USD per hour?
        Individual H100s, yes. But with sufficient interconnect to run these
        huge models?
       
        leecmjohnny wrote 7 hours 15 min ago:
        Yes, the API business is cross-subsidizing the consumer business.
        
        Back in March, I did the same analysis with greater sensitivities, and
        arrived at similar gross margins: >70%.
        
   URI  [1]: https://johnnyclee.com/i/are-frontier-labs-making-80percent-gr...
       
        ankit219 wrote 7 hours 23 min ago:
        This seems very very far off. From the latest reports, anthropic has a
        gross margin of 60%. It came out in their latest fundraising story.
        From that one The Information report, it estimated OpenAI's GM to be
        50% including free users. These are gross margins so any amortization
        or model training cost would likely come after this.
        
        Then, today almost every lab uses methods like speculative decoding and
        caching which reduce the cost and speed up things significantly.
        
        The input numbers are far off. The assumption is 37B of active
        parameters. Sonnet 4 is supposedly a 100B-200B param model. Opus is
        about 2T params. Both of them (even if we assume MoE) wont have exactly
        these number of output params. Then there is a cost to hosting and
        activating params at inference time. (the article kind of assumes it
        would be the same constant 37B params).
       
          thegeomaster wrote 6 hours 38 min ago:
          Are you saying that you think Sonnet 4 has 100B-200B _active_ params?
          And that Opus has 2T active? What data are you basing these
          outlandish assumptions on?
       
            Der_Einzige wrote 4 hours 46 min ago:
            Not everyone uses MoE architectures. It's not outlandish at all...
       
            ankit219 wrote 6 hours 6 min ago:
            Oh nothing official. There are people who estimate the sizes based
            on tok/s, cost, benchmarks etc. The one that most go on is [1] .
            This guy estimated Claude 3 opus to be 2T param model (given the
            pricing + speed). Opus 4 is 1.2T param according to him (but then I
            dont understand why the price remained the same.). Sonnet is
            estimated by various people to be around 100B-200B params.
            
            [1]:
            
   URI      [1]: https://lifearchitect.substack.com/p/the-memo-special-edit...
   URI      [2]: https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0...
       
              NoahZuniga wrote 5 hours 54 min ago:
              If you're using the api cost of the model to estimate it's size,
              then you can't use this size estimate to estimate the inference
              cost.
       
          mutkach wrote 7 hours 4 min ago:
          Gross margins also don't tell the whole story, we don't know how much
          Azure and Amazon charge for the infrastructure and we have reasons to
          believe they are selling it at a massive discount (Microsoft
          definitely does that, as follows from their agreement with OpenAI).
          They get the model, OpenAI gets discounted infra.
       
            ankit219 wrote 6 hours 53 min ago:
            A discounted Azure H100 will still be more than $2 per hour. Same
            goes for AWS. Trainium chips are new and not as effective (not
            saying they are bad) but still cost in the same range.
            
            For inference, gross margins are exactly: (what companies charge
            per 1M tokens to the user) - (direct cost to produce that 1M tokens
            which is GPU costs).
       
              mutkach wrote 6 hours 41 min ago:
              I am implying that what OpenAI pays for GPU/hour is much less
              than $2, because of the discount. That's an assumption. It could
              be $1, $0.5, no?
              
              It could still be burning money for Microsoft/Amazon
       
        SamInTheShell wrote 7 hours 24 min ago:
        Those same H100's are probably also going to be responsible for the R&D
        of new model versions. Running a model is definitely cheaper than
        training them.
       
        mutkach wrote 7 hours 27 min ago:
        A full KV-cache is quite big compared to the weights of the model
        (depending on the context size), that should be a factor too (and
        basically you need to maintain a separate KV cache for each request, I
        think...). 
        Also the the token/s is not uniform across the request and it's getting
        slower with each subsequent generated token.
        
        On the other side, there's an insane booster of speculative decoding,
        that would give a semi-prefill rate for decoding, but the memory
        pressure is still a factor.
        
        I would be happy to be corrected regarding both factors.
       
        ath3nd wrote 7 hours 29 min ago:
        Yes they are, they are deeply deeply unprofitable and that's why they
        need endless investments to prop them up.
        
        That's why Microsoft is not doing the deal with OpenAI, that's why
        Claude was fiddling with token limits just a couple of weeks ago.
        
        It's a huge bubble, and the only winner at this moment is Nvidia.
       
          wahnfrieden wrote 4 hours 30 min ago:
          Citation? Investments aren't evidence of unprofitability in inference
       
        noodletheworld wrote 7 hours 34 min ago:
        Huh.
        
        I feel oddly skeptical about this article; I can't specifically argue
        the numbers, since I have no idea, but... there are some decent open
        source models; they're not state of the art, but if inference is this
        cheap then why aren't there multiple API providers offering models at
        dirt cheap prices?
        
        The only cheap-ass providers I've seen only run tiny models. Where's my
        cheap deepseek-R1?
        
        Surely if its this cheap, and we're talking massive margins according
        to this, I should be able to get a cheap / run my own 600B param model.
        
        Am I missing something?
        
        It seems that reality (ie. the absence of people actually doing things
        this cheap) is the biggest critic of this set of calculations.
       
          colinsane wrote 4 hours 10 min ago:
          > I should be able to get a cheap / run my own 600B param model.
          
          if the margins on hosted inference are 80%, then you need > 20%
          utilization of whatever you build for yourself for this to be less
          costly to you (on margin).
          
          i self-host open weight models (please: deepseek et al aren't open
          _source_) on whatever $300 GPU i bought a few years ago, but if it
          outputs 2 tokens/sec then i'm waiting 10 minutes for most results. if
          i want results in 10s instead of 10m, i'll be paying $30000 instead.
          if i'm prompting it 100 times during the day, then it's idle 99% of
          the time.
          
          coordinating a group buy for that $30000 GPU and sharing that across
          100 people probably makes more sense than either arrangement in the
          previous paragraph. for now, that's a big component of what model
          providers, uh, provide.
       
          johnsmith1840 wrote 4 hours 51 min ago:
          Another giant problem with this article is we have no idea the
          optimizations used on their end. There are some widly complex
          optimizations these large AI companies use.
          
          What I'm trying to say is that hosting your own model is in an
          entierly different leauge than the pros.
          
          If we account for error in article implies higher cost I would argue
          it would return back to profit directly because how advanced
          optimization of infer3nce has become.
          
          If actual model intelligence is not a moat (looking likely this is
          true) the real sauce of profitable AI companies is advanced
          optimizations across the entire stack.
          
          Openai is NEVER going to release their specialized kernels, routing
          algos, quanitizations or model comilation methods. These are all
          really hard and really specific.
       
          jedberg wrote 6 hours 46 min ago:
           [1] Deepseek R1 for free.
          
   URI    [1]: https://lambda.chat
       
            hdgvhicv wrote 1 hour 28 min ago:
            > I'm here to provide helpful, respectful, and appropriate content
            for all users. If you have any other requests or need assistance
            with a different type of story or topic, feel free to ask!
       
            mcpeepants wrote 5 hours 28 min ago:
            * distilled R1 for free
       
          paulddraper wrote 7 hours 3 min ago:
          I would not be surprised if the operating costs are modest
          
          But these companies also have very expensive R&D development and
          large upfront costs.
       
          dragonwriter wrote 7 hours 7 min ago:
          > but if inference is this cheap then why aren't there multiple API
          providers offering models at dirt cheap prices
          
          There are multiple API providers offering models at dirt cheap
          prices, enough so that there is at least one well-known API provider
          that is an aggreggator of other API providers that offers lots of
          models at $0.
          
          > The only cheap-ass providers I've seen only run tiny models.
          Where's my cheap deepseek-R1?
          
   URI    [1]: https://openrouter.ai/deepseek/deepseek-r1-0528:free
       
            idiotsecant wrote 5 hours 11 min ago:
            How is this possible? I imagine someone is finding some value in
            the prompts themselves but this cant possibly be paying for itself.
       
              tick_tock_tick wrote 3 hours 43 min ago:
              Inference is just that cheap plus they hope that you'll start
              using the ones they charge for as you become more used to using
              AI in your workflow.
       
            booi wrote 6 hours 51 min ago:
            you can also run deepseek for free on a modestly sized laptop
       
              svachalek wrote 6 hours 19 min ago:
              You're probably thinking of what ollama labels "deepseek" which
              is not in fact deepseek, but other models with some deepseek
              distilled into them.
       
              dragonwriter wrote 6 hours 43 min ago:
              At 4-bit quant, R1 takes 300+ gigs just for weights. You can
              certainly run smaller models into which R1 has been distilled on
              a modest laptop, but I don't see how you can run R1 itself on
              anything that wouldn't be considered extreme for a laptop in at
              least one dimension.
       
          martinald wrote 7 hours 18 min ago:
          There are, I screenshotted DeepInfra in the article, but there are a
          lot more
          
   URI    [1]: https://openrouter.ai/deepseek/deepseek-r1-0528
       
            unknown2374 wrote 3 hours 2 min ago:
            is that a quantized model or the full r1?
       
          jsnell wrote 7 hours 21 min ago:
          > why aren't there multiple API providers offering models at dirt
          cheap prices?
          
          There are. Basically every provider's R1 prices are cheaper than
          estimated by this article.
          
   URI    [1]: https://artificialanalysis.ai/models/deepseek-r1/providers
       
            ac29 wrote 6 hours 59 min ago:
            The cheapest provider in your link charges 460x more for input
            tokens than the article estimates.
       
              dragonwriter wrote 6 hours 33 min ago:
              > The cheapest provider in your link charges 460x more for input
              tokens than the article estimates.
              
              The article estinates $0.003 per million input tokens, the
              cheapest on the list is $0.46 per million. The ratio is 120Ã,
              not 460Ã.
              
              OTOH, all of the providers are far below the estimated $3.08 cost
              per million output tokens
       
                ruszki wrote 5 hours 42 min ago:
                There are 7 providers on that page which have higher output
                token price than $3.08. There is even 1 which has higher input
                token price than that. So that "all" is not true either.
       
          hirako2000 wrote 7 hours 22 min ago:
          Imo the article is totally off the mark since it assumes users on
          average do not go over th 1M tokens per day.
          
          Afaik openai doesn't enforce a daily quota even on the $20 plans
          unless the platform is under pressure.
          
          Since I often consume 20M token per day, one can assume many would
          use far more than the 1M tokens assumed in the article's
          calculations.
       
            skybrian wrote 5 hours 14 min ago:
            Meanwhile, I donât use ChatGPT at all on a median day. I use it
            in occasional bursts when researching something.
       
            empath75 wrote 7 hours 6 min ago:
            There's zero basis for assuming any of that.  The most likely
            situation is a power law curve where the vast majority of users
            don't use it much at all and the top 10% of users account for 90%
            of the usage.
            
            It is very likely that you are in the top 10% of users.
       
              hirako2000 wrote 6 hours 42 min ago:
              True. the article also has zero basis in its estimating the
              average usage from each tier's user base.
              
              I somewhat doubt my usage is so close to the edge of the curve
              since I don't even pay for any plan.  It could be that I'm very
              frugal with money and fat on consumption while most are more
              balanced, but 1M token per day in any case sounds slim for any
              user who pays for the service.
       
          GaggiX wrote 7 hours 23 min ago:
           [1] They are dirt cheap. Same model architecture for the comparison:
          $0.30/M $1.00/M. Or even $0.20-$0.80 from another provider.
          
   URI    [1]: https://openrouter.ai/deepseek/deepseek-chat-v3.1
       
          brokencode wrote 7 hours 30 min ago:
          I also have no idea on the numbers. But I do know that these same
          companies are pouring many billions of dollars into training models,
          paying very expensive staff, and building out infrastructure. These
          costs would need to be factored in to come up with the actual profit
          margins.
       
        layer8 wrote 7 hours 34 min ago:
        The only reason they wouldnât be losing money on inference is if more
        costly (more computationally intensive) inference wouldnât be able to
        give them an extra edge, which seems unlikely to me.
       
        kaelandt wrote 7 hours 42 min ago:
        The API prices of $3/$15 are not right for a lot of models. see at
        openrouter, the gpt-oss-120b ones [1] , it's more like $0.01/$0.3 (and
        that model actually needs h200/b200 to have good throughput).
        
   URI  [1]: https://openrouter.ai/openai/gpt-oss-120b
       
        teekert wrote 7 hours 45 min ago:
        Idk what is going on but I'm using it all day for free, no limits in
        sight yet... It's just for small things, but for sure I would have had
        to pay 6 months ago. I actually would if they prompted tbh. Although I
        still find that whole "You can't use the webUI with your API credits"
        annoying. Why not? Why make me run OpenWebUI or LibreChat?
        
        I guess my use is absolutely nothing compare to someone with a couple
        of agents running continuously.
       
        sc68cal wrote 7 hours 47 min ago:
        This whole article is built off using DeepSeek R1, which is a huge
        premise that I don't think is correct. DeepSeek is much more efficient
        and I don't think it's    a valid way to estimate what OpenAI and
        Anthropic's costs are. [1] Basically, DeepSeek is _very_ efficient at
        inference, and that was the whole reason why it shook the industry when
        it was released.
        
   URI  [1]: https://www.wheresyoured.at/deep-impact/
       
          dcre wrote 6 hours 38 min ago:
          What are we meant to take away from the 8000 word Zitron post?
          
          In any case, here is what Anthropic CEO Dario Amodei said about
          DeepSeek:
          
          "DeepSeek produced a model close to the performance of US models 7-10
          months older, for a good deal less cost (but not anywhere near the
          ratios people have suggested)"
          
          "DeepSeek-V3 is not a unique breakthrough or something that
          fundamentally changes the economics of LLMâs; itâs an expected
          point on an ongoing cost reduction curve. Whatâs different this
          time is that the company that was first to demonstrate the expected
          cost reductions was Chinese." [1] We certainly don't have to take his
          word for it, but the claim is that DeepSeek's models are not much
          more efficient to train or inference than closed models of comparable
          quality. Furthermore, both Amodei and Sam Altman have recently
          claimed that inference is profitable:
          
          Amodei: "If you consider each model to be a company, the model that
          was trained in 2023 was profitable. You paid $100 million, and then
          it made $200 million of revenue. There's some cost to inference with
          the model, but let's just assume, in this cartoonish cartoon example,
          that even if you add those two up, you're kind of in a good state.
          So, if every model was a company, the model, in this example, is
          actually profitable.
          
          What's going on is that at the same time as you're reaping the
          benefits from one company, you're founding another company that's
          much more expensive and requires much more upfront R&D investment.
          And so the way that it's going to shake out is this will keep going
          up until the numbers go very large and the models can't get larger,
          and then it'll be a large, very profitable business, or, at some
          point, the models will stop getting better, right? The march to AGI
          will be halted for some reason, and then perhaps it'll be some
          overhang. So, there'll be a one-time, 'Oh man, we spent a lot of
          money and we didn't get anything for it.' And then the business
          returns to whatever scale it was at." [2] Altman: "If we didnât pay
          for training, weâd be a very profitable company."
          
   URI    [1]: https://www.darioamodei.com/post/on-deepseek-and-export-cont...
   URI    [2]: https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...
   URI    [3]: https://www.theverge.com/command-line-newsletter/759897/sam-...
       
            overgard wrote 5 hours 45 min ago:
            In terms of sources, I would trust Zitron a lot more than Altman or
            Amodei. To be charitable, those CEOs are known for their hyperbole
            and for saying whatever is convenient in the moment, but they
            certainly aren't that careful about being precise or leaving out
            inconvenient details. Which is what a CEO should do, more or less,
            but, I wouldn't trust their word on most things.
       
              dcre wrote 5 hours 2 min ago:
              I agree we should not take CEOs at their word, we have to think
              about whether what they're saying is more likely to be true than
              false given other things we know. But to trust Zitron on anything
              is ridiculous. He is not a source at all: he knows very little,
              does zero new reporting, and frequently contradicts himself in
              his frenzy to believe the bubble is about to pop any time now. A
              simple example: claiming both that "AI is very little of big tech
              revenue" and "Big tech has no other way to show growth other than
              AI hype". Both are very nearly direct quotes.
       
            gmerc wrote 6 hours 16 min ago:
            Grok 3.5: 400M training run 
            DeepSeek R1: 5M training run
            Released around the same time, marginal performance difference.
       
              dcre wrote 5 hours 0 min ago:
              I suspect that says more about Grok than anything else.
       
          thatguysaguy wrote 6 hours 49 min ago:
          Why would you think that deepseek is more efficient than gpt-5/Claude
          4 though? There's been enough time to integrate the lessons from
          deepseek.
       
            overgard wrote 5 hours 48 min ago:
            Because to make GPT-5 or Claude better than previous models, you
            need to do more reasoning which burns a lot more tokens. So, your
            per-token costs may drop, but you may also need a lot more tokens.
       
              jstummbillig wrote 5 hours 4 min ago:
              GPT-5 can be configured extensively. Is there any point at which
              any configuration of GPT-5 that offers ~DeepSeek level
              performance is more expensive than DeepSeek per token?
       
          davidguetta wrote 7 hours 26 min ago:
          What a wrong take. Its not even MoE that was great in deepseek, its
          shared expert + grpo
       
          phillipcarter wrote 7 hours 29 min ago:
          Uhhh, I'm pretty sure DeepSeek shook the industry because of a 14x
          reduction in training cost, not inference cost.
          
          We also don't know the per-token cost for OpenAI and Anthropic
          models, but I would be highly surprised if it was significantly more
          expensive than open models anyone can use and run themselves. It's
          not like they're also not investing in inference research.
       
            gmd63 wrote 6 hours 44 min ago:
            DeepSeek was trained with distillation. Any accurate estimate of
            training costs should include the training costs of the model that
            it was distilling.
       
              ffsm8 wrote 6 hours 33 min ago:
              That makes the calculation nonsensical, because if you go
              there... you'd also have to include all energy used in producing
              the content the other model providers used. So now suddenly
              everyones devices on which they wrote comments on social media,
              pretty much all servers to have ever served a request to open
              AI/Google/anthropics bots etc pp
              
              Seriously, that claim was always completely disingenuous
       
                gmd63 wrote 4 hours 6 min ago:
                I don't think it's that nonsensical to realize that in order to
                have AI, you need generations of artists, journalists,
                scientists, and librarians to produce materials to learn from.
                
                And when you're using an actual AI model to "train" (copy),
                it's not even a shred of nonsense to realize the prior model is
                a core component of the training.
       
            baxtr wrote 6 hours 54 min ago:
            Because of the alleged reduction in training costs.
       
              basilgohar wrote 6 hours 45 min ago:
              All reports by companies are alleged until verified by other,
              more trustworthy sources. I don't think it's especially notable
              that it's alleged because it's DeepSeek vs. the alleged numbers
              from other companies.
       
            andai wrote 6 hours 56 min ago:
            Isn't training cost a function of inference cost? From what I
            gathered, they reduced both.
            
            I remember seeing lots of videos at the time explaining the
            details, but basically it came down to the kind of hardware-aware
            programming that used to be very common. (Although they took it to
            the next level by using undocumented behavior to their advantage.)
       
              booi wrote 6 hours 52 min ago:
              They're typically somewhat related but the difference between
              training and inference can vary greatly so, i guess the answer is
              no.
              
              they did reduce both though and mostly due to reduced precision
       
          CjHuber wrote 7 hours 33 min ago:
          The reason it shook the market at least was because of the claim that
          its training cost was 5 million.
       
            vitaflo wrote 4 hours 0 min ago:
            Also the fact that it cost 10% of what other models cost. Pretty
            much still does.
       
            hirako2000 wrote 7 hours 26 min ago:
            That' what the buzz focused on,  strange as we don't actually know
            what it cost them.  While inference optimization is a fact and is
            even more impactful since training costs benefit from economics of
            scale.
       
              CjHuber wrote 5 hours 10 min ago:
              I don't think that's strange at all, it's a much more palatable
              narrative for the mass who doesn't know what inference and
              training is and who think having conversations=training
       
          GaggiX wrote 7 hours 41 min ago:
          The "efficiency" meantioned in blog post you have linked is the price
          difference between Deepseek and o1, it doesn't mean that GPT-5 or
          other SOTA models are less efficient.
       
          boroboro4 wrote 7 hours 42 min ago:
          DeepSeek inference efficiency comes from two things: MoE and MLA
          attention. OpenAI was rumored to use MoE around GPT4 moment, I.e
          loooong time ago.
          
          Given Gemini efficiency with long context I would bet their attention
          is very efficient too.
          
          GPT OSS uses fp4, which DeepSeek doesnât use yet btw.
          
          So no, big labs arenât behind DeepSeek in efficiency. Not by much
          at least.
       
        softwaredoug wrote 7 hours 49 min ago:
        I wonder if there needs to be two different business models:
        
        1. Companies that train models and license them
        
        2. Companies that do inference on models
       
        caminanteblanco wrote 8 hours 1 min ago:
        Ok, one issue I have with this analysis is the breakdown between input
        and output tokens. I'm the kind of person who spend most of my chat
        asking questions, so I might only use 20ish input tokens per prompt,
        where Gemini is having to put out several hundred, which would seem to
        affect the economics quite a bit
       
          red2awn wrote 5 hours 1 min ago:
          It also didn't take into account a lot of the new models are
          reasoning models which spits out a lot of output tokens.
       
          pakitan wrote 6 hours 51 min ago:
          It may hurt them financially but they are fighting for market share
          and I'd argue short answers will drive users away. I prefer the long
          ones much more as they often include things I haven't directly asked
          about but are still helpful.
       
          bcrosby95 wrote 7 hours 44 min ago:
          Yeah, I've noticed Chatgpt5 is very chatty. I can ask a 1 sentence
          question and get back 3-4 paragraphs, most of which I ignore,
          depending upon the task.
       
            ozgung wrote 6 hours 52 min ago:
            Same. It acts like its output tokens are for free. My input output
            ratio is like 1 to 10 at least. Not counting "Thought" and it's
            internal generation for agentic tasks.
       
        smjburton wrote 8 hours 12 min ago:
        Good breakdown of the costs involved. Even if they're running at a
        loss, OpenAI and Anthropic receive considerable value from the free
        training data users are providing through their conversations. Looking
        at it another way, these companies are paying for the training data to
        make their models better for future profitability.
       
        ekelsen wrote 8 hours 13 min ago:
        The math on the input tokens is definitely wrong. It claims each
        instance (8 GPUs) can handle 1.44 million tokens/sec of input.    Let's
        check that out.
        
        1.44e6 tokens/sec * 37e9 bytes/token / 3.3e12 bytes/sec/GPU = ~16,000
        GPUs
        
        And that's assuming a more likely 1 byte per parameter.
        
        So the article is only off by a factor of at least 1,000. I didn't
        check any of the rest of the math, but that probably has some impact on
        their conclusions...
       
          thatguysaguy wrote 8 hours 0 min ago:
          37 billion bytes per token?
          
          Edit: Oh assuming this is an estimate based on the model weights
          moving fromm HBM to SRAM, that's not how transformers are applied to
          input tokens. You only have to do move the weights for every token
          during generation, not during "prefill". (And actually during
          generation you can use speculative decoding to do better than this
          roofline anyways).
       
            mutkach wrote 7 hours 35 min ago:
            There's also an estimation of how much a KV cache grows with each
            subsequent token. That would be roughly ~MBs/token. I think that
            would be the bottleneck
       
            GaggiX wrote 7 hours 45 min ago:
            > (And actually during generation you can use speculative decoding
            to do better than this roofline anyways).
            
            And more importantly batches, so taking the example from the blog
            post, it would be 32 tokens per each forward pass in the decoding
            phase.
       
          endtime wrote 8 hours 5 min ago:
          > 37e9 bytes/token
          
          This doesn't quite sound right...isn't a token just a few characters?
       
          GaggiX wrote 8 hours 6 min ago:
          Your calculations make no sense. Why are you loading the model for
          each token independently? You can process all the input tokens at the
          same time as long as they can fit in memory.
          
          You are doing the calculation as they were output tokens on a single
          batch, it would not make sense even in the decode phase.
       
            ekelsen wrote 5 hours 2 min ago:
            Then the right calculation is to use FLOPs not bandwidth like they
            did.
       
            ozgung wrote 7 hours 12 min ago:
            This. ChatGPT also agrees with you: "74 GB weight read is per pass,
            not per token." I was checking the math in this blog post with GPT
            to understand it better and it seems legit for the given
            assumptions.
       
          Lionga wrote 8 hours 6 min ago:
          Well he asked some AI to do the math for him probably
       
        moduspol wrote 8 hours 14 min ago:
        This kind of presumes you're just cranking out inference non-stop 24/7
        to get the estimated price, right? Or am I misreading this?
        
        In reality, presumably they have to support fast inference even during
        peak usage times, but then the hardware is still sitting around off of
        peak times. I guess they can power them off, but that's a significant
        difference from paying $2/hr for an all-in IaaS provider.
        
        I'm also not sure we should expect their costs to just be "in-line
        with, or cheaper than" what various hourly H100 providers charge. Those
        providers presumably don't have to run entire datacenters filled to the
        gills with these specialized GPUs. It may be a lot more expensive to do
        that than to run a handful of them spread among the same datacenter
        with your other workloads.
       
          empath75 wrote 7 hours 3 min ago:
          > In reality, presumably they have to support fast inference even
          during peak usage times, but then the hardware is still sitting
          around off of peak times. I guess they can power them off, but that's
          a significant difference from paying $2/hr for an all-in IaaS
          provider.
          
          They can repurpose those nodes for training when they aren't being
          used for inference.  Or if they're using public cloud nodes, just
          turn them off.
       
          martinald wrote 7 hours 35 min ago:
          Yes. But these are on demand prices, so you could just turn them off
          when loads are less.
          
          But there is no way that OpenAI should be more expensive than this.
          The main cost is the capex of the H100s, and if you are buying 100k
          at a time you should be getting a significant discount off list
          price.
       
          lolc wrote 7 hours 46 min ago:
          Of course it is impossible for us to know the true cost, but idle
          instances should not be accounted for at full price:
          
          1. Idle instances don't turn electricity to heat so that reduces
          their operating cost.
          
          2. Idle instances can be borrowed for training which means flexible
          training amortizes peak inference capacity.
       
          GaggiX wrote 7 hours 58 min ago:
          That's why they have the batch tier:
          
   URI    [1]: https://platform.openai.com/docs/guides/batch
       
        cjbarber wrote 8 hours 20 min ago:
        See also: [1] OpenAI projects 50% gross margins for 2025
        
        The other companies don't include free users in their GM calculations
        which makes it hard to compare
        
   URI  [1]: https://x.com/tanayj/status/1960116730786918616
       
        OtherShrezzing wrote 8 hours 26 min ago:
        This is a great article, but it doesn't appear to model H100 downtime
        in the $2/hr costs. It assumes that OpenAI and Anthropic can match
        demand for inference to their supply of H100s perfectly, 24/7, in all
        regions.  Maybe you could argue that the idle H100s are being used for
        model training - but that's different to the article's argument that
        inference is economically sustainable in isolation.
       
          manquer wrote 8 hours 2 min ago:
          Not really, that is why they sell Batch API at considerably lower
          costs than the normal API.
          
          There are also probably all kinds of enterprise deals that they are
          okay with high latency (> hours) that they do beyond the PAYG batch
          APIs
       
        EcommerceFlow wrote 8 hours 46 min ago:
        I wouldn't be surprised if their profit/query is at a negative for all
        major Ai companies, but guess what?
        
        They have a service which understands a users question/needs 100x
        better than a traditional Google search does.
        
        Once they tap into that for PPC/paid ads, their profit/query should
        jump into the green. In fact, there's a decent chance a lot of these
        models will go 100% free once that PPC pipeline is implemented and
        shown to be profitable.
       
          efficax wrote 8 hours 17 min ago:
          > Once they tap into that for PPC/paid ads,
          
          If they start showing ads based on your prompts, and your history of
          "chats", it will erode the already shaky trust that users have in the
          bots. "Hallucinations" are one thing, but now you'll be asking
          yourself all the time: is that the best answer the llm can give me,
          or has it been trained to respond in ways favourable to its
          advertisers?
       
            xdennis wrote 2 hours 32 min ago:
            This is the exact same issues Facebook/YouTube/etc had with ads. In
            the end, ads won.
            
            Google used to segregate ads very clearly in the beginning. Now
            they look almost the same as results. I've switched to DDG since
            then, but have the majority of users? Nope. Even if they're not
            using ad blockers, most people seem to not mind the ads.
            
            With LLMs, the ads will be even more harder to tell apart from
            non-ads.
       
          techpineapple wrote 8 hours 25 min ago:
          > They have a service which understands a users question/needs 100x
          better than a traditional Google search does.
          
          Source?
       
            EcommerceFlow wrote 8 hours 18 min ago:
            A lifetime of using Google and 4 years of using LLMs.
       
              nativeit wrote 7 hours 32 min ago:
              â¦is a great counter-example of a âsourceâ.
              
              Itâs not like the product at-hand is relevant to data analysis
              or anything, amirite?
       
                Filligree wrote 7 hours 22 min ago:
                Sometimes a statement is just too obvious to need extensive
                sourcing, and this is one of those times.
                
                Gemini doesnât always find very much better results, but it
                usually does. It beggars belief to claim that it doesnât also
                understand the query much better than Rankbrain et al.
       
        rossdavidh wrote 8 hours 57 min ago:
        "Heavy readers - applications that consume massive amounts of context
        but generate minimal output - operate in an almost free tier for
        compute costs."
        
        Not saying there's not interesting analysis here, but this is assuming
        that they don't have to pay for access to the massive amounts of
        context.  Sources like stackoverflow and reddit that used to be free,
        are not going to be available to keep the model up to date.
        
        If this analysis is meant to say "they're not going to turn the lights
        out because of the costs of running", that may be so, but if they
        cannot afford to keep training new models every so often they will
        become less relevant over timte, and I don't know if they will get an
        ocean of VC money to do it all again (at higher cost than last time,
        because the sources want their cut now).
       
        gitremote wrote 9 hours 15 min ago:
        These numbers are off.
        
        > $20/month ChatGPT Pro user: Heavy daily usage but token-limited
        
        ChatGPT Pro is $200/month and Sam Altman already admitted that OpenAI
        is losing money from Pro subscriptions in January 2025:
        
        "insane thing: we are currently losing money on openai pro
        subscriptions!
        
        people use it much more than we expected."
        
        - Sam Altman, January 6, 2025
        
   URI  [1]: https://xcancel.com/sama/status/1876104315296968813
       
          skybrian wrote 3 hours 59 min ago:
          That's interesting but it doesn't mean they're losing money on the
          $20/month users. The Pro plan selects for heavy-usage enthusiasts.
       
          johnsmith1840 wrote 4 hours 49 min ago:
          Losing money on o1-pro. That makes sense and also why they axed that
          entire class of models.
          
          Every o1-pro and o1-preview inference was a normal inference times
          how many replica paths they made.
       
          davedx wrote 6 hours 56 min ago:
           [1] > The most likely situation is a power law curve where the vast
          majority of users don't use it much at all and the top 10% of users
          account for 90% of the usage.
          
          That'll be the Pro users. My wife uses her regular sub very lightly,
          most people will be like her...
          
   URI    [1]: https://news.ycombinator.com/item?id=45053741
       
          hirako2000 wrote 7 hours 17 min ago:
          Trusting the man about costs would be even more misplaced than
          trusting an oil company's CEO about the environment.
       
          chairhairair wrote 7 hours 40 min ago:
          Anyone paying attention should have zero trust in what Sam Altman
          says.
       
            simianwords wrote 6 hours 50 min ago:
            What do you think his strategy is? He has to make money at some
            point.
            
            I donât buy the logic that he will âscamâ his investors and
            run away at some point.
       
              achenet wrote 4 hours 1 min ago:
              He makes money by convincing people to buy OpenAI stock.
              
              If OpenAI goes down tomorrow, he will be just fine. His incentive
              is to sell the stock, not actually build and run a profitable
              business.
              
              Look at Adam Neumann as an example of how to lose billions of
              investor dollars and still walk out of the ensuing crash with
              over a billion. [1] His strategy is to sell OpenAI stock like it
              was Bitcoin in 2020, and if for some reason the market decides
              that maybe a company that loses large amounts of cash isn't
              actually a good investment... he'll be fine, he's had plenty of
              time to turn some of his stock into money :)
              
   URI        [1]: https://en.wikipedia.org/wiki/Adam_Neumann
       
                Tossrock wrote 2 hours 40 min ago:
                Altman doesn't have any stock. He's playing a game at a level
                people caught up on "capitalism bad" can't even conceptualize.
       
                simianwords wrote 2 hours 53 min ago:
                Why not build a profitable business like Zucc, Bill gates,
                Jensen, Sergey etc? These people are way richer much more
                powerful.
       
          martinald wrote 7 hours 43 min ago:
          Apologies, should be Plus. I'll update the article later.
       
          Topfi wrote 7 hours 57 min ago:
          That doesn't seem compatible with what he stated more recently:
          
          > We're profitable on inference. If we didn't pay for training, we'd
          be a very profitable company.
          
          Source: [1] His possible incentives and the fact OpenAI isn't a
          public company simply make it hard for us to gauge which of these
          statements is closer to the truth.
          
   URI    [1]: https://www.axios.com/2025/08/15/sam-altman-gpt5-launch-chat...
       
            metalliqaz wrote 7 hours 31 min ago:
            >  If we didn't pay for training
            
            it is comical that something like this was even uttered in the
            conversation.  It really shows how disconnected the tech sector is
            from the real world.
            
            Imagine Intel CEO saying "If we didn't have to pay for fabs, we'd
            be a very profitable company."    Even in passing.  He'd be
            ridiculed.
       
              Closi wrote 6 hours 38 min ago:
              I'm not entirely sure the analogy is fair - Amazon for example
              was 'ridiculed' for being hugely unprofitable for the first
              decade, but had underlying profitability if you removed capex.
              
              As a counterpoint, if OpenAI were actually profitable at this
              early stage that could be a bad financial decision - it might
              mean that they aren't investing enough in what is an incredibly
              fierce and capital-intensive market.
       
              hirako2000 wrote 7 hours 12 min ago:
              Also admitting it would make this business impossible if they had
              to respect copyright law, so the laws shall be adjusted so that
              it can be a business.
       
            CjHuber wrote 7 hours 31 min ago:
            Does anybody really think in this current time that what a CEO says
            has anything to do with reality and not just with hyping up ala
            elon recipe
       
              vkou wrote 7 hours 17 min ago:
              Specifically, a connected CEO in post-law America.
              
              This sort of thing used to be called fraud, but there's zero
              chance of criminal prosecution.
       
                CjHuber wrote 5 hours 18 min ago:
                Criminal persecution?
                This scheme has been perfected, like what do you want to
                persecute. Can you say with certainty that he means it's
                profitable overall? What if he means it's profitable right now
                today it is profitable, but not yesterday or in the last week.
                or what if he meant if you take the mean user its profitable?
                so much room for interpretation, that's why there is no risk
                for them
       
            WesolyKubeczek wrote 7 hours 36 min ago:
            This can be true if you assume that there exists a high number of
            $20 subscribers who don't use the product that much, but $200
            subscribers squeeze every last bit and then some more. The balance
            could be still positive, but if you look at the power users alone,
            they might cost more than they pay.
       
              bee_rider wrote 7 hours 28 min ago:
              They might even have decided âhey, these power users are
              willing to try and tells us what LLMs are useful for, and are
              even willing to pay us for the opportunity!â
       
            re-thc wrote 7 hours 53 min ago:
            > That doesn't seem compatible with what he stated more recently:
            
            Profitable on inference doesn't mean they aren't losing money on
            pro plans. What's not compatible?
            
            The API requests are likely making more money.
       
              gitremote wrote 7 hours 21 min ago:
              Yes, API pricing is usage based, but ChatGPT Pro pricing is a
              flat rate for a time period.
              
              The question is then whether SaaS companies paying for GPT API
              pricing are profitable if they charge their users a flat rate for
              a time period. If their users trigger inference too much, they
              would also lose money.
       
          AstroBen wrote 8 hours 29 min ago:
          I just straight up don't trust him
          
          Saying that is the equivalent of him saying "our product is really
          valuable! use it!"
       
            GoatInGrey wrote 3 hours 43 min ago:
            That is my interpretation, that it's a marketing attempt.  A form
            of "The value of our product is so good that it's losing us money. 
            It's practically the Costco hotdog combo!".
       
            benzible wrote 7 hours 1 min ago:
            There's the usual issue of a CEO "talking their book" but there's
            also the fact that Sam has a rich, documented history of lying.
            That was the central issue of his firing. "Empire of AI" has a
            detailed account of this. He would outright tell board member A
            that "board member B said X", based on his knowledge of the social
            dynamics of the board he assumed that A and B would never talk. But
            they eventually figured it out, it unraveled, and they confronted
            him in a group. Specifically, when they confronted him about
            telling Ilya Sutskever that Tasha McCauley said Helen Toner should
            step off the board, McCauley said "I never said that" and Altman
            was at a loss for words for a minute before finally mumbling "Well,
            I thought you could have said that. I don't know."
       
          mvieira38 wrote 8 hours 31 min ago:
          Doesn't he have an incentive to make it look like that, though? The
          way he phrased it, that they are losing money because people use it
          so much, makes it seem like Pro subscribers are some super
          power-users. As long as inference has a nonnegative, nonzero cost,
          then this case will lose money, so Sam isn't admitting that the
          business model is flawed or anything
       
        uludag wrote 9 hours 15 min ago:
        Another comment mentioned the cost associated with the model. Setting
        that aside, wouldn't we also need to include all of the systems around
        the inference? I can imagine significant infrastructure and engineering
        needs around all of these various services, along with the work needed
        to keep these systems up and running.
        
        Or are these costs just insignificant compared to inference?
       
          Aurornis wrote 9 hours 13 min ago:
          All incremental costs should be included. If adding each 100,000 new
          customers requires 1 extra engineer you would include that. We
          donât know those exact numbers though and the ratio is probably
          much higher than my example numbers. Inference costs likely dominate.
       
        gpjanik wrote 9 hours 17 min ago:
        "Here's the key insight: each forward pass processes ALL tokens in ALL
        sequences simultaneously."
        
        This sounds incorrect, you only process all tokens once, and later
        incrementally. It's an auto-regressive model after all.
       
          Voloskaya wrote 8 hours 31 min ago:
          Not during prefill, i.e. the very first token generated in a new
          conversation.
          During this forward pass, all tokens in the context are all processed
          at the same time, and then attention's KV are cached, you still
          generate a single token, but you need to compute attention from all
          tokens to all tokens.
          
          From that point on every subsequent tokens is processed sequentially
          in autoregressive way, but because we have the KV cache, this becomes
          O(N) (1 token query to all tokens) and not O(N^2)
       
            gpjanik wrote 7 hours 37 min ago:
            I somehow missed the "decode phase" paragraph and hence was
            confused - it's essentially that separation I meant, you're
            obviously correct.
       
        JCM9 wrote 9 hours 19 min ago:
        With the heat turning up on AI companies to explain how they will land
        on a viable business model some of this is starting to look like
        WeWorkâs âCommunity Adjusted EBITAâ arguments of âhey if you
        ignore where weâre losing money, weâre not losing money!â that
        they made right before imploding.
        
        I think most folks understand that pure inference in a vacuum is likely
        cash flow positive, but thatâs not why folks are asking increasingly
        tough questions on the financial health of these enterprises.
       
          Aurornis wrote 9 hours 14 min ago:
          A fast growing venture backed startup doing frontier R&D should be
          losing money overall.
          
          If they werenât losing money, they wouldnât be spending enough on
          R&D. This isnât some gotcha. Itâs what the investors want right
          now.
       
            JCM9 wrote 8 hours 31 min ago:
            Donât disagree itâs what investors want. Point is just that
            weâre approaching a point from an economics standpoint where the
            credibility of the âitâs ok because weâre investing in R&Dâ
            argument is rapidly wearing thin.
            
            WeWorkâs investors didnât want them to focus on business
            fundamentals either and kept pumping money elsewhere. That didnât
            turn out so well.
       
        pityJuke wrote 9 hours 33 min ago:
        From [1] , Sam Altman said:
        
        > âIf we didnât pay for training, weâd be a very profitable
        company.â
        
   URI  [1]: https://www.theverge.com/command-line-newsletter/759897/sam-al...
       
          kif wrote 3 hours 12 min ago:
          He also said he got scared when trying out GPT 5, thinking âWhat
          have we done?â.
          
          Heâs in the habit of lying, so it would be remiss to take his word
          for it.
       
          cowl wrote 7 hours 9 min ago:
          that's true of any company. if they didn't pay for building the
          product, they would be very profitable.
       
          paulhodge wrote 7 hours 44 min ago:
          Yeah Dario has said similar things in interviews. The way he
          explained it, if you look at each specific model (such as Sonnet 3.5)
          as its own separate company, then each one of them is profitable in
          the end. They all eventually recoup the expense of training, thanks
          to good profit margins on usage once they are deployed.
       
          miltonlost wrote 8 hours 10 min ago:
          Or if they had to pay copyright costs. So much pirated data being
          repackaged and sold.
       
            ethagnawl wrote 7 hours 29 min ago:
            It's wild and, while they're all guilty, Gemini is a particularly
            egregious offender. What really surprises me is that they don't
            even consider it a bug if you can predictably get it to generate
            copyrighted content. These types of exploits are out of scope of
            their bug bounty program and they suggest the end user file a
            ticket whenever they encounter such issues (i.e. they're just
            saying YOLO until there's case law).
       
            otterley wrote 7 hours 41 min ago:
            Itâs not being repackaged. That question has already been settled
            by at least two courts.
       
          Aurornis wrote 8 hours 42 min ago:
          Exactly. All of the claims that OpenAI is losing money on every
          request are wrong. OpenAI hasnât even unlocked all of their
          possible revenue opportunities from the free tier such as ads (like
          Google search), affiliate links, and other services.
          
          Thereâs also a lot of comments in this thread who want LLM
          companies to fail for different reasons, so theyâre projecting that
          wish on to imagined unit economics.
          
          Iâm having flashbacks to all of the conversations about Uber and
          claims that it was going to collapse as soon as the investment money
          ran out. Then Uber gradually transitioned to profitability and the
          critics moved to using the same shtick on AI companies.
       
            overgard wrote 5 hours 39 min ago:
            If they're profitable, why on earth are they seeking crazy amounts
            of investment month after month? It seems like they'll raise 10
            billion one month, and then immediately turn around and raise
            another 10 billion a month or two after that. If it's for training,
            it seems like a waste of money since GPT-5 doesn't seem like it's
            that much of an improvement.
       
            eikenberry wrote 6 hours 46 min ago:
            So inference is cheap but training is expensive and getting more
            expensive. It seems like if they can't get training expenses down,
            cheap inference won't matter.
       
              NoahZuniga wrote 5 hours 53 min ago:
              No. Training itself isn't that expensive compared to inference.
              The real expense is salary for talent.
       
            tsunamifury wrote 7 hours 53 min ago:
            As someone who has been taking the largest part of Google and
            facebooks ad wallet share away,
            Let me tell you something.
            
            Advertising is now a very very locked in market and will take over
            a decade to shift even a significant minority it into
            OpenAIs hands.    This is not likely the first or even second
            monetization strategy imo.
            
            But Iâm happy to be wrong.
       
              otterley wrote 7 hours 48 min ago:
              > As someone who has been taking the largest part of Google and
              facebooks ad wallet share away
              
              Can you elaborate? Youâve sparked my curiosity.
       
                tsunamifury wrote 6 hours 28 min ago:
                There are two companies gaining significant wallet share:
                Amazon and TikTok. Of those only one is taking a significant
                early share of both Google and Facebook.
       
                  otterley wrote 6 hours 3 min ago:
                  OK, but you are a person, not a company. "You" are not taking
                  the share away.
       
                    tsunamifury wrote 4 hours 10 min ago:
                    "I'm digging a trench"
                    
                    "No you're not, WE are digging a trench!"
                    
                    Yes fine, but "I am as well".
                    
                    Sheesh.  Also I, personally, do and lead the work of taking
                    the wallet share.  So I will stick with "I" and would
                    accept any of my team saying the same.
       
                      otterley wrote 3 hours 48 min ago:
                      Well, at least your attitude has made it obvious who you
                      work for now. ;)
       
            pessimizer wrote 7 hours 54 min ago:
            No, the argument is that Uber was going to lose money hand over
            fist until all of the alternatives were starved to death, then
            raise prices infinitely.
       
              StableAlkyne wrote 6 hours 56 min ago:
              Taxis sucked. Any disruptor who was willing to just... Tell
              people what the cost would be ahead of time without scamming
              them, and show up when they said they would, was going to win.
              
              Uber (and Lyft) didn't starve the alternatives: they were already
              severely malnourished. Also, they found a loophole to get around
              the medallion system in several cities, which taxi owners used in
              an incredibly anticompetitive fashion to prevent new competition.
              
              Just because Uber used a shitty business practice to deliver the
              killing blow doesn't mean their competition were undeserving of
              the loss, or that the traditional taxis weren't without a lot of
              shady practices.
       
                oblio wrote 3 hours 19 min ago:
                Spoiler alert: in most of the world taxis are still there and
                at best Uber is just another app you can use to call them.
                
                And lifetime profits for Uber are still at best break even
                which means that unless you timed the market perfectly, Uber
                probably lost you money as a shareholder.
                
                Uber is just distorted in valuation by its presence in big US
                metro areas (which basically have no realistic transportation
                alternative).
       
            techpineapple wrote 8 hours 27 min ago:
            Because Sam Altman said so?
            
            Sam Altman also said this:
            
   URI      [1]: https://xcancel.com/sama/status/1876104315296968813
       
              JimDabell wrote 8 hours 21 min ago:
              There is a six month gap between those statements. Inference
              costs have been plummeting, plans have had tweaked quotas, and
              usage patterns can change.
       
          simlevesque wrote 8 hours 48 min ago:
          If we ignore the fact that if training was free, everyone would do it
          and OpenAI wouldn't be profitable.
       
          tootie wrote 8 hours 49 min ago:
          Yeah I've seen the same sentiment from a few others as well.
          Inference likely is profitable. Training is incredibly expensive and
          will sometimes not yield positive results.
       
        jonathan-adly wrote 9 hours 35 min ago:
        Basically- the same math as modern automated manufacturing. Super
        expensive and complex build-out - then a money printer once running and
        optimized.
        
        I know there is lots of bearish sentiments here. Lots of people
        correctly point out that this is not the same math as FAANG products -
        then they make the jump that it must be bad.
        
        But - my guess is these companies end up with margins better than Tesla
        (modern manufacturer), but less than 80%-90% of "pure" software.
        Somewhere in the middle, which is still pretty good.
        
        Also - once the Nvidia monopoly gets broken, the initial build out
        becomes a lot cheaper as well.
       
          Workaccount2 wrote 9 hours 3 min ago:
          The difference is the money printer right now only prints for ~6
          months before it needs to be replaced with an even more expensive
          printer.
       
            churchill wrote 8 hours 24 min ago:
            And if you ever stop/step off the treadmill and jack up prices to
            reach profitability, a new upstart without your sunk costs will
            immediately create a 99%   solution and start competing with you.
            Or more like hundreds of competitors. Like we've seen with Karpathy
            & Murati, any engineer with pedigree working on the frontline
            models can easily raise billions to compete with them.
            
            Expect the trend to pick up as the pool of engineers who can create
            usable LLMs from scratch increases through knowledge/talent
            diffusion.
       
              Workaccount2 wrote 7 hours 46 min ago:
              The LLM scene is an insane economic bloodbath right now. The tech
              aside, the financial moves here are historical. It's the ultimate
              wet dream for consumers - many competitors, face-ripping cap-ex,
              any missteps being quickly punished, and a total inability to
              hold back anything from the market. Companies are spending
              hundreds of billions to put the best tech in your hands as fast
              and as cheaply as possible.
              
              If OpenAI didn't come along with ChatGPT, we would probably just
              now be getting Google Bard 1.0 with an ability level of GPT-3.5
              and censorship so heavy it would make it useless for anything
              beyond "Tell me who the first president was".
       
          hugedickfounder wrote 9 hours 24 min ago:
          the difference is you can train on outputs deepseek style, there are
          not gates in this field
          profit margins will go to 0
       
        osti wrote 9 hours 39 min ago:
        This kinda tracks with the latest estimate of power usage of llm
        inference published by google [1] . If inference isnt that power hungry
        like people thought, they must be able to make good money from those
        subscriptions.
        
   URI  [1]: https://news.ycombinator.com/item?id=44972808
       
          jeffbee wrote 7 hours 50 min ago:
          > power hungry like people thought
          
          The only people who thought this were non-practitioners.
       
        jsnell wrote 9 hours 42 min ago:
        I don't believe the asymmetry between prefill and decode is that large.
        If it were, it would make no sense for most of the providers to have
        separate pricing for prefill with cache hits vs. without.
        
        Given the analysis is based on R1, Deepseek's actual in-production
        numbers seem highly relevant: [1] (But yes, they claim 80% margins on
        the compute in that article.)
        
        > When established players emphasize massive costs and technical
        complexity, it discourages competition and investment in alternatives
        
        But it's not the established players emphasizing the costs! They're
        typically saying that inference is profitable. Instead the false claims
        about high costs and unprofitability are part of the anti-AI crowd's
        standard talking points.
        
   URI  [1]: https://github.com/deepseek-ai/open-infra-index/blob/main/2025...
       
          martinald wrote 9 hours 30 min ago:
          Yes. I was really surprised at this myself (author here). If you have
          some better numbers I'm all ears. Even on my lowly 9070XT I get 20x
          the tok/s input vs output, and I'm not doing batching or anything
          locally.
          
          I think the cache hit vs miss stuff makes sense at >100k tokens where
          you start getting compute bound.
       
            Filligree wrote 7 hours 29 min ago:
            Maybe because you arenât doing batching? It sounds like youâre
            assuming that would benefit prefill more than decode, but I believe
            itâs the other way around.
       
            jsnell wrote 8 hours 50 min ago:
            I linked to the writeup by Deepseek with their actual numbers from
            production, and you want "better numbers" than that?!
            
            > Each H800 node delivers an average throughput of ~73.7k tokens/s
            input (including cache hits) during prefilling or ~14.8k tokens/s
            output during decoding.
            
            That's a 5x difference, not 1000x. It also lines up with their
            pricing, as one would expect.
            
            (The decode throughputs they give are roughly equal to yours, but
            you're claiming a prefill performance 200x times higher than they
            can achieve.)
       
              smarterclayton wrote 7 hours 40 min ago:
              A good rule of thumb is that a prefill token is about 1/6th the
              compute cost of decode token, and that you can get about 15k
              prefill tokens a second on Llama3 8B on a single H100.    Bigger
              models will require more compute per token, and quantization like
              FP8 or FP4 will require less.
       
        player1234 wrote 9 hours 42 min ago:
        Input inference i.e. reading is cheaper, output i.e. doing the
        generating is not, for something called generative AI sounds pretty
        fucking not profitable.
        
        The cheap usecase from this article is not a trillion dollar industry
        and absolutely not the usecase hyped as the future by AI companies,
        that is coming for your job.
       
        gjsman-1000 wrote 9 hours 50 min ago:
        If inference is that cheap, why is not even one company profitable yet?
       
          baobabKoodaa wrote 8 hours 24 min ago:
          "Why would you reinvest profits back into a business that is
          extremely profitable, when you have the chance of pulling your money
          out?"
       
            jppope wrote 8 hours 9 min ago:
            You are making a joke but reasonably speaking there are a ton of
            software companies where they kept reinvesting where they should
            have taken out profit, especially when they are peaking.
       
          emilecantin wrote 9 hours 15 min ago:
          Training.
       
          hiatus wrote 9 hours 19 min ago:
          A factory can make cheap goods and not reach profitability for some
          time due to the large capital outlay in spinning up a factory and
          tooling. It is likely there are large capital costs associated with
          model training that are recouped over the lifetime of the model.
       
          ascorbic wrote 9 hours 29 min ago:
          Because they're spending it all on training the next model.
       
            topaz0 wrote 8 hours 8 min ago:
            That's an argument for why openai and anthropic shouldn't be
            profitable, but this point is about how also they don't have
            customers using the models to generate a profit either. Things like
            cursor, for example. ETA: also note the recent MIT study that found
            that 95% of LLM pilots at for-profit companies were not producing
            returns.
       
              ascorbic wrote 5 hours 54 min ago:
              This article is about the model providers' costs, not API users'.
              Cursor etc have to pay the marked-up inference costs, so it's not
              surprising they can't make a profit.
       
                topaz0 wrote 1 hour 18 min ago:
                Yes, and the comment you first replied to was about the
                state/viability of the industry as a whole. If users can't make
                money from this "transformative technology", even when the
                provider is in the stage of burning money for the sake of
                growth, that sort of tells against it turning into a trillion
                dollar industry or whatever the hype claims.
       
                  ascorbic wrote 44 min ago:
                  The point is that the providers aren't burning money by
                  subsidising inference costs. On the contrary, if this article
                  is to believed they're charging healthy margins on it.
                  
                  So there are two answers: for the model providers, it's
                  because they're spending it all on training the next model.
                  For the API users, it's because they're spending it all on
                  expensive API usage.
       
        JCM9 wrote 9 hours 50 min ago:
        These articles (of which there are many) all make the same basic
        accounting mistakes. You have to include all the costs associated with
        the model, not just inference compute.
        
        This article is like saying an apartment complex isnât âlosing
        moneyâ because the monthly rents cover operating costs but ignoring
        the cost of the building. Most real estate developments go bust because
        the developers canât pay the mortgage payment, not because theyâre
        negative on operating costs.
        
        If the cash flow was truly healthy these companies wouldnât need to
        raise money. If you have healthy positive cash flow you have much
        better mechanisms available to fund capital investment other than
        selling shares at increasingly inflated valuations. Eg issue a bond
        against that healthy cash flow.
        
        Fact remains when all costs are considered these companies are losing
        money and so long as the lifespan of a model is limited itâs going to
        stay ugly. Using that apartment building analogy itâs like having to
        knock down and rebuild the building every 6 months to stay relevant,
        but saying all is well because the rents cover the cost of garbage
        collection and the water bill. Thatâs simply not a viable business
        model.
        
        Update Edit: A lot of commentary below re the R&D and training costs
        and if itâs fair to exclude that on inference costs or âunit
        economics.â Iâd simply say inference is just selling compute and
        that should be high margin, which the article concludes it is.    The
        issue behind the growing concerns about a giant AI bubble is if that
        margin is sufficient to cover the costs of everything else. Iâd also
        say that excluding the cost of the model from âunit economicsâ
        calculations doesnât make business/math/economics since itâs
        literally the thing being sold. Itâs not some bit of fungible
        equipment or long term capital expense when they become obsolete after
        a few months. Take away the model and youâre just selling compute so
        itâs really not a great metric to use to say these companies are OK.
       
          rprend wrote 6 hours 38 min ago:
          Itâs funny you mention apartments, because that is exactly the
          comparison i thought of, but with the opposite conclusion. If you buy
          an apartment with debt, but get positive cash flow from rent, you
          wouldnât call that unprofitable or a bad investment. It takes X
          years to recoup the initial debt, and as long as X is achievable
          thatâs a good deal.
          
          Hoping for something net profitable including fixed costs from day 1
          is a nice fantasy, but thatâs not how any business works or even
          how consumers think about debt. Restaurants get SBA financing.
          Homeowners are ânet losing moneyâ for 30 years if you include
          their debt, but they rightly understand that you need to pay a large
          fixed cost to get positive cash flow.
          
          R&D is conceptually very similar. Customer acquisition also behaves
          that way
       
            JCM9 wrote 5 hours 34 min ago:
            Running with your analogy having positive cash flow and buying a
            property to hold for the long term makes sense. Thats the classic
            mortgage scenario. But it takes time for that math to work out.
            Buying a new property every 6 months breaks that model. Thatâs
            like folks that keep buying a new car and rolling ânegative
            equityâ into a new deal. Itâs insanity financially but folks
            still do it.
       
          empath75 wrote 7 hours 0 min ago:
          > If the cash flow was truly healthy these companies wouldnât need
          to raise money.
          
          If this were true, the stock market would have no reason to exist.
       
          FarMcKon wrote 8 hours 24 min ago:
          "This article is like saying an apartment complex isnât âlosing
          moneyâ because the monthly rents cover operating costs but ignoring
          the cost of the building. Most real estate developments go bust
          because the developers canât pay the mortgage payment, not because
          theyâre negative on operating costs."
          
          Exactly the analogy I was going to make. :)
       
          benreesman wrote 8 hours 27 min ago:
          My observation is that Opus is chronically capacity constrained while
          being dramatically more expensive than any of the others.
          
          To me that more or less settles both "which one is best" and "is it
          subsidized".
          
          Can't be sure, but anything else defies economic gravity.
       
            hirako2000 wrote 6 hours 50 min ago:
            Or Opus is a great model so demand is high and the provider isn't
            scaling the platform. I agree something defies gravity.
            
            Also that's not accounting for free riders.
            
            I have probably consumed trillions of free tokens from openai infra
            since gpt 3 and never spent a penny.
            
            And now I'm doing the equivalent on Gemini since flash is free of
            charge and a better model than most free of charge models.
       
          furyofantares wrote 8 hours 29 min ago:
          I don't think it's an accounting error when the article title says
          "Are OpenAI and Anthropic Really Losing Money on Inference?"
          
          And it's a relevant question because people constantly say these
          companies are losing money on inference.
       
            JCM9 wrote 8 hours 25 min ago:
            I think the nuance here is what people consider the âcostâ of
            âinference.â Purely on compute costs and not accounting for the
            cost of the model (which is where the article focuses) itâs not
            bad.
       
          ninetyninenine wrote 8 hours 32 min ago:
          The model is like a house. It can be upgraded. And it can be sold.
          
          Think of the model as an investment.
       
            matwood wrote 8 hours 0 min ago:
            > Think of the model as an investment.
            
            Exactly, or a factory.
       
          rich_sasha wrote 8 hours 50 min ago:
          > Fact remains when all costs are considered these companies are
          losing money
          
          You would need to figure out what exactly they are losing money on.
          Making money on inference is like operating profit - revenue less
          marginal costs. So the article is trying to answer if this operating
          profit is positive or negative. Not whether they are profitable as a
          whole.
          
          If things like cost of maintaining data centres or electricity or
          bandwidth push them into the red, then yes, they are losing money on
          inference.
          
          If the things that make them lose money is new R&D then that's
          different. You could split them up into a profitable inference
          company and a loss making startup. Except the startup isn't purely
          financed by VC etc, but also by a profitable inference company.
       
            toddmorey wrote 8 hours 26 min ago:
            Yes that's right. The inference costs in isolation are interesting
            because that speaks to the unit economics of this business: R&D /
            model training aside, can the service itself be scaled to operate
            at a profit? Because that's the only hope of all the R&D eventually
            paying dividends.
            
            One thing that makes me suspect inference costs are coming down is
            how chatty the models have become lately, often appending
            encouragement to a checklist like "You can check off each item as
            you complete them!" Maybe I'm wrong, but I feel if inference was
            killing them, the responses would become more terse rather than
            more verbose.
       
          jsnell wrote 8 hours 55 min ago:
          For the top few providers, the training is getting amortized over
          absurd amount of inference. E.g. Google recently mentioned that they
          processed 980T tokens over all surfaces in June 2025.
          
          The leaked OpenAI financial projections for 2024 showed about equal
          amount of money spent on training and inference.
          
          Amortizing the training per-query really doesn't meaningfully change
          the unit economics.
          
          > Fact remains when all costs are considered these companies are
          losing money and so long as the lifespan of a model is limited itâs
          going to stay ugly. Using that apartment building analogy itâs like
          having to knock down and rebuild the building every 6 months to stay
          relevant. Thatâs simply not a viable business model.
          
          To the extent they're losing money, it's because they're giving free
          service with no monetizaton to a billion users. But since the unit
          costs are so low, monetizing those free users with ads will be very
          lucrative the moment they decide to do so.
       
            overgard wrote 5 hours 32 min ago:
            Assuming users accept those ads. Like, would they make it clear
            with a "sponsored section", or would they just try to worm it into
            the output? I could see a lot of potential ways that users reject
            the ad service, especially if it's seen to compromise the utility
            or correctness of the output.
       
          conradev wrote 9 hours 12 min ago:
          I found Darioâs explanation pretty compelling: [1] the short of it:
          if you do the accounting on a per-model basis, it looks much better
          
   URI    [1]: https://x.com/FinHubIQ/status/1960540489876410404
       
            CharlesW wrote 8 hours 15 min ago:
            That was worth a watch, thank you!
       
          lkjdsklf wrote 9 hours 13 min ago:
          Itâs fun to work backwards, but i was listening to a podcast where
          the journalists were talking about a dinner that Sam Altman had.
          
          This question came up and Sam said they were profitable if you
          exclude training and the COO corrected him
          
          So at least for OpenAI, the answer is ânoâ
          
          They did say it was close
          
          And thatâs if you exclude training costs which is kind of absurd
          because itâs not like you can stop training
       
            topaz0 wrote 8 hours 21 min ago:
            Worth noting that the post only claims they should be profitable
            for the inference of their paying customers on a guesstimated
            typical workload. Free users and users with atypical usage patterns
            will obviously skew the whole picture. So the argument in the post
            is at least compatible with them still losing money on inference
            overall.
       
            JimDabell wrote 8 hours 25 min ago:
            Thereâs no mention of that in this article about it: [1] They
            quote him as saying inference is profitable and end it at that.
            
            Are you saying that the COO corrected him at the dinner, or on the
            podcast? Which podcast was it?
            
   URI      [1]: https://archive.is/wZslL
       
              Barbing wrote 7 hours 52 min ago:
              From a journalist at the dinner:
              
              âI think that tends to end poorly because as demand for your
              service grows, you lose more and more money. Sam Altman actually
              addressed this at dinner. He was asked basically, are you guys
              losing money every time someone uses ChatGPT?
              
              And it was funny. At first, he answered, no, we would be
              profitable if not for training new models. Essentially, if you
              take away all the stuff, all the money we're spending on building
              new models and just look at the cost of serving the existing
              models, we are sort of profitable on that basis.
              
              And then he looked at Brad Lightcap, who is the COO, and he sort
              of said, right? And Brad kind of like squirmed in his seat a
              little bit and was like, well, we're pretty close.
              
              We're pretty close. We're pretty close.
              
              So to me, that suggests that there is still some, maybe small
              negative unit economics on the usage of ChatGPT. Now, I don't
              know whether that's true for other AI companies, but I think at
              some point, you do have to fix that because as we've seen for
              companies like Uber, like MoviePass, like all these other sort of
              classic examples of companies that were artificially subsidizing
              the cost of the thing that they were providing to consumers, that
              is not a recipe for long-term success.â
              
              From Hard Fork: Is This an A.I. Bubble? + Metaâs Missing Morals
              + TikTok Shock Slop, Aug 22, 2025
       
                est31 wrote 7 hours 34 min ago:
                GPT-5 was I suppose their attempt to make a product that
                provides as good metrics as their earlier products.
                
                Uber doesn't really compare, as they had existing competition
                from taxi companies that they first had to/have to destroy. And
                cars or fuel didn't get 10x cheaper over the time of Uber's
                existence, but I'm sure that they still can optimize a lot for
                efficiency.
                
                I'm more worried about OpenAIs capability to build a good moat.
                Right now it seems that each success is replicated by the
                competing companies quickly. Each month there is a new leader
                in the benchmarks. Maybe the moat will be the data in the end,
                i.e. there is barriers nowadays to crawl many websites that
                have lots of text. Meanwhile they might make agreements with
                the established AI players, maybe some of those agreements will
                be exclusive. Not just for training but also for updating wrt
                world news.
       
                JimDabell wrote 7 hours 48 min ago:
                Thanks!
       
            nixgeek wrote 8 hours 30 min ago:
            Excluding training two of their biggest costs will be payroll and
            inferencing for all the free users.
            
            Itâs therefore interesting that they claimed it was close: this
            supports the theory inferencing from paid users is a (big) money
            maker if itâs close to covering all the free usage and their
            payroll costs?
       
          losvedir wrote 9 hours 14 min ago:
          I think this is missing the point that the very interesting article
          makes.
          
          You're arguing that maybe the big companies won't recoup their
          investment in the models, or profitably train new ones.
          
          But that's a separate question. Whether a model - which now exists! -
          can profitably be run is very good to know. The fact that people
          happily pay more than the inference costs means what we have now is
          sustainable. Maybe Anthropic of OpenAI will go out of business or
          something, but the weights have been calculated already, so someone
          will be able to offer that service going forward.
       
            hirako2000 wrote 6 hours 48 min ago:
            It hasn't even proven that, it's assuming a ridiculous daily usage,
            and also ignoring free riders. Running a model is likely not
            profitable for any provider right now. Even a public company (e.g
            alphabet) isn't obliged to honest figures since numbers on the
            sheets can be moved left and right. We won't know for a other year
            or two when companies we have today start falling and their
            founders start talking.
       
          ForHackernews wrote 9 hours 23 min ago:
          > if you have healthy positive cash flow you have much better
          mechanisms available to fund capital investment other than selling
          shares. Eg issue a bond against that healthy cash flow.
          
          Is that actually true in 2025? Presumably you have to make coupon
          payments on a bond(?), but shares are free. Companies like Meta have
          shown you can issue shares that don't come with voting rights and
          people will buy them, and meme stocks like GME have demonstrated the
          effectiveness of churning out as many shares as the market will bear.
       
            JCM9 wrote 8 hours 42 min ago:
            Agree itâs not the fashionable thing. Thereâs a line from The
            Big Short of âThis is Wall Street Dr Bury, if you offer us free
            money weâre going to take it.â
            
            These companies are behaving the same way. Folks are willing to
            throw endless money into the present pit so on the one hand I
            canât blame them for taking it.
            
            Reality is though that when the hype wears off itâs only throwing
            more gasoline on the fire and building a bigger pool of investors
            thatâs will become increasingly desperate to salvage returns.
            History says time and time again that story doesnât end well and
            thatâs why the voices mumbling âbubbleâ under their breath
            are getting louder every day.
       
          crote wrote 9 hours 31 min ago:
          Their assumption is that training is a fixed cost: you'll spend the
          same amount on training for 5 users as you will with 500 million
          users.
          
          Spending hundreds of millions of dollars on training when you are two
          guys in a garage is quite significant, but the same amount is
          absolutely trivial if you are planet-scale.
          
          The big question is: how will training cost develop? Best-case
          scenario is a one-and-done run. But we're now seeing an arms race
          between the various AI providers: worst-case scenario, can the market
          survive an exponential increase in training costs for sublinear
          improvements?
       
            simianwords wrote 7 hours 1 min ago:
            They just wonât train it. They have the choice.
            
            Why do you think they will mindlessly train extremely complicated
            models if the numbers donât make sense?
       
              crote wrote 1 hour 14 min ago:
              Because they are trying to capture the market, obviously.
              
              Nobody is going to pay the same price for a significantly worse
              model. If your competitor brings out a better model at the same
              price point, you either a) drop your price to attract a new
              low-budget market, b) train a better model to retain the same
              high-budget market, or c) lose all your customers.
              
              You have taken on a huge amount of VC money, and those investors
              aren't going to accept options A or C. What is left is option B:
              burn more money, build an even better model, and hope your
              finances last longer than the competition.
              
              It's the classic VC-backed startup model: operate at a loss until
              you have killed the competition, then slowly increase prices as
              your customers are unable to switch to an alternative. It worked
              great for Uber & friends.
       
          Aurornis wrote 9 hours 37 min ago:
          > You have to include all the costs associated with the model, not
          just inference.
          
          The title of the article directly says âon inferenceâ. Itâs not
          a mistake to exclude training costs. This is about incremental costs
          of inference.
       
            artursapek wrote 9 hours 31 min ago:
            Hacker News commenters just can't help but critique things even
            when they're missing the point
       
              kgwgk wrote 9 hours 23 min ago:
              Your comment may apply to the original commenter âmissingâ
              the point of TFA and to the person replying âmissingâ the
              point of that comment. And to my comment âmissingâ the point
              of yours - which may have also âmissedâ the point.
       
                Aurornis wrote 9 hours 20 min ago:
                Iâve clearly âmissedâ the point you were trying to make,
                because thereâs nothing complicated: The article is about
                unit economics and marginal costs of inferences and this
                comment thread is trying to criticize the article based on a
                misunderstanding of what unit economics means.
       
                  kgwgk wrote 8 hours 56 min ago:
                  I was not trying to make any point. Iâm not even sure if
                  the comment I replied to was suggesting that it was you or
                  the other commenter who was missing some point or another.
       
              Aurornis wrote 9 hours 27 min ago:
              The parent commenterâs responses are all based on a wrong
              understanding of what unit economics means.
              
              You donât include fixed costs in the unit economics. Unit
              economics is about incremental costs.
       
                artursapek wrote 8 hours 58 min ago:
                I know I'm agreeing with you. I'm saying, don't bother with him
                lol
       
          martinald wrote 9 hours 38 min ago:
          (Author here). Yes I am aware of that and did mention it. However -
          what I wanted to push back in this article was that claude code was
          completely unsustainable and therefore a flash in the pan and devs
          aren't at risk (I know you are not saying this).
          
          The models as is are still hugely useful, even if no further training
          was done.
       
            scrollaway wrote 8 hours 28 min ago:
            > claude code was completely unsustainable and therefore a flash in
            the pan and devs aren't at risk
            
            How can you possibly say this if you know anything about the
            evolution of costs in the past year?
            
            Inference costs are going down constantly, and as models get better
            they make less mistakes which means less cycles = less inference to
            actually subsidize.
            
            This is without even looking at potential fundamental improvements
            in LLMs and AI in general. And with all the trillions in funding
            going into this sector, you can't possibly think we're anywhere
            near the technological peak.
            
            Speaking as a founder managing multiple companies: Claude Code's
            value is in the thousands per month /per person/ (with the proper
            training). This isn't a flash in the pan, this isn't even a
            "prediction" - the game HAS changed and anyone telling you it
            hasn't is trying to cover their head with highly volatile sand.
       
              martinald wrote 7 hours 22 min ago:
              I totally agree with you! I have heard others saying this though.
              But I don't think it's true.
       
                scrollaway wrote 7 hours 11 min ago:
                Got it â I got confused by your wording in the post but
                itâs clear now.
       
            Aurornis wrote 9 hours 21 min ago:
            > The models as is are still hugely useful, even if no further
            training was done.
            
            Exactly. The parent comment has an incorrect understanding of what
            unit economics means.
            
            The cost of training is not a factor in the marginal cost of each
            inference or each new customer.
            
            Itâs unfortunate this comment thread is the highest upvoted right
            now when itâs based on a basic misunderstanding of unit
            economics.
       
              ninetyninenine wrote 8 hours 26 min ago:
              I upvoted it because it aligns most closely with my own
              perspective. I have a strong dislike for AI and everything
              associated with it, so my judgment is shaped by that bias. If a
              post sounds realistic or complex, I have no interest in examining
              its nuance. I am not concerned with practical reality and prefer
              to accept it without thinking, so I support ideas that match my
              personal viewpoint.
              
              I donât understand why people like you have to call this stuff
              out? Like most of HN thinks the way I do and thatâs why the
              post was upvoted. Why be a contrarian? Thereâs really no point.
       
                SubiculumCode wrote 8 hours 21 min ago:
                Is this written by a sarcastic AI?
       
              esafak wrote 9 hours 17 min ago:
              The marginal cost is not the salient factor when the model has to
              be frequently retrained at great cost. Even if the marginal cost
              was driven to zero, would they profit?
       
                wongarsu wrote 8 hours 30 min ago:
                But they don't have to be retained frequently at great cost.
                Right now they are retrained frequently because everyone is
                frequently coming out with new models and nobody wants to fall
                behind. But if investment for AI were to dry up everyone would
                stop throwing so much money at R&D, and if everyone else isn't
                investing in new models you don't have to either. The models
                are powerful as they are, most of the knowledge in them isn't
                going to rapidly obsolete, and where that is a concern you can
                paper over it with RAG or MCP servers. If everyone runs out of
                money for R&D at the same time we could easily cut back to a
                situation where we get an updated version of the same model
                every 3 years instead of a bigger/better model twice a year.
                
                And whether companies can survive in that scenario depends
                almost entirely on their unit economics of inference, ignoring
                current R&D costs
       
                  re-thc wrote 8 hours 4 min ago:
                  > But if investment for AI were to dry up everyone would stop
                  throwing so much money at R&D, and if everyone else isn't
                  investing in new models you don't have to either
                  
                  IF.
                  
                  If you do stagnate for years someone will eventually decide
                  to invest and beat you. Intel has proven so.
       
                    simianwords wrote 7 hours 5 min ago:
                    Yeah so? How does that change anything?
       
                  churchill wrote 8 hours 7 min ago:
                  Like we've seen with Karparthy & Murati starting their own
                  labs, it's to be expected that over the next 5 years,
                  hundreds of engineers & researchers at the bleeding edge will
                  quit and start competing products. They'll reliably raise $1b
                  to $5b in weeks, too. And it's logical: for an investor, a
                  startup founded by a Tier 1 researcher will more reliably
                  10-100x your capital, vs. Anthropic & OpenAI that are already
                  at >$250b+.
                  
                  This talent diffusion guarantees that OpenAI and Anthropic
                  will have to keep sinking in ever more money to stay at the
                  bleeding edge, or upstarts like DeepSeek and incumbents like
                  Meta will simply outspend you/hire away all the Tier 1 talent
                  to upstage you.
                  
                  The only companies that'll reliably print money off AI are
                  TSMC and NVIDIA because they'll get paid either way. They're
                  selling shovels and even if the gold rush ends up being a
                  bust, they'll still do very well.
       
                    JSR_FDED wrote 7 hours 9 min ago:
                    True. But at some point the fact that there are many many
                    players in the market will start to diminish the valuation
                    of each of those players, donât you think? I wonder what
                    that point would be.
       
                Aurornis wrote 9 hours 11 min ago:
                Unit economics are the salient factor of inference costs, which
                this article is about.
       
          gruez wrote 9 hours 43 min ago:
          I think the point isn't to argue AI companies are money printers or
          even that they're fairly valued, it's that at least the unit
          economics work out. Contrast this to something like moviepass, where
          they were actually losing money on each subscriber. Sure, a company
          that requires huge capital investments that might never be paid back
          isn't great either, but at least it's better than moviepass.
       
            JCM9 wrote 9 hours 39 min ago:
            Unit economics needs to include the cost of the thing being sold,
            not just the direct cost of selling it.
            
            Unit economics is mostly a manufacturing concept and the only
            reason it looks OK here is because of not really factoring in the
            cost of building the thing into the cost of the thing.
            
            Someone might say I donât understand âunit economicsâ but
            Iâd simply argue applying a unit economics argument saying itâs
            good without including the cost of model training is abusing the
            concept of unit economics in a way thatâs not realistic from a
            business/economics sense.
            
            The model is whatâs being sold. You canât just sell
            âinferenceâ as a thing with no model. Thats just selling
            compute, which should be high margin. The article is simply
            affirming that by saying yes when youâre just selling compute in
            micro-chunks thatâs a decent margin business which is a nice
            analysis but not surprising.
       
              cwyers wrote 5 hours 38 min ago:
              The thing about large fixed costs is that you can just solve them
              with growth. If they were losing money on inference alone no
              amount of growth would help. It's not clear to me there's enough
              growth that everybody makes it out of this AI boom alive, but at
              least some companies are going to be able to grow their way to
              profitability at some point, presumably.
       
              barrkel wrote 9 hours 25 min ago:
              There is no marginal cost for training, just like there's no
              marginal cost for software. This is why you don't generally use
              unit economics for analyzing software company breakeven.
       
                cj wrote 8 hours 39 min ago:
                The only reason unit economics aren't generally used for
                software companies is the profit margin is typically 80%+. The
                cost of posting a Tweet on Twitter/X is close to $0.
                
                Compare the cost of tweeting to the cost of submitting a
                question to ChatGPT. The fact that ChatGPT rate limits (and now
                sells additional credits to keep using it after you hit the
                limit) indicates there are serious unit economic
                considerations.
                
                We can't think of OpenAI/Anthropic as software businesses. At
                least from a financial perspective, it's more similar to a
                company selling compute (e.g. AWS) than a company selling
                software (e.g. Twitter/X).
       
              voxic11 wrote 9 hours 26 min ago:
              That isn't what unit economics is. The purpose of unit economics
              is to answer: "How much money do I make (or lose) if I add one
              more customer or transaction?". Since adding an additional
              user/transaction doesn't increase the cost of training the models
              you would not include the cost of training the models in a unit
              economics analysis. The entire point of unit economics is that it
              excludes such "fixed costs".
       
              Aurornis wrote 9 hours 28 min ago:
              The cost of âmanufacturingâ an AI response is the inference
              cost, which this article covers.
              
              > That would be like saying the unit economics of selling
              software is good because the only cost is some bandwidth and
              credit card processing fees. You need to include the cost of
              making the software
              
              Unit economics is about the incremental value and costs of each
              additional customer.
              
              You do not amortize the cost of software into the unit economics
              calculations. You only include the incremental costs of
              additional customers.
              
              > just like you need to include the cost of making the models.
              
              The cost of making the models is important overall, but itâs
              not included in the unit economics or when calculating the cost
              of inference.
       
              ascorbic wrote 9 hours 31 min ago:
              You can amortise the training cost across billions of inference
              requests though. It's the marginal cost for inference that's most
              interesting here.
       
              martinald wrote 9 hours 35 min ago:
              But what about running Deepseek R1 or (insert other open weights
              model here)? There is no training cost for that.
       
                JCM9 wrote 9 hours 30 min ago:
                1. Someone is still paying for that cost.
                
                2. âOpen sourceâ is great but then itâs just a commodity.
                It would be very hard to build a sustainable business purely on
                the back of commoditized models. Adding a feature to an actual
                product that does something else though? Sure.
       
                  scarface_74 wrote 8 hours 35 min ago:
                  There is plenty of money to be made from hosting open source
                  software.  AWS for instance makes tons of money from Linux,
                  MySQL, Postgres, Redis, hosting AI models like DeepSeek
                  (Bedrock) etc.
       
          politelemon wrote 9 hours 46 min ago:
          What will be the knock on effect on us consumers?
       
            chasd00 wrote 9 hours 27 min ago:
            Self hosting LLMs isnât completely out of the realm of
            feasibility. Hardware cost may be 2-3x a hardcore gaming rig but it
            would be neat to see open source, self hosted, coding helpers. When
            Linux hit the scenes it put UNIX(ish) power in the hands of anyone
            with no license fee required. Surely somewhere someone is doing the
            same with LLM assisted coding.
       
              Workaccount2 wrote 9 hours 5 min ago:
              The only reason to have a local model right now is for privacy
              and hobby.
              
              The economics are awful and local model performance is pretty
              lackluster by comparison. Never mind much slower and narrower
              context length.
              
              $6,000 is 2.5 years of a $200/mo subscription. And in 2.5 years
              that $6k setup will likely be equivalent to a $1k setup of the
              time.
       
                grim_io wrote 8 hours 27 min ago:
                We don't even need to compare it to the most expensive
                subscriptions.
                
                The $20 subscription is far more capable than anything i could
                build locally for under $10k.
       
            JCM9 wrote 9 hours 37 min ago:
            Costs will go up to levels where people will no longer find this
            stuff as useful/interesting. Itâs all fun and games until the
            subsides end.
            
            See the recent reactions to AWS pricing on Kiro where folks had a
            big WTF reaction on pricing after, it appears, AWS tried to charge
            realistic pricing based on what this stuff actually costs.
       
              nixgeek wrote 8 hours 28 min ago:
              Isnât AWS always quite expensive? Look at their margins and the
              amount of cash it throws off, versus the consumer/retail business
              which runs a ton more revenue but no profit.
              
              If youâre applying the same pricing structure to Kiro as to all
              AWS products then, yeah, itâs not particularly hobbyist
              accessible?
       
              philipallstar wrote 9 hours 34 min ago:
              The article is answering a specific question, and has excluded
              this on purpose. If you have a sunk training cost you still want
              to know if you can at least operate profitably.
       
            kelp6063 wrote 9 hours 42 min ago:
            API prices are going up and rate limits are getting more aggressive
            (see what's going on with cursor and claude code)
       
        techpineapple wrote 11 hours 45 min ago:
        So, if this is true, OpenAI needs much better conversion rates, because
        they have ~15 million paying users compared to 800 million weekly
        active users:
        
   URI  [1]: https://nerdynav.com/chatgpt-statistics/
       
          martinald wrote 9 hours 56 min ago:
          Yeah but they can probably monetize them with ads.
       
            UltraSane wrote 9 hours 34 min ago:
            LLM generated ads.
       
            bgwalter wrote 9 hours 34 min ago:
            I'm not so sure. Inserting ads into chatbot output is like
            inserting ads into email. People are more reluctant to tolerate
            that than web or YouTube ads (which are hated already).
            
            If they insert stealth ads, then after the third sponsored bad
            restaurant suggestion people will stop using that feature, too.
       
              martinald wrote 9 hours 24 min ago:
              Mmm let's see. I think in LLM ads are probably have the most
              intent (and therefore most value) of any ads. They are like
              search PPC ads on steroids as you have even more context of what
              the user is actually looking for.
              
              Hell they could even just add affiliate tracking to links (and
              not change any of the ranking based on it) and probably make
              enough money to cover a lot of the inference for free users.
       
       
   DIR <- back to front page