gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Kimi K2.5 Technical Report [pdf]
       
       
        eager_learner wrote 37 min ago:
        I tried Kimi 2.5 Swarm Agent version and it was way better than any AI
        model I've tried so far.
       
        tonychang430 wrote 4 hours 24 min ago:
        Love to see Open source models doing better than SOTA
       
        extr wrote 5 hours 2 min ago:
        I tried this today. It's good - but it was significantly less focused
        and reliable than Opus 4.5 at implementing some mostly-fleshed-out
        specs I had lying around for some needed modifications to an enterprise
        TS node/express service. I was a bit disappointed tbh, the speed via
        fireworks.ai is great, they're doing great work on the hosting side.
        But I found the model had to double-back to fix type issues, broken
        tests, etc, far more than Opus 4.5 which churned through the tasks with
        almost zero errors. In fact, I gave the resulting code to Opus, simply
        said it looked "sloppy" and Opus cleaned it up very quickly.
       
        threethirtytwo wrote 5 hours 17 min ago:
        When will hardware get cheap enough so people can run this locally?
        Thatâs the world Iâm waiting for.
       
          vanviegen wrote 1 hour 1 min ago:
          2042. But by then you won't want to run this model anymore.
       
        tallesborges92 wrote 6 hours 49 min ago:
        Iâve added the api key support to kimi on my agentic coding:
        
   URI  [1]: https://github.com/tallesborges/zdx
       
        sreekanth850 wrote 7 hours 43 min ago:
        Calude give 100% passmark for code generated by kimi and sometimes it
        say, its better than what claude proposed. Absolutely best os model.
       
        logicprog wrote 10 hours 45 min ago:
        Kimi K2T was good. This model is outstanding, based on the time I've
        had to test it (basically since it came out). It's so good at following
        my instructions, staying on task, and not getting context poisoned. I
        don't use Claude or GPT, so I can't say how good it is compared to
        them, but it's definitely head and shoulders above the open weight
        competitors
       
        unleaded wrote 11 hours 47 min ago:
        Seems that K2.5 has lost a lot of the personality from K2
        unfortunately, talks in more ChatGPT/Gemini/C-3PO style now. It's not
        explictly bad, I'm sure most people won't care but it was something
        that made it unique so it's a shame to see it go.
        
        examples to illustrate [1] (K2.5) [2] (K2)
        
   URI  [1]: https://www.kimi.com/share/19c115d6-6402-87d5-8000-000062fecca...
   URI  [2]: https://www.kimi.com/share/19c11615-8a92-89cb-8000-000063ee671...
       
          orbital-decay wrote 5 hours 55 min ago:
          K2 in your example is using the GPT reply template (tl;dr - terse
          details - conclusion, with contradictory tendencies), there's nothing
          unique about it. That's exactly how GPT-5.0 talked.
          The only model with a strong "personality" vibe was Claude 3 Opus.
       
          Grosvenor wrote 6 hours 35 min ago:
          Both models of Kimi are shit. A NeXT cube is a perfectly cromulent
          computing device. Where else can you run Lotus Improv, Framemaker,
          and Mathematica at once?
          
          Plus it looks boss - The ladies will be moist.
       
            Grimblewald wrote 4 hours 48 min ago:
            Disagree, i've found kimi useful in solving creative coding
            problems gemini, claude, chatgpt etc failed at. Or, it is far
            better at verifying, augmenting and adding to human reviews of
            resumes for positions. It catches missed detials humans and other
            llm's routinley miss. There is something special to K2.
       
          zozbot234 wrote 10 hours 45 min ago:
          It's hard to judge from this particular question, but the K2.5 output
          looks at least marginally better AIUI, the only real problem with it
          is the snarky initial "That's very interesting" quip.  Even then a
          British user would probably be fine with it.
       
          logicprog wrote 11 hours 2 min ago:
          I agree. K2 was blunt, straightforward, pretty... rational? K2.5 has
          a much stronger slop vibe.
       
        oxqbldpxo wrote 11 hours 51 min ago:
        This Kimi K2 is so far the best. Gemini is also great, but google is
        stock in the academic bias of Stanford and  MIT and can't think outside
        the box. China definitely ahead on Ai. Wish somehow someone here in the
        US, would think different.
       
          dfsegoat wrote 11 hours 37 min ago:
          >  but google is stock in the academic bias of Stanford and MIT and
          can't think outside the box
          
          Can you clarify what you mean? I am not sure I follow.
       
            JSR_FDED wrote 11 hours 17 min ago:
            s/stock/stuck/
       
        storus wrote 12 hours 25 min ago:
        Do I need to have two M3U 512GB MacStudios to run this?
       
        Imanari wrote 12 hours 33 min ago:
        I have been very impressed with this model and also with the Kimi CLI.
        I have been using it with the 'Moderato' plan (7 days free, then 19$).
        A true competitor to Claude Code with Opus.
       
        syndacks wrote 12 hours 37 min ago:
        How do people evaluate creative writing and emotional intelligence in
        LLMs? Most benchmarks seem to focus on reasoning or correctness, which
        feels orthogonal. Iâve been playing with Kimmy K 2.5 and it feels
        much stronger on voice and emotional grounding, but I donât know how
        to measure that beyond human judgment.
       
          nolist_policy wrote 43 min ago:
          
          
   URI    [1]: https://eqbench.com/index.html
       
          mohsen1 wrote 9 hours 13 min ago:
          I am trying! [1] I just don't have enough funding to do a ton of
          tests
          
   URI    [1]: https://mafia-arena.com
       
        cmrdporcupine wrote 13 hours 3 min ago:
        DeepSeek is likely to release a new model soon, and judging from the
        past it's likely to be more cost effective and just as or more powerful
        than Kimi 2.5.
        
        DeepSeek 3.2 was already quite compelling. I expect its successor will
        be competitive.
       
        zzleeper wrote 13 hours 13 min ago:
        Do any of these models do well with information retrieval and reasoning
        from text?
        
        I'm reading newspaper articles through a MoE of gemini3flash and
        gpt5mini, and what made it hard to use open models (at the time) was a
        lack of support for pydantic.
       
          jychang wrote 13 hours 10 min ago:
          That roughly correlates with tool calling capabilities. Kimi K2.5 is
          a lot better than previous open source models in that regard.
          
          You should try out K2.5 for your use case, it might actually succeed
          where previous generation open source models failed.
       
        gedy wrote 14 hours 20 min ago:
        Sorry if this is an easy-answerable question - but by open we can
        download this and use totally offline if now or in the future if we
        have hardware capable?    Seems like a great thing to archive if the
        world falls apart (said half-jokingly)
       
          fancy_pantser wrote 6 hours 6 min ago:
          Sure. Someone on /r/LocalLLaMA was seeing 12.5 tokens/s on dual Strix
          Halo 128GB machines (run you $6-8K total?) with 1.8bits per
          parameter. It performs far below the unquantized model, so it would
          not be my personal pick for a one-local-LLM-forever, but it is
          compelling because it has image and video understanding. You lose
          those features if you choose, say, gpt-oss-120B.
          
          Also, that's with no context, so it would be slower as it filled (I
          don't think K2.5 uses the Kimi-Linear KDA attention mechanism, so
          it's sub-quadratic but not their lowest).
       
          fragmede wrote 12 hours 30 min ago:
          Yes but the hardware to run it decently gonna cost you north of
          $100k, so hopefully you and your bunkermates allocated the right
          amount to this instead of guns or ammo.
       
          Tepix wrote 12 hours 43 min ago:
          You could buy five Strix Halo systems at $2000 each, network them and
          run it.
          
          Rough estimage: 12.5:2.2 so
          you should get around 5.5 tokens/s.
       
            j-bos wrote 12 hours 35 min ago:
            Is the software/drivers for networking LLMs on Strix Halo there
            yet? I was under the impression a few weeks ago that it's veeeery
            early stages and terribly slow.
       
              Tepix wrote 6 hours 12 min ago:
              Llama.cpp with its rpc-server
       
          cmrdporcupine wrote 13 hours 30 min ago:
          Yes, but you'll need some pretty massive hardware.
       
          Carrok wrote 14 hours 12 min ago:
          Yes.
       
        firasd wrote 14 hours 47 min ago:
        I'm not very familiar with this model series but tried it when
        prototyping a potential 'Vibesbench' (my eval) conversation I tried it.
        Here's Kimi on a Robocop scene
        
        kimi k2 thinking turbo
        
        "Heâs not being poetic; heâs weaponizing nostalgia. Quoting
        Casablancaâthe ultimate âbeautiful sacrificeâ romanceâto taunt
        Morton about corporate backstabbing is pure executive sleaze. Itâs
        the same move as his mahogany desk and flag pin: draping sentimental
        Americana over naked power. Heâs essentially saying, âIsnât it
        romantic, Bob? You fought the good fight, now you die.â "
        
        grok 4.1
        
        "So when Dick Jones smirks and says, âTime goes by, itâs the same
        old storyâ¦ the fight for love and glory, huh, Bob?â thatâs pure
        Ed Neumeier flexing his love of old Hollywood while simultaneously
        mocking how these psychopathic â80s execs think theyâre in some
        grand cinematic drama instead of just murdering a guy for market
        share."
        
        chatgpt-4o
        
        "The reason the Casablanca reference matters is RoboCop itself is
        partially a noirâand also a tragedy built inside a genre engine. [â¦
        ] So yeah, calling back to Casablanca is not just some
        meta-jokeâitâs a knowing wink that RoboCop is a noir trapped in
        cybernetic skin. // Even Clarence Boddicker, for all his cartoonish
        sadism, plays like a twisted stand-in for a noir heavyâtalking
        smooth, dropping nicknames, throwing out casual violence like itâs a
        punchline."
        
        opus 4.5
        
        "Jones is performing avuncular menace. He's the kind of executive who'd
        quote old movies while having you murdered, because it's all
        performance to him, all style. The substance is just power."
       
        miroljub wrote 15 hours 38 min ago:
        I've been quite satisfied lately with MiniMax M-2.1 in opencode.
        
        How does Kimi 2.5 compare to it in real world scenarios?
       
          viraptor wrote 15 hours 35 min ago:
          A lot better in my experience. M2.1 to me feels between haiku and
          sonnet. K2.5 feels close to opus. That's based on my testing of
          removing some code and getting it to reimplement based on tests. Also
          the design/spec writing feels great. You can still test k2.5 for free
          in OpenCode today.
       
            miroljub wrote 15 hours 21 min ago:
            Well, Minimax was the equivalent of Sonnet in my testing. If Kimi 
            approach Opus, that would be great.
       
              samtheprogram wrote 14 hours 51 min ago:
              Kimi K2.5 approaches Sonnet as well from what I can tell, it's
              just slower to get to the result.
       
        llmslave wrote 15 hours 45 min ago:
        The benchmarks on all these models are meaningless
       
          alchemist1e9 wrote 15 hours 23 min ago:
          Why and what would a good benchmark look like?
       
            moffkalast wrote 15 hours 5 min ago:
            30 people trying out all models on the list for their use case for
            a week and then checking what they're still using a month after.
       
        epolanski wrote 15 hours 53 min ago:
        It's interesting to note that a model that can OpenAI is valued almost
        400 times more than moonshotai, despite their models being surprisingly
        close.
       
          m3kw9 wrote 12 hours 46 min ago:
          Unless they can beat their capabilities by a clear magical step up
          and has infrastructure to capture the users
       
          famouswaffles wrote 14 hours 1 min ago:
          OpenAI is a household name with nearly a billion weekly active users.
          Not sure there's any reality where they wouldn't be valued much more
          than Kimi regardless of how close the models may be.
       
          moffkalast wrote 15 hours 24 min ago:
          Well to be the devil's advocate: One is a household name that holds
          most of the world's silicon wafers for ransom, and the other sounds
          like a crypto scam. Also estimating valuation of Chinese companies is
          sort of nonsense when they're all effectively state owned.
       
            epolanski wrote 14 hours 12 min ago:
            There isn't a single % that is state owned in Moonshot AI.
            
            And don't start me with the "yeah but if the PRC" because it's
            gross when US can de facto ban and impose conditions even on
            European companies, let alone the control it has on US ones.
       
              moffkalast wrote 2 hours 20 min ago:
              I'm not sure if that is accurate, most of the funding they've got
              is from Tencent and Alibaba, and we know what happened to Jack Ma
              the second he went against the party line. These two are defacto
              state owned enterprises. Moonshot is unlikely to be for sale in
              any meaningful way so its valuation is moot.
              
              [0]
              
   URI        [1]: https://en.wikipedia.org/wiki/Moonshot_AI#Funding_and_in...
       
              swyx wrote 8 hours 43 min ago:
              Funny because that's how us Americans feel about your European
              cookie banner litter and unilateral demands on privacy
       
        behnamoh wrote 16 hours 3 min ago:
        It's a decent model but works best with kimi CLI, not CC or others.
       
          rubslopes wrote 10 hours 13 min ago:
          I haven't use Kimi CLI, but it works very well with OpenCode.
       
          alansaber wrote 15 hours 52 min ago:
          Why do you think that is?
       
            segmondy wrote 14 hours 24 min ago:
            read the tech report
       
            chillacy wrote 15 hours 44 min ago:
            I heard it's because the labs fine tune their models for their own
            harness. Same reason why claude does better in claude code than
            cursor.
       
        derac wrote 16 hours 19 min ago:
        I really like the agent swarm thing, is it possible to use that
        functionality with OpenCode or is that a Kimi CLI specific thing? Does
        the agent need to be aware of the capability?
       
          esafak wrote 14 hours 55 min ago:
          Has anyone tried it and decided it's worth the cost; I've heard it's
          even more profligate with tokens?
       
            swyx wrote 8 hours 46 min ago:
            Yes. [1] it's not crazy, they cap it to 3 credits, and also YSK
            agent swarm is a closed source product
            
            Would i use it a gain compared to Deep Research products elsewhere?
            Maybe, probably not but only bc it's hard to switch apps
            
   URI      [1]: https://x.com/swyx/status/2016381014483075561?s=20
       
          zeroxfe wrote 15 hours 47 min ago:
          It seems to work with OpenCode, but I can't tell exactly what's going
          on -- I was super impressed when OpenCode presented me with a UI to
          switch the view between different sub-agents. I don't know if
          OpenCode is aware of the capability, or the model is really good at
          telling the harness how to spawn sub-agents or execute parallel tool
          calls.
       
        margorczynski wrote 16 hours 31 min ago:
        I wonder how K2.5 + OpenCode compares to Opus with CC. If it is close I
        would let go of my subscription, as probably a lot of people.
       
          jauntywundrkind wrote 9 hours 58 min ago:
          I've been drafting plans/specs in parallel with Opus and Kimi. Then
          asking them to review the others plan.
          
          I still find Opus is "sharper" technically, tackles problems more
          completely & gets the nuance.
          
          But man Kimi k2.5 can write. Even if I don't have a big problem
          description, just a bunch of specs, Kimi is there, writing good intro
          material, having good text that more than elaborates, that actually
          explains. Opus, GLM-4.7 have both complemented Kimi on it's writing.
          
          Still mainly using my z.ai glm-4.7 subscription for the work, so I
          don't know how capable it really is. But I do tend to go for some
          Opus in sticky spots, and especially given the 9x price difference, I
          should try some Kimi. I wish I was set up for better parallel
          evaluation; feels like such a pain to get started.
       
          eknkc wrote 15 hours 42 min ago:
          It is not opus. It is good, works really fast and suprisingly through
          about its decisions. However I've seen it hallucinate things.
          
          Just today I asked for a code review and it flagged a method that can
          be `static`. The problem is it was already static. That kind of stuff
          never happens with Opus 4.5 as far as I can tell.
          
          Also, in an opencode Plan mode (read only). It generated a plan and
          instead of presenting it and stopping, decided to implement it. Could
          not use the edit and write tools because the harness was in read only
          mode. But it had bash and started using bash to edit stuff. Wouldn't
          just fucking stop even though the error messages it received from
          opencode stated why. Its plan and the resulting code was ok so I let
          it go crazy though...
       
            esafak wrote 14 hours 53 min ago:
            Some models have a mind of their own. I keep them on a leash with
            `permission` blocks in OC -- especially for rm/mv/git.
       
          naragon wrote 15 hours 48 min ago:
          I've been using K2.5 with OpenCode to do code assessments/fixes and
          Opus 4.5 with CC to check the work, and so far so good. Very
          impressed with it so far, but I don't feel comfortable canceling my
          Claude subscription just yet. Haven't tried it on large feature
          implementations.
       
          ithkuil wrote 15 hours 50 min ago:
          I also wonder if CC can be used with k2.5 with the appropriate API
          adapter
       
            tjuene wrote 14 hours 32 min ago:
            yes, just use the base url [1] ( [2] )
            
   URI      [1]: https://api.moonshot.ai/anthropic
   URI      [2]: https://platform.moonshot.ai/docs/guide/agent-support#conf...
       
        zeroxfe wrote 16 hours 41 min ago:
        I've been using this model (as a coding agent) for the past few days,
        and it's the first time I've felt that an open source model really
        competes with the big labs. So far it's been able to handle most things
        I've thrown at it. I'm almost hesitant to say that this is as good as
        Opus.
       
          timwheeler wrote 6 hours 20 min ago:
          Did you use Kimi Code or some other harness? I used it with OpenCode
          and it was bumbling around through some tasks that Claude handles
          with ease.
       
            zedutchgandalf wrote 5 hours 15 min ago:
            Are you on the latest version? They pushed an update yesterday that
            greatly improved Kimi K2.5âs performance. Itâs also free for a
            week in OpenCode, sponsored by their inference provider
       
              ekabod wrote 2 hours 11 min ago:
              But it may be a quantized model for the free version.
       
          rubslopes wrote 11 hours 8 min ago:
          Also my experience. I've been going back and forth between Opus and
          Kimi for the last few days, and, at least for my CRUD webapps, I
          would say they are both on the same level.
       
          armcat wrote 16 hours 32 min ago:
          Out of curiosity, what kind of specs do you have (GPU / RAM)? I saw
          the requirements and it's a beyond my budget so I am "stuck" with
          smaller Qwen coders.
       
            observationist wrote 12 hours 13 min ago:
            API costs on these big models over private hosts tend to be a lot
            less than API calls to the big 4 American platforms. You definitely
            get more bang for your buck.
       
            tgrowazay wrote 15 hours 51 min ago:
            Just pick up any >240GB VRAM GPU off your local BestBuy to run a
            quantized version.
            
            > The full Kimi K2.5 model is 630GB and typically requires at least
            4Ã H200 GPUs.
       
              CamperBob2 wrote 13 hours 59 min ago:
              You could run the full, unquantized model at high speed with 8
              RTX 6000 Blackwell boards.
              
              I don't see a way to put together a decent system of that scale
              for less than $100K, given RAM and SSD prices.    A system with 4x
              H200s would cost more like $200K.
       
                ttul wrote 6 hours 28 min ago:
                That would be quite the space heater, too!
       
            zeroxfe wrote 15 hours 53 min ago:
            I'm not running it locally (it's gigantic!) I'm using the API at
            
   URI      [1]: https://platform.moonshot.ai
       
              rc1 wrote 14 hours 53 min ago:
              How long until this can be run on consumer grade hardware or a
              domestic electricity supply I wonder.
              
              Anyone have a projection?
       
                segmondy wrote 14 hours 21 min ago:
                You can run it on a mac studio with 512gb ram, that's the
                easiest way.  I run it at home on a multi rig GPU with partial
                offload to ram.
       
                  johndough wrote 14 hours 8 min ago:
                  I was wondering whether multiple GPUs make it go appreciably
                  faster when limited by VRAM. Do you have some tokens/sec
                  numbers for text generation?
       
                johndough wrote 14 hours 32 min ago:
                You can run it on consumer grade hardware right now, but it
                will be rather slow. NVMe SSDs these days have a read speed of
                7 GB/s (EDIT: or even faster than that! Thank you @hedgehog for
                the update), so it will give you one token roughly every three
                seconds while crunching through the 32 billion active
                parameters, which are natively quantized to 4 bit each. If you
                want to run it faster, you have to spend more money.
                
                Some people in the localllama subreddit have built systems
                which run large models at more decent speeds:
                
   URI          [1]: https://www.reddit.com/r/LocalLLaMA/
       
                  hedgehog wrote 13 hours 56 min ago:
                  High end consumer SSDs can do closer to 15 GB/s, though only
                  with PCI-e gen 5. On a motherboard with two m.2 slots that's
                  potentially around 30GB/s from disk. 
                  Edit: How fast everything is depends on how much data needs
                  to get loaded from disk which is not always everything on MoE
                  models.
       
                    greenavocado wrote 11 hours 9 min ago:
                    Would RAID zero help here?
       
                      hedgehog wrote 10 hours 16 min ago:
                      Yes, RAID 0 or 1 could both work in this case to combine
                      the disks. You would want to check the bus topology for
                      the specific motherboard to make sure the slots aren't on
                      the other side of a hub or something like that.
       
                heliumtera wrote 14 hours 37 min ago:
                You need 600gb of VRAM + MEMORY (+ DISK) to fit the model
                (full) or 240 for the 1b quantized model. Of course this will
                be slow.
                
                Through moonshot api it is pretty fast (much much much faster
                than Gemini 3 pro and Claude sonnet, probably faster than
                Gemini flash), though. To get similar experience they say at
                least 4xH200.
                
                If you don't mind running it super slow, you still need around
                600gb of VRAM + fast RAM.
                
                It's already possible to run 4xH200 in a domestic environment
                (it would be instantaneous for most tasks, unbelievable speed).
                It's just very very expensive and probably challenging for most
                users, manageable/easy for the average hacker news crowd.
                
                Expensive AND hard to source high end GPUs, if you manage to
                source for the old prices around 200 thousand dollars to get
                maximum speed I guess, you could probably run decently on a
                bunch of high end machines, for let's say, 40k (slow).
       
              BeetleB wrote 15 hours 51 min ago:
              Just curious - how does it compare to GLM 4.7? Ever since they
              gave the $28/year deal, I've been using it for personal projects
              and am very happy with it (via opencode).
              
   URI        [1]: https://z.ai/subscribe
       
                Alifatisk wrote 1 hour 16 min ago:
                Kimi k2.5 is a beast, speaks very human like (k2 was also good
                at this) and completes whatever I throw at it. However, the glm
                quarterly coding plan is too good of a deal. The Christmas deal
                ends today, so Iâd still suggest to stick to it. There will
                always come a better model.
       
                segmondy wrote 14 hours 22 min ago:
                The old Kimi K2 is better than GLM4.7
       
                akudha wrote 15 hours 16 min ago:
                Is the Lite plan enough for your projects?
       
                  BeetleB wrote 14 hours 36 min ago:
                  Very much so. I'm using it for small personal stuff on my
                  home PC. Nothing grand. Not having to worry about token usage
                  has been great (previously was paying per API use).
                  
                  I haven't stress tested it with anything large. Both at work
                  and home, I don't give much free rein to the AI (e.g. I
                  examine and approve all code changes).
                  
                  Lite plan doesn't have vision, so you cannot copy/paste an
                  image there. But I can always switch models when I need to.
       
                cmrdporcupine wrote 15 hours 17 min ago:
                From what people say, it's better than GLM 4.7 (and I guess
                DeepSeek 3.2)
                
                But it's also like... 10x the price per output token on any of
                the providers I've looked at.
                
                I don't feel it's 10x the value. It's still much cheaper than
                paying by the token for Sonnet or Opus, but if you have a
                subscribed plan from the Big 3 (OpenAI, Anthropic, Google) it's
                much better value for $$.
                
                Comes down to ethical or openness reasons to use it I guess.
       
                  esafak wrote 15 hours 4 min ago:
                  Exactly. For the price it has to beat Claude and GPT, unless
                  you have budget for both. I just let GLM solve whatever it
                  can and reserve my Claude budget for the rest.
       
                InsideOutSanta wrote 15 hours 32 min ago:
                There's no comparison. GLM 4.7 is fine and reasonably competent
                at writing code, but K2.5 is right up there with something like
                Sonnet 4.5. it's the first time I can use an open-source model
                and not immediately tell the difference between it and top-end
                models from Anthropic and OpenAI.
       
                zeroxfe wrote 15 hours 44 min ago:
                It's waaay better than GLM 4.7 (which was the open model I was
                using earlier)! Kimi was able to quickly and smoothly finish
                some very complex tasks that GLM completely choked at.
       
            Carrok wrote 16 hours 28 min ago:
            Not OP but OpenCode and DeepInfra seems like an easy way.
       
          thesurlydev wrote 16 hours 36 min ago:
          Can you share how you're running it?
       
            indigodaddy wrote 10 hours 31 min ago:
            Been using K2.5 Thinking via Nano-GPT subscription and `nanocode
            run` and it's working quite nicely.  No issues with Tool Calling so
            far.
       
            JumpCrisscross wrote 11 hours 26 min ago:
            > Can you share how you're running it?
            
            Not OP, but I've been running it through Kagi [1]. Their AI
            offering is probably the best-kept secret in the market.
            
   URI      [1]: https://help.kagi.com/kagi/ai/assistant.html
       
              deaux wrote 7 hours 33 min ago:
              Doesn't list Kimi 2.5 and seems to be chat-only, not API,
              correct?
       
            zeroxfe wrote 15 hours 54 min ago:
            Running it via [1] -- using OpenCode. They have super cheap monthly
            plans at kimi.com too, but I'm not using it because I already have
            codex and claude monthly plans.
            
   URI      [1]: https://platform.moonshot.ai
       
              esafak wrote 15 hours 1 min ago:
              Where? [1] starts at $19/month, which is same as the big boys.
              
   URI        [1]: https://www.kimi.com/code
       
              UncleOxidant wrote 15 hours 33 min ago:
              so there's a free plan at moonshot.ai that gives you some number
              of tokens without paying?
       
            eknkc wrote 15 hours 54 min ago:
            I've been using it with opencode. You can either use your kimi code
            subscription (flat fee), moonshot.ai api key (per token) or
            openrouter to access it. OpenCode works beautifully with the model.
            
            Edit: as a side note, I only installed opencode to try this model
            and I gotta say it is pretty good. Did not think it'd be as good as
            claude code but its just fine. Been using it with codex too.
       
              Imustaskforhelp wrote 15 hours 39 min ago:
              I tried to use opencode for kimi k2.5 too but recently they
              changed their pricing from 200 tool requests/5 hour to token
              based pricing.
              
              I can only speak from the tool request based but for some reason
              anecdotally opencode took like 10 requests in like 3-4 minutes
              where Kimi cli took 2-3
              
              So I personally like/stick with the kimi cli for kimi coding. I
              haven't tested it out again with OpenAI with teh new token based
              pricing but I do think that opencode might add more token issue.
              
              Kimi Cli's pretty good too imo. You should check it out!
              
   URI        [1]: https://github.com/MoonshotAI/kimi-cli
       
                nl wrote 12 hours 42 min ago:
                I like Kimi-cli but it does leak memory.
                
                I was using it for multi-hour tasks scripted via an
                self-written orchestrator on a small VM and ended up switching
                away from it because it would run slower and slower over time.
       
            explorigin wrote 16 hours 25 min ago:
             [1] Requirements are listed.
            
   URI      [1]: https://unsloth.ai/docs/models/kimi-k2.5
       
              KolmogorovComp wrote 15 hours 49 min ago:
              To save everyone a click
              
              > The 1.8-bit (UD-TQ1_0) quant will run on a single 24GB GPU if
              you offload all MoE layers to system RAM (or a fast SSD). With
              ~256GB RAM, expect ~10 tokens/s. The full Kimi K2.5 model is
              630GB and typically requires at least 4Ã H200 GPUs.
              If the model fits, you will get >40 tokens/s when using a B200.
              To run the model in near full precision, you can use the 4-bit or
              5-bit quants. You can use any higher just to be safe.
              For strong performance, aim for >240GB of unified memory (or
              combined RAM+VRAM) to reach 10+ tokens/s. If youâre below that,
              it'll work but speed will drop (llama.cpp can still run via
              mmap/disk offload) and may fall from ~10 tokens/s to <2 token/s.
              We recommend UD-Q2_K_XL (375GB) as a good size/quality balance.
              Best rule of thumb: RAM+VRAM â the quant size; otherwise
              itâll still work, just slower due to offloading.
       
                Gracana wrote 15 hours 38 min ago:
                I'm running the Q4_K_M quant on a xeon with 7x A4000s and I'm
                getting about 8 tok/s with small context (16k). I need to do
                more tuning, I think I can get more out of it, but it's never
                gonna be fast on this suboptimal machine.
       
                  segmondy wrote 14 hours 18 min ago:
                  you can add 1 more GPU so you can take advantage of tensor
                  parallel.   I get the same speed with 5 3090's with most of
                  the model on 2400mhz ddr4 ram, 8.5tk almost constant.    I
                  don't really do agents but chat, and it holds up to 64k.
       
                    Gracana wrote 14 hours 1 min ago:
                    That is a very good point and I would love to do it, but I
                    built this machine in a desktop case and the motherboard
                    has seven slots. I did a custom water cooling manifold just
                    to make it work with all the cards.
                    
                    I'm trying to figure out how to add another card on a riser
                    hanging off a slimsas port, or maybe I could turn the
                    bottom slot into two vertical slots.. the case (fractal
                    meshify 2 xl) has room for a vertical mounted card that
                    wouldn't interfere with the others, but I'd need to make a
                    custom riser with two slots on it to make it work. I dunno,
                    it's possible!
                    
                    I also have an RTX Pro 6000 Blackwell and an RTX 5000 Ada..
                    I'd be better off pulling all the A7000s and throwing both
                    of those cards in this machine, but then I wouldn't have
                    anything for my desktop. Decisions, decisions!
       
                  esafak wrote 14 hours 58 min ago:
                  The pitiful state of GPUs. $10K for a sloth with no memory.
       
            gigatexal wrote 16 hours 34 min ago:
            Yeah I too am curious. Because Claude code is so good and the
            ecosystem so just it works that Iâm
            Willing to pay them.
       
              Imustaskforhelp wrote 15 hours 37 min ago:
              I tried kimi k2.5 and first I didn't really like it. I was
              critical of it but then I started liking it. Also, the model has
              kind of replaced how I use chatgpt too & I really love kimi 2.5
              the most right now (although gemini models come close too)
              
              To be honest, I do feel like kimi k2.5 is the best open source
              model. It's not the best model itself right now tho but its
              really price performant and for many use cases might be nice
              depending on it.
              
              It might not be the completely SOTA that people say but it comes
              pretty close and its open source and I trust the open source part
              because I feel like other providers can also run it and just
              about a lot of other things too (also considering that iirc
              chatgpt recently slashed some old models)
              
              I really appreciate kimi for still open sourcing their complete
              SOTA and then releasing some research papers on top of them
              unlike Qwen which has closed source its complete SOTA.
              
              Thank you Kimi!
       
              epolanski wrote 15 hours 54 min ago:
              You can plug another model in place of Anthropic ones in Claude
              Code.
       
                miroljub wrote 13 hours 44 min ago:
                If you don't use Antrophic models there's no reason to use
                Claude Code at all. Opencode gives so much more choice.
       
                zeroxfe wrote 15 hours 50 min ago:
                That tends to work quite poorly because Claude Code does not
                use standard completions APIs. I tried it with Kimi, using
                litellm[proxy], and it failed in too many places.
       
                  samtheprogram wrote 14 hours 52 min ago:
                  opencode is a good alternative that doesnt flake out in this
                  way.
       
                  AnonymousPlanet wrote 15 hours 0 min ago:
                  It worked very well for me using qwen3 coder behind a
                  litellm. Most other models just fail in weird ways though.
       
       
   DIR <- back to front page