_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Reverse Engineering Cursor's LLM Client
       
       
        serf wrote 7 min ago:
        Cursor is the only product that I have cancelled in 20+ years due to a
        lack of customer service response.
        
        Emailed them multiple times over weeks about billing questions -- not a
        single response. These weren't like VS code questions , either -- they
        needed Cursor staff intervention.
        
        No problem getting promo emails though!
        
        The quicker their 'value' can be spread to other services the better,
        imo. Maybe the next group will answer emails.
       
        lyjackal wrote 5 hours 58 min ago:
        I've been curious to see the process for selecting relevant context
        from a long conversation. has anyone reverse engineered what that looks
        like? how is the conversion history pruned, and how is the latest state
        of a file represented?
       
          GabrielBianconi wrote 5 hours 49 min ago:
          We didn't look into that workflow closely, but you can reproduce our
          work (code in GitHub) and potentially find some insights!
          
          We plan to continue investigating how it works (+ optimize the models
          and prompts using TensorZero).
       
        bredren wrote 6 hours 24 min ago:
        Cursor and other IDE modality solutions are interesting but train
        sloppy use of context.
        
        From the extracted prompting Cursor is using:
        
        > Each time the USER sends a message, we may automatically attach some
        information about their current state…edit history in their session
        so far, linter errors, and more. This information may or may not be
        relevant to the coding task, it is up for you to decide.
        
        This is the context bloat that limits effectiveness of LLMs in solving
        very hard problems.
        
        This particular .env example illustrates the low stakes type of problem
        cursor is great at solving but also lacks the complexity that will keep
        SWE’s employed.
        
        Instead I suggest folks working with AI start at chat interface and
        work on editing conversations to keep clean contexts as they explore a
        truly challenging problem.
        
        This often includes meeting and slack transcripts, internal docs,
        external content and code.
        
        I’ve built a tool for surgical use of code called FileKitty: [1] and
        more recently slackprep: [2] That let a person be more intentional
        about what the problem they are trying to solve by only including
        information relevant to the problem.
        
   URI  [1]: https://github.com/banagale/FileKitty
   URI  [2]: https://github.com/banagale/slackprep
       
          jacob019 wrote 5 hours 15 min ago:
          I had this thought as well and find it a bit surprising.  For my own
          agentic applications, I have found it necessary to carefully curate
          the context.  Instead of including an instruction that we "may
          automatically attach", only include an instruction WHEN something is
          attached.  Instead of "may or may not be relevant to the coding task,
          it is up for you to decide"; provide explicit instruction to consider
          the relevance and what to do when it is relevant and when it is not
          relevant.  When the context is short, it doesn't matter as much, but
          when there is a difficult problem with long context length, fine
          tuned instructions make all the difference.  Cursor may be keeping
          instructions more generic to take advantage of cached token pricing,
          but the phrasing does seem rather sloppy.  This is all still
          relatively new, I'm sure both the models and the prompts will see a
          lot more change before things settle down.
       
        notpushkin wrote 7 hours 8 min ago:
        Hmm, now that we have the prompts, would it be possible to reimplement
        Cursor servers and have a fully local (ahem pirated) version?
       
          tomr75 wrote 3 hours 7 min ago:
          presumably their apply model is run on their servers
          
          I wonder how hard it would be to build a local apply model/surely
          that would be faster on a macbook
       
          handfuloflight wrote 4 hours 31 min ago:
          Were you really waiting for the prompts before disembarking on this
          adventure?
       
          deadbabe wrote 5 hours 44 min ago:
          Absolutely
       
        robkop wrote 8 hours 26 min ago:
        There is much missing from this prompt, tool call descriptors is the
        most obvious. See for yourself using even a year old jailbreak [1].
        There’s some great ideas in how they’ve setup other pieces such as
        cursor rules.
        
        [1] 
        
   URI  [1]: https://gist.github.com/lucasmrdt/4215e483257e1d81e44842eddb8c...
       
          cloudking wrote 6 hours 26 min ago:
          
          
   URI    [1]: https://github.com/elder-plinius/CL4R1T4S/blob/main/CURSOR/C...
       
          GabrielBianconi wrote 7 hours 11 min ago:
          They use different prompts depending on the action you're taking. We
          provided just a sample because our ultimate goal here is to start A/B
          testing models, optimizing prompts + models, etc. We provide the code
          to reproduce our work so you can see other prompts!
          
          The Gist you shared is a good resource too though!
       
          ericrallen wrote 8 hours 7 min ago:
          Maybe there is some optimization logic that only appends tool details
          that are required for the user’s query?
          
          I’m sure they are trying to slash tokens where they can, and
          removing potentially irrelevant tool descriptors seems like
          low-hanging fruit to reduce token consumption.
       
            joshmlewis wrote 6 hours 28 min ago:
            Yes this is one of the techniques apps can use. You vectorize the
            tool description and then do a lookup based on the users query to
            select the most relevant tools, this is called pre-computed
            semantic profiles. You can even hash queries themselves and cache
            tools that were used and then do similarity lookups by query.
       
            vrm wrote 7 hours 30 min ago:
            I definitely see different prompts based on what I'm doing in the
            app. As we mentioned there are different prompts for if you're
            asking questions, doing Cmd-K edits, working in the shell, etc. I'd
            also imagine that they customize the prompt by model (unobserved
            here, but we can also customize per-model using TensorZero and A/B
            test).
       
        CafeRacer wrote 13 hours 0 min ago:
        Soooo.... wireshark is no longer available or something?
       
          vrm wrote 9 hours 54 min ago:
          wireshark would work for seeing the requests from the desktop app to
          Cursor’s servers (which make the actual LLM requests). But if
          you’re interested in what the actual requests to LLMs look like
          from Cursor’s servers you have to set something like this up. Plus,
          this lets us modify the request and A/B test variations!
       
            stavros wrote 7 hours 56 min ago:
            Sorry, can you explain this a bit more? Either you're putting
            something between your desktop to the server (in which case
            Wireshark would work) or you're putting something between Cursor's
            infrastructure and their LLM provider, in which case, how?
       
              vrm wrote 7 hours 32 min ago:
              we're doing the latter! Cursor lets you configure the OpenAI base
              URL so we were able to have Cursor call Ngrok -> Nginx (for auth)
              -> TensorZero -> LLMs. We explain in detail in the blog post.
       
                stavros wrote 7 hours 21 min ago:
                Ah OK, I saw that, but I thought that was the desktop client
                hitting the endpoint, not the server. Thanks!
       
          Maxious wrote 9 hours 58 min ago:
          The article literally says at the end this was just the first post
          about looking before getting into actually changing the responses.
          
          (that being said, mitmproxy has gotten pretty good for just looking
          lately [1] )
          
   URI    [1]: https://docs.mitmproxy.org/stable/concepts/modes/#local-capt...
       
            spmurrayzzz wrote 2 hours 28 min ago:
            Yea the proxying/observability is without question the simplest
            part of this whole problem space. Once you get into the weeds of
            automating all the eval and prompt optimizing, you realize how
            irrelevant wireshark actually is in the feedback loop.
            
            But I also like you landed on mitmproxy as well, after starting
            with tcpdump/wireshark. I recently started building a tiny
            streaming textual gradient based optimizer (similar to what
            adalflow is doing) by parsing the mitmproxy outputs in realtime.
            Having a turnkey solution for this sort of thing will definitely be
            valuable at least in the near to mid term.
       
              vrm wrote 1 hour 31 min ago:
              if you haven't check out our repo -- it's free, fully
              self-hosted, production-grade, and designed for precisely this
              application :)
              
   URI        [1]: https://github.com/TensorZero/tensorzero
       
                spmurrayzzz wrote 43 min ago:
                Looks very buttoned up. My local project has some features
                tuned for my explicit agent flows however (built directly into
                my inference engine), so can't really jump ship just yet.
                
                Looking great so far though!
       
       
   DIR <- back to front page