gopher://codevoid.de/1/hn/comments

        _______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Search-R1: Training LLMs to Reason and Leverage Search Engines with RL
       
       
        abidhusain wrote 4 hours 14 min ago:
        Leveraging reinforcement learning (RL) for LLMs is a fascinating
        evolution in search technology. The potential for improving search
        engines to reason intelligently and process data in real-time could
        revolutionize the entire industry.
       
        vessenes wrote 5 hours 3 min ago:
        A couple of comments. Whatâs not that interesting here is that adding
        search to an LLM increases accuracy â this is known, and largely
        implemented via RAG or other search pipelines which then stuff
        information into the context.
        
        What might be interesting here is that they are thinking about
        taxonomic tool use-cases, and exploring training and therefore
        optimizing the utilization of them.
        
        This to me is a proof of concept â an interesting one, but just a
        proof of concept. You can see from their example search that the model
        over-relied on search; it didnât need to re-search three times to get
        the answer.
        
        A next step that I think would be useful would be updating the reward
        function to penalize search; pressing the model to use search when it
        needs to and not before. This to me is a likely framework going forward
        where MCP tool costing matters, and would be really useful to have in
        the next gen of tool calling LLMs.
        
        In the case of search weâd hopefully get a really useful signal and
        outcome for times the model is unsure â it would call a friend, and
        get good info! And for times itâs sure, weâd have taught it not to
        waste reward on that.
       
        DeathArrow wrote 7 hours 50 min ago:
        Can someone ELI5 how reinforcement learning works with transformer
        based architecture?
       
        deepsquirrelnet wrote 8 hours 9 min ago:
        This is pretty cool. I have a similar model thatâs 8 days into
        training on msmarco.
        
        So far I only have the âcold startâ data posted, but Iâm planning
        on posting a full distillation dataset.
        
   URI  [1]: https://huggingface.co/datasets/dleemiller/lm25
       
          jacobgorm wrote 2 hours 37 min ago:
          What kind of hardware setup would be needed to replicate the
          paperâs results?
       
        0xlogk wrote 8 hours 42 min ago:
        The paper mentions they used Wikipedia as search corpus. The repo
        states they plan to expand to Google, Bing APIs. I wonder how they will
        handle evolving search corpora, ie. if continual RL updates will be
        needed.
       
        sachinaag wrote 12 hours 2 min ago:
        I wonder if Perplexity uses similar methods under the hood or if it is
        a completely different approach.
       
          mrklol wrote 7 hours 42 min ago:
          I feel like most of these services simply take your prompt and ask a
          model for search queries regarding that prompt. Then add the
          resulting pages into the context.
       
        perbu wrote 12 hours 38 min ago:
        This is the magical thing that happens when AI research happens in the
        open. Deepseek published their model and their methodology and then the
        nice people at the University of Illinois are able to build on it.
        
        When OpenAI was launched this is what I thought it was going to be
        like. Something, something for the betterment of man kind.
       
          c16 wrote 11 hours 45 min ago:
          I'm always surprised at how many LLM research papers are published on
          here, so despite OpenAI, I think it's absolutely happening.
       
            NitpickLawyer wrote 5 hours 41 min ago:
            Unfortunately the "open"AI effect is starting to show in other labs
            as well. DeepMind recently announced a min 6months delay in
            publishing their SotA research, to give them a market advantage. I
            get it, but it's sad that it's happening.
            
            The good thing is that there are a lot of companies out there that
            want to make a name for themselves. Mistral started like that with
            Apache 2.0 models, now ds w/ MIT models, and so on. And if the past
            year is a good indicator, it seems that closed SotA to open
            close-to-SotA is 6-3 months. So that's good.
            
            I also find interesting LeCun's take that "there is no closed
            source moat, or not for long". In a podcast he went into detail on
            this, saying that "people move companies, and people talk". If
            someone finds some secret sauce, the ideas will move around and
            other labs will catch up quickly. So there's some hope.
       
       
   DIR <- back to front page