_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   DIR   Ask HN: Is anyone doing anything cool with tiny language models?
       
       
        merwijas wrote 56 min ago:
        I put llama 3 on a RBPi 5 and have it running a small droid. I added a
        TTS engine so it can hear spoken prompts which it replies to in droid
        speak. It also has a small screen that translates the response to
        English. I gave it a backstory about being a astromech droid so it
        usually just talks about the hyperdrive but it's fun.
       
        sauravpanda wrote 59 min ago:
        We are building a framework to run this tiny language model in the web
        so anyone can access private LLMs in their browser: [1] .
        
        With just three lines of code, you can run Small LLM models inside the
        browser. We feel this unlocks a ton of potential for businesses so that
        they can introduce AI without fear of cost and can personalize the
        experience using AI.
        
        Would love your thoughts and what we can do more or better!
        
   URI  [1]: https://github.com/sauravpanda/BrowserAI
       
        guywithahat wrote 3 hours 5 min ago:
        I've been working on a self-hosted, low-latency service for small
        LLM's. It's basically exactly what I would have wanted when I started
        my previous startup. The goal is for real time applications, where even
        the network time to access a fast LLM like groq is an issue.
        
        I haven't benchmarked it yet but I'd be happy to hear opinions on it.
        It's written in C++ (specifically not python), and is designed to be a
        self-contained microservice based around llama.cpp.
        
   URI  [1]: https://github.com/thansen0/fastllm.cpp
       
        gpm wrote 3 hours 25 min ago:
        I made a shell alias to translate things from French to English, does
        that count?
        
            function trans
            llm "Translate \"$argv\" from French to English please"
            end
        
        Llama 3.2:3b is a fine French-English dictionary IMHO.
       
        dh1011 wrote 3 hours 55 min ago:
        I copied all the text from this post and used an LLM to generate a list
        of all the ideas. I do the same for other similar HN post .
       
          lordswork wrote 3 hours 51 min ago:
          well, what are the ideas?
       
        sidravi1 wrote 3 hours 55 min ago:
        We fine-tuned a Gemma 2B to identify urgent messages sent by new and
        expecting mothers on a government-run maternal health helpline.
        
   URI  [1]: https://idinsight.github.io/tech-blog/blog/enhancing_maternal_...
       
          proxygeek wrote 2 hours 40 min ago:
          Such a fun thread but this is the kind of applications that perk up
          my attention!
          
          Very cool!
       
        linsomniac wrote 4 hours 1 min ago:
        I have this idea that a tiny LM would be good at canonicalizing entered
        real estate addresses.    We currently buy a data set and software from
        Experian, but it feels like something an LM might be very good at. 
        There are lots of weirdnesses in address entry that regexes have a hard
        time with.  We know the bulk of addresses a user might be entering,
        unless it's a totally new property, so we should be able to train it on
        that.
       
        jftuga wrote 4 hours 11 min ago:
        I'm using ollama, llama3.2 3b, and python to shorten news article
        titles to 10 words or less.  I have a 3 column web site with a list of
        news articles in the middle column.  Some of the titles are too long
        for this format, but the shorter titles appear OK.
       
        HexDecOctBin wrote 5 hours 8 min ago:
        Is there any experiments in a small models that does paraphrasing? I
        tried hsing some off-the-shelf models, but it didn't go well.
        
        I was thinking of hooking them in RPGs with text-based dialogue, so
        that a character will say something slightly different every time you
        speak to them.
       
        jwitthuhn wrote 5 hours 14 min ago:
        I've made a tiny ~1m parameter model that can generate random Magic the
        Gathering cards that is largely based on Karpathy's nanogpt with a few
        more features added on top.
        
        I don't have a pre-trained model to share but you can make one yourself
        from the git repo, assuming you have an apple silicon mac.
        
   URI  [1]: https://github.com/jlwitthuhn/TCGGPT
       
        itskarad wrote 5 hours 25 min ago:
        I'm using ollama for parsing and categorizing scraped jobs for a local
        job board dashboard I check everyday.
       
        JLCarveth wrote 5 hours 37 min ago:
        I used a small (3b, I think) model plus tesseract.js to perform OCR on
        an image of a nutritional facts table and output structured JSON.
       
          tigrank wrote 25 min ago:
          All that server side or client?
       
          deivid wrote 1 hour 48 min ago:
          What was the model? What kind of performance did you get out of it?
          
          Could you share a link to your project, if it is public?
       
        codazoda wrote 5 hours 51 min ago:
        I had an LLM create a playlist for me.
        
        I’m tired of the bad playlists I get from algorithms, so I made a
        specific playlist with an Llama2 based on several songs I like. I
        started with 50, removed any I didn’t like, and added more to fill in
        the spaces. The small models were pretty good at this. Now I have a
        decent fixed playlist. It does get “tired” after a few weeks and I
        need to add more to it. I’ve never been able to do this myself with
        more than a dozen songs.
       
          petesergeant wrote 4 hours 41 min ago:
          Interesting! I've sadly found more capable models to really fail on
          music recommendations for me.
       
        kianN wrote 6 hours 27 min ago:
        I don’t know if this counts as tiny but I use llama 3B in prod for
        summarization (kinda).
        
        Its effective context window is pretty small but I have a much more
        robust statistical model that handles thematic extraction. The llm is
        essentially just rewriting ~5-10 sentences into a single paragraph.
        
        I’ve found the less you need the language model to actually do, the
        less the size/quality of the model actually matters.
       
        jothflee wrote 6 hours 28 min ago:
        when i feel like casually listening to something, instead of
        netflix/hulu/whatever, i'll run a ~3b model (qwen 2.5 or llama 3.2) and
        generate and audio stream of water cooler office gossip. (when it is
        up, it runs here: [1] ).
        
        some of the situations get pretty wild, for the office :)
        
   URI  [1]: https://water-cooler.jothflee.com
       
          jftuga wrote 4 hours 5 min ago:
          What prompt are you using for this?
       
        spiritplumber wrote 6 hours 49 min ago:
        My husband and me made a stock market analysis thing that gets it right
        about 55% of the time, so better than a coin toss. The problem is that
        it keeps making unethical suggestions, so we're not using it to trade
        stock. Does anyone have any idea what we can do with that?
       
          febed wrote 1 hour 15 min ago:
          What data do you analyze?
       
          bongodongobob wrote 5 hours 26 min ago:
          You can literally flip coins and get better than 50% success in a
          bull market. Just buy index funds and spend your time on something
          that isn't trying to beat entropy. You won't be able to.
       
            spiritplumber wrote 2 hours 44 min ago:
            INSUFFICIENT DATA FOR A MEANINGFUL ANSWER.
       
          dkga wrote 5 hours 56 min ago:
          Suggestion: calculate the out-of-sample Sharpe ratio[0] of the
          suggestions over a reasonable period to gauge how good the model
          would actually perform in terms of return compared to risks. It is
          better than vanilla accuracy or related metrics. Source: I'm a
          financial economist.
          
          [0]:
          
   URI    [1]: https://en.wikipedia.org/wiki/Sharpe_ratio
       
            spiritplumber wrote 4 hours 9 min ago:
            thank you! that's exactly the sort of thing I don't know.
       
          Etheryte wrote 6 hours 24 min ago:
          Have you backtested this in times when markets were not constantly
          green? Nearly any strategy is good in the good times.
       
            spiritplumber wrote 4 hours 9 min ago:
            yep. the 55% is over a few years.
       
          bobbygoodlatte wrote 6 hours 40 min ago:
          I'm curious what sort of unethical suggestions it's coming up with
          haha
       
            spiritplumber wrote 4 hours 9 min ago:
            so far, mostly buying companies owned/ran by horrible people.
       
        danbmil99 wrote 6 hours 53 min ago:
        Using llama 3.2 as an interface to a robot. If you can get the latency
        down, it works wonderfully
       
          mentos wrote 3 hours 3 min ago:
          Would love to see this applied to a FPS bot in unreal engine.
       
        jmward01 wrote 6 hours 54 min ago:
        I think I am. At least I think I'm building things that will enable
        much smaller models:
        
   URI  [1]: https://github.com/jmward01/lmplay/wiki/Sacrificial-Training
       
        juancroldan wrote 6 hours 57 min ago:
        I'm making an agent that takes decompiled code and tries to understand
        the methods and replace variables and function names one at a time.
       
        Evidlo wrote 6 hours 59 min ago:
        I have ollama responding to SMS spam texts.  I told it to feign
        interest in whatever the spammer is selling/buying.  Each number gets
        its own persona, like a millennial gymbro or 19th century British
        gentleman. [1]
        
   URI  [1]: http://files.widloski.com/image10%20(1).png
   URI  [2]: http://files.widloski.com/image11.png
       
          merpkz wrote 4 min ago:
          Calling Jessica an old chap is quite a giveaway that it's a bot xD
          Nice idea indeed, but I have a feeling that it's just two LLMs now
          conversing with each other.
       
          metadat wrote 1 hour 17 min ago:
          I love this, more please!!!
       
          blackeyeblitzar wrote 5 hours 13 min ago:
          You realize this is going to cause carriers to allow the number to
          send more spam, because it looks like engagement. The best thing to
          do is to report the offending message to 7726 (SPAM) so the carrier
          can take action. You can also file complaints at the FTC and FCC
          websites, but that takes a bit more effort.
       
          thecosmicfrog wrote 6 hours 18 min ago:
          Please tell me you have a blog/archive of these somewhere. This was
          such a joy to read!
       
          celestialcheese wrote 6 hours 28 min ago:
          Given the source, I'm skeptical it's not just a troll, but found this
          explanation [0] plausible as to why those vague spam text exists.  If
          true, this trolling helps the spammers warm those phone numbers up.
          
          0 -
          
   URI    [1]: https://x.com/nikitabier/status/1867029883387580571
       
            stogot wrote 5 hours 43 min ago:
            Why does STOP work here?
       
              inerte wrote 5 hours 37 min ago:
              Carriers and SMS service providers (like Twillio) obey that, no
              matter what service is behind.
              
              There are stories of people replying STOP to spam, then never
              getting a legit SMS because the number was re-used by another
              service. That's because it's being blocked between the spammer
              and the phone.
       
              celestialcheese wrote 5 hours 38 min ago:
               [1] Again, no clue if this is true, but it seems plausible.
              
   URI        [1]: https://x.com/nikitabier/status/1867069169256308766
       
          zx8080 wrote 6 hours 47 min ago:
          Cool! Do you consider the risk of unintentional (and until some
          moment, an unknown) subscription to some paid SMS service and how do
          you mitigate it?
       
            Evidlo wrote 6 hours 41 min ago:
            I have to whitelist a conversation before the LLM can respond.
       
          RVuRnvbM2e wrote 6 hours 55 min ago:
          This is fantastic. How have your hooked up a mobile number to the
          llm?
       
            Evidlo wrote 6 hours 44 min ago:
            Android app that forwards to a Python service on remote workstation
            over MQTT.  I can make a Show HN if people are interested.
       
              dkga wrote 6 hours 0 min ago:
              Yes, I'd be interested in that!
       
              deadbabe wrote 6 hours 28 min ago:
              I’d love to see that. Could you simulate iMessage?
       
                great_psy wrote 5 hours 26 min ago:
                Yes it’s possible, but it’s not something you can easily
                scale.
                
                I had a similar project a few years back that used OSX
                automations and Shortcuts and Python to send a message everyday
                to a friend. It required you to be signed in to iMessage on
                your MacBook.
                
                Than was a send operation, the reading of replies is not
                something I implemented, but I know there is a file somewhere
                that holds a history of your recent iMessages. So you would
                have to parse it on file update and that should give you the
                read operation so you can have a conversation.
                
                Very doable in a few hours unless something dramatic changed
                with how the messages apps works within the last few years.
       
                Evidlo wrote 6 hours 4 min ago:
                If you mean hook this into iMessage, I don't know.  I'm willing
                to bet it's way harder though because Apple
       
            spiritplumber wrote 6 hours 48 min ago:
            For something similar with FB chat, I use Selenium and run it on
            the same box that the llm is running on. Using multiple
            personalities is really cool though. I should update mine likewise!
       
        antonok wrote 7 hours 2 min ago:
        I've been using Llama models to identify cookie notices on websites,
        for the purpose of adding filter rules to block them in EasyList
        Cookie. Otherwise, this is normally done by, essentially, manual
        volunteer reporting.
        
        Most cookie notices turn out to be pretty similar, HTML/CSS-wise, and
        then you can grab their `innerText` and filter out false positives with
        a small LLM. I've found the 3B models have decent performance on this
        task, given enough prompt engineering. They do fall apart slightly
        around edge cases like less common languages or combined cookie notice
        + age restriction banners. 7B has a negligible false-positive rate
        without much extra cost. Either way these things are really fast and
        it's amazing to see reports streaming in during a crawl with no human
        effort required.
        
        Code is at [1] . You can see the prompt at [1]
        /blob/main/src/text-cl....
        
   URI  [1]: https://github.com/brave/cookiemonster
   URI  [2]: https://github.com/brave/cookiemonster/blob/main/src/text-clas...
       
          bazmattaz wrote 6 hours 56 min ago:
          This is so cool thanks for sharing. I can imagine it’s not
          technically possible (yet?) but it would be cool if this could simply
          be run as a browser extension rather than running a docker container
       
            throwup238 wrote 3 hours 6 min ago:
            It should be possible using native messaging [1] which can call out
            to an external binary. The 1password extensions use that to
            communicate with the password manager binary.
            
   URI      [1]: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/W...
       
            antonok wrote 6 hours 54 min ago:
            I did actually make a rough proof-of-concept of this! One of my
            long-term visions is to have it running natively in-browser, and
            able to automatically fix site issues caused by adblocking whenever
            they happen.
            
            The PoC is a bit outdated but it's here:
            
   URI      [1]: https://github.com/brave/cookiemonster/tree/webext
       
          binarysneaker wrote 6 hours 56 min ago:
          Maybe it could also send automated petitions to the EU to undo cookie
          consent legislation, and reverse some of the enshitification.
       
            sebastiennight wrote 2 hours 51 min ago:
            To me this take is like smokers complaining that the evil
            government is forcing the good tobacco companies to degrade the
            experience by adding pictures of cancer patients on cigarette
            packs.
       
            K0balt wrote 6 hours 16 min ago:
            I think there is real potential here, for smart browsing. Have the
            llm get the page, replace all the ads with kittens, find
            non-paywall versions if possible and needed, spoof fingerprint
            data, detect and highlight AI generated drivel, etc. The site would
            have no way of knowing that it wasn’t touching eyeballs.  We
            might be able to rake back a bit of the web this way.
       
              antonok wrote 6 hours 1 min ago:
              You probably wouldn't want to run this in real-time on every site
              as it'll significantly increase the load on your browser, but as
              long as it's possible to generate adblock filter rules, the fixes
              can scale to a pretty large audience.
       
                K0balt wrote 4 hours 29 min ago:
                I was thinking running it in my home lab server as a proxy, but
                yeah, scaling it to the browser would require some pretty
                strong hardware. Still, maybe in a couple of years it could be
                mainstream.
       
            antonok wrote 6 hours 50 min ago:
            Ha, I'm not sure the EU is prepared to handle the deluge of
            petitions that would ensue.
            
            On a more serious note, this must be the first time we can
            quantitatively measure the impact of cookie consent legislation
            across the web, so maybe there's something to be explored there.
       
        thetrash wrote 7 hours 18 min ago:
        I programmed my own version of Tic Tac Toe in Godot, using a Llama 3B
        as the AI opponent. Not for work flow, but figuring out how to beat it
        is entertaining during moments of boredom.
       
          spiritplumber wrote 6 hours 42 min ago:
          Number of players: zero
          
          U.S. FIRST STRIKE      WINNER: NONE
          
          USSR FIRST STRIKE      WINNER: NONE
          
          NATO / WARSAW PACT      WINNER: NONE
          
          FAR EAST STRATEGY      WINNER: NONE
          
          US USSR ESCALATION      WINNER: NONE
          
          MIDDLE EAST WAR       WINNER: NONE
          
          USSR CHINA ATTACK      WINNER: NONE
          
          INDIA PAKISTAN WAR      WINNER: NONE
          
          MEDITERRANEAN WAR      WINNER: NONE
          
          HONGKONG VARIANT      WINNER: NONE
          
          Strange game. The only winning move is not to play
       
        cwmoore wrote 7 hours 22 min ago:
        I'm playing with the idea of identifying logical fallacies stated by
        live broadcasters.
       
          JayStavis wrote 2 hours 52 min ago:
          Automation to identify logical/rhetorical fallacies is a long held
          dream of mine, would love to follow along with this project if it
          picks up somehow
       
          petesergeant wrote 4 hours 40 min ago:
          I'll be very positively impressed if you make this work; I spend all
          day every day for work trying to make more capable models perform
          basic reasoning, and often failing :-P
       
          genewitch wrote 5 hours 26 min ago:
          I have several rhetoric and logic books of the sort you might use for
           training or whatever, and one of my best friends got a doctorate in
          a tangential field, and may have materials and insights.
          
          We actually just threw a relationship curative app online in 17 hours
          around Thanksgiving., so they "owe" me, as it were.
          
          I'm one of those people that can do  anything practical with tech and
          the like, but I have no imagination for it - so when someone mentions
          something that I think would be beneficial for my fellow humans I get
          this immense desire to at least cheer on if not ask to help.
       
          spiritplumber wrote 6 hours 42 min ago:
          That's fantastic and I'd love to help
       
            cwmoore wrote 6 hours 31 min ago:
            So far not much beyond this list of targets to identify
            
   URI      [1]: https://en.wikipedia.org/wiki/List_of_fallacies
       
        nozzlegear wrote 7 hours 29 min ago:
        I have a small fish script I use to prompt a model to generate three
        commit messages based off of my current git diff. I'm still playing
        around with which model comes up with the best messages, but usually I
        only use it to give me some ideas when my brain isn't working. All the
        models accomplish that task pretty well.
        
        Here's the script: [1] And for this change [1] it generated these
        messages:
        
            1. `fix: change from printf to echo for handling git diff input`
            
            2. `refactor: update codeblock syntax in commit message generator`
            
            3. `style: improve readability by adjusting prompt formatting` [1]
        
   URI  [1]: https://github.com/nozzlegear/dotfiles/blob/master/fish-functi...
   URI  [2]: https://github.com/nozzlegear/dotfiles/commit/0db65054524d0d2e...
       
          mentos wrote 3 hours 6 min ago:
          Awesome need to make one for naming variables too haha
       
        deivid wrote 7 hours 45 min ago:
        Not sure it qualifies, but I've started building an Android app that
        wraps bergamot[0] (the firefox translation models) to have on-device
        translation without reliance on google.
        
        Bergamot is already used inside firefox, but I wanted translation also
        outside the browser.
        
        [0]: bergamot
        
   URI  [1]: https://github.com/browsermt/bergamot-translator
       
          deivid wrote 1 hour 49 min ago:
          I would be very interested if someone is aware of any small/tiny
          models to perform OCR, so the app can translate pictures as well
       
        ata_aman wrote 7 hours 51 min ago:
        I have it running on a Raspberry Pi 5 for offline chat and RAG.
        I wrote this open-source code for it: [1] It also does RAG on apps
        there, like the music player, contacts app and to-do app. I can ask it
        to recommend similar artists to listen to based on my music library for
        example or ask it to quiz me on my PDF papers.
        
   URI  [1]: https://github.com/persys-ai/persys
       
          nejsjsjsbsb wrote 4 hours 25 min ago:
          Does [1] run on the rpi?
          
          Is that design 3d printable? Or is that for paid users only.
          
   URI    [1]: https://github.com/persys-ai/persys-server
       
            ata_aman wrote 3 hours 18 min ago:
            I can publish it no problem. I’ll create a new repo with
            instructions for the hardware with CAD files.
            
            Designing a new one for the NVIDIA Orin Nano Super so it might take
            a few days.
       
        kristopolous wrote 8 hours 4 min ago:
        I'm working on using them for agentic voice commands of a limited
        scope.
        
        My needs are narrow and limited but I want a bit of flexibility.
       
        simonjgreen wrote 8 hours 26 min ago:
        Micro Wake Word is a library and set of on device models for ESPs to
        wake on a spoken wake word. [1] Recently deployed in Home Assistants
        fully local capable Alexa replacement.
        
   URI  [1]: https://github.com/kahrendt/microWakeWord
   URI  [2]: https://www.home-assistant.io/voice_control/about_wake_word/
       
          yzydserd wrote 25 min ago:
          I live stream meeting audio to text. When a variant of my name
          appears it Slacks me the last 20 seconds of the transcript. He
          solution allows me to pay even less attention than I normally would.
          Using wake word detection on the audio is a good idea to strengthen
          the process!
       
        flippyhead wrote 8 hours 47 min ago:
        I have a tiny device that listens to conversations between two people
        or more and constantly tries to declare a "winner"
       
          deivid wrote 8 min ago:
          what model do you use for speech to text?
       
          prakashn27 wrote 2 hours 45 min ago:
          wifey always wins. ;)
       
          nejsjsjsbsb wrote 4 hours 31 min ago:
          All computation on device?
       
          mkaic wrote 6 hours 32 min ago:
          This reminds me of the antics of streamer DougDoug, who often uses
          LLM APIs to live-summarize, analyze, or interact with his (often
          multi-thousand-strong) Twitch chat. Most recently I saw him do a
          GeoGuessr stream where he had ChatGPT assume the role of a detective
          who must comb through the thousands of chat messages for clues about
          where the chat thinks the location is, then synthesizes the clamor
          into a final guess. Aside from constantly being trolled by people
          spamming nothing but "Kyoto, Japan" in chat, it occasionaly
          demonstrated a pretty effective incarnation of "the wisdom of the
          crowd" and was strikingly accurate at times.
       
          eddd-ddde wrote 7 hours 18 min ago:
          I love that there's not even a vague idea of the winner "metric" in
          your explanation. Like it's just, _the_ winner.
       
          hn8726 wrote 7 hours 42 min ago:
          What approach/stack would you recommend for listening to an ongoing
          conversation, transcribing it and passing through llm? I had some use
          cases in mind but I'm not very familiar with AI frameworks and tools
       
          jjcm wrote 8 hours 22 min ago:
          Are you raising a funding round? I'm bought in. This is hilarious.
       
          amelius wrote 8 hours 23 min ago:
          You can use the model to generate winning speeches also.
       
          econ wrote 8 hours 25 min ago:
          This is a product I want
       
          oa335 wrote 8 hours 41 min ago:
          This made me actually laugh out loud.  Can you share more details on
          hardware and models used?
       
          pseudosavant wrote 8 hours 42 min ago:
          I'd love to hear more about the hardware behind this project. I've
          had concepts for tech requiring a mic on me at all times for various
          reasons. Always tricky to have enough power in a reasonable DIY form
          factor.
       
        mritchie712 wrote 9 hours 8 min ago:
        I used local LLMs via Ollama for generating H1's / marketing copy.
        
        1. Create several different personas
        
        2. Generate a ton of variation using a high temperature
        
        3. Compare the variagtions head-to-head using the LLM to get a win /
        loss ratio
        
        The best ones can be quite good.
        
        0 -
        
   URI  [1]: https://www.definite.app/blog/overkillm
       
          UltraSane wrote 4 hours 16 min ago:
          clever name!
       
        ignoramous wrote 9 hours 8 min ago:
        We're prototyping a text firewall (for Android) with Gemma2 2B (which
        limits us to English), though DeepSeek's R1 variants now look pretty
        promising [0]: Depending on the content, we rewrite the text or
        quarantine it from your view. Of course this is easy (for English) in
        the sense that the core logic is all LLMs [1], but the integration
        points (on Android) are not so straight forward for anything other than
        SMS. [2] A more difficult problem we forsee is to turn it into a
        real-time (online) firewall (for calls, for example). [1] MediaPipe in
        particular makes it simple to prototype around Gemma2 on Android: [2]
        Intend to open source it once we get it working for anything other than
        SMSes
        
   URI  [1]: https://chat.deepseek.com/a/chat/s/d5aeeda1-fefe-4fc6-8c90-20e...
   URI  [2]: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_infer...
       
        deet wrote 9 hours 31 min ago:
        We (avy.ai) are using models in that range to analyze computer activity
        on-device, in a privacy sensitive way, to help knowledge workers as
        they go about their day.
        
        The local models do things ranging from cleaning up OCR, to summarizing
        meetings, to estimating the user's current goals and activity, to
        predicting search terms, to predicting queries and actions that, if
        run, would help the user accomplish their current task.
        
        The capabilities of these tiny models have really surged recently. Even
        small vision models are becoming useful, especially if fine tuned.
       
        A4ET8a8uTh0_v2 wrote 9 hours 41 min ago:
        Kinda? All local so very much personal, non-business use. I made Ollama
        talk in a specific persona styles with the idea of speaking like Spider
        Jerusalem, when I feel like retaining some level of privacy by avoiding
        phrases I would normally use. Uncensored llama just rewrites my post
        with a specific persona's 'voice'. Works amusingly well for that
        purpose.
       
        eb0la wrote 9 hours 55 min ago:
        We're using small language models to detect prompt injection. Not too
        cool, but at least we can publish some AI-related stuff on the internet
        without a huge bill.
       
          sitkack wrote 8 hours 49 min ago:
          What kind of prompt injection attacks do you filter out? Have you
          tested with a prompt tuning framework?
       
        behohippy wrote 10 hours 2 min ago:
        I have a mini PC with an n100 CPU connected to a small 7" monitor
        sitting on my desk, under the regular PC.  I have llama 3b (q4)
        generating endless stories in different genres and styles.  It's fun to
        glance over at it and read whatever it's in the middle of making.  I
        gave llama.cpp one CPU core and it generates slow enough to just read
        at a normal pace, and the CPU fans don't go nuts.  Totally not
        productive or really useful but I like it.
       
          droideqa wrote 5 hours 6 min ago:
          That's awesome!
       
          ipython wrote 8 hours 13 min ago:
          That's neat. I just tried something similar:
          
              FORTUNE=$(fortune) && echo $FORTUNE && echo "Convert the
          following output of the Unix `fortune` command into a small
          screenplay in the style of Shakespeare: \n\n $FORTUNE" | ollama run
          phi4
       
          keeganpoppen wrote 8 hours 15 min ago:
          oh wow that is actually such a brilliant little use case-- really
          cuts to the core of the real "magic" of ai: that it can just keep
          running continuously. it never gets tired, and never gets tired of
          thinking.
       
          Uehreka wrote 9 hours 39 min ago:
          Do you find that it actually generates varied and diverse stories? Or
          does it just fall into the same 3 grooves?
          
          Last week I tried to get an LLM (one of the recent Llama models
          running through Groq, it was 70B I believe) to produce randomly
          generated prompts in a variety of styles and it kept producing
          cyberpunk scifi stuff. When I told it to stop doing cyberpunk scifi
          stuff it went completely to wild west.
       
            TMWNN wrote 1 hour 46 min ago:
            > Do you find that it actually generates varied and diverse
            stories? Or does it just fall into the same 3 grooves?
            
            > Last week I tried to get an LLM (one of the recent Llama models
            running through Groq, it was 70B I believe) to produce randomly
            generated prompts in a variety of styles and it kept producing
            cyberpunk scifi stuff.
            
            100% relevant: "Someday" < [1] > by Isaac Asimov, 1956
            
   URI      [1]: https://en.wikipedia.org/wiki/Someday_(short_story)
       
            coder543 wrote 4 hours 26 min ago:
            Someone mentioned generating millions of (very short) stories with
            an LLM a few weeks ago: [1] They linked to an interactive explorer
            that nicely shows the diversity of the dataset, and the HF repo
            links to the GitHub repo that has the code that generated the
            stories: [2] So, it seems there are ways to get varied stories.
            
   URI      [1]: https://news.ycombinator.com/item?id=42577644
   URI      [2]: https://github.com/lennart-finke/simple_stories_generate
       
            janalsncm wrote 8 hours 3 min ago:
            Generate a list of 5000 possible topics you’d like it to talk
            about. Randomly pick one and inject that into your prompt.
       
            o11c wrote 9 hours 24 min ago:
            You should not ever expect an LLM to actually do what you want
            without handholding, and randomness in particular is one of the
            places it fails badly. This is probably fundamental.
            
            That said, this is also not helped by the fact that all of the
            default interfaces lack many essential features, so you have to
            build the interface yourself. Neither "clear the context on every
            attempt" nor "reuse the context repeatedly" will give good results,
            but having one context producing just one-line summaries, then
            fresh contexts expanding each one will do slightly less badly.
            
            (If you actually want the LLM to do something useful, there are
            many more things that need to be added beyond this)
       
              dotancohen wrote 8 hours 9 min ago:
              Sounds to me like you might want to reduce the Top P - that will
              prevent the really unlikely next tokens from ever being selected,
              while still providing nice randomness in the remaining next
              tokens so you continue to get diverse stories.
       
          bithavoc wrote 9 hours 48 min ago:
          this is so cool, any chance you post a video?
       
          Dansvidania wrote 9 hours 54 min ago:
          this sounds pretty cool, do you have any video/media of it?
       
        azhenley wrote 10 hours 9 min ago:
        Microsoft published a paper on their FLAME model (60M parameters) for
        Excel formula repair/completion which outperformed much larger models
        (>100B parameters).
        
   URI  [1]: https://arxiv.org/abs/2301.13779
       
          coder543 wrote 3 hours 28 min ago:
          That paper is from over a year ago, and it compared against
          codex-davinci... which was basically GPT-3, from what I understand.
          Saying >100B makes it sound a lot more impressive than it is in
          today's context... 100B models today are a lot more capable. The
          researchers also compared against a couple of other
          ancient(/irrelevant today), small models that don't give me much
          insight.
          
          FLAME seems like a fun little model, and 60M is truly tiny compared
          to other LLMs, but I have no idea how good it is in today's context,
          and it doesn't seem like they ever released it.
       
          3abiton wrote 8 hours 54 min ago:
          But I feel we're going back full circle. These small models are not
          generalist, thus not really LLMs at least in terms of objective.
          Recently there has been a rise of "specialized" models that provide
          lots of values, but that's not why we were sold on LLMs.
       
            janalsncm wrote 7 hours 47 min ago:
            I think playing word games about what really counts as an LLM is a
            losing battle. It has become a marketing term, mostly. It’s
            better to have a functionalist point of view of “what can this
            thing do”.
       
            Suppafly wrote 7 hours 58 min ago:
            Specialized models work much better still for most stuff. Really we
            need an LLM to understand the input and then hand it off to a
            specialized model that actually provides good results.
       
            colechristensen wrote 8 hours 43 min ago:
            But that's the thing, I don't need my ML model to be able to write
            me a sonnet about the history of beets, especially if I want to run
            it at home for specific tasks like as a programming assistant.
            
            I'm fine with and prefer specialist models in most cases.
       
              zeroCalories wrote 7 hours 15 min ago:
              I would love a model that knows SQL really well so I don't need
              to remember all the small details of the language. Beyond that, I
              don't see why the transformer architecture can't be applied to
              any problem that needs to predict sequences.
       
                dr_kiszonka wrote 5 hours 57 min ago:
                The trick is to find such problems with enough training data
                and some market potential. I am terrible at it.
       
          andai wrote 9 hours 29 min ago:
          This is wild. They claim it was trained exclusively on Excel
          formulas, but then they mention retrieval? Is it understanding the
          connection between English and formulas? Or am I misunderstanding
          retrieval in this context?
          
          Edit: No, the retrieval is Formula-Formula, the model (nor I believe
          tokenizer) does not handle English.
       
          barrenko wrote 9 hours 45 min ago:
          This is really cool. Is this already in Excel?
       
        arionhardison wrote 10 hours 11 min ago:
        I am, in a way by using EHR/EMR data for fine tuning so agents can
        query each other for medical records in a HIPPA compliant manner.
       
        Havoc wrote 10 hours 15 min ago:
        Pretty sure they are mostly used as fine tuning targets, rather than
        as-is.
       
          dcl wrote 9 hours 9 min ago:
          But for what purposes?
       
        iamnotagenius wrote 10 hours 17 min ago:
        No, but I use llama 3.2 1b and qwen2.5 1.5 as bash oneliner generator,
        always runnimg in console.
       
          XMasterrrr wrote 8 hours 20 min ago:
          What's your workflow like? I use AI Chat. I load
          Qwen2.5-1.5B-Instruct with llama.cpp server, fully offloaded to the
          CPU, and then I config AI Chat to connect to the llama.cpp endpoint.
       
          andai wrote 9 hours 29 min ago:
          Could you elaborate?
       
            XMasterrrr wrote 8 hours 20 min ago:
            I think I know what he means. I use AI Chat. I load
            Qwen2.5-1.5B-Instruct with llama.cpp server, fully offloaded to the
            CPU, and then I config AI Chat to connect to the llama.cpp
            endpoint.
            
            Checkout the demo they have below
            
   URI      [1]: https://github.com/sigoden/aichat#shell-assistant
       
        RhysU wrote 10 hours 22 min ago:
        "Comedy Writing With Small Generative Models" by Jamie Brew (Strange
        Loop 2023) [1] Spend the 45 minutes watching this talk. It is a
        delight. If you are unsure, wait until the speaker picks up the guitar.
        
   URI  [1]: https://m.youtube.com/watch?v=M2o4f_2L0No
       
          100k wrote 10 hours 19 min ago:
          Seconded! This was my favorite talk at Strange Loop (including my
          own).
       
        mettamage wrote 10 hours 42 min ago:
        I simply use it to de-anonymize code that I typed in via Claude
        
        Maybe should write a plugin for it (open source):
        
        1. Put in all your work related questions in the plugin, an LLM will
        make it as an abstract question for you to preview and send it
        
        2. And then get the answer with all the data back
        
        E.g. df[“cookie_company_name”] becomes df[“a”] and back
       
          sundarurfriend wrote 3 hours 8 min ago:
          You're using it to anonymize your code, not de-anonymize someone's
          code. I was confused by your comment until I read the replies and
          realized that's what you meant to say.
       
          sauwan wrote 8 hours 40 min ago:
          Are you using the model to create a key-value pair to find/replace
          and then reverse to reanonymize, or are you using its outputs
          directly? If the latter, is it fast enough and reliable enough?
       
          sitkack wrote 8 hours 50 min ago:
          So you are using a local small model to remove identifying
          information and make the question generic, which is then sent to a
          larger model? Is that understanding correct?
          
          I think this would have some additional benefits of not confusing the
          larger model with facts it doesn't need to know about. My erasing
          information, you can allow its attention heads to focus on the pieces
          that matter.
          
          Requires further study.
       
          politelemon wrote 10 hours 35 min ago:
          Could you recommend a tiny language model I could try out locally?
       
            mettamage wrote 10 hours 19 min ago:
            Llama 3.2 has about 3.2b parameters. I have to admit, I use bigger
            ones like phi-4 (14.7b) and Llama 3.3 (70.6b) but I think Llama 3.2
            could do de-anonimization and anonimization of code
       
              RicoElectrico wrote 9 hours 38 min ago:
              Llama 3.2 punches way above its weight. For general "language
              manipulation" tasks it's good enough - and it can be used on a
              CPU with acceptable speed.
       
                seunosewa wrote 8 hours 42 min ago:
                How many tokens/s?
       
              OxfordOutlander wrote 10 hours 7 min ago:
              +1 this idea. I do the same. Just do it locally using ollama,
              also using 3.2 3b
       
        psyklic wrote 10 hours 54 min ago:
        JetBrains' local single-line autocomplete model is 0.1B (w/ 1536-token
        context, ~170 lines of code): [1] For context, GPT-2-small is 0.124B
        params (w/ 1024-token context).
        
   URI  [1]: https://blog.jetbrains.com/blog/2024/04/04/full-line-code-comp...
       
          staticautomatic wrote 7 hours 55 min ago:
          Is that why their tab completion is so bad now?
       
          pseudosavant wrote 8 hours 37 min ago:
          I wonder how big that model is in RAM/disk. I use LLMs for FFMPEG all
          the time, and I was thinking about training a model on just the
          FFMPEG CLI arguments. If it was small enough, it could be a package
          for FFMPEG. e.g. `ffmpeg llm "Convert this MP4 into the latest
          royalty-free codecs in an MKV."`
       
            binary132 wrote 6 hours 30 min ago:
            That’s a great idea, but I feel like it might be hard to get it
            to be correct enough
       
            maujim wrote 7 hours 24 min ago:
            from a few days ago:
            
   URI      [1]: https://news.ycombinator.com/item?id=42706637
       
            h0l0cube wrote 7 hours 49 min ago:
            Please submit a blog post to HN when you're done.  I'd be curious
            to know the most minimal LLM setup needed get consistently sane
            output for FFMPEG parameters.
       
            jedbrooke wrote 8 hours 26 min ago:
            the jetbrains models are about 70MB zipped on disk (one model per
            language)
       
          smaddox wrote 8 hours 48 min ago:
          You can train that size of a model on ~1 billion tokens in ~3 minutes
          on a rented 8xH100 80GB node (~$9/hr on Lambda Labs, RunPod io, etc.)
          using the NanoGPT speed run repo: [1] For that short of a run, you'll
          spend more time waiting for the node to come up, downloading the
          dataset, and compiling the model, though.
          
   URI    [1]: https://github.com/KellerJordan/modded-nanogpt
       
          WithinReason wrote 10 hours 13 min ago:
          That size is on the edge of something you can train at home
       
            Sohcahtoa82 wrote 7 hours 9 min ago:
            Not even on the edge.  That's something you could train on a 2 GB
            GPU.
            
            The general guidance I've used is that to train a model, you need
            an amount of RAM (or VRAM) equal to 8x the number of parameters, so
            a 0.125B model would need 1 GB of RAM to train.
       
            vineyardmike wrote 9 hours 28 min ago:
            If you have modern hardware, you can absolutely train that at home.
            Or very affordable on a cloud service.
            
            I’ve seen a number of “DIY GPT-2” tutorials that target this
            sweet spot. You won’t get amazing results unless you want to
            leave a personal computer running for a number of hours/days and
            you have solid data to train on locally, but fine-tuning should be
            in the realm of normal hobbyists patience.
       
              nottorp wrote 9 hours 0 min ago:
              Hmm is there anything reasonably ready made* for this spot?
              Training and querying a llm locally on an existing codebase?
              
              * I don't mind compiling it myself but i'd rather not write it.
       
       
   DIR <- back to front page