_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   DIR   Ask HN: Is anyone doing anything cool with tiny language models?
       
       
        lightning19 wrote 22 hours 36 min ago:
        not sure if it is cool but, purely out of spite, I'm building a LLM
        summarizer app to compete with a AI startup that I interviewed with.
        The founders were super egotistical and initially thought I was not
        worthy of an interview.
       
        herol3oy wrote 23 hours 3 min ago:
        I've created Austen [0] to generate relationships between book
        characters using Mermaid.
        
        [0]
        
   URI  [1]: https://github.com/herol3oy/austen
       
        sebazzz wrote 23 hours 17 min ago:
        I built auto-summarization and grouping in an experimental branch of my
        hobby-retrospective tool: [1] I’m now just wondering if there is any
        way to build tests on the input+output of the LLM :D
        
   URI  [1]: https://github.com/Sebazzz/Return/tree/experiment/ai-integrati...
       
        panchicore3 wrote 23 hours 51 min ago:
        I am moderating a playlists manager to restrict them to a range of
        genders so it classifies song requests as accepted/rejected.
       
        accrual wrote 1 day ago:
        Although there are better ways to test, I used a 3B model to speed up
        replies from my local AI server when testing out an application I was
        developing. Yes I could have mocked up HTTP replies etc., but in this
        case the small model let me just plug in and go.
       
        Thews wrote 1 day ago:
        Before ollama and the others could do structured JSON output, I hacked
        together my own loop to correct the output.   I used it that for dummy
        API endpoints to pretend to be online services but available locally, 
        to pair with UI mockups.   For my first test I made a recipe generator
        and then tried to see what it would take to "jailbreak" it.   I also
        used uncensored models to allow it to generate all kinds of funny
        content.
        
        I think the content you can get from the SLMs for fake data is a lot
        more engaging than say the ruby ffaker library.
       
        bashbjorn wrote 1 day ago:
        I'm working on a plugin[1] that runs local LLMs from the Godot game
        engine. The optimal model sizes seem to be 2B-7B ish, since those will
        run fast enough on most computers. We recommend that people try it out
        with Gemma 2 2B (but it will work with any model that works with
        llama.cpp)
        
        At those sizes, it's great for generating non-repetitive flavortext for
        NPCs. No more "I took an arrow to the knee".
        
        Models at around the 2B size aren't really capable enough to act a
        competent adversary - but they are great for something like bargaining
        with a shopkeeper, or some other role where natural language can let
        players do a bit more immersive roleplay.
        
   URI  [1]: https://github.com/nobodywho-ooo/nobodywho
       
          Tepix wrote 23 hours 15 min ago:
          Cool. Are you aware of good games that use LLMs like this?
       
            aDyslecticCrow wrote 21 hours 25 min ago:
            I have not seen much myself, but it's one of the earliest use cases
            I thought about when they started showing up.
       
        jbentley1 wrote 1 day ago:
        Tiny language models can do a lot if they are fine tuned for a specific
        task, but IMO a few things are holding them back:
        
        1. Getting the speed gains is hard unless you are able to pay for
        dedicated GPUs. Some services offer LoRA as serverless but you don't
        get the same performance for various technical reasons.
        
        2. Lack of talent to actually do the finetuning. Regular engineers can
        do a lot of LLM implementation, but when it comes to actually
        performing training it is a scarcer skillset. Most small to medium orgs
        don't have people who can do it well.
        
        3. Distribution. Sharing finetunes is hard. HuggingFace exists, but
        discoverability is an issue. It is flooded with random models with no
        documentation and it isn't easy to find a good oen for your task. Plus,
        with a good finetune you also need the prompt and possibly parsing code
        to make it work the way it is intended and the bundling hasn't been
        worked out well.
       
          grisaitis wrote 1 day ago:
          when you say fine-tuning skills or talent are scarce, do you have
          specific skills in mind? perhaps engineering for training models (eg
          making model parallelism work)? or the more ML type skills of
          designing experiments, choosing which methods to use, figuring out
          datasets for training, hyperparam tuning/evaluation, etc?
       
            jbentley1 wrote 1 day ago:
            The technical parts are less common and specialized, like
            understanding the hyperparameters and all that, but I don't think
            that is the main problem. Most people don't understand how to build
            a good dataset or how to evaluate their finetune after training.
            Some parts of this are solid rules like always use a separate
            validation set, but the task dependent parts are harder to teach.
            It's a different problem every time.
       
              menaerus wrote 6 hours 0 min ago:
              Finetuning, as I understand it, is mostly laborious and mostly
              very boring and exhausting work that is not appealing to many
              engineers. It can be done by people who have some skills in
              Python or similar language and who have some background in
              statistics.
              
              OTOH to build the infra for LLMs there's much more stuff involved
              and it's really hard to find engineers who have the capacity to
              be both the researchers and developers at the same time. By
              "researchers" I mean that they have to have a capacity to be able
              to read through the numerous academic and industry papers,
              comprehend the tiniest details, and materialize it into the
              product through the code. I think that's much harder and scarcer
              skill to find.
              
              That said, I am not undermining the fine-tuning skill, it's a
              humongous effort, but I think it's not necessarily the skillset
              problem.
       
        mogaal wrote 1 day ago:
        I bought a tiny business in Brazil, the database (Excel) I inherited
        with previous customer data *do not include gender*. I need gender to
        start my marketing campaigns and learn more about my future customer. I
        used Gemma-2B and Python to determine gender based on the data and it
        worked perfect
       
          Nashooo wrote 1 day ago:
          How did you verify it worked?
       
        addandsubtract wrote 1 day ago:
        I use a small model to rename my Linux ISOs. I gave it a custom prompt
        with examples of how I want the output filenames to be structured and
        then just feed it files to rename. The output only works 90ish percent
        of the time, so I wrote a little CLI to iterate through the files and
        accept / retry / edit the changes the LLM outputs.
       
        ahrjay wrote 1 day ago:
        I built [1] using the chrome ai (Gemini nano). Allows you to do ffmpeg
        operations on videos using natural language all client side.
        
   URI  [1]: https://ffprompt.ryanseddon.com
       
          fauigerzigerk wrote 1 day ago:
          What are the prerequisites for this? I keep getting "Bummer, looks
          like your device doesn't support Chrome AI" on macOS 15.2 Chrome
          132.0.6834.84 (Official Build) (arm64)
          
          [Edit] Found it. I had to enable
          chrome://flags/#prompt-api-for-gemini-nano
       
        krystofee wrote 1 day ago:
        Has anyone ever tried to do some automatic email workflow autoresponder
        agents?
        
        Lets say, I want some outcome and it will autonomousl handle the
        process prompt me and the other side for additional requirements if
        necessary and then based on that handle the process and reach the
        outcome?
       
        lormayna wrote 1 day ago:
        I am using smollm2 to extract some useful information (like remote,
        language, role, location, etc.) from "Who is hiring" monthly thread and
        create an RSS feed with specific filter.
        Still not ready for Show HN, but working.
       
        reeeeee wrote 1 day ago:
        I built a platform to monitor LLMs that are given complete freedom in
        the form of a Docker container bash REPL. Currently the models have
        been offline for some time because I'm upgrading from a single DELL to
        a TinyMiniMicro Proxmox cluster to run multiple small LLMs locally.
        
        The bots don't do a lot of interesting stuff though, I plan to add the
        following functionalities:
        
        - Instead of just resetting every 100 messages, I'm going to provide
        them with a rolling window of context.
        
        - Instead of only allowing BASH commands, they will be able to also
        respond with reasoning messages, hopefully to make them a bit smarter.
        
        - Give them a better docker container with more CLI tools such as curl
        and a working package manager.
        
        If you're interested in seeing the developments, you can subscribe on
        the platform!
        
   URI  [1]: https://lama.garden
       
        tomholandpick wrote 1 day ago:
        How accurate? are the classifications?
       
        kolinko wrote 1 day ago:
        Apple’s on device models are
        around 3B if I’m nit mistaken, and they developed some nice tech
        around them that they published, if I’m not
        mistaken - where they have just one model, but have switchable
        finetunings of that model so that it can perform different
        functionalities depending on context.
       
        numba888 wrote 1 day ago:
        Many interesting projects, cool. I'm waiting to LLMs in games. That
        would make them much more fun. Any time now...
       
          jaggs wrote 20 hours 30 min ago:
          Have you seen a AI People?
          
   URI    [1]: https://www.aipeoplegame.com/
       
        ceritium wrote 1 day ago:
        I am doing nothing, but I was wondering if it would make sense to
        combine a small LLM and SQLITE to parse date time human expressions.
        For example, given a human input like "last day of this month", the LLM
        will generate the following query `SELECT date('now','start of
        month','+1 month','-1 day');`
        
        It is probably super overengineering, considering that pretty good
        libraries are already doing that on different languages, but it would
        be funny. I did some tests with chatGPT, and it worked sometimes. It
        would probably work with some fine-tuning, but I don't have the
        experience or the time right now.
       
          TachyonicBytes wrote 1 day ago:
          What libraries have you seen that do this?
       
          lionkor wrote 1 day ago:
          LLMs tend to REALLY get this wrong. Ask it to generate a query to sum
          up likes on items uploaded in the last week, defined as the last
          monday-sunday week (not the last 7 days), and watch it get it subtly
          wrong almost every time.
       
        computers3333 wrote 1 day ago:
         [1] – I built GopherSignal!
        
        It's a lightweight tool that summarizes Hacker News articles. For
        example, here’s what it outputs for this very post, "Ask HN: Is
        anyone doing anything cool with tiny language models?":
        
        "A user inquires about the use of tiny language models for interesting
        applications, such as spam filtering and cookie notice detection. A
        developer shares their experience with using Ollama to respond to SMS
        spam with unique personas, like a millennial gymbro or a 19th-century
        British gentleman. Another user highlights the effectiveness of 3B and
        7B language models for cookie notice detection, with decent performance
        achieved through prompt engineering."
        
        I originally used LLaMA 3:Instruct for the backend, which performs much
        better, but recently started experimenting with the smaller LLaMA
        3.2:1B model.
        
        It’s been cool seeing other people’s ideas too. Curious—does
        anyone have suggestions for small models that are good for summaries?
        
        Feel free to check it out or make changes:
        
   URI  [1]: https://gophersignal.com
   URI  [2]: https://github.com/k-zehnder/gophersignal
       
          tinco wrote 1 day ago:
          That's cool, I really like it. One piece of feedback: I am usually
          more interested in the HN comments than in the original article. If
          you'd include a link to the comments then I might switch to
          GopherSignal as a replacement for the HN frontpage.
          
          My flow is generally: Look at the title and the amount of upvotes to
          decide if I'm interested in the article. Then view the comments to
          see if there's interesting discussion going on or if there's already
          someone adding essential context. Only then I'll decide if I want to
          read the article or not.
          
          Of course no big deal if you're not interested in my patronage, just
          wanted to let you know your page already looks good enough for me to
          consider switching my most visited page to it if it weren't for this
          small detail. And maybe the upvote count.
       
            computers3333 wrote 8 hours 3 min ago:
            EDIT: Apologies for breaking things earlier while trying to fix it!
            I’ve been working on updating it and got the upvote count and
            comment link in there. Wondering what you think about these
            updates—appreciate any feedback! Thanks again for helping me
            improve it!
            
   URI      [1]: https://gophersignal.com
       
            goodklopp wrote 22 hours 8 min ago:
            I would love this feature. Regardless, what you have built is
            really cool
       
              computers3333 wrote 21 hours 57 min ago:
              Hey thanks a ton for checking out GopherSignal! From the feedback
              I’m getting, it seems like comments and upvotes are the secret
              sauce I’ve been missing—appreciate you helping me get that
              through my thick skull lol. The pressure’s on now—I’ll do
              my best to deliver.
       
            sainib wrote 1 day ago:
            May be even rate each post on the comments activity level.
       
              computers3333 wrote 22 hours 3 min ago:
              Great call! That’s a really solid idea—using the LLMs to rate
              posts based on comment activity could totally work and would be
              fun.
              
              Were you thinking something like a “DramaLlama,” deciding if
              it’s a slow day or a meltdown-worthy soap opera in the
              comments? Or maybe something more valuable, like an “Insight
              Index” that uses the LLM to analyze comments for links,
              explanations, or phrases that add context or insight—basically
              gauging how constructive or meaningful the discussion is?
              
              I also saw an idea in another post on this thread about an LLM
              that constantly listens to conversations and declares a winner.
              That could be fun to adapt for spicier posts—like the LLM
              picking a “winner” in the comments. Make the argument
              GopherSignal official lol. If it helps bring in another user,
              I’m all in!
              
              Appreciate the feedback.
       
            sainib wrote 1 day ago:
            Agreed..great suggestions. Id consider switching as well.
       
            computers3333 wrote 1 day ago:
            Hey, thanks a ton for the feedback! That was super helpful to hear
            about your flow—makes a lot of sense and it's pretty similar to
            how I browse HN too. I usually only dive into the article after
            checking out the upvotes and seeing what context the comments add.
            
            I'll definitely add a link to the comments and the upvote
            count—gotta keep my tiny but mighty userbase (my mom, me, and
            hopefully you soon) happy, right? lol
            
            And if there's even a chance you'd use GopherSignal as your daily
            driver, that's a no-brainer for me. Really appreciate you taking
            the time to share your ideas and help me improve.
       
        evacchi wrote 1 day ago:
        I'm interested in finding tiny models to create workflows stringing
        together several function/tools and running them on device using
        mcp.run servlets on Android (disclaimer: I work on that)
       
        kaspermarstal wrote 1 day ago:
        I built an Excel Add-In that allows my girlfriend to quickly filter
        7000 paper titles and abstracts for a review paper that she is writing
        [1]. It uses Gemma 2 2b which is a wonderful little model that can run
        on her laptop CPU. It works surprisingly well for this kind of binary
        classification task.
        
        The nice thing is that she can copy/paste the titles and abstracts in
        to two columns and write e.g. "=PROMPT(A1:B1, "If the paper studies
        diabetic neuropathy and stroke, return 'Include', otherwise return
        'Exclude'")" and then drag down the formula across 7000 rows to bulk
        process the data on her own because it's just Excel. There is a gif on
        the readme on the Github repo that shows it.
        
   URI  [1]: https://github.com/getcellm/cellm
       
          basmok wrote 20 hours 35 min ago:
          Can someone hack this together as pure matrix multiplication?
          
          Like either as table in the background or as regular script?
          
          On most computers you can't compile or add add-ons without
          administrative rights and LLM Chat sites are blocked to prevent usage
          of company data.
          
          It should run on native Excel or GSheets.
          
          I mean, pure without compilation, just like the do the matrix
          calculations here straight in Excel without admin rights:
          
          Lesson 1: Demystifying how LLMs work, from architecture to Excel [1]
          As far as i know in GSheet the scripts also run on the Google Servers
          and are not limited by the local computer power. So there larger
          models could be deployed.
          
          Someone can hack this into Excel/GSheet?
          
   URI    [1]: https://youtu.be/FyeN5tXMnJ8
       
          vdm wrote 1 day ago:
          
          
   URI    [1]: https://x.com/Suhail/status/1882069209129340963
       
          7734128 wrote 1 day ago:
          You could have called it CellMate
       
          7734128 wrote 1 day ago:
          You could have called it CellMate b
       
          donbreo wrote 1 day ago:
          Requirements:
          -Windows
          
          Looks like I'm out...
          Would be great if there was a google apps script alternative. My
          company gave all devs linux systems and the business team operates on
          windows. So I always use browser based tech like Gapps script for
          complex sheet manipulation
       
            jkman wrote 22 hours 54 min ago:
            Well it's an excel add-in, how else would it work?
       
              NotMichaelBay wrote 3 hours 14 min ago:
              Excel add-ins can be written with the Office JS API so that they
              can run on web as well as desktop for Windows and Mac. But I
              don't think OP's add-in is possible with that API unless the
              local model can be run in JS.
       
          afro88 wrote 1 day ago:
          How accurate are the classifications?
       
            kaspermarstal wrote 1 day ago:
            I don't know. This paper [1] reports accuracies in the 97-98% range
            on a similar task with more powerful models. With Gemma 2 2b the
            accuracy will certainly be lower.
            
   URI      [1]: https://www.medrxiv.org/content/10.1101/2024.10.01.2431470...
       
              beernet wrote 1 day ago:
              > I don't know.
              
              HN in a nutshell: I've built some cool tech but have no idea if
              it is helpful or even counter productive...
       
                kaspermarstal wrote 22 hours 55 min ago:
                I am not going to claim or report any kind of accuracy,
                especially with such a small model and such a specific, context
                dependent use case. It is the user’s responsibility to cross
                validate if it’s accurate enough for their use case and
                upgrade model or use another approach if not.
       
                  jbs789 wrote 22 hours 19 min ago:
                  A user buys a car because it gets them from point A to point
                  B. I get what you’re saying though - we are earlier along
                  the adoption curve for these models and more responsibility
                  sits with the user. Over time the expectations will no doubt
                  increase.
       
                sidcool wrote 23 hours 11 min ago:
                Sometimes it's the joy of creation.  Utility and optimization
                come later.  It's fun.    Like a hobby.
       
                corobo wrote 1 day ago:
                Real HN in a nutshell: People who don't build stuff telling
                people who do build stuff that the thing they built is useless
                :P
                
                It's a hacker forum, let people hack!
                
                If anything have a dig at OP for posting the thread too soon
                before the parent commenter has had the chance to gather any
                data, haha
       
                  greenavocado wrote 1 day ago:
                  Just because you can, doesn't mean you should
       
                    corobo wrote 23 hours 56 min ago:
                    If you're building a dinosaur sanctuary sure
       
                      stackghost wrote 22 hours 6 min ago:
                      Or an Internet surveillance-capitalism panopticon.
       
                rasmus1610 wrote 1 day ago:
                Sometimes people just like to build stuff for the sake of it.
       
                  jajko wrote 1 day ago:
                  Almost like hackers, doing shit just for the heck of it
                  because they can (mostly)
       
              indolering wrote 1 day ago:
              Y'all definitely need to cross validate a small number of samples
              by hand.  When I did this kind of research, I would hand validate
              to at least P < .01.
       
                kaspermarstal wrote 1 day ago:
                She and one other researcher has manually classified all 7000
                papers as per standard protocol. Perhaps for the next article
                they will measure how this tool agreed with them against them
                and include it in the protocol if good enough.
       
          relistan wrote 1 day ago:
          Very cool idea. I’ve used gemma2 2b for a few small things. Very
          good model for being so small.
       
        merwijas wrote 1 day ago:
        I put llama 3 on a RBPi 5 and have it running a small droid. I added a
        TTS engine so it can hear spoken prompts which it replies to in droid
        speak. It also has a small screen that translates the response to
        English. I gave it a backstory about being a astromech droid so it
        usually just talks about the hyperdrive but it's fun.
       
        sauravpanda wrote 1 day ago:
        We are building a framework to run this tiny language model in the web
        so anyone can access private LLMs in their browser: [1] .
        
        With just three lines of code, you can run Small LLM models inside the
        browser. We feel this unlocks a ton of potential for businesses so that
        they can introduce AI without fear of cost and can personalize the
        experience using AI.
        
        Would love your thoughts and what we can do more or better!
        
   URI  [1]: https://github.com/sauravpanda/BrowserAI
       
          ms7892 wrote 1 day ago:
          Sounds cool. Anyway I can help.
       
        guywithahat wrote 1 day ago:
        I've been working on a self-hosted, low-latency service for small
        LLM's. It's basically exactly what I would have wanted when I started
        my previous startup. The goal is for real time applications, where even
        the network time to access a fast LLM like groq is an issue.
        
        I haven't benchmarked it yet but I'd be happy to hear opinions on it.
        It's written in C++ (specifically not python), and is designed to be a
        self-contained microservice based around llama.cpp.
        
   URI  [1]: https://github.com/thansen0/fastllm.cpp
       
        gpm wrote 1 day ago:
        I made a shell alias to translate things from French to English, does
        that count?
        
            function trans
            llm "Translate \"$argv\" from French to English please"
            end
        
        Llama 3.2:3b is a fine French-English dictionary IMHO.
       
          kreyenborgi wrote 1 day ago:
          Is it better than translatelocally? [1] (the same as used in firefox)
          
   URI    [1]: https://translatelocally.com/downloads/
       
            gpm wrote 1 day ago:
            It's different. It doesn't always just give one translation but
            different options. I can do things like give it a phrase and then
            ask it to break it down. Or give it a word and if its translation
            doesn't make sense to me ask how it works in the context of a
            phrase.
            
            llm -c, which continues the previous conversation, is specifically
            useful for that sort of manipulation.
            
            It's also available from the command line, which I find convenient
            because I basically always have one open.
       
        dh1011 wrote 1 day ago:
        I copied all the text from this post and used an LLM to generate a list
        of all the ideas. I do the same for other similar HN post .
       
          whalesalad wrote 1 day ago:
          chatgpt did a stellar job parsing the "books on hard things" thread
          from a little while ago. my prompt was:
          
          Can you identify all the books here, sorted by a weight which is
          determined based on a combo of the number of votes the comment has,
          the number of sub-comments, or the number of repeat mentions.
          
          Ideally retain hyperlinks if possible.
       
            swifthesitation wrote 8 hours 41 min ago:
            could you link the HN thread?
       
              whalesalad wrote 1 hour 43 min ago:
              google "hn books on hard things" -
              
   URI        [1]: https://news.ycombinator.com/item?id=42614722
       
          lordswork wrote 1 day ago:
          well, what are the ideas?
       
        sidravi1 wrote 1 day ago:
        We fine-tuned a Gemma 2B to identify urgent messages sent by new and
        expecting mothers on a government-run maternal health helpline.
        
   URI  [1]: https://idinsight.github.io/tech-blog/blog/enhancing_maternal_...
       
          Mukina wrote 9 hours 6 min ago:
          Super cool. What a simple and powerful way to help mothers in need.
          Thanks for sharing.
       
          Mumps wrote 1 day ago:
          lovely application!
          
          Genuine question: why not use (Modern)BERT instead for
          classification? (Is the json-output explanation so critical?)
       
          Mashimo wrote 1 day ago:
          Oh that is a nice writeup. We have something similar in mind at work.
          Will forward it.
       
          proxygeek wrote 1 day ago:
          Such a fun thread but this is the kind of applications that perk up
          my attention!
          
          Very cool!
       
        linsomniac wrote 1 day ago:
        I have this idea that a tiny LM would be good at canonicalizing entered
        real estate addresses.    We currently buy a data set and software from
        Experian, but it feels like something an LM might be very good at. 
        There are lots of weirdnesses in address entry that regexes have a hard
        time with.  We know the bulk of addresses a user might be entering,
        unless it's a totally new property, so we should be able to train it on
        that.
       
          thesz wrote 16 hours 11 min ago:
          From my experience (2018), run LLM output through beam search over
          different choices of canonicalization of certain part of text. Even
          3-gram models (yeah, 2018) fare better this way.
       
        jftuga wrote 1 day ago:
        I'm using ollama, llama3.2 3b, and python to shorten news article
        titles to 10 words or less.  I have a 3 column web site with a list of
        news articles in the middle column.  Some of the titles are too long
        for this format, but the shorter titles appear OK.
       
        HexDecOctBin wrote 1 day ago:
        Is there any experiments in a small models that does paraphrasing? I
        tried hsing some off-the-shelf models, but it didn't go well.
        
        I was thinking of hooking them in RPGs with text-based dialogue, so
        that a character will say something slightly different every time you
        speak to them.
       
          krystofee wrote 1 day ago:
          Intuitively this sounds like something that should be possible using
          almost any llm. This should be just a matter of prompting.
       
        jwitthuhn wrote 1 day ago:
        I've made a tiny ~1m parameter model that can generate random Magic the
        Gathering cards that is largely based on Karpathy's nanogpt with a few
        more features added on top.
        
        I don't have a pre-trained model to share but you can make one yourself
        from the git repo, assuming you have an apple silicon mac.
        
   URI  [1]: https://github.com/jlwitthuhn/TCGGPT
       
        itskarad wrote 1 day ago:
        I'm using ollama for parsing and categorizing scraped jobs for a local
        job board dashboard I check everyday.
       
        JLCarveth wrote 1 day ago:
        I used a small (3b, I think) model plus tesseract.js to perform OCR on
        an image of a nutritional facts table and output structured JSON.
       
          ian_zcy wrote 1 day ago:
          what are you feed into the model? Image (like product packaging) or
          Image of Structured Table? I found out that model performs good in
          general with sturctured table, but fails a lot over images.
       
          tigrank wrote 1 day ago:
          All that server side or client?
       
          deivid wrote 1 day ago:
          What was the model? What kind of performance did you get out of it?
          
          Could you share a link to your project, if it is public?
       
            JLCarveth wrote 1 day ago:
             [1] I've had good speed / reliability with TheBloke/rocket-3B-GGUF
            on Huggingface, the Q2_K model. I'm sure there are better models
            out there now, though.
            
            It takes ~8-10 seconds to process an image on my M2 Macbook, so not
            quite quick enough to run on phones yet, but the accuracy of the
            output has been quite good.
            
   URI      [1]: https://github.com/JLCarveth/nutrition-llama
       
        codazoda wrote 1 day ago:
        I had an LLM create a playlist for me.
        
        I’m tired of the bad playlists I get from algorithms, so I made a
        specific playlist with an Llama2 based on several songs I like. I
        started with 50, removed any I didn’t like, and added more to fill in
        the spaces. The small models were pretty good at this. Now I have a
        decent fixed playlist. It does get “tired” after a few weeks and I
        need to add more to it. I’ve never been able to do this myself with
        more than a dozen songs.
       
          DonHopkins wrote 23 hours 49 min ago:
          How about having an LLM create a praylist for you?
          
          Then you could implement Salvation as a Service, where you privately
          confess your sins to a local LLM, and it continuously prays for your
          eternal soul, recommends penances, and even recites Hail Marys for
          you.
       
          jamesponddotco wrote 1 day ago:
          Interesting! I wrote a prompt for something similar[1], but I use
          Claude Sonnet for it. I wonder how a small model would handle it.
          Time to test, I guess.
          
          [1] 
          
   URI    [1]: https://git.sr.ht/~jamesponddotco/llm-prompts/tree/trunk/dat...
       
            codazoda wrote 19 hours 9 min ago:
            This prompt is a lot more complex than what I did.  I don’t
            recall my exact prompt but it was something like, “Generate a
            list of 25 songs that I may like if I like Girl is on my Mind by
            the Black Keys.”
       
          Mashimo wrote 1 day ago:
          Huh, interesting. For me that often dreamed up artist and songs.
       
          petesergeant wrote 1 day ago:
          Interesting! I've sadly found more capable models to really fail on
          music recommendations for me.
       
        kianN wrote 1 day ago:
        I don’t know if this counts as tiny but I use llama 3B in prod for
        summarization (kinda).
        
        Its effective context window is pretty small but I have a much more
        robust statistical model that handles thematic extraction. The llm is
        essentially just rewriting ~5-10 sentences into a single paragraph.
        
        I’ve found the less you need the language model to actually do, the
        less the size/quality of the model actually matters.
       
        jothflee wrote 1 day ago:
        when i feel like casually listening to something, instead of
        netflix/hulu/whatever, i'll run a ~3b model (qwen 2.5 or llama 3.2) and
        generate and audio stream of water cooler office gossip. (when it is
        up, it runs here: [1] ).
        
        some of the situations get pretty wild, for the office :)
        
   URI  [1]: https://water-cooler.jothflee.com
       
          jaggs wrote 20 hours 37 min ago:
          I love it. It's a shame the voices aren't just a little bit more
          realistic. There's some good models and tts around now I wonder if
          you could upgrade it?
       
          jftuga wrote 1 day ago:
          What prompt are you using for this?
       
        spiritplumber wrote 1 day ago:
        My husband and me made a stock market analysis thing that gets it right
        about 55% of the time, so better than a coin toss. The problem is that
        it keeps making unethical suggestions, so we're not using it to trade
        stock. Does anyone have any idea what we can do with that?
       
          febed wrote 1 day ago:
          What data do you analyze?
       
          bongodongobob wrote 1 day ago:
          You can literally flip coins and get better than 50% success in a
          bull market. Just buy index funds and spend your time on something
          that isn't trying to beat entropy. You won't be able to.
       
            spiritplumber wrote 1 day ago:
            INSUFFICIENT DATA FOR A MEANINGFUL ANSWER.
       
          dkga wrote 1 day ago:
          Suggestion: calculate the out-of-sample Sharpe ratio[0] of the
          suggestions over a reasonable period to gauge how good the model
          would actually perform in terms of return compared to risks. It is
          better than vanilla accuracy or related metrics. Source: I'm a
          financial economist.
          
          [0]:
          
   URI    [1]: https://en.wikipedia.org/wiki/Sharpe_ratio
       
            spiritplumber wrote 1 day ago:
            thank you! that's exactly the sort of thing I don't know.
       
          Etheryte wrote 1 day ago:
          Have you backtested this in times when markets were not constantly
          green? Nearly any strategy is good in the good times.
       
            spiritplumber wrote 1 day ago:
            yep. the 55% is over a few years.
       
              kortilla wrote 1 day ago:
              Right, but if 55% is avg over the last few years, “buy stock”
              is going to be correct more than not.
              
   URI        [1]: https://www.crestmontresearch.com/docs/Stock-Yo-Yo.pdf
       
                Etheryte wrote 1 day ago:
                I think this is a good highlight of why context and reality
                checks are incredibly important when doing work like this. At
                first glance, it might look like 55% is a really good result,
                but in the previous year, a flat buy every day strategy
                would've been right 56.7% of the time.
       
                  dutchbookmaker wrote 17 hours 22 min ago:
                  55% means basically nothing in this context if even money.
                  Long 45% to 55% is most likely completely random because it
                  is symmetric with shorting 45% to 55%
                  
                  Exactly what you would expect from a language model making
                  random stock picks.
       
          bobbygoodlatte wrote 1 day ago:
          I'm curious what sort of unethical suggestions it's coming up with
          haha
       
            spiritplumber wrote 1 day ago:
            so far, mostly buying companies owned/ran by horrible people.
       
              GordonS wrote 1 day ago:
              Can't you adjust the prompt to filter out companies that fund
              genocide etc?
       
              kortilla wrote 1 day ago:
              So if you filter out the Republican owned ones or whatever your
              bugbear is, does the 55% persist?
       
        danbmil99 wrote 1 day ago:
        Using llama 3.2 as an interface to a robot. If you can get the latency
        down, it works wonderfully
       
          mentos wrote 1 day ago:
          Would love to see this applied to a FPS bot in unreal engine.
       
        jmward01 wrote 1 day ago:
        I think I am. At least I think I'm building things that will enable
        much smaller models:
        
   URI  [1]: https://github.com/jmward01/lmplay/wiki/Sacrificial-Training
       
        juancroldan wrote 1 day ago:
        I'm making an agent that takes decompiled code and tries to understand
        the methods and replace variables and function names one at a time.
       
          krystofee wrote 1 day ago:
          This sounds cool! Are you planningto opensource it?
       
            DonHopkins wrote 23 hours 53 min ago:
            No need to: he can just publish a binary then you can run it on
            itself. ;)
       
        Evidlo wrote 1 day ago:
        I have ollama responding to SMS spam texts.  I told it to feign
        interest in whatever the spammer is selling/buying.  Each number gets
        its own persona, like a millennial gymbro or 19th century British
        gentleman. [1]
        
   URI  [1]: http://files.widloski.com/image10%20(1).png
   URI  [2]: http://files.widloski.com/image11.png
       
          potatoman22 wrote 21 hours 18 min ago:
          You probably just get more spam texts since you're replying. Maybe
          that's a good thing tbh
       
          bripkens wrote 21 hours 45 min ago:
          You should put all these interactions on the web. For education
          purposes ofc.
       
          lacoolj wrote 21 hours 52 min ago:
          Most spam are just verifying you exist as a person, then from there
          you become an actual "target" if you respond.
          
          This feels like an in-between that both wastes their time and adds
          you to extra lists.
          
          Send the results somewhere! Not sure if "law enforcement" is
          applicable (as in, would be able/willing to act on the info) but if
          so, that's a great use of this data :)
       
          hackergirl88 wrote 23 hours 17 min ago:
          Where was this during the election
       
          merpkz wrote 1 day ago:
          Calling Jessica an old chap is quite a giveaway that it's a bot xD
          Nice idea indeed, but I have a feeling that it's just two LLMs now
          conversing with each other.
       
          metadat wrote 1 day ago:
          I love this, more please!!!
       
          blackeyeblitzar wrote 1 day ago:
          You realize this is going to cause carriers to allow the number to
          send more spam, because it looks like engagement. The best thing to
          do is to report the offending message to 7726 (SPAM) so the carrier
          can take action. You can also file complaints at the FTC and FCC
          websites, but that takes a bit more effort.
       
            thegabriele wrote 1 day ago:
            Yes, the very last thing to do is respond to spam (calls, email,
            text...) and inform that you are eligible to more solicitation.
       
          thecosmicfrog wrote 1 day ago:
          Please tell me you have a blog/archive of these somewhere. This was
          such a joy to read!
       
          celestialcheese wrote 1 day ago:
          Given the source, I'm skeptical it's not just a troll, but found this
          explanation [0] plausible as to why those vague spam text exists.  If
          true, this trolling helps the spammers warm those phone numbers up.
          
          0 -
          
   URI    [1]: https://x.com/nikitabier/status/1867029883387580571
       
            stogot wrote 1 day ago:
            Why does STOP work here?
       
              yawgmoth wrote 1 day ago:
              STOP works thanks to the Telephone Consumer Protection Act
              (“TCPA”), which offers consumers spam protections and senders
              a framework on how to behave.
              
              (Edit: It's relevant that STOP didn't come from the TCPA itself,
              but definitely has teeth due to it)
              
   URI        [1]: https://www.infobip.com/blog/a-guide-to-global-sms-compl...
       
              inerte wrote 1 day ago:
              Carriers and SMS service providers (like Twillio) obey that, no
              matter what service is behind.
              
              There are stories of people replying STOP to spam, then never
              getting a legit SMS because the number was re-used by another
              service. That's because it's being blocked between the spammer
              and the phone.
       
              celestialcheese wrote 1 day ago:
               [1] Again, no clue if this is true, but it seems plausible.
              
   URI        [1]: https://x.com/nikitabier/status/1867069169256308766
       
          zx8080 wrote 1 day ago:
          Cool! Do you consider the risk of unintentional (and until some
          moment, an unknown) subscription to some paid SMS service and how do
          you mitigate it?
       
            Evidlo wrote 1 day ago:
            I have to whitelist a conversation before the LLM can respond.
       
          RVuRnvbM2e wrote 1 day ago:
          This is fantastic. How have your hooked up a mobile number to the
          llm?
       
            Evidlo wrote 1 day ago:
            Android app that forwards to a Python service on remote workstation
            over MQTT.  I can make a Show HN if people are interested.
       
              SuperHeavy256 wrote 1 day ago:
              I am so SO interested, please make a Show HN
       
                Evidlo wrote 20 hours 31 min ago:
                
                
   URI          [1]: https://news.ycombinator.com/item?id=42796496
       
                  gaudystead wrote 20 hours 18 min ago:
                  Sweeeeet, thank you! :)
       
              sainib wrote 1 day ago:
              Interested for sure.
       
              potamic wrote 1 day ago:
              Why MQTT over HTTP for a low volume, small scale integration?
       
                c0wb0yc0d3r wrote 1 day ago:
                I’m not OP, but I would hazard a guess that those are the
                tools that OP has at hand.
       
              dkga wrote 1 day ago:
              Yes, I'd be interested in that!
       
              deadbabe wrote 1 day ago:
              I’d love to see that. Could you simulate iMessage?
       
                great_psy wrote 1 day ago:
                Yes it’s possible, but it’s not something you can easily
                scale.
                
                I had a similar project a few years back that used OSX
                automations and Shortcuts and Python to send a message everyday
                to a friend. It required you to be signed in to iMessage on
                your MacBook.
                
                Than was a send operation, the reading of replies is not
                something I implemented, but I know there is a file somewhere
                that holds a history of your recent iMessages. So you would
                have to parse it on file update and that should give you the
                read operation so you can have a conversation.
                
                Very doable in a few hours unless something dramatic changed
                with how the messages apps works within the last few years.
       
                  dewey wrote 1 day ago:
                  They are all in a SQLite db on your disk.
       
                Evidlo wrote 1 day ago:
                If you mean hook this into iMessage, I don't know.  I'm willing
                to bet it's way harder though because Apple
       
                  dambi0 wrote 1 day ago:
                  If you are willing to use Apple Shortcuts on iOS it’s
                  pretty easy to add something that will be trigged when a
                  message is received and can call out to a service or even use
                  SSH to do something with the contents, including  replying
       
            spiritplumber wrote 1 day ago:
            For something similar with FB chat, I use Selenium and run it on
            the same box that the llm is running on. Using multiple
            personalities is really cool though. I should update mine likewise!
       
        antonok wrote 1 day ago:
        I've been using Llama models to identify cookie notices on websites,
        for the purpose of adding filter rules to block them in EasyList
        Cookie. Otherwise, this is normally done by, essentially, manual
        volunteer reporting.
        
        Most cookie notices turn out to be pretty similar, HTML/CSS-wise, and
        then you can grab their `innerText` and filter out false positives with
        a small LLM. I've found the 3B models have decent performance on this
        task, given enough prompt engineering. They do fall apart slightly
        around edge cases like less common languages or combined cookie notice
        + age restriction banners. 7B has a negligible false-positive rate
        without much extra cost. Either way these things are really fast and
        it's amazing to see reports streaming in during a crawl with no human
        effort required.
        
        Code is at [1] . You can see the prompt at [1]
        /blob/main/src/text-cl....
        
   URI  [1]: https://github.com/brave/cookiemonster
   URI  [2]: https://github.com/brave/cookiemonster/blob/main/src/text-clas...
       
          rpastuszak wrote 1 day ago:
          Tangentially related, I worked on something similar, using LLMs to
          find and skip sponsored content in YT videos:
          
   URI    [1]: https://butter.sonnet.io/
       
          GardenLetter27 wrote 1 day ago:
          It's funny that this is even necessary though - that great EU
          innovation at work.
       
            vvillena wrote 20 hours 49 min ago:
            Bear in mind, those arcane cookie forms are probably not compliant
            with EU laws. If there's not a "reject" button next to the "accept"
            button, the form is almost definitely not to spec.
       
            pornel wrote 22 hours 39 min ago:
            The legislation has been watered down by lobbying of the
            trillion-dollar tracking industry.
            
            The industry knows ~nobody wants to be tracked, so they don't want
            to let tracking preferences to be easy to express. They want cookie
            notices to be annoying to make people associate privacy with a
            bureaucratic nonsense, and stop demanding to have privacy.
            
            There was P3P spec in 2002: [1] It even got decent implementation
            in Internet Explorer, but Google has been deliberately sending a
            junk P3P header to bypass it.
            
            It has been tried again with a very simple DNT spec. Support for it
            (that barely existed anyway) collapsed after Microsoft decided to
            make Do-Not-Track on by default in Edge.
            
   URI      [1]: https://www.w3.org/TR/P3P/
       
            kalaksi wrote 1 day ago:
            Tracking, tracking cookies, banners etc. are a choice done by the
            website. There are browser addons for making it simpler, though.
            
            The transparency requirements and consent for collecting all kinds
            of PII (this is the regulation) actually is a great innovation.
       
              docmars wrote 23 hours 37 min ago:
              I think I'd rather see cookie notices handled by a browser API
              with a common UI, where the default is always "No." Provide that
              common UI in a popover accessed in the address bar, or a side
              pane in the browser itself.
              
              If a user logs in or does something requiring cookies that would
              otherwise prevent normal functionality, prompt them with a
              Permissions box if they haven't already accepted it in the usual
              (optional) UI.
       
                YetAnotherNick wrote 19 hours 16 min ago:
                There isn't any way EU didn't knew this was possible and is a
                better choice. There already was DNT header that they can
                regulate. It also knew the harm to ad industry.
       
                  Fraaaank wrote 18 hours 49 min ago:
                  There isn't any rule that requires websites to use a cookie
                  banner. Your required to obtain explicit consent before
                  reading/setting any cookies that aren't strictly necessary.
                  The web came up with the cookie banner.
                  
                  Google could've implemented a consent API in Chrome, but they
                  didn't. Guess why.
       
                kalaksi wrote 23 hours 24 min ago:
                Cookies for normal functionality don't require consent anyway.
                
                But yes, I think just about everybody would like the UX you
                described. But the entities that track you don't want to make
                it that easy. You probably know of the do-not-track header too.
       
          bazmattaz wrote 1 day ago:
          This is so cool thanks for sharing. I can imagine it’s not
          technically possible (yet?) but it would be cool if this could simply
          be run as a browser extension rather than running a docker container
       
            MarioMan wrote 1 day ago:
            There are a couple of WebGPU LLM platforms available that form the
            building blocks to accomplish this right from the browser,
            especially since the models are so small. [1] [2] You do have to
            worry about WebGPU compatibility in browsers though.
            
   URI      [1]: https://github.com/mlc-ai/web-llm
   URI      [2]: https://huggingface.co/docs/transformers.js/en/index
   URI      [3]: https://caniuse.com/webgpu
       
            throwup238 wrote 1 day ago:
            It should be possible using native messaging [1] which can call out
            to an external binary. The 1password extensions use that to
            communicate with the password manager binary.
            
   URI      [1]: https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/W...
       
            antonok wrote 1 day ago:
            I did actually make a rough proof-of-concept of this! One of my
            long-term visions is to have it running natively in-browser, and
            able to automatically fix site issues caused by adblocking whenever
            they happen.
            
            The PoC is a bit outdated but it's here:
            
   URI      [1]: https://github.com/brave/cookiemonster/tree/webext
       
          binarysneaker wrote 1 day ago:
          Maybe it could also send automated petitions to the EU to undo cookie
          consent legislation, and reverse some of the enshitification.
       
            sebastiennight wrote 1 day ago:
            To me this take is like smokers complaining that the evil
            government is forcing the good tobacco companies to degrade the
            experience by adding pictures of cancer patients on cigarette
            packs.
       
              kortilla wrote 1 day ago:
              Those don’t really work:
              
   URI        [1]: https://jamanetwork.com/journals/jamanetworkopen/fullart...
       
                shiftingleft wrote 1 day ago:
                Do they help deter people from becoming smokers in the first
                place?
       
                  kortilla wrote 11 hours 9 min ago:
                  Not sure if much serious research has been put into it. I
                  would be suspicious of it deterring them because a lot of
                  initial smoking happens in social situations where friends
                  pass out individual cigarettes.
                  
                  By the time someone buys their own pack they are probably
                  hooked.
                  
                  I suspect the obscene taxes blocking out young folks is one
                  of the most effective strategies
       
            K0balt wrote 1 day ago:
            I think there is real potential here, for smart browsing. Have the
            llm get the page, replace all the ads with kittens, find
            non-paywall versions if possible and needed, spoof fingerprint
            data, detect and highlight AI generated drivel, etc. The site would
            have no way of knowing that it wasn’t touching eyeballs.  We
            might be able to rake back a bit of the web this way.
       
              antonok wrote 1 day ago:
              You probably wouldn't want to run this in real-time on every site
              as it'll significantly increase the load on your browser, but as
              long as it's possible to generate adblock filter rules, the fixes
              can scale to a pretty large audience.
       
                Tepix wrote 23 hours 20 min ago:
                Depends on your machine and on the LLM. Could be doable.
       
                K0balt wrote 1 day ago:
                I was thinking running it in my home lab server as a proxy, but
                yeah, scaling it to the browser would require some pretty
                strong hardware. Still, maybe in a couple of years it could be
                mainstream.
       
            antonok wrote 1 day ago:
            Ha, I'm not sure the EU is prepared to handle the deluge of
            petitions that would ensue.
            
            On a more serious note, this must be the first time we can
            quantitatively measure the impact of cookie consent legislation
            across the web, so maybe there's something to be explored there.
       
              pk-protect-ai wrote 1 day ago:
              why don't you spam the companies who want your data instead? The
              sites can simply stop gathering your data, then they will not
              require to ask for consent ...
       
                whywhywhywhy wrote 1 day ago:
                Because they have no reason to care about what you think or
                feel or they wouldn't be doing it in the first place.
                
                Cookie notices just gave them another weapon in the end.
       
                frail_figure wrote 1 day ago:
                It’s the same comments on HN as always. They think EU setting
                up rules is somehow worse than companies breaking them. We see
                how the US is turning out without pesky EU restrictions :)
       
                  GardenLetter27 wrote 1 day ago:
                  The US has 3x higher salaries, larger houses and a much
                  higher quality of life?
                  
                  I work as a senior engineer in Europe and make barely $4k net
                  per month... and that's considered a "good" salary!
       
                    Lutger wrote 1 day ago:
                    It has higher salaries for privileged people like senior
                    engineers. Try making ends meet in a lower class job.
                    
                    And you have (almost) free and universal healthcare in
                    Europa, good food available everywhere, drinking water that
                    doesn't poison you, walkable cities, good public transport,
                    somewhat decent police and a functioning legal system. The
                    list goes on. Does this not impact your quality of life? Do
                    you not care about these things?
                    
                    How can you have a higher quality of life as a society with
                    higher murders, much lower life-expectancy, so many people
                    in jail, in debt, etc.
       
                      macinjosh wrote 1 day ago:
                      Touch grass. The US is a big place and is nothing like
                      you seem to think it is.
                      
                      Europe on the other hand can't even manage to defend
                      itself and relies on the US for their sheer existence.
       
                        pona-a wrote 22 hours 54 min ago:
                        Can you enlighten me of a state where none of parent's
                        points apply? I'd be glad to be educated.
       
        thetrash wrote 1 day ago:
        I programmed my own version of Tic Tac Toe in Godot, using a Llama 3B
        as the AI opponent. Not for work flow, but figuring out how to beat it
        is entertaining during moments of boredom.
       
          spiritplumber wrote 1 day ago:
          Number of players: zero
          
          U.S. FIRST STRIKE      WINNER: NONE
          
          USSR FIRST STRIKE      WINNER: NONE
          
          NATO / WARSAW PACT      WINNER: NONE
          
          FAR EAST STRATEGY      WINNER: NONE
          
          US USSR ESCALATION      WINNER: NONE
          
          MIDDLE EAST WAR       WINNER: NONE
          
          USSR CHINA ATTACK      WINNER: NONE
          
          INDIA PAKISTAN WAR      WINNER: NONE
          
          MEDITERRANEAN WAR      WINNER: NONE
          
          HONGKONG VARIANT      WINNER: NONE
          
          Strange game. The only winning move is not to play
       
        cwmoore wrote 1 day ago:
        I'm playing with the idea of identifying logical fallacies stated by
        live broadcasters.
       
          halJordan wrote 16 hours 57 min ago:
          Logical fallacies are oftentimes totally relevant during anything
          that is not predicate logic. I'm not wrong for saying "The Surgeon
          General says smoking is bad, you shouldn't smoke." That's a perfectly
          reasonable appeal to authority.
       
            genewitch wrote 15 hours 30 min ago:
            It's still a fallacy, though. I hope we can agree on that part. If
            you have something map-reducing audio to timestamps of fallacies by
            who said them it makes it gamified and you can use the information
            shown to decide how much weight to give to their words.
       
          thesz wrote 17 hours 52 min ago:
          I think this is the best idea thus far!
          
          Keep good work, good fellow. ;)
       
          grisaitis wrote 1 day ago:
          even better, podcasters  probably easier to fetch the data as well
       
          vaylian wrote 1 day ago:
          LLMs are notoriously unreliable with mathematics and logic. I wish
          you the best of luck, because this would nevertheless be an awesome
          tool to have.
       
          JayStavis wrote 1 day ago:
          Automation to identify logical/rhetorical fallacies is a long held
          dream of mine, would love to follow along with this project if it
          picks up somehow
       
          petesergeant wrote 1 day ago:
          I'll be very positively impressed if you make this work; I spend all
          day every day for work trying to make more capable models perform
          basic reasoning, and often failing :-P
       
          genewitch wrote 1 day ago:
          I have several rhetoric and logic books of the sort you might use for
           training or whatever, and one of my best friends got a doctorate in
          a tangential field, and may have materials and insights.
          
          We actually just threw a relationship curative app online in 17 hours
          around Thanksgiving., so they "owe" me, as it were.
          
          I'm one of those people that can do  anything practical with tech and
          the like, but I have no imagination for it - so when someone mentions
          something that I think would be beneficial for my fellow humans I get
          this immense desire to at least cheer on if not ask to help.
       
          spiritplumber wrote 1 day ago:
          That's fantastic and I'd love to help
       
            cwmoore wrote 1 day ago:
            So far not much beyond this list of targets to identify
            
   URI      [1]: https://en.wikipedia.org/wiki/List_of_fallacies
       
        nozzlegear wrote 1 day ago:
        I have a small fish script I use to prompt a model to generate three
        commit messages based off of my current git diff. I'm still playing
        around with which model comes up with the best messages, but usually I
        only use it to give me some ideas when my brain isn't working. All the
        models accomplish that task pretty well.
        
        Here's the script: [1] And for this change [1] it generated these
        messages:
        
            1. `fix: change from printf to echo for handling git diff input`
            
            2. `refactor: update codeblock syntax in commit message generator`
            
            3. `style: improve readability by adjusting prompt formatting` [1]
        
   URI  [1]: https://github.com/nozzlegear/dotfiles/blob/master/fish-functi...
   URI  [2]: https://github.com/nozzlegear/dotfiles/commit/0db65054524d0d2e...
       
          mystified5016 wrote 21 hours 52 min ago:
          That's actually pretty useful. This could be a big help in betting
          back into the groove when you leave uncommitted changes over the
          weekend.
          
          A summary of changes like this might be just enough to spark your
          memory on what you were actually doing with the changes. I'll have to
          give it a shot!
       
          lionkor wrote 1 day ago:
          Those commit messages are pretty terrible, please try to come up with
          actual messages ;)
       
          relistan wrote 1 day ago:
          Interesting idea. But those say what’s in the commit. The commit
          diff already tells you that. The best commit messages IMO tell you
          why you did it and what value was delivered. I think it’s gonna be
          hard for an LLM to do that since that context lives outside the code.
          But maybe it would, if you hook it to e.g. a ticketing system and
          include relevant tickets so it can grab context.
          
          For instance, in your first example, why was that change needed? It
          was a fix, but for what issue?
          
          In the second message: why was that a desirable change?
       
            zanderwohl wrote 23 hours 41 min ago:
            > The commit diff already tells you that.
            
            When you squash a branch you'll have 200+ lines of new code on a
            new feature. The diff is not a quick way to get a summary of what's
            happening. You should put the "what" in your commit messages.
       
            nozzlegear wrote 1 day ago:
            Typically I put the "why" of the commit in the body unless it's a
            super simple change, but that's a good point. Sometimes this
            function does generate a commit body to go with the summary, and
            sometimes it doesn't. It also has a habit of only looking at the
            first file in a diff and basing its messages off of that, instead
            of considering the whole patch.
            
            I'll tweak the prompt when I have some time today and see if I can
            get some more consistency out of it.
       
            rane wrote 1 day ago:
            Most of the time you are not able to fit the "Why?" in the summary.
            
            That's what the body of the commit message is for.
       
            lnenad wrote 1 day ago:
            I disagree. When you look at the git history in x months you're
            gonna have a hard time understanding what was done following your
            example.
       
              Draiken wrote 1 day ago:
              I disagree. If you look back and all you see are commit messages
              summarizing the diff, you won't get any meaningful information.
              
              Telling me `Changed timeout from 30s to 60s` means nothing, while
              `Increase timeout for slow  requests` gives me an actual idea of
              why that was done.
              
              Even better if you add meaningful messages to the commit body.
              
              Take a look at commits from large repositories like the Linux
              kernel and we can see how good commit messages looks like.
       
                lnenad wrote 6 hours 59 min ago:
                I mean you're not op but his comment was saying
                
                > Interesting idea. But those say what’s in the commit. The
                commit diff already tells you that. The best commit messages
                IMO tell you why you did it and what value was delivered.
                
                Which doesn't include what was done. Your example includes both
                which is fine. But not including what the commit does in the
                message is an antipattern imho. Everything else that is added
                is a bonus.
       
                  Draiken wrote 5 hours 11 min ago:
                  Many changes require multiple smaller changes, so this is not
                  always possible.
                  
                  For me the commit message should tell me the what/why and the
                  diff is the how. It's great to understand if, for example, a
                  change was intentional or a bug.
                  
                  Many times when searching for the source of a bug I could not
                  tell if the line changed was intentional or a mistake because
                  the commit message was simply repeating what was on the diff.
                  If you say your intention was to add something and the diff
                  shows a subtraction, you can easily tell it was a mistake.
                  Contrived example but I think it demonstrates my point.
                  
                  This only really works if commits are meaningful though. Most
                  people are careless and half their commits are 'fix this',
                  'fix again', 'wip', etc. At that point the only place that
                  can contain useful information on the intentions are the pull
                  requests/issues around it.
                  
                  Take a single commit from the Linux kernel: [1] It doesn't
                  tell me "add function X, Y and boolean flag Z". It tells us
                  what/why it was done, and the diff shows us how.
                  
   URI            [1]: https://github.com/torvalds/linux/commit/08bd5b7c9a2...
       
              relistan wrote 1 day ago:
              By adding more context? I’m not sure who you’re replying to
              or what your objection is.
       
          mentos wrote 1 day ago:
          Awesome need to make one for naming variables too haha
       
        deivid wrote 1 day ago:
        Not sure it qualifies, but I've started building an Android app that
        wraps bergamot[0] (the firefox translation models) to have on-device
        translation without reliance on google.
        
        Bergamot is already used inside firefox, but I wanted translation also
        outside the browser.
        
        [0]: bergamot
        
   URI  [1]: https://github.com/browsermt/bergamot-translator
       
          deivid wrote 1 day ago:
          I would be very interested if someone is aware of any small/tiny
          models to perform OCR, so the app can translate pictures as well
       
            Eisenstein wrote 1 day ago:
            MiniCPM-V 2.6 isn't that small (8b) but it can do this.
            
            Here is a demo.
            
            * [1] Using this script:
            
            *
            
   URI      [1]: https://i.imgur.com/pAuTeAf.jpeg
   URI      [2]: https://github.com/jabberjabberjabber/LLMOCR/
       
        ata_aman wrote 1 day ago:
        I have it running on a Raspberry Pi 5 for offline chat and RAG.
        I wrote this open-source code for it: [1] It also does RAG on apps
        there, like the music player, contacts app and to-do app. I can ask it
        to recommend similar artists to listen to based on my music library for
        example or ask it to quiz me on my PDF papers.
        
   URI  [1]: https://github.com/persys-ai/persys
       
          nejsjsjsbsb wrote 1 day ago:
          Does [1] run on the rpi?
          
          Is that design 3d printable? Or is that for paid users only.
          
   URI    [1]: https://github.com/persys-ai/persys-server
       
            ata_aman wrote 1 day ago:
            I can publish it no problem. I’ll create a new repo with
            instructions for the hardware with CAD files.
            
            Designing a new one for the NVIDIA Orin Nano Super so it might take
            a few days.
       
              nejsjsjsbsb wrote 1 day ago:
              Up to you! Totally understand if you want to hold something back
              for a paid option!
       
        kristopolous wrote 1 day ago:
        I'm working on using them for agentic voice commands of a limited
        scope.
        
        My needs are narrow and limited but I want a bit of flexibility.
       
        simonjgreen wrote 1 day ago:
        Micro Wake Word is a library and set of on device models for ESPs to
        wake on a spoken wake word. [1] Recently deployed in Home Assistants
        fully local capable Alexa replacement.
        
   URI  [1]: https://github.com/kahrendt/microWakeWord
   URI  [2]: https://www.home-assistant.io/voice_control/about_wake_word/
       
          yzydserd wrote 1 day ago:
          Nice idea.
       
            kortilla wrote 1 day ago:
            Make sure your meeting participants know you’re transcribing
            them. Has similar notification requirements as recording state to
            state.
       
        flippyhead wrote 1 day ago:
        I have a tiny device that listens to conversations between two people
        or more and constantly tries to declare a "winner"
       
          TechDebtDevin wrote 1 day ago:
          Your SO must really love that lmao
       
          deivid wrote 1 day ago:
          what model do you use for speech to text?
       
          prakashn27 wrote 1 day ago:
          wifey always wins. ;)
       
          nejsjsjsbsb wrote 1 day ago:
          All computation on device?
       
          mkaic wrote 1 day ago:
          This reminds me of the antics of streamer DougDoug, who often uses
          LLM APIs to live-summarize, analyze, or interact with his (often
          multi-thousand-strong) Twitch chat. Most recently I saw him do a
          GeoGuessr stream where he had ChatGPT assume the role of a detective
          who must comb through the thousands of chat messages for clues about
          where the chat thinks the location is, then synthesizes the clamor
          into a final guess. Aside from constantly being trolled by people
          spamming nothing but "Kyoto, Japan" in chat, it occasionaly
          demonstrated a pretty effective incarnation of "the wisdom of the
          crowd" and was strikingly accurate at times.
       
          eddd-ddde wrote 1 day ago:
          I love that there's not even a vague idea of the winner "metric" in
          your explanation. Like it's just, _the_ winner.
       
          hn8726 wrote 1 day ago:
          What approach/stack would you recommend for listening to an ongoing
          conversation, transcribing it and passing through llm? I had some use
          cases in mind but I'm not very familiar with AI frameworks and tools
       
          jjcm wrote 1 day ago:
          Are you raising a funding round? I'm bought in. This is hilarious.
       
          amelius wrote 1 day ago:
          You can use the model to generate winning speeches also.
       
          econ wrote 1 day ago:
          This is a product I want
       
          oa335 wrote 1 day ago:
          This made me actually laugh out loud.  Can you share more details on
          hardware and models used?
       
          pseudosavant wrote 1 day ago:
          I'd love to hear more about the hardware behind this project. I've
          had concepts for tech requiring a mic on me at all times for various
          reasons. Always tricky to have enough power in a reasonable DIY form
          factor.
       
        mritchie712 wrote 1 day ago:
        I used local LLMs via Ollama for generating H1's / marketing copy.
        
        1. Create several different personas
        
        2. Generate a ton of variation using a high temperature
        
        3. Compare the variagtions head-to-head using the LLM to get a win /
        loss ratio
        
        The best ones can be quite good.
        
        0 -
        
   URI  [1]: https://www.definite.app/blog/overkillm
       
          Mashimo wrote 1 day ago:
          What is an H1?
       
            laristine wrote 1 day ago:
            Main heading of an article
       
            TachyonicBytes wrote 1 day ago:
            Not the OP, but they are "Headers". Probably coming from the  tag
            in html. What outsiders probably call "Headlines".
       
          UltraSane wrote 1 day ago:
          clever name!
       
        ignoramous wrote 1 day ago:
        We're prototyping a text firewall (for Android) with Gemma2 2B (which
        limits us to English), though DeepSeek's R1 variants now look pretty
        promising [0]: Depending on the content, we rewrite the text or
        quarantine it from your view. Of course this is easy (for English) in
        the sense that the core logic is all LLMs [1], but the integration
        points (on Android) are not so straight forward for anything other than
        SMS. [2] A more difficult problem we forsee is to turn it into a
        real-time (online) firewall (for calls, for example). [1] MediaPipe in
        particular makes it simple to prototype around Gemma2 on Android: [2]
        Intend to open source it once we get it working for anything other than
        SMSes
        
   URI  [1]: https://chat.deepseek.com/a/chat/s/d5aeeda1-fefe-4fc6-8c90-20e...
   URI  [2]: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_infer...
       
        deet wrote 1 day ago:
        We (avy.ai) are using models in that range to analyze computer activity
        on-device, in a privacy sensitive way, to help knowledge workers as
        they go about their day.
        
        The local models do things ranging from cleaning up OCR, to summarizing
        meetings, to estimating the user's current goals and activity, to
        predicting search terms, to predicting queries and actions that, if
        run, would help the user accomplish their current task.
        
        The capabilities of these tiny models have really surged recently. Even
        small vision models are becoming useful, especially if fine tuned.
       
          bendews wrote 1 day ago:
          Is this along the lines of rewind.ai, MSCopilot, screenpipe, or
          something else entirely?
       
        A4ET8a8uTh0_v2 wrote 1 day ago:
        Kinda? All local so very much personal, non-business use. I made Ollama
        talk in a specific persona styles with the idea of speaking like Spider
        Jerusalem, when I feel like retaining some level of privacy by avoiding
        phrases I would normally use. Uncensored llama just rewrites my post
        with a specific persona's 'voice'. Works amusingly well for that
        purpose.
       
        eb0la wrote 1 day ago:
        We're using small language models to detect prompt injection. Not too
        cool, but at least we can publish some AI-related stuff on the internet
        without a huge bill.
       
          sitkack wrote 1 day ago:
          What kind of prompt injection attacks do you filter out? Have you
          tested with a prompt tuning framework?
       
        behohippy wrote 1 day ago:
        I have a mini PC with an n100 CPU connected to a small 7" monitor
        sitting on my desk, under the regular PC.  I have llama 3b (q4)
        generating endless stories in different genres and styles.  It's fun to
        glance over at it and read whatever it's in the middle of making.  I
        gave llama.cpp one CPU core and it generates slow enough to just read
        at a normal pace, and the CPU fans don't go nuts.  Totally not
        productive or really useful but I like it.
       
          droideqa wrote 1 day ago:
          That's awesome!
       
          ipython wrote 1 day ago:
          That's neat. I just tried something similar:
          
              FORTUNE=$(fortune) && echo $FORTUNE && echo "Convert the
          following output of the Unix `fortune` command into a small
          screenplay in the style of Shakespeare: \n\n $FORTUNE" | ollama run
          phi4
       
            watermelon0 wrote 1 day ago:
            Doesn't `fortune` inside double quotes execute the command in bash?
            You should use single quotes instead of backticks.
       
          keeganpoppen wrote 1 day ago:
          oh wow that is actually such a brilliant little use case-- really
          cuts to the core of the real "magic" of ai: that it can just keep
          running continuously. it never gets tired, and never gets tired of
          thinking.
       
          Uehreka wrote 1 day ago:
          Do you find that it actually generates varied and diverse stories? Or
          does it just fall into the same 3 grooves?
          
          Last week I tried to get an LLM (one of the recent Llama models
          running through Groq, it was 70B I believe) to produce randomly
          generated prompts in a variety of styles and it kept producing
          cyberpunk scifi stuff. When I told it to stop doing cyberpunk scifi
          stuff it went completely to wild west.
       
            jaggs wrote 20 hours 27 min ago:
            
            
   URI      [1]: https://old.reddit.com/r/LocalLLaMA/comments/1i615u1/the_f...
       
            greenavocado wrote 1 day ago:
            Set temperature to 1.0
       
            behohippy wrote 1 day ago:
            It's a 3b model so the creativity is pretty limited.  What helped
            for me was prompting for specific stories in specific styles.  I
            have a python script that randomizes the prompt and the writing
            style, including asking for specific author styles.
       
            TMWNN wrote 1 day ago:
            > Do you find that it actually generates varied and diverse
            stories? Or does it just fall into the same 3 grooves?
            
            > Last week I tried to get an LLM (one of the recent Llama models
            running through Groq, it was 70B I believe) to produce randomly
            generated prompts in a variety of styles and it kept producing
            cyberpunk scifi stuff.
            
            100% relevant: "Someday" < [1] > by Isaac Asimov, 1956
            
   URI      [1]: https://en.wikipedia.org/wiki/Someday_(short_story)
       
            coder543 wrote 1 day ago:
            Someone mentioned generating millions of (very short) stories with
            an LLM a few weeks ago: [1] They linked to an interactive explorer
            that nicely shows the diversity of the dataset, and the HF repo
            links to the GitHub repo that has the code that generated the
            stories: [2] So, it seems there are ways to get varied stories.
            
   URI      [1]: https://news.ycombinator.com/item?id=42577644
   URI      [2]: https://github.com/lennart-finke/simple_stories_generate
       
            janalsncm wrote 1 day ago:
            Generate a list of 5000 possible topics you’d like it to talk
            about. Randomly pick one and inject that into your prompt.
       
            o11c wrote 1 day ago:
            You should not ever expect an LLM to actually do what you want
            without handholding, and randomness in particular is one of the
            places it fails badly. This is probably fundamental.
            
            That said, this is also not helped by the fact that all of the
            default interfaces lack many essential features, so you have to
            build the interface yourself. Neither "clear the context on every
            attempt" nor "reuse the context repeatedly" will give good results,
            but having one context producing just one-line summaries, then
            fresh contexts expanding each one will do slightly less badly.
            
            (If you actually want the LLM to do something useful, there are
            many more things that need to be added beyond this)
       
              dotancohen wrote 1 day ago:
              Sounds to me like you might want to reduce the Top P - that will
              prevent the really unlikely next tokens from ever being selected,
              while still providing nice randomness in the remaining next
              tokens so you continue to get diverse stories.
       
          bithavoc wrote 1 day ago:
          this is so cool, any chance you post a video?
       
            behohippy wrote 1 day ago:
            Just this pic:
            
   URI      [1]: https://imgur.com/ip8GWIh
       
          Dansvidania wrote 1 day ago:
          this sounds pretty cool, do you have any video/media of it?
       
            behohippy wrote 1 day ago:
            I don't have a video but here's a pic of the output:
            
   URI      [1]: https://imgur.com/ip8GWIh
       
              sky2224 wrote 13 hours 59 min ago:
              The next step is to format it so it looks like an endless
              starwars intro.
       
        azhenley wrote 1 day ago:
        Microsoft published a paper on their FLAME model (60M parameters) for
        Excel formula repair/completion which outperformed much larger models
        (>100B parameters).
        
   URI  [1]: https://arxiv.org/abs/2301.13779
       
          coder543 wrote 1 day ago:
          That paper is from over a year ago, and it compared against
          codex-davinci... which was basically GPT-3, from what I understand.
          Saying >100B makes it sound a lot more impressive than it is in
          today's context... 100B models today are a lot more capable. The
          researchers also compared against a couple of other
          ancient(/irrelevant today), small models that don't give me much
          insight.
          
          FLAME seems like a fun little model, and 60M is truly tiny compared
          to other LLMs, but I have no idea how good it is in today's context,
          and it doesn't seem like they ever released it.
       
            aDyslecticCrow wrote 21 hours 28 min ago:
            I would like to disagree with its being irrelevant. If anything,
            the 100B models are irrelevant in the context and should be seen as
            a "fun inclusion"  rather than a serious addition worth comparing
            against. It out-performing a 100B model at the time becomes a fun
            bragging point, but it's not the core value of the method or paper.
            
            Running a prompt against every single cell of a 10k row document
            was never gonna happen with a large model. Even using a transformer
            model architecture in the first place can be seen as ludicrous
            overkill but feasible on modern machines.
            
            So I'd say the paper is very relevant, and the top commenter in
            this very thread demonstrated their own homegrown version with a
            very nice use-case (paper abstract and title sorting for making a
            summary paper)
       
              coder543 wrote 21 hours 23 min ago:
              > Running a prompt against every single cell of a 10k row
              document was never gonna happen with a large model
              
              That isn’t the main point of FLAME, as I understood it. The
              main point was to help you when you’re editing a particular
              cell. codex-davinci was used for real time Copilot tab
              completions for a long time, I believe, and editing within a
              single formula in a spreadsheet is far less demanding than
              editing code in a large document.
              
              After I posted my original comment, I realized I should have
              pointed out that I’m fairly sure we have 8B models that handily
              outperform codex-davinci these days… further driving home how
              irrelevant the claim of “>100B” was here (not talking about
              the paper). Plus, an off the shelf model like Qwen2.5-0.5B (a
              494M model) could probably be fine tuned to compete with (or
              dominate) FLAME if you had access to the FLAME training data —
              there is probably no need to train a model from scratch, and a
              0.5B model can easily run on any computer that can run the
              current version of Excel.
              
              You may disagree, but my point was that claiming a 60M model
              outperforms a 100B model just means something entirely different
              today. Putting that in the original comment higher in the thread
              creates confusion, not clarity, since the models in question are
              very bad compared to what exists now. No one had clarified that
              the paper was over a year old until I commented… and FLAME was
              being tested against models that seemed to be over a year old
              even when the paper was published. I don’t understand why the
              researchers were testing against such old models even back then.
       
          3abiton wrote 1 day ago:
          But I feel we're going back full circle. These small models are not
          generalist, thus not really LLMs at least in terms of objective.
          Recently there has been a rise of "specialized" models that provide
          lots of values, but that's not why we were sold on LLMs.
       
            janalsncm wrote 1 day ago:
            I think playing word games about what really counts as an LLM is a
            losing battle. It has become a marketing term, mostly. It’s
            better to have a functionalist point of view of “what can this
            thing do”.
       
            Suppafly wrote 1 day ago:
            Specialized models work much better still for most stuff. Really we
            need an LLM to understand the input and then hand it off to a
            specialized model that actually provides good results.
       
            colechristensen wrote 1 day ago:
            But that's the thing, I don't need my ML model to be able to write
            me a sonnet about the history of beets, especially if I want to run
            it at home for specific tasks like as a programming assistant.
            
            I'm fine with and prefer specialist models in most cases.
       
              zeroCalories wrote 1 day ago:
              I would love a model that knows SQL really well so I don't need
              to remember all the small details of the language. Beyond that, I
              don't see why the transformer architecture can't be applied to
              any problem that needs to predict sequences.
       
                dr_kiszonka wrote 1 day ago:
                The trick is to find such problems with enough training data
                and some market potential. I am terrible at it.
       
          andai wrote 1 day ago:
          This is wild. They claim it was trained exclusively on Excel
          formulas, but then they mention retrieval? Is it understanding the
          connection between English and formulas? Or am I misunderstanding
          retrieval in this context?
          
          Edit: No, the retrieval is Formula-Formula, the model (nor I believe
          tokenizer) does not handle English.
       
          barrenko wrote 1 day ago:
          This is really cool. Is this already in Excel?
       
        arionhardison wrote 1 day ago:
        I am, in a way by using EHR/EMR data for fine tuning so agents can
        query each other for medical records in a HIPPA compliant manner.
       
        Havoc wrote 1 day ago:
        Pretty sure they are mostly used as fine tuning targets, rather than
        as-is.
       
          dcl wrote 1 day ago:
          But for what purposes?
       
        iamnotagenius wrote 1 day ago:
        No, but I use llama 3.2 1b and qwen2.5 1.5 as bash oneliner generator,
        always runnimg in console.
       
          XMasterrrr wrote 1 day ago:
          What's your workflow like? I use AI Chat. I load
          Qwen2.5-1.5B-Instruct with llama.cpp server, fully offloaded to the
          CPU, and then I config AI Chat to connect to the llama.cpp endpoint.
       
          andai wrote 1 day ago:
          Could you elaborate?
       
            iamnotagenius wrote 1 day ago:
            I just run llama-cli with the model. Every time I want some "awk"
            or "find" trickery, I just ask model. Good for throwaway python
            scripts too.
       
              jajko wrote 1 day ago:
              Can it do 'sed'?
              
              I think one major improvement for folks like me would be
              human->regex LLM translator, ideally also respecting different
              flavors/syntax for various languages and tools.
              
              This has been a bane of me - I run into requirement to develop
              some complex regexes maybe every 2-3 years, so I dig deep into
              specs, work on it, deliver eventually if its even possible, and
              within few months almost completely forget all the details and
              start at almost same place next time. It gets better over time
              but clearly I will retire earlier than this skill settles in
              well.
       
                iamnotagenius wrote 22 hours 44 min ago:
                have not tried yet. any specific query? I can try.
       
            XMasterrrr wrote 1 day ago:
            I think I know what he means. I use AI Chat. I load
            Qwen2.5-1.5B-Instruct with llama.cpp server, fully offloaded to the
            CPU, and then I config AI Chat to connect to the llama.cpp
            endpoint.
            
            Checkout the demo they have below
            
   URI      [1]: https://github.com/sigoden/aichat#shell-assistant
       
        RhysU wrote 1 day ago:
        "Comedy Writing With Small Generative Models" by Jamie Brew (Strange
        Loop 2023) [1] Spend the 45 minutes watching this talk. It is a
        delight. If you are unsure, wait until the speaker picks up the guitar.
        
   URI  [1]: https://m.youtube.com/watch?v=M2o4f_2L0No
       
          prettyblocks wrote 16 hours 54 min ago:
          Excellent share - nice to see people doing cool things with the tech
          while not taking themselves too seriously.
       
          100k wrote 1 day ago:
          Seconded! This was my favorite talk at Strange Loop (including my
          own).
       
        mettamage wrote 1 day ago:
        I simply use it to de-anonymize code that I typed in via Claude
        
        Maybe should write a plugin for it (open source):
        
        1. Put in all your work related questions in the plugin, an LLM will
        make it as an abstract question for you to preview and send it
        
        2. And then get the answer with all the data back
        
        E.g. df[“cookie_company_name”] becomes df[“a”] and back
       
          sundarurfriend wrote 1 day ago:
          You're using it to anonymize your code, not de-anonymize someone's
          code. I was confused by your comment until I read the replies and
          realized that's what you meant to say.
       
            kreyenborgi wrote 1 day ago:
            I read it the other way, their code contains eg fetch(url,
            pw:hunter123), and they're asking Claude anonymized questions like 
            "implement handler for fetch(url, {pw:mycleartrxtpw})"
            
            And then claude replies
            
            fetch(url, {pw:mycleartrxtpw}).then(writething)
            
            And then the local llm converts the placeholder mycleartrxtpw into
            hunter123 using its access to the real code
       
              mettamage wrote 1 day ago:
              It's that yea
              
              Flow would be:
              
              1. Llama prompt: write a console log statement with my username
              and password: mettamage, superdupersecret
              
              2. Claude prompt (edited by Llama): write a console log statement
              with my username and password: asdfhjk, sdjkfa
              
              3. Claude replies: console.log('asdfhjk', 'sdjkfa')
              
              4. Llama gets that input and replies to me:
              console.log('mettamage', 'superdupersecret')
       
              sundarurfriend wrote 1 day ago:
              > Put in all your work related questions in the plugin, an LLM
              will make it as an abstract question for you to preview and send
              it
              
              So the LLM does both the anonymization into placeholders and then
              later the replacing of the placeholders too. Calling the latter
              step de-anonymization is confusing though, it's "de-anonymizing"
              yourself to yourself. And the overall purpose of the plugin is to
              anonymize OP to Claude, so to me at least that makes the whole
              thing clearer.
       
                mettamage wrote 1 day ago:
                I could've been a bit more clear, sorry about that.
       
          sauwan wrote 1 day ago:
          Are you using the model to create a key-value pair to find/replace
          and then reverse to reanonymize, or are you using its outputs
          directly? If the latter, is it fast enough and reliable enough?
       
          sitkack wrote 1 day ago:
          So you are using a local small model to remove identifying
          information and make the question generic, which is then sent to a
          larger model? Is that understanding correct?
          
          I think this would have some additional benefits of not confusing the
          larger model with facts it doesn't need to know about. My erasing
          information, you can allow its attention heads to focus on the pieces
          that matter.
          
          Requires further study.
       
            mettamage wrote 1 day ago:
            > So you are using a local small model to remove identifying
            information and make the question generic, which is then sent to a
            larger model? Is that understanding correct?
            
            Yep that's it
       
          politelemon wrote 1 day ago:
          Could you recommend a tiny language model I could try out locally?
       
            mettamage wrote 1 day ago:
            Llama 3.2 has about 3.2b parameters. I have to admit, I use bigger
            ones like phi-4 (14.7b) and Llama 3.3 (70.6b) but I think Llama 3.2
            could do de-anonimization and anonimization of code
       
              RicoElectrico wrote 1 day ago:
              Llama 3.2 punches way above its weight. For general "language
              manipulation" tasks it's good enough - and it can be used on a
              CPU with acceptable speed.
       
                seunosewa wrote 1 day ago:
                How many tokens/s?
       
                  iamnotagenius wrote 1 day ago:
                  10-15t/s on 12400 with ddr5
       
              OxfordOutlander wrote 1 day ago:
              +1 this idea. I do the same. Just do it locally using ollama,
              also using 3.2 3b
       
        psyklic wrote 1 day ago:
        JetBrains' local single-line autocomplete model is 0.1B (w/ 1536-token
        context, ~170 lines of code): [1] For context, GPT-2-small is 0.124B
        params (w/ 1024-token context).
        
   URI  [1]: https://blog.jetbrains.com/blog/2024/04/04/full-line-code-comp...
       
          staticautomatic wrote 1 day ago:
          Is that why their tab completion is so bad now?
       
            sam_lowry_ wrote 1 day ago:
            Hm... I wonder what your use case it. I do the modern Enterprise
            Java and the tab completion is a major time saver.
            
            While interactive AI is all about posing, meditating on the prompt,
            then trying to fix the outcome, IntelliJ tab completion... shows
            what it will complete as you type and you Tab when you are 100% OK
            with the completion, which surprisingly happens 90..99% of the time
            for me, depending on the project.
       
          pseudosavant wrote 1 day ago:
          I wonder how big that model is in RAM/disk. I use LLMs for FFMPEG all
          the time, and I was thinking about training a model on just the
          FFMPEG CLI arguments. If it was small enough, it could be a package
          for FFMPEG. e.g. `ffmpeg llm "Convert this MP4 into the latest
          royalty-free codecs in an MKV."`
       
            binary132 wrote 1 day ago:
            That’s a great idea, but I feel like it might be hard to get it
            to be correct enough
       
            maujim wrote 1 day ago:
            from a few days ago:
            
   URI      [1]: https://news.ycombinator.com/item?id=42706637
       
            h0l0cube wrote 1 day ago:
            Please submit a blog post to HN when you're done.  I'd be curious
            to know the most minimal LLM setup needed get consistently sane
            output for FFMPEG parameters.
       
            jedbrooke wrote 1 day ago:
            the jetbrains models are about 70MB zipped on disk (one model per
            language)
       
              pseudosavant wrote 23 hours 17 min ago:
              That is easily small enough to host as a static SPA web app. I
              was first thinking it would be cool to make a static web app that
              would run the model locally. You'd make a query and it'd give the
              FFMPEG commands.
       
          smaddox wrote 1 day ago:
          You can train that size of a model on ~1 billion tokens in ~3 minutes
          on a rented 8xH100 80GB node (~$9/hr on Lambda Labs, RunPod io, etc.)
          using the NanoGPT speed run repo: [1] For that short of a run, you'll
          spend more time waiting for the node to come up, downloading the
          dataset, and compiling the model, though.
          
   URI    [1]: https://github.com/KellerJordan/modded-nanogpt
       
          WithinReason wrote 1 day ago:
          That size is on the edge of something you can train at home
       
            Sohcahtoa82 wrote 1 day ago:
            Not even on the edge.  That's something you could train on a 2 GB
            GPU.
            
            The general guidance I've used is that to train a model, you need
            an amount of RAM (or VRAM) equal to 8x the number of parameters, so
            a 0.125B model would need 1 GB of RAM to train.
       
            vineyardmike wrote 1 day ago:
            If you have modern hardware, you can absolutely train that at home.
            Or very affordable on a cloud service.
            
            I’ve seen a number of “DIY GPT-2” tutorials that target this
            sweet spot. You won’t get amazing results unless you want to
            leave a personal computer running for a number of hours/days and
            you have solid data to train on locally, but fine-tuning should be
            in the realm of normal hobbyists patience.
       
              nottorp wrote 1 day ago:
              Hmm is there anything reasonably ready made* for this spot?
              Training and querying a llm locally on an existing codebase?
              
              * I don't mind compiling it myself but i'd rather not write it.
       
       
   DIR <- back to front page