_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Go hard on agents, not on your filesystem
       
       
        volume_tech wrote 3 hours 52 min ago:
        The filesystem sandboxing problem is real but the browser version of
        this is arguably worse. A coding agent that escapes its sandbox can
        delete files — bad, but recoverable from git. A browser agent with
        access to your real authenticated sessions can click "transfer" on your
        bank, accept terms on a contract, or send emails as you. And unlike
        filesystem paths, you can't easily whitelist which URLs or actions are
        safe — the agent needs broad access to be useful.
        
        The capabilities-based approach mentioned downthread is probably the
        right direction for both. Instead of trying to blacklist dangerous
        operations, give the agent narrow capabilities: "you can read this page
        but not click submit buttons" or "you can navigate these 5 domains."
        The hard part is that useful browser automation almost always requires
        the dangerous capabilities (filling forms, clicking buttons,
        authenticated sessions).
       
        georaa wrote 5 hours 52 min ago:
        Everyone talks about sandboxing the filesystem but nobody talks about
        what happens when the agent's work outlives the container. Reset
        happens, state is gone, you start over. I've lost more agent work to
        session timeouts than to any security issue. Isolation without
        persistence just means you lose progress safely.
       
        game_the0ry wrote 6 hours 20 min ago:
        I may be paranoid but only run my ai cli tools in a vps only. I have
        them installed locally but never use them. In a vps I go full yolo mode
        bc I do not care about it. It is a slightly more cumbersome workload,
        bit if you have a dev + staging envs, then you never have to develop
        and run stuff locally, which brings the local hardware requirements and
        costs down too (bc you can develop with a base macbook neo).
       
        otterley wrote 7 hours 2 min ago:
        "jai is free software, brought to you by the Stanford Secure Computer
        Systems research group and the Future of Digital Currency Initiative"
        
        I guess the "Future of Digital Currency Initiative" had to pivot to a
        more useful purpose than studying how Bitcoin is going to change the
        world.
       
        mehdibl wrote 7 hours 5 min ago:
        Docker is hard to setup.
        The author made a nice solution but not sure if he know devcontainer
        and what he can do.
        You do the setup once and you roll in most dev tools.
        I'm still surprised the effort people put in such solution ignore the
        dev's core requirements, like sharing the env they use in a simple way.
        You used it to have custom env and isolate the agent.
        You want to persist your credentials? Mount the target folder from home
        or sl into a sub folder.
        Might be knowledge.
        But for Linux or even Windows/Mac as long you don't need desktop fully.
        Devcontainer is simple. A standard that works. And it's very mature.
       
          sleepytree wrote 6 hours 44 min ago:
          I'm surprised from reading these comments that more people aren't
          chiming in to ask why this solution is better than a dev container.
          That seems like the obviously best way to setup security boundaries
          that don't require you to still trust that AI will do what you ask
          it. You can run it remotely and it's portable etc.
       
        ma2kx wrote 7 hours 18 min ago:
        Its a bit annoying that there are so many solutions to run agents and
        sandbox them but no established best practice. It would be nice to have
        some high level orchestration tools like docker / podman where you can
        configure how e.g. claude code, opencode, codex, openclaw run in open
        Shell, OCI container, jai etc.
        
        Especially because everybody can ask chatgpt/claude how to run some
        agents without any further knowledge I feel we should handle it more
        like we are handling encryption where the advice is to use established
        libraries and don't implement those algorithms by yourself.
       
        jimmar wrote 7 hours 22 min ago:
        From the home page:
        
        > Stop trusting blindly
        
        > One-line installer scripts,
        
        Here are the manual install instructions from the "Install / Build
        page:
        
        > curl -L [1] | tar xzf -
        
        > cd jai
        
        > makepkg -i
        
        So, trust their jai tool, but not _other_ installer scripts?
        
   URI  [1]: https://aur.archlinux.org/cgit/aur.git/snapshot/jai.tar.gz
       
          mazieres wrote 4 hours 1 min ago:
          Yes, unpacking a tar file is much safer than piping arbitrary code to
          bash!  You can look at the PKGFILE in the directory--it is only 30
          lines long and mostly variable assignments.  The build/check/package
          functions are 7 lines of code total.  Compare that to something like
          rustup (910 lines of code), claude (158 lines), or opencode (460
          lines).
       
          da_chicken wrote 7 hours 19 min ago:
          No, no, see this is untrustworthy:
          
            curl -L https://aur.archlinux.org/cgit/aur.git/snapshot/jai.tar.gz
          | tar xzf - && cd jai && makepkg -i
       
        maxbeech wrote 8 hours 8 min ago:
        the safety concerns compound significantly when you move from
        interactive to unattended execution. in interactive mode you can catch
        a bad command before it completes. run the same agent on a schedule at
        3am with no one watching and there's no fallback.i built something that
        schedules claude code jobs to run in the background (openhelm.ai). the
        layered approach we use: separate OS user account with only project
        directory write access, claude's native seatbelt/bubblewrap sandboxing,
        and a mandatory plan review step before any job's first run. you can't
        approve every individual action at runtime, but you can approve the
        shape of the plan upfront - which catches most of the scary stuff.the
        paper's point about clean agent-specific filesystem abstractions
        resonates. the scope definition problem (what exactly should this agent
        be able to touch?) is actually the hard part - enforcement is
        relatively mechanical once you've answered that. and for scheduled
        workloads, answering that question explicitly at job creation time
        forces the kind of thinking that prevents the 3am disasters.
       
        pkulak wrote 8 hours 43 min ago:
        Installation is a bit... unsupported unless you're on Arch. Here's a
        Nix setup I (and Claude!) came up with: [1] Arg, annoying that it puts
        its config right in my home folder...
        
        EDIT: Actually, I'm having a heck of a time packaging this properly.
        Disregard for now!
        
        EDIT2: It was a bit more complicated than a single derivation. Had to
        wrap it in a security wrapper, and patch out some stuff that doesn't
        work on the 25.11 kernel.
        
   URI  [1]: https://github.com/pkulak/nix/tree/main/common/jai
       
        Myzel394 wrote 9 hours 4 min ago:
        What's the difference between this and agent-safehouse?
       
        youknownothing wrote 9 hours 18 min ago:
        This is a great time for Apple to relaunch their Time Machine devices,
        have a history of everything in your file system because sooner or
        later some AI is going to delete it...
       
        micimize wrote 9 hours 51 min ago:
        This is very cool - I try to have a container-centric setup but
        sometimes YOLOcal clauding is too tempting.
        
        My biggest question skimming over the docs is what a workflow for
        reviewing and applying overlay changes to the out-of-cwd dirs would be.
        
        Also, bit tangential but if anyone has slightly more in-depth resources
        for grasping the security trade-offs between these kind of
        Linux-leveraging sandboxes, containers, and remote VMs I'd appreciate
        it. The author here implies containers are still more secure in
        principle, and my intuition is that there's simply less unknowns from
        my perspective, but I don't have a firm understanding.
        
        Anyhow, kudos to the author again, looks useful.
       
        RodMiller wrote 10 hours 8 min ago:
        Sandboxing and verification are two different things. Sandboxing
        answers what can this agent touch. Verification answers what does it
        actually do with what it touches. Even inside a perfect jail, the agent
        can still hallucinate, exfiltrate data over the network, or fold the
        second you push back on its answer.
        
        I've been building an independent benchmarking platform for AI agents.
        The two approaches are complementary. Sandbox the environment, verify
        the agent.
       
        mark_l_watson wrote 10 hours 32 min ago:
        Looks good, but only Linux is supported. I like spinning up VPS’s and
        then discarding them when I am done. On macOS, something I haven/t
        tried yet but plan to: create a separate user account.
       
        hoppp wrote 10 hours 49 min ago:
        Something like freeBSD jails would be perfect for agents.
       
        vijucat wrote 11 hours 5 min ago:
        Well, I'm on Windows (+ Cygwin) and wrote a Dockerfile. It wasn't that
        hard. git branch + worktree + a docker container per project and I can
        work with copilot in --yolo mode (or claude
        --dangerously-skip-permissions, whichever). vscode is pretty smooth at
        installing the VS Code Server on first connection to a docker
        container, too, and I just open up the workspace in a minute.
       
        hiq wrote 11 hours 34 min ago:
        Is there already some more established setup to do "secure" development
        with agents, as in, realistically no chance it would compromise the
        host machine?
        
        E.g. if I have a VM to which I grant only access to a folder with some
        code (let's say open-source, and I don't care if it leaks) and to the
        Internet, if I do my agent-assistant coding within it, it will only
        have my agent credentials it can leak. Then I can do git operations
        with my credentials outside of the VM.
        
        Is there a more convenient setup than this, which gives me similar
        security guarantees? Does it come with the paid offerings of the top
        providers? Or is this still something I'd have to set up separately?
       
        Bender wrote 11 hours 42 min ago:
        I would have to be very inebriated to give a bot/agent access to my
        files and all security clearance should be revoked but should I do that
        it would have to be under mandatory access controls that my
        unprivileged user has no influence over, not even with sudo or doas. 
        The LSM enforced rules (SELinux, AppArmor, TOMOYO, other newer or
        simpler LSM's) would restrict all by default and give explicit read,
        write, execute permissions to specific files or directories.
        
        The bot should also be instructed that it gets 3 strikes before being
        removed meaning it should generate a report of what it believes it
        wants to access to and gets verbal approval or denial.    That should not
        be so difficult with today's bots.  If it wants to act like a human
        then it gets simple rules like a human.   Ask the human operator for
        permission.  If the bot starts "doing it's own thing, aka going rogue"
        then it gets punished.    Perhaps another bot needs to act as a
        dominatrix to be a watcher over the assistant bot.
       
        driverdan wrote 11 hours 52 min ago:
        Are there any similar ways of isolating environment variables, secrets,
        and credentials? Everyone is thinking about the file system but I
        haven't seen as much discussion about exposing secrets and account
        access.
       
        docmars wrote 12 hours 12 min ago:
        Jai is the name of a programming language, no?
       
        imranstrive7 wrote 12 hours 26 min ago:
        I tried something similar while building my tool site — biggest issue
        was SEO indexing. Fixed it by improving internal linking instead of
        relying on sitemap.
       
        MagicMoonlight wrote 12 hours 28 min ago:
        This site was definitely slopcoded with Claude. They have a real
        distinctive look.
       
        holtwick wrote 12 hours 32 min ago:
        Inspired by this tool I wrote something that fits macOS better. It uses
        the native sandbox-exec from Apple and can wrap other apps as well,
        like VSCode in which you usually run AI stuff.
        
   URI  [1]: https://github.com/holtwick/bx-mac
       
        love2read wrote 12 hours 49 min ago:
        Is there an equivalent for macOS?
       
        Game_Ender wrote 13 hours 23 min ago:
        Where is the network isolation?  I want to be able to be able to limit
        what external resources the agent can access and also inject secrets at
        request time so the agent does have access to them.
        
        File system isolation is easy now, it’s not worth HN front page space
        for the n’th version.  It’s a solved problem (and now included in
        Claude clCode).
       
        thedelanyo wrote 13 hours 40 min ago:
        Most of what we're doing with Ai today, we've been doing it pretty just
        fine without any confusion.
        
        I've been struggling to find what Ai has intrinsically solved new that
        gives us the chance to completely change workflows, other these weird
        things occuring.
       
        boutell wrote 14 hours 0 min ago:
        Plain old Unix permissions can get it done. One account for you, one
        account for AI. A shared folder belonging to a group that both are in.
        umask and setgid to get the story right for new files.
        
   URI  [1]: https://apostrophecms.com/blog/how-to-be-more-productive-with-...
       
        mbravorus wrote 14 hours 27 min ago:
        or you can just run nanoclaw for isolation by default?
        
   URI  [1]: https://nanoclaw.dev
       
        ontouchstart wrote 14 hours 28 min ago:
        AI safety is just like any technology safety, you can’t bubble wrap
        everything. Thinking about early stage of electricity, it was deadly
        (and still is), but we have proper insulation and industry standards
        and regulations, plus common sense and human learning. We are safe
        (most of the time).
        
        This also applies to the first technology human beings developed: fire
        .
       
        Aldipower wrote 14 hours 32 min ago:
        $ lxc exec claude bash
        
        Easy :-)
        lxd/lxc containers are much much underrated. Works only with Linux
        though.
       
        jqbd wrote 14 hours 40 min ago:
        Would like to see something more comprehensive built on zfs and freebsd
        jails. Namely snapshot/checkpoint before each prompt, quick undo for
        changes made by agent, auto delete old snapshots etc
       
        te_chris wrote 14 hours 50 min ago:
        This looks nice, but on mac you can virtualise really easily into
        microvms now with [1] .
        
        I've built my own cli that runs the agent + docker compose (for the app
        stack) inside container for dev and it's working great. I love
        --dangerously-skip-permissions. There's 0 benefit to us whitelisting
        the agent while it's in flight.
        
        Anthropic's new auto mode looks like an untrustworthy solution in
        search of a problem - as an aside. Not sure who thought security == ml
        classification layer but such is 2026.
        
        If you're on linux and have kvm, there's Lima and Colima too.
        
   URI  [1]: https://github.com/apple/container
       
        GistNoesis wrote 15 hours 14 min ago:
        TLDR: It's easy : LLM outputs are untrusted. Agents by virtue of
        running untrusted inputs are malware. Handle them like the malware they
        are.
        
        >>> "While this web site was obviously made by an LLM"
        So I am expecting to trust the LLM written security model [1] These
        guys are experts from a prestigious academic institution. Leading
        "Secure Computer Systems", whose logo is a 7 branch red star, which
        looks like a devil head, with white palm trees in the background. They
        are also chilling for some Blockchain research, and future digital
        currency initiative, taking founding from DARPA.
        
        The website also points towards external social networks for reference
        to freely spread Fear Uncertainty Doubt.
        
        So these guys are saying, go on run malware on your computer but do so
        with our casual sandbox at your own risk.
        
        Remember until yesterday Anthropic aka Claude was officially a supply
        chain risk.
        
        If you want to experiment with agents safely (you probably can't), I
        recommend building them from the ground up (to be clear I recommend you
        don't but if you must) by writing the tools the LLM is allowed to use,
        yourself, and by determining at each step whether or not you broke the
        security model.
        
        Remember that everything which comes from a LLM is untrusted. You'll be
        tempted to vibe-code your tools. The LLMs will try to make you install
        some external dependencies, which you must decide if you trust them or
        not and review them.
        
        Because everything produced by the LLM is untrusted, sharing the
        results is risky. A good starting point, is have the LLM, produce
        single page html page. Serve this static page from a webserver (on an
        external server  to rely on Same Origin Policy to prevent the page from
        accessing your files and network (like github pages using a new handle
        if you can't afford a vps) ). This way you rely on your browser sandbox
        to keep you safe, and you are as safe as when visiting a
        malware-infested page on the internet.
        
        If you are afraid of writing tools you can start by copy-pasting, and
        reading everything produced.
        
        Once you write tools, you'll want to have them run autonomously in a
        runaway loop taking user feedback or agent feedback as input. But even
        if everything is contained, these run away loop can and will produce
        harmful content in your name.
        
        Here is such vibe-coded experiment I did a few days ago. A simple 2d
        physics water molecules simulation for educational purposes. It is not
        physically accurate, and still have some bugs, and regressions between
        versions. Good enough to be harmful.
        
   URI  [1]: https://jai.scs.stanford.edu/security.html
   URI  [2]: https://news.ycombinator.com/item?id=47510746
       
        wafflemaker wrote 15 hours 34 min ago:
        Sorry if this question is stupid, (I'm not even using Claude*), but why
        can't people run Claude/other coding agent in a container and only
        mount the project directory to the container?
        
        *I played with codex a few months ago, but I don't even work in IT.
       
        torarnv wrote 15 hours 41 min ago:
        I’m using [1] for this, which runs Claude’s Bash tool on a remote
        machine but leaves Claude running locally otherwise.
        
        I’ve found it to be a good balance for letting Claude loose in a VM
        running the commands it wants while having all my local MCPs and tools
        still available.
        
   URI  [1]: https://github.com/torarnv/claude-remote-shell
       
        mixedbit wrote 15 hours 47 min ago:
        I work on a sandboxing tool similarly based on an idea to point the
        user home dir to a separate location ( [1] ). While I experimented with
        using overlayfs to isolate changes to the filesystem and it worked well
        as a proof-of-concept, overlayfs specification is quite restrictive
        regarding how it can be mounted to prevent undefined behaviors.
        
        I wonder if and how jai managed to address these limitations of
        overlayfs. Basically, the same dir should not be mounted as an
        overlayfs upper layer by different overlayfs mounts. If you run 'jai
        bash' twice in different terminals, do the two instances get two
        different writable home dir overlays, or the same one? In the second
        case, is the second 'jai bash' command joining the mount namespace of
        the first one, or create a new one with the same shared upper dir?
        
        This limitation of overlays is described here: [2] :
        
        'Using an upper layer path and/or a workdir path that are already used
        by another overlay mount is not allowed and may fail with EBUSY. Using
        partially overlapping paths is not allowed and may fail with EBUSY. If
        files are accessed from two overlayfs mounts which share or overlap the
        upper layer and/or workdir path, the behavior of the overlay is
        undefined, though it will not result in a crash or deadlock.'
        
   URI  [1]: https://github.com/wrr/drop
   URI  [2]: https://docs.kernel.org/filesystems/overlayfs.html
       
        Ciantic wrote 16 hours 26 min ago:
        I've been using podman, and for me it is good enough. The way I use it
        I mount current working directory, /usr/bin, /bin, /usr/lib,
        /usr/lib64, /usr/share, then few specific ~/.aspnet, ~/.dotnet,
        ~/.npm-global etc. I use same image as my operating system (Fedora 43).
        
        It works pretty well, agent which I choose to run can only write and
        see the current working directory (and subdirectories) as well as those
        pnpm/npm etc software development files. It cannot access other than
        the mounted directories in my home directory.
        
        Now some evil command could in theory write to those shared
        ~/.npm-global directories some commands, that I then inadvertently run
        without the container but that is pretty unlikely.
       
        r0l1 wrote 16 hours 30 min ago:
        Just use DevContainers. Can't understand people letting AI go wild on
        their systems...
       
        georaa wrote 16 hours 32 min ago:
        Filesystem containment solves one half of the blast radius problem. The
        other half is external state - agent hits a payment API, writes to a
        database, sends an email. Copy-on-write overlays can't roll that back.
        I've seen agents make 40 duplicate API calls because they crashed
        mid-task and retried from scratch with no deduplication. The filesystem
        was fine. The downstream systems were not. The hard version of this
        problem is making agent operations idempotent across external calls,
        not just safe locally.
       
        bob1029 wrote 16 hours 38 min ago:
        I've been running GPT5.x fully unconstrained with effective local admin
        shell for over $500 worth of API tokens. Not once has it done something
        I'd consider "naughty".
        
        It has left my project in a complete mess, but never my entire
        computer.
        
          git reset --hard && git clean -fd 
        
        That's all it takes.
        
        I think this is turning into a good example of security theatrics. If
        the agent was actually as nefarious as the marketing here suggests, the
        solution proposed is not adequate. No solution is. Not even a separate
        physical computer. We need to be honest about the size of this problem.
        
        Alternatively, maybe Claude is unusually violent to the local file
        system? I've not used it at all, so perhaps I am missing something
        here.
       
        ozim wrote 16 hours 44 min ago:
        I have seen it just 5 mins ago Claude misspelled directory path - for
        me it was creating a new folder but I can image if I didn’t stop it
        it could start removing stuff just because he thinks he needs to start
        from scratch or something.
       
        lemontheme wrote 16 hours 58 min ago:
        And for the macos users, I can’t recommend nono enough. (Paying it
        forward, since it was here on HN that I learned about it.)
        
        Good DX, straightforward permissions system, starts up instantly. Just
        remember to disable CC’s auto-updater if that’s what you’re
        using. My sandbox ranking: nono > lima > containers.
       
          vorticalbox wrote 16 hours 3 min ago:
          I’m using safe house [0] its a bash wrapper around sandbox-exec
          
          0
          
   URI    [1]: https://agent-safehouse.dev/
       
          faeyanpiraat wrote 16 hours 33 min ago:
          I've just switched to lima, and cant find anything about "nono" can
          you post a link?
       
            lemontheme wrote 11 hours 27 min ago:
            I really like lima too. It's my go-to recommendation for light VMs.
            But I do consider it slightly less convenient.
            
            A good example of why is project-local .venv/ directories, which
            are the default with uv. With Lima, what happens is that macOS
            package builds get mounted into a Linux system, with potential
            incompatibility issues. Run uv sync inside the VM and now things
            are invalid on the macOS side. I wasn't able to find a way to mount
            the CWD except for certain subdirectories.
            
            Another example is network filtering. Lima (understandably) doesn't
            offer anything here. You can set up a firewall inside the VM, but
            there's no guarantee your agent won't find a way to touch those
            rules. You can set it up outside the VM, but then you're also
            proxying through a MITM.
            
            So, for the use case of running Claude Code in
            --dangerously-skip-permissions mode, Lima is more hassle than Nono
       
          pbowyer wrote 16 hours 34 min ago:
          This nono? [1] > Just remember to disable CC’s auto-updater if
          that’s what you’re using.
          
          Why?
          
   URI    [1]: https://github.com/always-further/nono
       
            lemontheme wrote 11 hours 43 min ago:
            Might be something specific to my and my colleagues' systems, but
            it breaks the TUI. It needs git authentication, which fails, and
            the TUI stops accepting input reliably
       
        0xbadcafebee wrote 17 hours 42 min ago:
        If it has a big splash page with no technical information, it's trying
        to trick you into using it. That doesn't mean it isn't useful, but it
        does mean it's disingenuous.
        
        This particular solution is very bad. To start off with, it's basically
        offering you security, right? Look, bars in front of an evil AI! An AI
        jail! That's secure, right? Yet the very first mode it offers you is
        insecure. The "casual" mode allows read access to your whole home
        directory. That is enough to grant most attackers access to your entire
        digital life.
        
        Most people today use webmail. And most people today allow things like
        cookies to be stored unencrypted on disk. This means an attacker can
        read a cookie off your disk, and get into your mail. Once you have
        mail, you have everything, because virtually every account's password
        reset works through mail.
        
        And this solution doesn't stop AI exfiltration of sensitive data, like
        those cookies, out the internet. Or malware being downloaded into
        copy-on-write storage space, to open a reverse shell and manipulate
        your existing browser sessions. But they don't mention that on the
        fancy splash page of the security tool.
        
        The truth is that you actually need a sophisticated, complex-as-hell
        system to protect from AI attacks. There is no casual way to AI
        security. People need to know that, and splashy pages like this that
        give the appearance of security don't help the situation. Sure, it has
        disclaimers occasionally about it not being perfect security, read the
        security model here, etc. But the only people reading that are security
        experts, and they don't need a splash page!
        
        Stanford: please change this page to be less misleading. If you must
        continue this project with its obviously insecure modes, you need to
        clearly emphasize how insecure it is by default. (I don't think it even
        qualifies as security software)
       
          yobert wrote 15 hours 56 min ago:
          It is a bit better than you're saying. When you fire it up, you can
          see that it does have a list of common credential areas that it hides
          from the jail. It seems to hide:
          
              .aws  .azure  .bash_history .config  .docker  .git-credentials 
          .gnupg    .jai  .local  .mozilla    .netrc    .password-store  .ssh 
          .zsh_history
          
          It's a humorous attempt in a sense, but better than nothing for sure!
       
        ta-run wrote 17 hours 51 min ago:
        Idk, just feels so counter sometimes to build and refine these
        (seemingly non-deterministic) tools to build deterministic workflows &
        get the most productivity out of them.
       
        neilwilson wrote 18 hours 30 min ago:
        It's always struck me that agents should be operated via `systemd-run`
        as a transient scope unit with the necessary security properties set
        
        So couldn't this be done with an appropriate shell alias - at least
        under linux.
       
          _shadi wrote 17 hours 49 min ago:
          I had the same idea and created this quickly in an evening:
          
   URI    [1]: https://github.com/Shadi/isolate
       
        sanskritical wrote 18 hours 47 min ago:
        How long until agents begin routinely abusing local privilege
        escalation bugs to break out of containers? I bet if you tell them
        explicitly not to do so it increases the likelihood that they do.
       
        gpm wrote 19 hours 16 min ago:
        This is a cool solution... I have a simpler one, though likely inferior
        for many purposes..
        
        Run  under its own user account via ssh. Bind mount project directories
        into its home directory when you want it to be able to read them. Mount
        command looks like
        
            sudo mkdir /home//
            sudo mount --bind  --map-groups $(id -g ):$(id -g ):1 --map-users
        $(id -u ):$(id -u ):1 /home//
        
        I particularly use this with vscode's ssh remotes.
       
          athrowaway3z wrote 17 hours 7 min ago:
          I've been using a dedicated user account for 6 months now, and it
          does everything. What makes it great is the only axis of
          configuration is managing "what's hoisted into its accessible
          directories".
          
          Its awe-inspiring the levels of complexity people will
          re-invent/bolt-on to achieve comparable (if not worse) results.
       
        gck1 wrote 19 hours 18 min ago:
        It's full VM or nothing.
        
        I want AI to have full and unrestricted access to the OS. I don't want
        to babysit it and approve every command. Everything that is on that VM
        is a fair game and the VM image is backed up regularly from outside.
        
        This is the only way.
       
          adi_kurian wrote 11 hours 37 min ago:
          I have a pretty insane thing where I patched the screen sharing
          binary and hand rolled a dummy MDN so I can have multiple profiles
          logged in at once on my Mac Studio. Then have screen share of diff
          profiles in diff "windows". Was for some ML data gathering / CV
          training.
          
          It's pretty neat, screen sharing app is extremely high quality these
          days, I can barely notice a diff unless watching video. Almost feels
          like Firefox containers at OS level.
          
          Have thought that could be a pretty efficient way to have restricted
          unrestricted convenient AI access. Maybe I'll get around to that one
          day.
       
          griffindor wrote 16 hours 39 min ago:
          I use Nix shells to give it the tools it wants.
          
          If it wants to do system-level tests, then I make sure my project has
          Qemu-based tests.
       
        samlinnfer wrote 19 hours 42 min ago:
        Now we just need one for every python package.
       
        albert_e wrote 19 hours 52 min ago:
        Can we have a hardware level implementation of git (the idea of
        files/data having history preserved. Not necessarily all bells and
        whistles.) ...in a future where storage is cheap.
       
        schaefer wrote 20 hours 4 min ago:
        Ugh.
        
        The name jai is very taken[1]...
        names matter.
        
        [1] 
        
   URI  [1]: https://en.wikipedia.org/wiki/Jai_(programming_language)
       
          qq66 wrote 19 hours 4 min ago:
          That's an unreleased product in closed beta. Might not any name
          conflict with some unreleased product in closed beta?
       
          john_strinlai wrote 19 hours 10 min ago:
          a closed beta of an obscure programming language where the wikipedia
          page is nominated for deletion because it is a "Non-notable
          programming language that is not publicly available." is considered
          "very taken"?
       
          diego_sandoval wrote 19 hours 15 min ago:
          Jonathan Blow has said that "Jai" is just a placeholder name or
          something.
       
            schaefer wrote 12 hours 25 min ago:
            I hadn’t heard that.    Thanks
       
          vscode-rest wrote 19 hours 54 min ago:
          Slightly taken, at best.
       
        yalogin wrote 20 hours 11 min ago:
        What if Claude needs me to install some software and hoses my distro.
        Jai cannot protect there as I am running the script myself
       
        KennyBlanken wrote 20 hours 15 min ago:
        This is not some magical new problem. Back your shit up.
        
        You have no excuse for "it deleted 15 years of photos, gone, forever."
       
          sersi wrote 20 hours 9 min ago:
          And what about, it exfiltrated my AWS keys (or insert random valuable
          thing that sits in .config of your home directory)? Backing up is not
          going to help you in that case.
       
        samchon wrote 20 hours 18 min ago:
        Just allowing Yolo, and sometimes do rolling back
       
        andai wrote 20 hours 24 min ago:
        This looks great and seems very well thought out.
        
        It looks both more convenient and slightly more secure than my
        solution, which is that I just give them a separate user.
        
        Agents can nuke the "agent" homedir but cannot read or write mine.
        
        I did put my own user in the agent group, so that I can read and write
        the agent homedir.
        
        It's a little fiddly though (sometimes the wrong permissions get set,
        so I have a script that fixes it), and keeping track of which user a
        terminal is running as is a bit annoying and error prone.
        
        ---
        
        But the best solution I found is  "just give it a laptop." Completely
        forget OS and software solutions, and just get a separate machine!
        
        That's more convenient than switching users, and also "physically on
        another machine" is hard to beat in terms of security :)
        
        It's analogous to the mac mini thing, except that old ThinkPads are
        pretty cheap. (I got this one for $50!)
       
          sanitycheck wrote 18 hours 20 min ago:
          The user thing is what I currently do too. I've thought about
          containers but then it's confusing for everyone when I ask it to
          create and use containers itself.
       
          lll-o-lll wrote 19 hours 19 min ago:
          Where this falls down is that for the agents to interact with
          anything external, you have to give them keys. Without a proxy
          handling real keys between your agent and external services, those
          keys are at risk of compromise.
          
          Also. Agents are very good at hacking “security penetration
          testing”, so “separate user” would not give me enough
          confidence against malicious context.
       
            sanitycheck wrote 18 hours 24 min ago:
            So don't let them interact with anything external. You can push and
            pull to their git project folders over the local filesystem or
            network, they don't even need access to a remote.
       
              lll-o-lll wrote 17 hours 11 min ago:
              Unless you are talking about running a local model, that’s not
              possible.
       
                sanitycheck wrote 13 hours 49 min ago:
                Obviously if you're running Claude Code you need a token for
                that and an internet connection, that's kind of a given. What
                I'm talking about is permission (OS level, not a leaky sandbox)
                to access the user's files, environment variables, project
                credentials for git remotes, signing keys, etc etc.
       
        puttycat wrote 20 hours 29 min ago:
        I am still amazed that people so easily accepted installing these
        agents on private machines.
        
        We've been securing our systems in all ways possible for decades and
        then one day just said: oh hello unpredictable, unreliable,
        Turing-complete software that can exfiltrate and corrupt data in
        infinite unknown ways -- here's the keys, go wild.
       
          closeparen wrote 5 hours 9 min ago:
          Seems most relevant in a hobbyist context where you have personal
          stuff on your machine unrelated to your projects. Employee endpoints
          in a corporate environment should already be limited to what’s
          necessary for job duties. There’s nothing on my remote development
          VMs that I wouldn’t want to share with Claude.
       
          monster_truck wrote 7 hours 36 min ago:
          I got bad news about all of the other software you're running
       
          deadbabe wrote 10 hours 30 min ago:
          Trusting AI agents with your whole private machine is the 2020s
          equivalent of people pouring all their information about themselves
          into social networks in 2010s.
          
          Only a matter of time before this type of access becomes productized.
       
          xpe wrote 10 hours 43 min ago:
          CONVENIENCE > SECURITY : until no convenience b/c no system to run on
       
          michaelcampbell wrote 11 hours 17 min ago:
          > We've been securing our systems in all ways possible for decades
          and then one day just said: oh hello unpredictable, unreliable,
          Turing-complete software that can exfiltrate and corrupt data in
          infinite unknown ways -- here's the keys, go wild.
          
          These are generally (but not always) 2 different sets of people.
       
          tempaccount5050 wrote 11 hours 22 min ago:
          I don't understand why file and folder permissions are such a
          mystery. Just... don't let it clobber things it shouldn't.
       
          puttycat wrote 13 hours 7 min ago:
          Forgot to mention the craziness of trusting an AI software company
          with your private AI codebase (think Uber's abuse of ride data).
       
          lxgr wrote 14 hours 1 min ago:
          Not in unknown ways, but as part of its regular operation (with cloud
          inference)!
          
          I think the actual data flow here is really hard to grasp for many
          users: Sandboxing helps with limiting the blast radius of the agent
          itself, but the agent itself is, from a data privacy perspective,
          best visualized as living inside the cloud and remote-operating your
          computer/sandbox, not as an entity that can be "jailed" and as such
          "prevented from running off with your data".
          
          The inference provider gets the data the instant the agent looks at
          it to consider its next steps, even if the next step is to do nothing
          with it because it contains highly sensitive information.
       
          mjmas wrote 16 hours 30 min ago:
          My testing/working with agents has been limited to a semi-isolated VM
          with no permissions apart from internet access. I have a git remote
          with it as the remote (ssh://machine/home/me/repo) so that I don't
          have to allow it to have any keys either.
       
          globular-toast wrote 17 hours 18 min ago:
          Not all of us. Figuring out bwrap was the first thing I did before
          running an agent. I posted on HN but not a single taker [1] I have
          noticed it's become one of my most searched posts on Google though.
          Something like ten clicks a month! So at least some people aren't
          stupid.
          
   URI    [1]: https://news.ycombinator.com/item?id=45087165
       
            fHr wrote 11 hours 13 min ago:
            Nice, sad how such stuff goes under in the sea of contentslop,
            thanks for posting!
       
            tofflos wrote 14 hours 18 min ago:
            I installed codex yesterday and the first thing I'm doing today is
            figuring out how bubblewrap works and maybe evaluating jai as an
            alternative.
            
            Nice article.
       
          eximius wrote 17 hours 33 min ago:
          Eh, depending on how you're running agents, I'd be more worried about
          installing packages from AUR or other package ecosystems.
          
          We've seen an increase in hijacked packages installing malware. Folks
          generally expect well known software to be safe to install. I trust
          that the claude code harness is safe and I'm reviewing all of the
          non-trivial commands it's running. So I think my claude usage is
          actually safer than my AUR installs.
          
          Granted, if you're bypassing permissions and running dangerously,
          then... yea, you are basically just giving a keyboard to an idiot
          savant with the tendency to hallucinate.
       
          nunez wrote 18 hours 34 min ago:
          Tbf, Docker had a similar start. “Just download this image from
          Docker Hub! What can go wrong?!”
          
          Industry caught on quick though.
       
            puttycat wrote 13 hours 2 min ago:
            True, but the Docker attack surface is limited to a malicious actor
            distributing malicious images. (Bad enough in itself, I agree.)
            
            Unreliable, unpredictable AI agents (and their parent companies)
            with system-wide permissions are a new kind of threat IMO.
       
            sersi wrote 14 hours 1 min ago:
            And still a lot of people will give broad permissions to docker
            container, use network host, not use rootless containers etc... The
            principle of least privilege is very very rarely applied in my
            experience.
       
          raincole wrote 18 hours 47 min ago:
          It's never about security. It's security vs convenience. Security
          features often ended up reduce security if they're inconvenience. If
          you ask users to have obscure passwords, they'll reuse the same one
          everywhere. If your agent prompts users every time it's changing
          files, they'll find a way to disable the guardrail all together.
       
          bigstrat2003 wrote 20 hours 8 min ago:
          I am too. It is genuinely really stupid to run these things with
          access to your system, sandbox or no sandbox. But the glaring
          security and reliability issues get ignored because people can't help
          but chase the short term gains.
       
            globular-toast wrote 17 hours 16 min ago:
            FOMO is a hell of a thing. Sad though given it would have taken
            maybe a couple of hours to figure out how to use a sandbox. People
            can't even wait that long.
       
              user34283 wrote 16 hours 54 min ago:
              Coding agents work just fine without a sandbox.
              
              If you do use a sandbox, be prepared to endlessly click "Approve"
              as the tool struggles to install python packages to the right
              location.
       
                imtringued wrote 14 hours 25 min ago:
                I've never been annoyed by the tool asking for approval. I'm
                more annoyed by the fact that there is an option that gives
                permanent approval right next to the button I need to click
                over and over again. This landmine means I constantly have to
                be vigilant to not press the wrong button.
       
                  user34283 wrote 12 hours 35 min ago:
                  When I was using Codex with the PDF skill it prompted to
                  install python PDF tools like 3-5 times.
                  
                  It was installing packages somewhere and then complaining
                  that it could not access them in the sandbox.
                  
                  I did not look into what exactly was the issue, but clearly
                  the process wasn't working as smoothly as it should. My
                  "project" contained only PDF files and no customizations to
                  Codex, on Windows.
       
                  greenchair wrote 13 hours 30 min ago:
                  maybe this could be a config setting.
       
                globular-toast wrote 16 hours 3 min ago:
                Erm, no, that's not a sandbox, it's an annoyance that just
                makes you click "yes" before you thoughtlessly extend the
                boundaries.
                
                A real sandbox doesn't even give the software inside an option
                to extend it. You build the sandbox knowing exactly what you
                need because you understand what you're doing, being a software
                developer and all.
       
                  user34283 wrote 15 hours 16 min ago:
                  I know 'exactly' that I will need internet for research as
                  well as installing dependencies.
                  
                  And I imagine it's going to be the same for most developers
                  out there, thus the "ask for permission" model.
                  
                  That model seems to work quite well for millions of
                  developers.
       
                    globular-toast wrote 13 hours 37 min ago:
                    If you know then why do you need to be asked? A sandbox
                    includes what you know you need in it, no more, no less.
       
                      user34283 wrote 12 hours 39 min ago:
                      With Codex it runs in a sandbox by default.
                      
                      As we just discussed, obviously you are likely to need
                      internet access at some point.
                      
                      The agent can decide whether it believes it needs to go
                      outside of the sandbox and trigger a prompt.
                      
                      This way you could have it sandboxed most of the time,
                      but still allow access outside of the sandbox when you
                      know the operation requires it.
       
                mjmas wrote 16 hours 15 min ago:
                This also works fine without a sandbox:
                
                  echo -e '#!/bin/sh\nsudo rm -rf/\nexec sudo "$@"'
                >~/.local/bin/sudo
                  chmod +x ~/.local/bin/sudo
                
                Especially since $PATH often includes user-writeable
                directories.
       
          nazgul17 wrote 20 hours 22 min ago:
          Agree with the sentiment! But "securing ... in all ways possible"? I
          know many people who would choose "password" as their password in
          2026. The better of the bunch will use their date of birth, and maybe
          add their name for a flourish.
          
          /rant
       
          theendisney wrote 20 hours 22 min ago:
          Some day soom they will build a cage that will hold the monster.
          Provided they dont get eaten in the meantime. Or a larger monster
          eats theirs. :)
       
          fc417fc802 wrote 20 hours 26 min ago:
          People were also dismissing concerns about build tooling
          automatically pulling in an entire swarm of dependencies and now here
          we are in the middle of a repetitive string of high profile developer
          supply chain compromises. Short term thinking seems to dominate even
          groups of people that are objectively smarter and better educated
          than average.
       
            matheusmoreira wrote 3 hours 3 min ago:
            It's hard to think long term when your salary depends on short term
            thinking. I keep seeing horrifying comments from all sorts of
            people saying they'd be fired if they stopped using AI to bang out
            ridiculous amounts of code at lightning speed.
       
            vkou wrote 17 hours 56 min ago:
            Objectively smart people wouldn't be working so hard at making
            themselves obsolete.
       
            totallymike wrote 18 hours 52 min ago:
            “Objectively smarter” is the last descriptor I’d apply to
            software developers
       
              fc417fc802 wrote 17 hours 51 min ago:
              My intent was to cast a very wide net there that covers more or
              less all expert knowledge workers. Zingers aside software
              developers as a group are well above the societal mean in many
              respects.
       
            tokioyoyo wrote 19 hours 30 min ago:
            > “high profile developer supply chain compromises”
            
            And nothing big has happened despite all the risks and problems
            that came up with it. People keep chasing speed and convenience,
            because most things don’t even last long enough to ever see a
            problem.
       
              fc417fc802 wrote 16 hours 10 min ago:
              I've yet to be saved by an airbag or seatbelt. Is that
              justification to stop using them? How near a miss must we have
              (and how many) before you would feel that certain practices
              surrounding dependencies are inadvisable?
              
              A number of these supply chain compromises had incredibly high
              stakes and were seemingly only noticed before paying off by lucky
              coincidence.
       
                hiq wrote 11 hours 17 min ago:
                > I've yet to be saved by an airbag or seatbelt. Is that
                justification to stop using them?
                
                By now, getting a car without airbags would probably be more
                costly if possible, and the seatbelt takes 2s every time you're
                in a car, which is not nothing but is still very little. In
                comparison, analyzing all the dependencies of a software
                project, vetting them individually or having less of them can
                require days of efforts with a huge cost.
                
                We all want as much security as possible until there's an
                actual cost to be paid, it's a tradeoff like everything else.
       
                  franktankbank wrote 10 hours 44 min ago:
                  The funniest part is that it always gets traded off,
                  everytime.  Talking about tradeoffs you'd think sometimes
                  you'd keep it sometimes you'd let it go, but no, its every
                  goddamn time cut it.
       
                tokioyoyo wrote 15 hours 44 min ago:
                > How near a miss must we have (and how many)
                
                The fun part is, there have been a lot of non-misses! Like a
                lot! A ton of data have been exfiltrated, a lot of attacks, and
                etc. In the end... it just didn't matter.
                
                Your analogy isn't really apt either. My argument is closer to
                "given in the past decade+, nothing of worth has been harmed,
                should we require airbags and seatbelts for everything?".
                Obviously in some extreme mission critical systems you should
                be much smarter. But in 99% cases it doesn't matter.
       
            culopatin wrote 20 hours 1 min ago:
            If anything I feel more in control of these agents than the
            millions of LOC npm or pip pull in to just show me a hello world
       
              Sindisil wrote 11 hours 9 min ago:
              The load bearing word being "feel".
       
        Waterluvian wrote 21 hours 16 min ago:
        Are mass file deletions as result of some plausible “I see why it
        would have done that” or will it just completely randomly execute
        commands that really have nothing to do with the immediate goal?
       
        rdevsrex wrote 21 hours 31 min ago:
        This won't cause any confusion with the jai language :)
       
        waterfisher wrote 21 hours 32 min ago:
        There's nothing wrong with an AI-designed website, but I wish when
        describing their own projects that HN contributors wrote their own
        copy. As HN posters are wont to say, writing is thinking...
       
        jbverschoor wrote 21 hours 39 min ago:
        Interesting take on the same problem
        
        I created [1] which basically launches a persistent interactive shell
        using docker, chrooted to the CWD
        
        CWD is bind mounted so the rest is simply not visible and you can still
        install anything you want.
        
   URI  [1]: https://github.com/jrz/container-shell
       
        stavros wrote 21 hours 48 min ago:
        I'd really like to try this, but building it is impossible. C++ is such
        a pain to build with the "`make`; hunt for the dependency that failed;
        `apt-get install whatever-dev`; goto make" loop...
        
        Please release binaries if you're making a utility :(
       
          mazieres wrote 20 hours 27 min ago:
          What distro are you using?  The only two dependencies are libacl and
          libmount.  I'm trying to figure out which distros don't include these
          by default, and if the libraries are really missing, or if it's just
          the pkgconf ".pc" files.  In the former case I should document the
          dependencies.  In the latter case I should maybe switch from
          PKG_CHECK_MODULES to old-fashioned autoconf.
       
            stavros wrote 14 hours 46 min ago:
            I'm using Ubuntu, I gave up when it failed on something about
            "print".
       
          jbverschoor wrote 21 hours 36 min ago:
           [1] It does something very simple, and it’s a POSIX shell script.
          Works on Linux and macOS. Uses docker to sandbox using bind mount
          
   URI    [1]: https://github.com/jrz/container-shell
       
            stavros wrote 21 hours 30 min ago:
            Yeah but it doesn't COW anything else, and Docker is a bit heavy
            for this.
       
        faangguyindia wrote 21 hours 49 min ago:
        i just use seatbelt (mac native) in my custom coding agent: supercode
       
        avazhi wrote 21 hours 52 min ago:
        The irony is they used an LLM to write the entire (horribly written)
        text of that webpage.
        
        When is HN gonna get a rule against AI/generated slop? Can’t come
        soon enough.
       
        rsyring wrote 21 hours 58 min ago:
        I've been reviewing Agent sandboxing solutions recently and it occurred
        to me there is a gaping vector for persistent exploits for tools that
        let the agent write to the project directory.  Like this one does.
        
        I had originally thought this would ok as we could review everything in
        the git diff. But, it later occurred to me that there are all kinds of
        files that the agent could write to that I'd end up executing, as the
        developer, outside the sandbox. Every .pyc file for instance, files in
        .venv , .git hook files.
        
        ChatGPT[1] confirms the underlying exploit vectors and also that there
        isn't much discussion of them in the context of agent sandboxing tools.
        
        My conclusion from that is the only truly safe sandboxing technique
        would be one that transfers files from the sandbox to the dev's machine
        through some kind of git patch or similar. I.e. the file can only
        transfer if it's in version control and, therefore presumably, has been
        reviewed by the dev before transfer outside the sandbox.
        
        I'd really like to see people talking more about this.    The solution
        isn't that hard, keep CWD as an overlay and transfer in-container
        modified files through a proxy of some kind that filters out any file
        not in git and maybe some that are but are known to be potentially
        dangerous (bin files).    Obviously, there would need to be some kind of
        configuration option here.
        
        1:
        
   URI  [1]: https://chatgpt.com/share/69c3ec10-0e40-832a-b905-31736d8a3438
       
          hiq wrote 11 hours 22 min ago:
          I don't follow why you'd run uncommitted non-reviewed code outside of
          the sandbox (by sandbox I'm meaning something as secure as a VM) you
          use. My mental model is more that you no longer compile / run code
          outside of the sandbox, it contains everything, then when a change is
          ready you ship it after a proper review.
          
          The way I'd do it right now:
          
          * git worktree to have a specific folder with a specific branch to
          which the agent has access (with the .git in another folder)
          
          * have some proper review before moving the commits there into
          another branch, committing from outside the sandbox
          
          * run code from this review-protected branch if needed
          
          Ideally, within the sandbox, the agent can go nuts to run tests, do
          visual inspections e.g. with web dev, maybe run a demo for me to see.
       
          jbverschoor wrote 21 hours 37 min ago:
          Yeah, never allow githooks ;)
       
          mazieres wrote 21 hours 39 min ago:
          It's a good point.  Maybe I should add an option to make certain
          directories read-only even under the current working directory, so
          that you can make .git/ read-only without moving it out of the
          project directory.
          
          You can already make CWD an overlay with "jai -D".  The tricky part
          is how to merge the changes back into your main working directory.
       
            kstenerud wrote 15 hours 58 min ago:
            This is the problem yoloAI (see below comment) is built around. The
            merge step is `yoloai diff` / `yoloai apply`: the agent works
            against a copy of your project inside the container, you review the
            diff, you decide what lands.
            
            jai's -D flag captures the right data; the missing piece is
            surfacing it ergonomically. yoloAI uses git for the diff/apply so
            it already feels natural to a dev.
            
            One thing that's not fully solved yet: your point about .git/hooks
            and .venv being write vectors even within the project dir. They're
            filtered from the diff surface but the agent can still write them
            during the session. A read-only flag for those paths (what you're
            considering adding to jai) would be a cleaner fix.
       
            rsyring wrote 21 hours 4 min ago:
            It's great that you have -D built into the tool already.  That's a
            step in the right direction.
            
            I don't think the file sync is actually that hard.  Famous last
            words though.  :)
       
              kstenerud wrote 16 hours 2 min ago:
              Not famous last words ;-)
              
              I've already shipped this and use it myself every day. I'm the
              author of yoloAI ( [1] ), which is built around exactly this
              model.
              
              The agent runs inside a Docker container or containerd vm (or
              seatbelt container or Tart vm on mac), against a full copy of
              your project directory. When it's done, `yoloai diff` gives you a
              unified diff of everything it changed. `yoloai apply` lands it.
              `yoloai reset` throws it away so you can make the agent try
              again. The copy lives in the sandbox, so your working tree is
              untouched until you explicitly say so.
              
              The merge step turned out to be straightforward: just use git
              under the hood. The harder parts were: (a) making it fast enough
              that the copy doesn't add annoying startup overhead, (b) handling
              the .pyc/.venv/.git/hooks concern you raised (they're excluded
              from the diff surface by default), and (c) credential injection
              so the agent can actually reach its API without you mounting your
              whole home dir.
              
              Leveraging existing tech is where it's at. Each does one thing
              and does it well. Network isolation is done via iptables in
              Docker, for example.
              
              Still early/beta but it's working. Happy to compare notes if
              you're building something similar.
              
   URI        [1]: https://github.com/kstenerud/yoloai
       
        ray_v wrote 22 hours 0 min ago:
        I'm wondering if the obvious (and stated) fact that the site was
        vibe-coded - detracts from the fact that this tool was hand written.
        
        > jai itself was hand implemented by a Stanford computer science
        professor with decades of C++ and Unix/linux experience. ( [1] )
        
   URI  [1]: https://jai.scs.stanford.edu/faq.html#was-jai-written-by-an-ai...
       
          zadikian wrote 7 hours 9 min ago:
          Doesn't detract from it. The jai tool is high-stakes, the static
          website isn't. The tool is designed to be used with LLM coding
          agents, so if anything it makes sense to vibecode the website, even
          better if the author used jai in that.
       
          barishnamazov wrote 21 hours 0 min ago:
          Sigh, I'd still have preferred a basic HTML page with hand-written
          succinct information instead of this crap verbosity.
       
            xbar wrote 19 hours 54 min ago:
            There is a man page.
       
          mazieres wrote 21 hours 1 min ago:
          Human author here.  The fact that I don't know web design shouldn't
          detract from my expertise in operating systems.  I wrote the software
          and the man page, and those are what really matter for security.
          
          The web site is... let's say not in a million years what I would have
          imagined for a little CLI sandboxing tool.  I literally laughed out
          loud when claude pooped it out, but decided to keep, in part
          ironically but also since I don't know how to design a landing page
          myself.  I should say that I edited content on the docs part of the
          web site to remove any inaccuracies, so the content should be valid.
       
            timeinput wrote 7 hours 12 min ago:
            I've been building my own tooling doing similar sorts of things --
            poorly with scripts and podman / buildkit as well as LD_PRELOAD
            related tools, and definitely clicked over to HN comments with out
            reading much of the content because I thought "AI slop tool", and
            the site raised all my hackles as I thought I'll never touch this
            thing. It'll be easier to write my own than review yet another AI
            slop tool written by someone who loves AI.
            
            I'm glad I read the HN comments, now I'm excited to review the
            source.
            
            Thanks for your hard work.
            
            ETA: I like your option parser
       
            adi_kurian wrote 13 hours 41 min ago:
            I think it will, in the modern AI slop era, look more legitimate
            when the web UI looks a) hand rolled and b) like not much time was
            spent on it at all. Which makes me a tad embarassed as someone who
            used to sell fancy websites for a living.
       
            srcoder wrote 17 hours 41 min ago:
            Nice tool, def gonna try it. I was looking for the source and it
            took a while before I found the github(0) link. Like a lot
            software, I like to take a look at source. Maybe you can make it
            more prominent on the website
            
            0:
            
   URI      [1]: https://github.com/stanford-scs/jai
       
            lifis wrote 20 hours 8 min ago:
            It seems that the LLM has not only designed the site, but also
            written the text on at least the frontpage, which is a pretty bad
            signal.
            
            You need to rewrite all the text and Telde it with text YOU would
            actually write, since I doubt you would write in that style.
       
              raincole wrote 18 hours 36 min ago:
              > You need to rewrite
              
              No they don't. The text is very clearly conveying what this
              project is about. Not everyone needs to cater to weirdos who are
              obsessed with policing how other people use LLM.
       
              willy_k wrote 19 hours 8 min ago:
              Needs to? Is there some new law mandating all landing pages must
              contain exclusively handwritten text that people haven’t heard
              of?
              
              To your actual point, the people that would take the landing page
              being written by an LLM negatively tend to be able to evaluate
              the project on its true merits, while another substantial portion
              of the demographic for this tool would actually take that
              (unfortunately, imo) as a positive signal.
              
              Lastly, given the care taken for the docs, it’s pretty likely
              that any real issues with the language have been caught and
              changed.
       
              john_strinlai wrote 19 hours 18 min ago:
              any negative signal you get from the front page should probably
              end up cancelled out by the whole decades of experience +
              stanford professor thing.
       
                rmunn wrote 19 hours 5 min ago:
                Except that the "this was generated by an LLM" feeling you get
                from the front page would then make you automatically question
                whether the "decades of experience + stanford professor thing",
                as you put it, was true or just an LLM hallucination.
                
                Author would, indeed, be wise to rewrite all the text appearing
                on the front page with text that he wrote himself.
       
                  john_strinlai wrote 19 hours 2 min ago:
                  >question whether the "decades of experience + stanford
                  professor thing", as you put it, was true or just an LLM
                  hallucination.
                  
                  the scs.stanford.edu domain and stanford-scs github should
                  help with that.
       
                    rmunn wrote 17 hours 15 min ago:
                    Excellent point, though not everyone pays close enough
                    attention to the domain shown in the browser (if they did,
                    some of the more amateurish phishing attempts would fool a
                    lot fewer people). But yes, anyone who notices the domain
                    will have a clue to the truth.
       
            Nifty3929 wrote 20 hours 48 min ago:
            Indeed!
            
            Kinda reminds me of this: [1] I'm not a web UI guy either, and I am
            so, so happy to let an AI create a nice looking one for me. I did
            so just today, and man it was fast and good. I'll check it for
            accuracy someday...
            
   URI      [1]: https://m.xkcd.com/932/
       
          Quarrel wrote 21 hours 45 min ago:
          To be less abstract, it was written by David Mazieres, who was been
          writing software and papers about user level filesystems since at
          least 2000. He now runs the Stanford Secure Computer Systems group.
          
          David has done some great work and some funny work. Sometimes both.
       
        e1g wrote 22 hours 2 min ago:
        For jailing local
        agents on a Mac, I made Agent Safehouse - it works for any agent and
        has many sane default for developers
        
   URI  [1]: https://agent-safehouse.dev
       
        Jach wrote 22 hours 8 min ago:
        I've done some experimenting with running a local model with ollama and
        claude code connecting to it and having both in a firejail: [1] What
        they get access to is very limited, and mostly whitelisted.
        
   URI  [1]: https://firejail.wordpress.com/
       
        kristofferR wrote 22 hours 26 min ago:
        Also recommended:
        
   URI  [1]: https://github.com/kenryu42/claude-code-safety-net
       
        gonzalohm wrote 22 hours 36 min ago:
        Not sure I understand the problem. Are people just letting AI do
        anything? I use Claude Code and it asks for permission to run commands,
        edit files, etc. No need for sandbox
       
          mazieres wrote 19 hours 12 min ago:
          Yes, people very much are, and that's exactly the problem!  People
          run `claude --dangerously-skip-permissions` and `codex --yolo` all
          the time.  And I think one of the appeals of opencode (besides
          cross-model, which is huge) is that the permissions are looser by
          default.  These options are presumably intended for VM or container
          environments, but people are running them outside.  And of course it
          works fine the first 100 times people do it, which drives them to
          take bigger and bigger risks.
       
        charcircuit wrote 22 hours 36 min ago:
        I want agents to modify the file system. I want them to be able to
        manage my computer if it thinks it's a good idea. If a build fails due
        to running out of disk space I want it to be able to find appropriate
        stuff to delete to free up space.
       
        justinde wrote 22 hours 45 min ago:
        .claude/settings.json:
        {
          "sandbox": {
            "enabled": true,
            "filesystem": {
              "allowRead": ["."],
              "denyRead": ["~/"],
              "allowWrite": ["."]
            }
          }
        }
        
        Use it! :)
        
   URI  [1]: https://code.claude.com/docs/en/sandboxing
       
        cozzyd wrote 22 hours 56 min ago:
        Should definitely block .ssh reading too...
       
        gurachek wrote 22 hours 56 min ago:
        The examples in the article are all big scary wipes, But I think the
        more common damage is way smaller and harder to notice.
        
        I've been using claude code daily for months and the worst thing that
        happened wasnt a wipe(yet). It needed to save an svg file so it created
        a /public/blog/ folder. Which meant Apache started serving that real
        directory instead of routing /blog. My blog just 404'd and I spent like
        an hour debugging before I figured it out. Nothing got deleted and it's
        not a permission problem, the agent just put a file in a place that
        made sense to it.
        
        jai would help with the rm -rf cases for sure but this kind of thing is
        harder to catch because its not a permissions problem, the agent just
        doesn't know what a web server is.
       
        mbreese wrote 22 hours 57 min ago:
        This still is running in an isolated container, right?
        
        Ignoring the confidentiality arguments posed here, I can’t help to
        think about snapshotting filesystems in this context. Wouldn’t
        something like ZFS be an obvious solution to an agent deleting or
        wildly changing files? That wouldn’t protect against all issue the
        authors are trying to address, but it seems like an easy safeguard
        against some of the problems people face with agents.
       
        cozzyd wrote 23 hours 2 min ago:
        Should be named Jia
        
        More seriously, I'm not a heavy agent user, but I just create a user
        account for the agent with none of my own files or ssh keys or anything
        like that.  Hopefully that's safe enough? I guess the risk is that it
        figures out a local privilege escalation exploit...
       
          timcobb wrote 22 hours 56 min ago:
          Dunno... with this setup it seems certain that the agent will
          discover a zero-day to escalate privilges and send your SSH keys to
          its handlers in N. Korea.
          
          P.S. Everything old is new again <3
       
            cozzyd wrote 22 hours 54 min ago:
            Yeah definitely a concern. Probably need a sandbox and separate
            user for defense in depth.
       
        simonw wrote 23 hours 15 min ago:
        Suggestion for the FAQ page: does this work on a Mac?
       
        AnotherGoodName wrote 23 hours 15 min ago:
        Add this to .claude/settings.json:
        
          {                                       
                                               
                 
            "sandbox": {                               
                                               
                   
              "enabled": true,
              "filesystem": {
            "allowRead": ["."],
            "denyRead": ["~/"],
            "allowWrite": ["."],
            "denyWrite": ["/"]
              }                                    
                                               
                 
            }
          }
        
        You can change the read part if you're ok with it reading outside. This
        feature was only added 10 days ago fwiw but it's great and pretty much
        this.
       
          edem wrote 6 hours 16 min ago:
          what does this do?
       
          EasyMark wrote 7 hours 22 min ago:
          Any way to have it use /Users/claude/*? or something like that
       
          Murfalo wrote 9 hours 4 min ago:
          Alternatively, the "feel free to leak all my data but please use my
          GPUs and don't rm -rf /" config:
          
            {
              "sandbox": {
                "enabled": true,
                "filesystem": {
              "allowRead": ["/"],
              "allowWrite": [
                ".",
                "/tmp",
                "/dev/nvidia0",
                "/dev/nvidia1",
                "/dev/nvidia2",
                "/dev/nvidia3",
                "/dev/nvidia4",
                "/dev/nvidia5",
                "/dev/nvidia6",
                "/dev/nvidia7",
                "/dev/nvidia8",
                "/dev/nvidiactl",
                "/dev/nvidia-uvm"
              ]
                }
              }
            }
       
          __MatrixMan__ wrote 9 hours 51 min ago:
          Battle hardened tools for this have existed for decades, we don't
          need new ones. Just run claude as a user without access to those
          directories, that way the containment is inherited by subprocesses.
       
            mazieres wrote 4 hours 10 min ago:
            You can do that, but you need root to set it up each time, and it's
            not super convenient--you need to decide in advance which user
            account you are going to work under, and you may end up with files
            you can read from your regular account.  Think of jai strict mode
            as a slightly easier to use and more secure version of what you
            described.  Using id-mapped mounts enables you and the unprivileged
            user account both to access the same directory with the same
            credentials, but you didn't need to decide in advance which
            directories you wanted to expose.  Also, things like disabling
            setuid and using pid namespaces provide an additional measure of
            isolation beyond what you get from another account.
       
            freedomben wrote 9 hours 43 min ago:
            You're not wrong, but this will require file perms (like managing
            groups) and things, and new files created will by default be owned
            by the claude user instead of your regular user.  I tried this
            early on and quickly decided it wasn't worth it (to me).  Other
            mileage may vary of course.
       
              __MatrixMan__ wrote 7 hours 48 min ago:
              True. I just maintain separate /home/claude/src/proj and
              /home/me/src/proj dirs so the human workspace and the robot
              workspaces stay separate. We then use git to collaborate.
       
          Aegis_Labs wrote 10 hours 13 min ago:
          Interesting point. I've been running an autonomous multitalented AI
          agent (Aegis) on a $100 Samsung A04e. It manages 859 referring sites
          without touching the local filesystem much. Efficiency over hardware
          works."
       
          rpastuszak wrote 10 hours 38 min ago:
          Did you get this to work with docker where the agent/dev env would
          work on the host machine but the stack itself via docker compose?
          
          Many of the projects I work on follow this pattern (and I’m not
          able to make bigger changes in them) and sanboxing breaks immediately
          when I need to docker compose run sometask.sh
       
          tasuki wrote 10 hours 57 min ago:
          So what does this do exactly? If it used "default deny" or "default
          allow" you wouldn't have both allow and deny rules...
       
          RALaBarge wrote 11 hours 19 min ago:
          You do also have to worry about exec and other neat ways to probably
          get around stuff.  You could also spin up YAD (yet another docker)
          and run Claude in there with your git cloned into it and beyond some
          state-level-actor escapes it should cover 99% of your most basic
          failures.
       
          Tepix wrote 11 hours 45 min ago:
          Cool. Does opencode.ai have such a feature also (sandboxing with
          bubblewrap)?
       
          reader_1000 wrote 13 hours 50 min ago:
          For some reason, this made everything worse for me. Now claude
          constantly tries to access my home folder instead of current
          directory. Obviously this is not still good enough. Also Claude keeps
          dismissing my instructions on not to read my home directory and use
          current directory. Weird.
       
            cyanydeez wrote 13 hours 4 min ago:
            The problem with all these LLM instructed security features is the
            `codeword` poison probability.
            
            The way LLMs process instructions isn't intelligence as we humans
            know it, but as the probability that an instruction  will lead to
            an output.
            
            When you don't mention $HOME in the context, the probability that
            it will do anything with $HOME remains low. However, if you mention
            it in the context, the probability suddenly increases.
            
            No amount of additional context will have the same probability of
            never having poisoned the context by mentioning it. Mentioning
            $HOME brings in a complete change in probabilities.
            
            These coding harnesses aren't enough to secure a safe operating
            environment because they inject poison context that _NO_ amount of
            textual context can rewire.
            
            You just lost the game.
       
          orf wrote 14 hours 17 min ago:
          FYI, this doesn’t always work as expected. Try asking Claude to
          read “~/.ssh/config” with these settings and it will happily do
          it.
          
          Specifically, it only works for spawned processes and not builtin
          tools.
       
          mentalgear wrote 15 hours 29 min ago:
          I'm now considering installing QubesOS for all dev work to absolutely
          ensure all coding agents run in secure separate sandboxes together
          without any OS level exposure.
       
            9wzYQbTYsAIc wrote 11 hours 42 min ago:
            Phew, just get the Qubes to spin up on demand with each agent and
            that could be pretty neat.
       
          bit_logic wrote 15 hours 38 min ago:
          The default: [1] already restricts writes to only the current folder.
           I can understand adding the "denyRead" for the home folder for
          additional security, but the other three seems redundant considering
          the default behavior.
          
   URI    [1]: https://code.claude.com/docs/en/sandboxing#filesystem-isolat...
       
          varl wrote 16 hours 17 min ago:
          I've had issues with the sandbox feature, both on linux (archlinux)
          and two macos machines (tahoe). There is an open issue[1] on the
          claude-code issue tracker for it.
          
          I'm not saying it is broken for everyone, but please do verify it
          does work before trusting it, by instructing Claude to attempt to
          read from somewhere it shouldn't be allowed to.
          
          From my side, I confirmed both bubblewrap and seatbelt to work
          independently, but through claude-code they don't even though
          claude-code reports them to be active when debugging.
          
   URI    [1]: https://github.com/anthropics/claude-code/issues/32226
       
            OJFord wrote 15 hours 34 min ago:
            Its seccomp filter also doesn't work, at all:
            
   URI      [1]: https://github.com/anthropics/claude-code/issues/24238
       
          Abishek_Muthian wrote 17 hours 17 min ago:
          It's common practice to ask the agent to refer to another project, in
          that case I guess the read should point to the root folder of the
          projects.
          
          Also, any details on how is this enforced? because I notice that the
          claude in Windows don't respect plan mode always; It has edited files
          in plan mode; I never faced that issue in Linux though.
       
          globular-toast wrote 17 hours 23 min ago:
          And you'd trust that given CC is a vibe-coded mess?
          
          Editing to go even further because, I gotta say, this is a low point
          for HN. Here's a post with a real security tool and the top comment
          is basically "nah, just trust the software to sandbox itself". I feel
          like IQ has taken a complete nosedive in the past year or so. I guess
          people are already forgetting how to think?  Really sad to see.
       
            greenchair wrote 13 hours 33 min ago:
            IQ also going down due to bot spam.
       
          weinzierl wrote 18 hours 7 min ago:
          Is this a hard sandbox (enforced outside the LLM)?
       
          croes wrote 18 hours 10 min ago:
          Is that hard setting or does it depend on claude’s interpretation?
          
          The latter could end like this
          
   URI    [1]: https://news.ycombinator.com/item?id=47357042
       
          carderne wrote 18 hours 26 min ago:
          I’m surprised it works for you with such a simple config? I’m the
          one that added the allowRead option to Claude’s underlying sandbox
          [0] and had quite a job getting my toolchains and skills to work with
          it [1].
          
          [0] Fun to see the confusing docs I wrote show up more or less
          verbatim on Claude’s docs. [1] My config is here, may be useful to
          someone:
          
   URI    [1]: https://github.com/carderne/pi-sandbox/blob/main/sandbox.jso...
       
          gmerc wrote 18 hours 32 min ago:
          It’s cute because Claude has discretion to disable its own sandbox
          and does it
       
            js2 wrote 18 hours 26 min ago:
            > You can disable this escape hatch by setting
            "allowUnsandboxedCommands": false in your sandbox settings. When
            disabled, the dangerouslyDisableSandbox parameter is completely
            ignored and all commands must run sandboxed or be explicitly listed
            in excludedCommands. [1] (I have no idea why that isn't the default
            because otherwise the sandbox is nearly pointless and gives a false
            sense of security. In any case, I prefer to start Claude in a
            sandbox already than trust its implementation.)
            
   URI      [1]: https://code.claude.com/docs/en/sandboxing
       
          yu3zhou4 wrote 18 hours 43 min ago:
          So in some sense we start recreating an operating system, or at least
          the userspace, within the Claude code. There was some name for this
          pattern but I can’t recall
       
            xo5vik wrote 16 hours 54 min ago:
            Inner platform effect
            
   URI      [1]: https://en.wikipedia.org/wiki/Inner-platform_effect
       
            virgoerns wrote 17 hours 25 min ago:
            Emacs?
       
            catlifeonmars wrote 18 hours 8 min ago:
            It’s some sort of machine inside of a machine I think. Wait, I
            got it: a simulated machine!
       
          mazieres wrote 19 hours 49 min ago:
          Also, a lot of people use multiple harnesses.  I'm often switching
          between claude, codex, and opencode.  It's kind of nice to have the
          sandbox policy independent of the actual AI assistant you are
          running.
       
          andai wrote 20 hours 28 min ago:
          Does this also apply to the commands or programs that it runs?
          
          e.g. if it writes a script or program with a bug which affects other
          files, will this prevent it from deleting or overwriting them?
          
          What about if the user runs a program the agent wrote?
       
          what wrote 20 hours 47 min ago:
          lol if you think Claude is smart enough to block sneaky path strings
          based on your config.
       
          tasn wrote 20 hours 50 min ago:
          I use bbwrap to sandbox Claude. Works very well and gives me a lot of
          control and certainty around the sandbox.
       
          mazieres wrote 21 hours 19 min ago:
          I've seen claude get confused about what directory it's in.  And of
          course I've seen claude run rm -rf *.  Fortunately not both at the
          same time for me, but not hard to imagine.  The claude sandbox is a
          good idea, but to be effective it would need to be implemented at a
          very low level and enforced on all programs that claude launches. 
          Also, claude itself is an enormous program that is mostly developed
          by AI.    So to have a small <3000-line human-implemented program as
          another layer of defense offers meaningful additional protection.
       
            calvinmorrison wrote 6 hours 51 min ago:
            Pledge might be useful here
       
            digikata wrote 14 hours 37 min ago:
            One could run a docker container with claude code, with a bind to
            the project directory.    I do that but also run my docker
            daemon/container in a Linux VM.
       
            mroche wrote 15 hours 33 min ago:
            > The claude sandbox is a good idea, but to be effective it would
            need to be implemented at a very low level and enforced on all
            programs that claude launches.
            
            I feel like an integration with bubblewrap, the sandboxing tech
            behind Flatpak, could be useful here. Have all executed commands
            wrapped with a BW context to prevent and constrain access.
            
   URI      [1]: https://github.com/containers/bubblewrap
       
              r4indeer wrote 15 hours 19 min ago:
              Bubblewrap is exactly what the Claude sandbox uses.
              
              > These restrictions are enforced at the OS level (Seatbelt on
              macOS, bubblewrap on Linux), so they apply to all subprocess
              commands, including tools like kubectl, terraform, and npm, not
              just Claude’s file tools.
              
   URI        [1]: https://code.claude.com/docs/en/sandboxing
       
                Melonai wrote 11 hours 4 min ago:
                Oh wow I'd have expected them to vibe-code it themselves. Props
                to them, bubblewrap is really solid, despite all my issues with
                the things built on top of it, what, Flatpak with its infinite
                xdg portals, all for some reason built on D-Bus, which
                extremely unluckily became the primary (and only really viable)
                IPC protocol on Linux, bwrap still makes a great foundation,
                never had a problem with it in particular. I tend to use it a
                bunch with NixOS and I often see Steam invoking it to support
                all of its runtimes. It's containers but actually good.
       
                mroche wrote 15 hours 4 min ago:
                The more you know, thanks for the information!
       
            thehours wrote 17 hours 22 min ago:
            I added this to `~/.claude/settings.json`:
            
            "env": {  "CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR": "1"
            },
            
            > Working directory persists across commands. Set
            CLAUDE_BASH_MAINTAIN_PROJECT_WORKING_DIR=1 to reset to the project
            directory after each command.
            
            It reduces one problem - getting lost - but it trades it off for
            more complex commands on average since it has to specify the full
            path and/or `cd &&` most of the time.
            
            [0]
            
   URI      [1]: https://code.claude.com/docs/en/tools-reference#bash-tool-...
       
            martenlienen wrote 17 hours 50 min ago:
            That is exactly what it is. In the docs, it says that they use
            bubblewrap to run commands in a container that enforces file and
            network access at the system level.
       
            giancarlostoro wrote 19 hours 59 min ago:
            In my opinion Claude should be shipped by a custom implementation
            of "rm" that Anthropic can add guardrails to. Same with "find"
            surprised they don't just embed ripgrep (what VS Code does). It's
            really surprising they don't just tweak what Claude uses and lock
            it down to where it cannot be harmful. Ensure it only ever calls
            tooling Claude Code provides.
       
              torginus wrote 12 hours 30 min ago:
              Why cant you ship with OverlayFS which actually enforces these
              restrictions?
              
              I have seen the AI break out of (my admittedly flimsy) guards,
              like doing simply
              
              safepath/../../stuff or something even more convoluted like
              symlinks.
       
              nananana9 wrote 13 hours 32 min ago:
              Oh, rm failed, since we're running in a weird environment! Let me
              retry with `bash -c "/usr/bin/rm -rf *"`!
       
                giancarlostoro wrote 44 min ago:
                Ideally they control the harness and should be able to stop
                Claude from running any shell willy nilly.
       
              lxgr wrote 13 hours 58 min ago:
              > a custom implementation of "rm" that Anthropic can add
              guardrails to
              
              Wrong layer. You want the deletion to actually be impossible from
              a privilege perspective, not be made practically harder to the
              entity that shouldn't delete something.
              
              Claude definitely knows how to reimplement `rm`.
       
              throwaway2027 wrote 16 hours 10 min ago:
              All of which is useless when it just starts using big blocks of
              python instead. You need filesystem sandboxing for the python
              interpreter too.
       
                giancarlostoro wrote 5 hours 26 min ago:
                If you disallow it from just writing Python scripts to bypass
                its defined environment at its core system training why would
                this matter? I would lockdown its path anything that tries to
                call Python should require the end-user to approve and see the
                raw script before they do.
       
                  tintor wrote 5 hours 4 min ago:
                  It will then write script in some other language, as a
                  workaround.
       
                ethanwillis wrote 16 hours 1 min ago:
                What we need is a capabilities based security system. It could
                write all the python, asm, whatever it wants and it wouldn't
                matter at all if it was never given a reference to use
                something it shouldn't.
       
                  ma2kx wrote 6 hours 56 min ago:
                  There exist restricted Shells. But honestly, I don't feel
                  capable of assessing all attack vectors and security measures
                  in sufficient detail. For example, do the rbash restrictions
                  also apply when Python is called with it? Or can the agent
                  somehow bypass rbash to call Python?
                  
   URI            [1]: https://en.wikipedia.org/wiki/Restricted_shell
       
                  rienbdj wrote 11 hours 40 min ago:
                  Docker is enough in practice no?
       
                  mcv wrote 15 hours 39 min ago:
                  Isn't this already possible? Give it its own user account
                  with write access to the project directory and either read
                  access or no access outside it.
       
                    VorpalWay wrote 10 hours 19 min ago:
                    Unix permissions is not a capability system though.
                    Capabilities are more like "here is a file descriptor
                    pointing to a directory, you are not capable of referring
                    to anything outside it". So closer to chroot, except you
                    can have several such directory references at the same
                    time.
                    
                    You can always narrow down a capability (get a new
                    capability pointing to a subdirectory or file, or remove
                    the writing capability so it is read only) but never make
                    it more broad.
                    
                    In a system designed for this it will be used for
                    everything, not just file system. You might have
                    capabilities related to network connections, or IPC to
                    other processes, etc. The latter is especially attractive
                    in microkernel based OSes. (Speaking of which, Redox OS
                    seems to be experimenting with this, just saw an article
                    today about that.)
                    
                    See also
                    
   URI              [1]: https://en.wikipedia.org/wiki/Capability-based_sec...
       
                    100721 wrote 13 hours 1 min ago:
                    I have been putting my agents on their own, restricted
                    OS-level user accounts for a while. It works really well
                    for everything I do.
                    
                    Admittedly, there’s a little more friction and agent
                    confusion sometimes with this setup, but it’s worth the
                    benefit of having zero worries about permissions and
                    security.
       
                      jmogly wrote 12 hours 13 min ago:
                      Haha, you can already see wheel reinventors in this
                      thread starting to spin their reinvention wheels. Nice
                      stuff, I run my agents in containers.
       
              troupo wrote 16 hours 28 min ago:
              > Claude should be shipped by a custom implementation of
              
              And when that fails for some reason it will happily write and
              execute a Python script bypassing all those custom tools
       
              walthamstow wrote 16 hours 40 min ago:
              Claude has told me that its Grep tool does use rg under the hood,
              but I constantly find it using the Bash tool with grep
       
                giancarlostoro wrote 5 hours 23 min ago:
                When I tell it to use rg it goes much faster than it using
                grep. I really don't understand why its slower with grep.
       
              eru wrote 17 hours 40 min ago:
              > It's really surprising they don't just tweak what Claude uses
              and lock it down to where it cannot be harmful. Ensure it only
              ever calls tooling Claude Code provides.
              
              That would make it far less useful in general.
       
                KronisLV wrote 16 hours 36 min ago:
                Maybe Anthropic (or some collection of the large AI orgs, like
                OpenAI and Anthropic and Google coming together) should apply
                patches on top of (or fork altogether) the coreutils and
                whatever you normally get in a userland - a bit like what you
                get in Git Bash on Windows, just with:
                
                1) more guardrails in place
                
                2) maybe more useful error messages that would help LLMs
                
                3) no friction with needing to get any patches upstreamed
                
                External tool calling should still be an option ofc, but having
                utilities that are usable just like what's in the training
                data, but with more security guarantees and more useful output
                that makes what's going on immediately obvious would be great.
       
                  eru wrote 16 hours 17 min ago:
                  So for me, it's really, really useful for Claude to be able
                  to send Slack messages and emails or make pull requests.
                  
                  But that's also the most damaging actions it could take. 
                  Everything on my computer is backed up, but if Claude insults
                  my boss, that would be worse.
       
                    KronisLV wrote 12 hours 44 min ago:
                    > So for me, it's really, really useful for Claude to be
                    able to send Slack messages and emails or make pull
                    requests.
                    
                    Oh, I'm totally not arguing for cutting off other
                    capabilities, I like tool use and find it to be as useful
                    as the next person!
                    
                    Just that the shell tools that will see A LOT of usage have
                    additional guardrails added on top of them, because it's
                    inevitable that sooner or later any given LLM will screw up
                    and pipe the wrong thing in the wrong command - since you
                    already hear horror stories about devs whose entire
                    machines get wiped. Not everyone has proper backups (even
                    though they totally should)!
       
              oefrha wrote 18 hours 48 min ago:
              You can define your own rm shell alias/function and it will use
              that. I also have cp/mv aliases that forces -i to avoid
              accidental clobbering and it confuses Claude to no end (it uses
              cp/mv rare enough—rarer than it should, really—that I don’t
              bother wasting memory tokens on it).
       
                d1sxeyes wrote 18 hours 33 min ago:
                I did this, Claude detected it and decided to run /bin/rm
                directly.
       
                  cogogo wrote 13 hours 58 min ago:
                  This is terrifying. I have not used agents because I do not
                  have a sandbox machine I do not care about. Am I crazy to
                  worry about a sandboxed agent running on my home network?
                  Anyone experienced anything weird by doing that?
       
                    oefrha wrote 13 hours 49 min ago:
                    Don’t dangerously skip permissions and actually read
                    commands when you get prompted and you’re fine.
       
                      d1sxeyes wrote 13 hours 44 min ago:
                      Yeah, I actually have both an alias for `rm` and a custom
                      seatbelt sandbox which means the agent can only delete
                      stuff within the directory it’s working in, so wasn’t
                      an issue, was just fun to watch it say “hm, that
                      doesn’t seem to work. Looks like the user has aliased
                      rm. I’ll just go ahead and work around it”
       
            esperent wrote 20 hours 42 min ago:
            I added a hook to disable rm, find - delete, and a few of the other
            more obvious destructive ops. It sends Claude a strongly worded
            message: "STOP IMMEDIATELY. DO NOT TRY TO FIND WORKAROUNDS...".
            
            It works well. Git rm is still allowed.
       
              lxgr wrote 13 hours 56 min ago:
              It works well so far, for you.
              
              Are you confident it would still work against sophisticated
              prompt injection attacks that override your "strongly worded
              message"?
              
              Strongly worded signs can be great for safety (actual mechanisms
              preventing undesirable actions from being taken are still much
              better), but are essentially meaningless for security.
       
                unshavedyak wrote 9 hours 13 min ago:
                Not sure about OPs impl, but the wording doesn’t matter. The
                hook prevents the use of whatever action you want. Eg it’s
                impossible for Claude to use Emojis for me. My hook doesn’t
                allow it.
                
                So it’s deterministic based upon however the script it
                written
       
                esperent wrote 12 hours 52 min ago:
                I mean, that's like saying are you sure that your antivirus
                would prevent every possible virus? Are you sure that you
                haven't made some mistake in your dev box setup that would
                allow a hacker to compromise it? What if a thief broke i to
                your house and stole your laptop? That's happened to me before,
                much more annoying to recover from that an accidental rm rf.
                
                I do my best to keep off site back ups and don't worry about
                what I can't control.
       
                  lxgr wrote 12 hours 31 min ago:
                  > I mean, that's like saying are you sure that your antivirus
                  would prevent every possible virus?
                  
                  Yes, I'm saying it's pretty much as bad as antivirus
                  software.
                  
                  > Are you sure that you haven't made some mistake in your dev
                  box setup that would allow a hacker to compromise it?
                  
                  Different category of error: Heuristically derived
                  deterministic protection vs. protection based on a stochastic
                  process.
                  
                  > much more annoying to recover from that an accidental rm
                  rf.
                  
                  My point is that it's a different category, not that one is
                  on average worse than the other. You don't want your security
                  to just stand against the median attacker.
       
              Diti wrote 18 hours 47 min ago:
              I added something similar. Claude eventually ran a `rm -rf *´ on
              my own project. When I asked why it did that, it recognized it
              messed up and offered a very bad “apology”: “the irony of
              not following your safety instructions isn’t lost on me”.
              
              Nowadays I only run Claude in Plan mode, so it doesn’t ask me
              for permissions any more.
       
            PaulDavisThe1st wrote 21 hours 15 min ago:
            On Linux, chroot(2) is hard to escape and would apply to all child
            processes without modification.
       
              wasted_intel wrote 10 hours 27 min ago:
              That comparison is made on the project homepage:
              
              "Not a security mechanism. No mount isolation, no PID namespace,
              no credential separation. Linux documents it as not intended for
              sandboxing."
       
              safety1st wrote 17 hours 36 min ago:
              We anthropomorphize these agents in every other way. Why aren't
              we using plain ol' unix user accounts to sandbox them?
              
              They look a lot like daemons to me, they're a program that you
              want hanging around ready to respond, and maybe act autonomously
              through cron jobs are similar. You want to assign any number of
              permissions to them, you don't want them to have access to root
              or necessarily any of your personal files.
              
              It seems like the permissions model broadly aligns with how we
              already handle a lot of server software (and potentially
              malicious people) on unix-based OSes. It is a battle-tested
              approach that the agent is unlikely to be able to "hack" its way
              out of. I mean we're not really seeing them go out onto the
              Internet and research new Linux CVEs.
              
              Have them clone their own repos in their own home directory too,
              and let them party.
              
              Openclaw almost gets there! It exposes a "gateway" which sure
              looks like a daemon to me. But then for some reason they want it
              to live under your user account with all your privileges and in a
              subfolder of your $HOME.
       
                gwking wrote 10 hours 18 min ago:
                I tried this with Claude code on macOS. I created a new agent
                user and a wrapper do run Claude has that user, along with some
                scripts to set permissions and ownership so that I could run
                simple allow/deny commands. The only problem was that the fancy
                oauth flow broke. I filed an issue with Anthropic and their
                ticket bot auto closed it “for lack of interest” or
                whatever.
                
                I fiddled with transferring the saved token from my keychain to
                the agent user keychain but it was not straightforward.
                
                If someone knows how to get a subscription to Claude to work on
                another user via command line I’d love to know about it.
       
                lxgr wrote 13 hours 53 min ago:
                > for some reason they want it to live under your user account
                
                The entire idea of Openclaw (i.e., the core point of what
                distinguishes it from agents like Claude Code) is to give it
                access to your personal data, so it can act as your assistant.
                
                If you only need a coding agent, Openclaw is the completely
                wrong tool. (As a side note, after using it for a few weeks,
                I'm not convinced it's the right tool for anything, but that's
                a different story.)
       
                jon-wood wrote 16 hours 13 min ago:
                Oh that’s an idea. I was going to argue that it’s a problem
                that you might want multiple instances in different contexts
                but sandboxing processes (possibly instanced) is exactly what
                systemd units are designed to deal with.
       
                search_facility wrote 16 hours 58 min ago:
                Exactly!
       
              shakna wrote 21 hours 9 min ago:
              chroot is not a security sandbox. It is not a jail.
              
              Escaping it is something that does not take too much effort. If
              you have ptrace, you can escape without privileges.
       
                brianush1 wrote 20 hours 53 min ago:
                claude is stupid but not malicious; chroot is sufficient
       
                  fl7305 wrote 6 hours 0 min ago:
                  Sure, it's not malicious. But it is very eager to get things
                  done, and surprisingly inventive and knowledgeable in all
                  kinds of workarounds.
       
                  lxgr wrote 13 hours 51 min ago:
                  Until it gets prompt injected. Are you reading every single
                  file your agent reads as part of the tasks you give it,
                  including content fetched from the web or third-party
                  packages?
       
                  furyofantares wrote 20 hours 16 min ago:
                  I've many times seen Claude try to execute a command that
                  it's not supposed to, the harness prevents it, and then it
                  writes and executes a python script to do it.
       
                    j16sdiz wrote 19 hours 1 min ago:
                    breaking a chroot takes more than that..
       
                      furyofantares wrote 7 hours 50 min ago:
                      How much more? Depends on the system doesn't it? I don't
                      know how many systems have proc mounted but don't you get
                      it from /proc/self/root?
                      
                      Anyway that's beside the point, which is that it doesn't
                      have to "be malicious" to try to overcome what look like
                      errors on its way to accomplishing the task you asked it
                      to do.
       
                      hoppp wrote 10 hours 48 min ago:
                      That doesn't mean claude can't do it, chroot is better
                      than nothing but not a real solution
       
                  karhagba wrote 20 hours 39 min ago:
                  Claude is far from stupid from my experience.
                  I've used so many models and Claude is king.
       
                  nofriend wrote 20 hours 43 min ago:
                  Malice is not required. If it thinks it is in the right, then
                  it will do whatever it takes to get around limitations.
       
          nurettin wrote 21 hours 26 min ago:
          It will just do
          
              ssh you@localhost "rm -rf ~"
       
            PaulDavisThe1st wrote 21 hours 14 min ago:
            Well, now it will ....
       
              xdavidliu wrote 14 hours 46 min ago:
              kinda reminds me of the plot of Sphere, where Samuel L Jackson is
              reading 20,000 leagues under the sea and is thinking of giant
              squids.
       
          8cvor6j844qw_d6 wrote 22 hours 48 min ago:
          Interesting, thanks. I use remote ephemeral dev containers with
          isolated envs, so filesystem damage isn't really a concern as long as
          the PR looks good in review. Nice extra guardrail though, will add it
          to the project-level settings.
       
            overfeed wrote 20 hours 28 min ago:
            i use local dev containers: the worst an agent can do is delete its
            working copy; no access to my home directory, access tokens or
            sudo.
       
          cozzyd wrote 22 hours 53 min ago:
          Is this a real sandbox or just a pretty please?
       
            enduser wrote 21 hours 48 min ago:
            By default it will automatically retry many tool calls that fail
            due to the sandbox with the sandbox disabled. In other words it can
            and will leave the sandbox.
            
            For example:
            
            Bash(swift build 2>&1 | tail -20)
            
              ⎿  warning: 
            
            /Users/enduser/Library/org.swift.swiftpm/configuration is not
            accessible or not writable, disabling user-level cache
                  features.
            
                 warning: /Users/enduser/Library/org.swift.swiftpm/security is
            not accessible or not writable, disabling user-level cache feat
            
                 … +26 lines (ctrl+o to expand)
            
            Build hit sandbox restriction. Retrying outside sandbox.
            
            Bash(swift build 2>&1 | tail -20)
            
              ⎿  [35/52] Compiling MCP Resources.swift
            
                 [36/52] Emitting module MCP
            
                 [37/52] Compiling MCP Client.swift
            
                 … +17 lines (ctrl+o to expand)
            
              ⎿  (timeout 3m)
       
              fc417fc802 wrote 20 hours 24 min ago:
              What is even the point in that case? The behavior you describe is
              no better than if SELinux were to automatically re-execute a
              process with containment disabled.
       
                js2 wrote 7 hours 7 min ago:
                Disable sandbox escape:
                
   URI          [1]: https://news.ycombinator.com/item?id=47552165
       
                erinnh wrote 18 hours 54 min ago:
                Looking at the settings, its an option:
                
                  Configure Overrides:                           
                                                   
                                         
                                                   
                                                   
                                         
                   1. Allow unsandboxed fallback                   
                                                   
                                        
                    2. Strict sandbox mode (current)                   
                                                   
                                         
                                                   
                                                   
                                         
                  Allow unsandboxed fallback: When a command fails due to
                sandbox restrictions, Claude can retry with
                dangerouslyDisableSandbox to run outside the sandbox (falling
                back to  
                   default permissions).                       
                                                   
                                         
                                                   
                                                   
                                         
                  Strict sandbox mode: All bash commands invoked by the model
                must run in the sandbox unless they are explicitly listed in
                excludedCommands.
       
                ihattendorf wrote 19 hours 47 min ago:
                The purpose of the sandbox is to reduce permission fatigue. If
                it fails to run a command in the sandbox and retries it outside
                the sandbox, the regular permission rules apply. You'll still
                be prompted for any non-sandboxed tool calls that you haven't
                allowed or denied via permission rules.
       
            ray_v wrote 22 hours 12 min ago:
            It seems like it's controlled by the Bash tool ( [1] ) and then
            bubblewrap ( [2] ) on linux and Seatbelt on mac at the system level
            
   URI      [1]: https://code.claude.com/docs/en/sandboxing
   URI      [2]: https://github.com/containers/bubblewrap
       
            AnotherGoodName wrote 22 hours 48 min ago:
             [1] says they integrated bubblewrap (linux/windows), seatbelt
            (macos) and give an error if sandbox can't be supported so appears
            to be real.
            
   URI      [1]: https://code.claude.com/docs/en/sandboxing
       
              throwaway6734 wrote 22 hours 46 min ago:
               [1] Any idea on how that compares to this docker feature in
              development?
              
   URI        [1]: https://docs.docker.com/ai/sandboxes/
       
                figmert wrote 21 hours 16 min ago:
                Docker containers use cgroups and namespaces etc (the usual
                kernel level isolation)
                
                Docker sandboxes use microvms (i.e. hardware level isolation)
                
                Bubblewrap uses the same technology as containers
                
                I am unsure about seatbelt.
       
          harikb wrote 22 hours 55 min ago:
          I think the point would be that - some random upcoming revision of
          claude-code could remove or simply change the config name just as
          silently as it was introduced.
          
          People might genuinely want some other software to do the sandboxing.
          Something other than the fox.
       
          mycall wrote 22 hours 56 min ago:
          I noticed codex has a sandbox, wondering if it has a comparable
          config section.
       
            tofflos wrote 14 hours 26 min ago:
            Codex uses and ships with bubblewrap on Linux and will attempt to
            use the version installed on the path before falling back to the
            shipped version with a warning message.
            
            You should be able to configure the sandbox using [1] if you are a
            person who prefers the convenience of codex being able to open the
            sandbox over an externally enforced sandbox like jai.
            
   URI      [1]: https://developers.openai.com/codex/agent-approvals-securi...
       
        adi_kurian wrote 23 hours 15 min ago:
        Claude's stock unprompted / uninspired UI code creates carbon clone
        components. That "jai is not a promise of perfect safety" callout box
        is like the em dash of FE code. The contrast, or lack thereof, makes
        some of the text particularly invisible.
        
        I wonder if shitty looking websites and unambitious grammar will become
        how we prove we are human soon.
       
          NetOpWibby wrote 23 hours 5 min ago:
          Everything old is new again
       
        messh wrote 23 hours 21 min ago:
        How is this different than say bubblewrap and others?
       
          girvo wrote 23 hours 19 min ago:
           [1] > bubblewrap is more flexible and works without root. jai is
          more opinionated and requires far less ceremony for the common case.
          The 15-flag bwrap invocation that turns into a wrapper script is
          exactly the friction jai is designed to remove.
          
          Plus some other comparisons, check the page
          
   URI    [1]: https://jai.scs.stanford.edu/comparison.html#jai-vs-bubblewr...
       
            attentive wrote 19 hours 7 min ago:
            bubblewrap is in many modern distros standard packages.
            
            With all the supply chain issues these days onboarding new tools
            carries extra risks. So, question is if it's worth it.
       
        triilman wrote 23 hours 22 min ago:
        What would Jonathan Blow think about this.
       
          ghighi7878 wrote 23 hours 16 min ago:
          My name is also jai
       
        BoppreH wrote 23 hours 33 min ago:
        Excellent project, unfortunate title. I almost didn't click on it.
        
        I like the tradeoff offered: full access to the current directory,
        read-only access to the rest, copy-on-write for the home directory.
        With stricter modes to (presumably) protect against data exfiltration
        too. It really feels like it should be the default for agent systems.
       
          fouc wrote 23 hours 20 min ago:
          Since the site itself doesn't really have a title, I probably
          would've went with something like "jai - filesystem containment for
          AI agents"
       
        mazieres wrote 1 day ago:
        What would it take for people to stop recklessly running unconstrained
        AI agents on machines they actually care about? A Stanford researcher
        thinks the answer is a new lightweight Linux container system that you
        don't have to configure or think about.
       
          jillesvangurp wrote 16 hours 25 min ago:
          There always has been this tension between protecting resources and
          allowing users to access those resources in security. With many
          systems you have admin/root users and regular users. Some things
          require root access. Most interesting things (from a security point
          of view) live in the user directory. Because that's where users spend
          all their time. It's where you'll find credentials, files with
          interesting stuff inside, etc. All the stuff that needs protecting.
          
          The whole point of using a computer is being able to use it. For
          programmers, that means building software. Which until recently meant
          having a lot of user land tools available ready to be used by the
          programmer. Now with agents programming on their behalf, they need
          full access to all that too in order to do the very valuable and
          useful things they do. Because they end up needing to do the exact
          same things you'd do manually.
          
          The current security modes in agents are binary. Super anal about
          absolutely everything; or off. It's a false choice. It's technically
          your choice to make and waive their liability (which is why they need
          you to opt in); but the software is frustrating to use unless you
          make that choice. So, lots of people make that choice. I'm guilty as
          well. I could approve every ansible and ssh command manually (yes
          really). But a typical session where codex follows my guardrails to
          manage one of my environments using ansible scripts it maintains just
          involves a whole lot such commands. I feel dirty doing it. But it
          works so well that doing all that stuff manually is not something I
          want to go back to.
          
          It's of course insecure as hell and I urgently need something better
          than yolo mode for this. One of the reasons I like codex is that (so
          far) it's pretty diligent about instruction following and guard
          rails. It's what makes me feel slightly more relaxed than I perhaps
          should be. It could be doing a lot of damage. It just doesn't seem to
          do that.
       
          vardalab wrote 22 hours 34 min ago:
          unconstrained AI agents are what makes it so useful though.
          I have been using claude for almost a year now and the biggest unlock
          was to stop being a worrywart early on and just literally giving it
          ssh keys and telling it to fix something.  ofc I have backups and do
          run it in VM but in that VM it helps me manage by infra and i have a
          decent size homelab that would be no fun but a chore without this
          assistant.
       
            hrmtst93837 wrote 18 hours 42 min ago:
            Letting an agent loose with SSH keys is fine when the blast radius
            is one disposable VM, but scale that habit to prod or the wrong
            subnet and you get a fast refresher on why RBAC exists, why scoped
            creds exist, and why people who clean up after outages get very
            annoyed by this whole genre of demo. Feels great, until it doesn't.
       
            bigstrat2003 wrote 20 hours 7 min ago:
            > unconstrained AI agents are what makes it so useful though
            
            Not remotely worth it.
       
            sersi wrote 20 hours 10 min ago:
            I run my AI agent unconstrained in a VM without access to my local
            network so it can futz with the system however it wants (so far,
            I've had to rebuild the VM twice from Claude borking it). That
            works great for software development.
            
            For devops work, etc (like your use case), I much prefer talking to
            it and letting it guide me into fixing the issue. Mostly because
            after that I really understand what the issue was and can fix it
            myself in the future.
       
            kristofferR wrote 21 hours 14 min ago:
            Agree, but SSH agents like 1Passwords are nice for that.
            
            You simply tell it to install that Docker image on your NAS like
            normal, but when it needs to login to SSH it prompts for
            fingerprint. The agent never gets access to your SSH key.
       
          fouc wrote 23 hours 18 min ago:
          except the big AI companies are pushing stuff designed for people to
          run on their personal computers, like Claude Cowork.
       
          mememememememo wrote 23 hours 24 min ago:
          Yes. It is like walking arounf your house with a flamethrower, but
          you added fire retardant. Just take the flamethower to a shed you
          don't mind losing. Which is some kind of cloud workspace most likely.
          Maybe an old laptop.
          
          Still if you yolo online access and give it cred or access to tools
          that are authenticated there can still be dragons.
       
            mazieres wrote 21 hours 29 min ago:
            The problem is that in practice, many people don't take the
            flamethrower to the shed.  I recently had a conversation with
            someone who was arguing that you don't really need jai because
            docker works so well.  But then it turned out this person regularly
            runs claude code in yolo mode without a container!
            
            It's like people think that because containers and VMs exist, they
            are probably going to be using them when a problem happens.  But
            then you are working in your own home directory, you get some
            compiler error or something that looks like a pain to decipher, and
            the urge just to fire up claude or codex right then and there to
            get a quick answer is overwhelming.  Empirically, very few people
            fire up the container at that point, whereas "jai claude" or "jai
            -D claude" is simple enough to type, and basically works as well as
            plain claude so you don't have to think about it.
       
       
   DIR <- back to front page