_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   World Emulation via Neural Network
       
       
        alekseiprokopev wrote 1 day ago:
        It would be quite interesting to try to mess with the neural
        representations do add or remove the images of some objects there. I'm
        also curious if the topology of the actual place is similar to the
        topology of the embedding space.
       
        stormfather wrote 1 day ago:
        Its a time capsule, among other things. I want to take many, many
        videos of my grandpa's farm, and be able to walk around in it in VR
        using something like this in the future.
       
          foxglacier wrote 17 hours 42 min ago:
          You can do it using the more classic technique of photogrammetry.
          There are commercial products used by real estate salesmen to produce
          high quality "games" where you walk around inside a house, but
          they're more like Google Streetview where you swoosh between points
          where a 360 degree photo was taken. All those things will be more
          faithful than neurally generating next frames based on previous
          frames and control input.
       
        montebicyclelo wrote 1 day ago:
        Awesome work / demo / blog
        
        Link to the demo in case people miss it [1] > using a customized camera
        app which also recorded my phone’s motion
        
        Using phone's gyro as a proxy for "controls" is very clever
        
   URI  [1]: https://madebyoll.in/posts/world_emulation_via_dnn/demo/
       
        Imanari wrote 1 day ago:
        Amazing work. Could you elaborate on the model architecture and the
        process that lead you to using this architecture?
       
          Macuyiko wrote 1 day ago:
          The model seems to be viewable here:
          
   URI    [1]: https://netron.app/?url=https://madebyoll.in/posts/world_emu...
       
        das_keyboard wrote 1 day ago:
        >  So, if traditional game worlds are paintings, neural worlds are
        photographs. Information flows from sensor to screen without passing
        through human hands.
        
        I don't get this analogy at all. Instead of a human information flows
        through a neural network which alters the information.
        
        > Every lifelike detail in the final world is only there because my
        phone recorded it.
        
        I might be wrong here but I don't think this is true. It might also be
        there because the network inferred that it is there based on previous
        data.
        
        Imo this just takes the human out of a artistic process - creating
        video game worlds and I'm not sure if this is worth archiving.
       
          Legend2440 wrote 22 hours 51 min ago:
          >It might also be there because the network inferred that it is there
          based on previous data.
          
          There is no previous data. This network is exclusively trained on the
          data he collected from the scene.
       
          ajb wrote 1 day ago:
          >I don't get this analogy at all. Instead of a human information
          flows through a neural network which alters the information.
          
          These days most photos are also stored using lossy compression which
          alters the information.
          
          You can think of this as a form of highly lossy compression of an
          image of this forest in time and space.
          
          Most lossy compression is 'subtractive' in that detail is subtracted
          from the image in order to compress it, so the kind of alterations
          are limited. However there have been previous non-subtractive forms
          of compression (eg, fractal compression) that have been criticised on
          the basis of making up details, which is certainly something that a
          neural network will do. However if the network is only trained on
          this forest data, rather than being also trained on other data and
          then fine tuned, then in some sense it does only represent this
          forest rather than giving an 'informed impression' like a human
          artist would.
       
            andai wrote 23 hours 17 min ago:
            >These days most photos are also stored using lossy compression
            which alters the information.
            
            I noticed this in some photos I see online starting maybe 5-10
            years ago.
            
            I'd click through to a high res version of the photo, and instead
            of sensor noise or jpeg artefacts, I'd see these bizarre snakelike
            formations, as though the thing had been put through style
            transfer.
       
        titouanch wrote 1 day ago:
        This is very impressive for a hobby project. I was wondering if you
        were planning to release the source code. Being able to create
        client-hosted, low-requirement neural networks for world generation
        could be really useful for game dev or artistic projects.
       
          thenthenthen wrote 1 day ago:
          Yes please! I would love to try and use this on disappearing
          neighbourhoods, the results are so dreamlike, or like memories!
       
        nopakos wrote 1 day ago:
        Next we should try "Excel emulation via Neural Network". We get rid of
        a lot of intermediate steps, calculations, user interface etc!
        
        What could go wrong?
        
        Jokes aside, this is insanely cool!
       
          downboots wrote 1 day ago:
          or for a large dataset of math identities and have the user draw one
          side
       
        Valk3_ wrote 1 day ago:
        This might be a vague question, but what kind of intuition or knowledge
        do you need to work with these kind of things, say if you want to make
        your own model? Is it just having experience with image generation and
        trying to incorporate relevant inputs that you would expect in a 3D
        world, like the control information you added for instance?
       
          ollin wrote 20 hours 19 min ago:
          I think [1] is a reasonable place to start (they have public
          world-model training code, and people have successfully adapted their
          codebase to other games e.g. [2] ). Most modern world models are
          essentially image generators with additional inputs (past-frames +
          controls) added on, so understanding how Diffusion/IADB/Flow Matching
          work would definitely help.
          
   URI    [1]: https://diamond-wm.github.io
   URI    [2]: https://derewah.dev/projects/ai-mariokart
       
            Valk3_ wrote 18 hours 31 min ago:
            Thanks!
       
        bjornsing wrote 1 day ago:
        What used to be cutting edge research not so long ago is now a fun
        hobby project. I love it.
       
        gitroom wrote 1 day ago:
        Gotta say, Ive always wanted to try  building something like this
        myself. That kind  of  grind pays off way more than shiny 
        announcements  imo.
       
        ilaksh wrote 1 day ago:
        This seems incredibly powerful.
        
        Imagine a similar technique but with productivity software.
        
        And a pre-trained network that adapts quickly.
       
        tehsauce wrote 1 day ago:
        I love this! Your results seem comparable to the counter strike or
        minecraft models from a bit ago with massively less compute and data.
        It's particularly cool that it uses real world data. I've been wanting
        to do something like this for a while, like capturing a large dataset
        while backpacking in the cascades :)
        
        I didn't see it in an obvious place on your github, do you have any
        plans to open source the training code?
       
        AndrewKemendo wrote 1 day ago:
        I think this is very interesting because you seem to have reinvented
        NeRF, if I’m understanding it correctly. I only did one pass through
        but it looks at first glance like a different approach entirely.
        
        More interesting is that you made an easy to use environment authoring
        tool that (I haven’t tried it yet) seems really slick.
        
        Both of those are impressive alone but together that’s very exciting.
       
          bjornsing wrote 1 day ago:
          NeRF is a more complex and constrained approach, based on a kind of
          ray tracing. But results are obviously similar.
       
            AndrewKemendo wrote 1 day ago:
            Right which is why i said it’s an entirely different approach but
            results in almost the same kind of output
       
        udia wrote 1 day ago:
        Very nice work. Seems very similar to the Oasis Minecraft simulator.
        
   URI  [1]: https://oasis.decart.ai/
       
          ollin wrote 1 day ago:
          Yup, definitely similar! There are a lot of video-game-emulation
          World Models floating around now, [1] had a list. In the self-driving
          & robotics literature there have also been many WMs created for
          policy training and evaluation. I don't remember a prior WM built on
          first-person cell-phone video, but it's a simple enough concept that
          someone has probably done it for a student project or something :)
          
   URI    [1]: https://worldarcade.gg
       
        bitwize wrote 1 day ago:
        I want to see a spiritual successor to LSD: Dream Emulator based on
        this.
        
   URI  [1]: https://en.m.wikipedia.org/wiki/LSD:_Dream_Emulator
       
        throwaway314155 wrote 1 day ago:
        Really cool. How much compute did you require to successfully train
        these models? Is it in the ballpark of something you could do with a
        single gaming GPU? Or did you spin up something fancier?
        
        edit: I see now that you mention a pricepoint of 100 GPU-hours/roughly
        100$. My mistake.
       
        puchatek wrote 1 day ago:
        This is great but I think I'll stick to mushrooms.
       
          ulrikrasmussen wrote 1 day ago:
          I also thought those wooden guard rails looked pretty spot on how
          they would look on 2C-B. The only thing that's missing is the overlay
          of geometric patterns on even surfaces.
       
          LoganDark wrote 1 day ago:
          For some reason, psilocybin causes me to randomly just lose
          consciousness, and LSD doesn't. Weird stuff.
       
          bongodongobob wrote 1 day ago:
          Yeah, the similarities to psychedelics with some of this stuff is
          remarkable.
       
            ilaksh wrote 1 day ago:
            It makes me think that maybe our visual perception is similar to
            what this program is doing in some ways.
            
            I wonder if there are any computer vision projects that take a
            similar world emulation approach?
            
            Imagine you collected the depth data also.
       
              voidspark wrote 1 day ago:
              Yes the model is a U-Net, which is a type of Convolutional Neural
              Network (CNN), which is inspired by the structure of the visual
              cortex.
              
   URI        [1]: https://en.wikipedia.org/wiki/Convolutional_neural_netwo...
       
        alain94040 wrote 1 day ago:
        Appreciate this article that shows some failures on the way to a great
        result. Too many times, people only show how the polished end-result:
        look, I trained this AI and it produces these great results. The world
        dissolving was very interesting to see, even if I'm not sure I
        understand how it got fixed.
       
          ollin wrote 1 day ago:
          Thanks! My favorite failure mode (not mentioned in the post - I think
          it was during the first round of upgrades?) was a "dry" form of
          soupification where the texture detail didn't fully disappear
          
   URI    [1]: https://imgur.com/c7gVRG0
       
        quantumHazer wrote 1 day ago:
        Is this a solo/personal project? If it is is indeed very cool.
        
        Is OP the blog’s author? Because in the post the author said that the
        purpose of the project is to show why NN are truly special and I wanted
        a more articulate view of why he/she thinks that?
        Good work anyway!
       
          ollin wrote 1 day ago:
          Yes! This was a solo project done in my free time :) to learn about
          WMs and get more practice training GANs.
          
          The special aspect of NNs (in the context of simulating worlds) is
          that NNs can mimic entire worlds from videos alone, without access to
          the source code (in the case of pokemon) or even without the source
          code having existed (as is the case for the real-world forest trail
          mimicked in this post). They mimic the entire interactive behavior of
          the world, not just the geometry (note e.g. the not-programmed-in
          autoexposure that appears when you look at the sky).
          
          Although the neural world in the post is a toy project, and quite far
          from generating photorealistic frames with "trees that bend in the
          wind, lilypads that bob in the rain, birds that sing to each other",
          I think getting better results is mostly a matter of scale. See e.g.
          the GAIA-2 results ( [1] , [2] ) for an example of what WMs can do
          without the realtime-rendering-in-a-browser constraints :)
          
   URI    [1]: https://wayve.ai/wp-content/uploads/2025/03/generalisation_0...
   URI    [2]: https://wayve.ai/wp-content/uploads/2025/03/unsafe_ego_01_le...
       
            attilakun wrote 1 day ago:
            Amazing project. This has the same feel as Karpathy’s classic
            “The Unreasonable Effectiveness of Recurrent Neural Networks”
            blog post. I think in 10 years’ time we will look back and say
            “wow, this is how it started.”
       
            janalsncm wrote 1 day ago:
            You mentioned it took 100 gpu hours, what gpu did you train on?
       
              ollin wrote 1 day ago:
              Mostly 1xA10 (though I switched to 1xGH200 briefly at the end,
              lambda has a sale going). The network used in the post is very
              tiny, but I had to train a really long time w/ large batch to get
              somewhat-stable results.
       
          treesciencebot wrote 1 day ago:
          author is:
          
   URI    [1]: https://x.com/madebyollin
       
       
   DIR <- back to front page