_______               __                   _______
       |   |   |.---.-..----.|  |--..-----..----. |    |  |.-----..--.--.--..-----.
       |       ||  _  ||  __||    < |  -__||   _| |       ||  -__||  |  |  ||__ --|
       |___|___||___._||____||__|__||_____||__|   |__|____||_____||________||_____|
                                                             on Gopher (inofficial)
   URI Visit Hacker News on the Web
       
       
       COMMENT PAGE FOR:
   URI   Paper Tape Is All You Need – Training a Transformer on a 1976 Minicomputer
       
       
        tcdent wrote 3 hours 39 min ago:
        5.5 min to train on a PDP/11 you mean to tell me we could have been
        doing this all along???
       
          rahen wrote 3 hours 24 min ago:
          Yes. The Cray supercomputers from the 80s were crazy good matmul
          machines in particular. The quad-CPU Cray X-MP (1984) could sustain
          800 MFLOPS to 1 GFLOPS, and with a 1 GB SSD, had enough computer
          power and bandwidth to train a 7-10M-parameter language model in
          about six months, and infer at 18-25 tok/sec.
          
          A mid-90s Cray T3E could have handled GPT-2 124M, 24 years before
          OpenAI.
          
          I also had a punch-card computer from 1965 learn XOR with
          backpropagation.
          
          The hardware was never the bottleneck, the ideas were.
       
        mkagenius wrote 4 hours 13 min ago:
        I had the exact same idea but for AI agent harnesses.
        
        I even created an app to explain it - [1] (deleted the app as got no
        traction whatsoever)
        
        Idea was that, the ai models like opus 4.6 and codex 5.4 have become so
        good at trying new ways to attack a problem, that even just Bash() tool
        is enough.
        
        Continuing the idea, infact even File() operations are enough.
        
        Again continuing the same line of thought, even just a Tape is enough.
        Given enough time, codex and opus will achieve your target.
        
   URI  [1]: https://news.ycombinator.com/item?id=47381803
       
        ashwinnair99 wrote 6 hours 52 min ago:
        The fact that it is possible at all says more about how simple
        transformers actually are underneath than it does about the hardware.
       
        kmoser wrote 10 hours 7 min ago:
        > I don't have an actual paper tape reader, so the object code is
        directly deposited in memory through the console.
        
        So, really, a Turing Machine is all you need?
       
          thyrsus wrote 8 hours 9 min ago:
          I dealt with physical paper tape on only three or four occasions in
          the early 1980's, each time terrified of a jam or tear.  It seems in
          this case it's a read-once operation, which is plausible.  Read-many,
          not so much.  Punch cards are orders of magnitude more reliable.
       
        kristopolous wrote 10 hours 14 min ago:
        I like how the author's "modern" machine to connect to it is still 20
        years old.
        
        With a concave trackpoint, respect.
        
        BTW, I nag Framework at every conference I go to that people want this
        shell and keyboard. It's been years. I think it's time to go through
        the effort to figure out how to do the production run of the case
        myself. Framework actually wants people to do things like this but you
        know, manufacturing is hard. Anyone wanna help?
       
        rahen wrote 11 hours 10 min ago:
        Thanks for reposting! I'm the author of ATTN-11. Happy to answer any
        questions about the fixed-point arithmetic, the PDP-11 hardware, or the
        training process.
       
          McGlockenshire wrote 7 hours 22 min ago:
          Thank you for the inspiration, I now have a practical-impractical
          assembly project for my TI TMS99105A homebrew! The 64k barrier is a
          real pain.
       
            rahen wrote 6 hours 33 min ago:
            I also have a working design for a small Transformer on the
            original Game Boy. It has around 4000 parameters fitting in the 8
            KB cartridge SRAM, where the "saved game" is the trained model. A
            TI-82 with its 32 KB of RAM would be even more comfortable.
       
          dare944 wrote 9 hours 35 min ago:
          Fun stuff! At one point I wondered about building something similar.
          But I lack the AI chops, and have too many other projects going on
          anyway.
          
          I'm curious as to the type of memory in the 11/34. I also have a
          working PDP-11, an 11/05 with 32KW of actual core. I wonder what
          performance would be like with EIS emulation grafted in.  Stunningly
          slow, I imagine.
          
          Thanks for publishing this.
       
          functional_dev wrote 10 hours 25 min ago:
          Incredible work! Fitting  transformer into 32KB RAM is crazy
          
          For those who read this project and do not know PDP-11 it could be
          hard to understand that working with these memory limits is
          difficult.
          Here is visual guide for PDP11 architecture - [1] Thanks for this
          amazing project!
          
   URI    [1]: https://vectree.io/c/pdp-11-hardware-architecture
       
            PaulHoule wrote 9 hours 46 min ago:
            That PDP-11 was the most fun minicomputer of the late 1970s in my
            opinion.  Growing up in NH about an hour north of Digital's HQ all
            sorts of schools from primary to secondary as well as museums had
            PDP-8, PDP-10, PDP-11 and later VAX machines.
            
            The PDP-11 had a timesharing OS called RSTS/E which could give
            maybe 10 people a BASIC programming experience a little bit better
            than an Apple ][.  If you were messing with 8-bit microcomputers in
            1981 you might think a 16-bit future would look like the PDP-11 but
            the 1970 design was long in the tooth by 1980 -- like 8-bit micros
            it was limited to a 64kb logical address space.  Virtual memory let
            it offer 64k environments to more users,  but not let a user have a
            bigger environment.
       
        AnimalMuppet wrote 11 hours 13 min ago:
        Woah.  Dude has a running PDP-11/34 in 2026?  Personally, I find that
        more impressive than the program.
       
          adrian_b wrote 8 hours 33 min ago:
          Not only that, but the author has also written a cycle-accurate
          PDP-11/34 simulator for the benefit of those who do not have such
          hardware.
          
   URI    [1]: https://github.com/dbrll/ll-34
       
            rahen wrote 6 hours 43 min ago:
            The WASM GUI is probably the easiest way to see the Transformer in
            action on this machine: [1] There's also the original Tetris from
            1984 to play.
            
   URI      [1]: https://dbrll.github.io/ll-34/
       
          rahen wrote 11 hours 2 min ago:
          That thing is a Tamagochi though, it constantly needs attention,
          pardon the pun. I did most of the development and tuning on ll-34 for
          that reason.
       
            budman1 wrote 10 hours 7 min ago:
            I am a bit surprised, but I guess everything eventually wears out.
            
            In the 1980's I worked as a field engineer that supported a lot of
            pdp-11's.  They were very reliable for the time; tape drives and
            disks were the #1 maintenance items.  To actually have to open up
            the processor and change a board was not a regular activity.
            
            Other machines of that era, like those from Gould or Perkin/Elmer
            or DG gave regular practice in the art of repairing processors.
            
            Guess I expect them to work forever.  Like a Toyota.
       
              rahen wrote 6 hours 37 min ago:
              I encouter two main failure modes. First, the bipolar PROMs
              degrade at the atomic level, the metal ions in the fuses tend to
              migrate or 'regrow' over decades, causing bit rot.
              Second, the backplanes suffer from mechanical fatigue. After
              forty years of thermal expansion and structural flexing,
              especially when inserting boards, the traces and solder joints
              develop stress cracks. Both are a pain to repair.
              
   URI        [1]: https://retrocmp.com/articles/trying-to-fix-a-dec-pdp-11...
       
                budman1 wrote 4 hours 45 min ago:
                Excellent work.
                
                The feeling of accomplishment when the machine boots after a
                major repair (almost) makes it all worth while.
                
                (i think i would have found a used backplane...fixing it was
                crazy clever)
       
        arglebarnacle wrote 11 hours 29 min ago:
        Fascinating. We hear that the leaps in AI have been made possible by
        orders of magnitude increases in compute and data availability, and of
        course that’s substantially true—but exactly how true? It’s a
        nice exercise in perspective to see how much or how little modern
        machine learning methods would have been capable of if you brought them
        by time machine to the 70’s and optimized them for that environment.
       
       
   DIR <- back to front page