_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
URI Visit Hacker News on the Web
COMMENT PAGE FOR:
URI Paper Tape Is All You Need â Training a Transformer on a 1976 Minicomputer
tcdent wrote 3 hours 39 min ago:
5.5 min to train on a PDP/11 you mean to tell me we could have been
doing this all along???
rahen wrote 3 hours 24 min ago:
Yes. The Cray supercomputers from the 80s were crazy good matmul
machines in particular. The quad-CPU Cray X-MP (1984) could sustain
800 MFLOPS to 1 GFLOPS, and with a 1 GB SSD, had enough computer
power and bandwidth to train a 7-10M-parameter language model in
about six months, and infer at 18-25 tok/sec.
A mid-90s Cray T3E could have handled GPT-2 124M, 24 years before
OpenAI.
I also had a punch-card computer from 1965 learn XOR with
backpropagation.
The hardware was never the bottleneck, the ideas were.
mkagenius wrote 4 hours 13 min ago:
I had the exact same idea but for AI agent harnesses.
I even created an app to explain it - [1] (deleted the app as got no
traction whatsoever)
Idea was that, the ai models like opus 4.6 and codex 5.4 have become so
good at trying new ways to attack a problem, that even just Bash() tool
is enough.
Continuing the idea, infact even File() operations are enough.
Again continuing the same line of thought, even just a Tape is enough.
Given enough time, codex and opus will achieve your target.
URI [1]: https://news.ycombinator.com/item?id=47381803
ashwinnair99 wrote 6 hours 52 min ago:
The fact that it is possible at all says more about how simple
transformers actually are underneath than it does about the hardware.
kmoser wrote 10 hours 7 min ago:
> I don't have an actual paper tape reader, so the object code is
directly deposited in memory through the console.
So, really, a Turing Machine is all you need?
thyrsus wrote 8 hours 9 min ago:
I dealt with physical paper tape on only three or four occasions in
the early 1980's, each time terrified of a jam or tear. It seems in
this case it's a read-once operation, which is plausible. Read-many,
not so much. Punch cards are orders of magnitude more reliable.
kristopolous wrote 10 hours 14 min ago:
I like how the author's "modern" machine to connect to it is still 20
years old.
With a concave trackpoint, respect.
BTW, I nag Framework at every conference I go to that people want this
shell and keyboard. It's been years. I think it's time to go through
the effort to figure out how to do the production run of the case
myself. Framework actually wants people to do things like this but you
know, manufacturing is hard. Anyone wanna help?
rahen wrote 11 hours 10 min ago:
Thanks for reposting! I'm the author of ATTN-11. Happy to answer any
questions about the fixed-point arithmetic, the PDP-11 hardware, or the
training process.
McGlockenshire wrote 7 hours 22 min ago:
Thank you for the inspiration, I now have a practical-impractical
assembly project for my TI TMS99105A homebrew! The 64k barrier is a
real pain.
rahen wrote 6 hours 33 min ago:
I also have a working design for a small Transformer on the
original Game Boy. It has around 4000 parameters fitting in the 8
KB cartridge SRAM, where the "saved game" is the trained model. A
TI-82 with its 32 KB of RAM would be even more comfortable.
dare944 wrote 9 hours 35 min ago:
Fun stuff! At one point I wondered about building something similar.
But I lack the AI chops, and have too many other projects going on
anyway.
I'm curious as to the type of memory in the 11/34. I also have a
working PDP-11, an 11/05 with 32KW of actual core. I wonder what
performance would be like with EIS emulation grafted in. Stunningly
slow, I imagine.
Thanks for publishing this.
functional_dev wrote 10 hours 25 min ago:
Incredible work! Fitting transformer into 32KB RAM is crazy
For those who read this project and do not know PDP-11 it could be
hard to understand that working with these memory limits is
difficult.
Here is visual guide for PDP11 architecture - [1] Thanks for this
amazing project!
URI [1]: https://vectree.io/c/pdp-11-hardware-architecture
PaulHoule wrote 9 hours 46 min ago:
That PDP-11 was the most fun minicomputer of the late 1970s in my
opinion. Growing up in NH about an hour north of Digital's HQ all
sorts of schools from primary to secondary as well as museums had
PDP-8, PDP-10, PDP-11 and later VAX machines.
The PDP-11 had a timesharing OS called RSTS/E which could give
maybe 10 people a BASIC programming experience a little bit better
than an Apple ][. If you were messing with 8-bit microcomputers in
1981 you might think a 16-bit future would look like the PDP-11 but
the 1970 design was long in the tooth by 1980 -- like 8-bit micros
it was limited to a 64kb logical address space. Virtual memory let
it offer 64k environments to more users, but not let a user have a
bigger environment.
AnimalMuppet wrote 11 hours 13 min ago:
Woah. Dude has a running PDP-11/34 in 2026? Personally, I find that
more impressive than the program.
adrian_b wrote 8 hours 33 min ago:
Not only that, but the author has also written a cycle-accurate
PDP-11/34 simulator for the benefit of those who do not have such
hardware.
URI [1]: https://github.com/dbrll/ll-34
rahen wrote 6 hours 43 min ago:
The WASM GUI is probably the easiest way to see the Transformer in
action on this machine: [1] There's also the original Tetris from
1984 to play.
URI [1]: https://dbrll.github.io/ll-34/
rahen wrote 11 hours 2 min ago:
That thing is a Tamagochi though, it constantly needs attention,
pardon the pun. I did most of the development and tuning on ll-34 for
that reason.
budman1 wrote 10 hours 7 min ago:
I am a bit surprised, but I guess everything eventually wears out.
In the 1980's I worked as a field engineer that supported a lot of
pdp-11's. They were very reliable for the time; tape drives and
disks were the #1 maintenance items. To actually have to open up
the processor and change a board was not a regular activity.
Other machines of that era, like those from Gould or Perkin/Elmer
or DG gave regular practice in the art of repairing processors.
Guess I expect them to work forever. Like a Toyota.
rahen wrote 6 hours 37 min ago:
I encouter two main failure modes. First, the bipolar PROMs
degrade at the atomic level, the metal ions in the fuses tend to
migrate or 'regrow' over decades, causing bit rot.
Second, the backplanes suffer from mechanical fatigue. After
forty years of thermal expansion and structural flexing,
especially when inserting boards, the traces and solder joints
develop stress cracks. Both are a pain to repair.
URI [1]: https://retrocmp.com/articles/trying-to-fix-a-dec-pdp-11...
budman1 wrote 4 hours 45 min ago:
Excellent work.
The feeling of accomplishment when the machine boots after a
major repair (almost) makes it all worth while.
(i think i would have found a used backplane...fixing it was
crazy clever)
arglebarnacle wrote 11 hours 29 min ago:
Fascinating. We hear that the leaps in AI have been made possible by
orders of magnitude increases in compute and data availability, and of
course thatâs substantially trueâbut exactly how true? Itâs a
nice exercise in perspective to see how much or how little modern
machine learning methods would have been capable of if you brought them
by time machine to the 70âs and optimized them for that environment.
DIR <- back to front page