_______ __ _______ | | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----. | || _ || __|| < | -__|| _| | || -__|| | | ||__ --| |___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____| on Gopher (inofficial) URI Visit Hacker News on the Web COMMENT PAGE FOR: URI LLaMA 3 70B Llamafiles aappleby wrote 11 hours 28 min ago: What's the cheapest hardware setup that can run a 70B model at tolerably interactive rates? (say 10 characters a second) jart wrote 10 hours 12 min ago: Any Macbook with 32GB should be able to run Meta-Llama-3-70B-Instruct.Q2_K.llamafile which I uploaded a few minutes ago. It's smart enough to solve math riddles, but at this level of quantization you should expect hallucinations. If you want to run Q4_0 you'll probably be able to squeeze it on a $3,999.00 Macbook Pro M3 Max w/ 48GB of RAM. If you want to run Q5_K_M or or Q8_0 the best choice is probably Mac Studio. I have an Apple M2 Ultra w/ 24âcore CPU, 60âcore GPU, 128GB RAM. It cost me $8000 with the monitor. If I run Meta-Llama-3-70B-Instruct.Q4_0.llamafile then I get 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU. You could alternatively go on vast.ai and rent a system with 4x RTX 4090's for a few bucks an hour. That'll run 70b. Or you could build your own, but the graphics cards alone will cost $10k+. AMD Threadripper Pro 7995WX ($10k) does a good job too. I get 5.9 tok/sec eval with Q4_0 and 49 tok/sec prompt. If I use F16 weights then prompt eval goes 65 tok/sec. frozenport wrote 10 hours 23 min ago: M2 macs can do it: [1] in practice 10 tokens per second is kinda annoyingly slow most local people would opt for a smaller 7b model URI [1]: https://twitter.com/junrushao/status/1681828325923389440 zarzavat wrote 7 hours 53 min ago: Have been playing around with Llama3 7b today, itâs not very good. Iâm sure that Facebook put everything they could into making it good, but 7B is apparently just not enough parameters. mistrial9 wrote 6 min ago: llava-v1.5-7b-q4.llamafile yes agree that the impression is poor overall pennomi wrote 2 hours 37 min ago: I assume you mean 8B? There is no Llama 3 7B. sieszpak wrote 1 hour 12 min ago: Llama 3 8B seems sad to answer... this is the first model in a long time that has had trouble telling me how much is 3! - (factorial) DIR <- back to front page