_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
URI Visit Hacker News on the Web
COMMENT PAGE FOR:
URI AMD's Ryzen 9 9950X3D2 Dual Edition crams 208MB of cache into a single chip
abcde666777 wrote 2 hours 14 min ago:
A year ago I swapped out a 5800x for a 5800x3d to get more stable frame
rates in Counterstrike 2. Made a sizable difference, especially to 1%
lows, so these large caches can clearly be a big boon. Granted it's
also obvious the game is poorly optimized, the gains look less
significant for most other titles.
kristianp wrote 3 hours 23 min ago:
Nobody adds L1+L2+L3 like that, because L1 stores a subset of L2 and L2
stores a subset of L3. Just say 192MB of L3.
Retr0id wrote 2 hours 44 min ago:
It depends on the implementation, it is possible for a cache line to
be in L1 but not L2, etc.
MaximilianEmel wrote 5 hours 24 min ago:
They should allow it to function without any external RAM.
SubiculumCode wrote 6 hours 28 min ago:
Oh man. I am running computations on my server that involve computing
geodesic distances with the heat method. The job turns out to be a L3
cache thrasher, leaving my cpus underutilized for multi worker jobs
.... 208mb instead of my 25 per socket sounds amazing
AnthonyMouse wrote 5 hours 51 min ago:
They sell essentially the same chips with more CCDs as Epyc instead
of Ryzen. 9684X has more than 1GB of L3 per socket (but it's not
cheap).
electronsoup wrote 7 hours 15 min ago:
Whenever I see a chip like this, I think "why wont my company let me
use a decent computer"
varispeed wrote 7 hours 21 min ago:
I know the prices of RAM are high, but 256GB RAM limit seems like
omission. If they supported at least 512GB in quad or eight channel
that would be something worth looking at for me. I know there is
Threadripper but ECC memory is out of reach.
Jotalea wrote 7 hours 49 min ago:
so you're telling me I can (theoretically) have a full Alpine Linux
installation in just the CPU? I'm impressed
rietta wrote 8 hours 12 min ago:
I am so grateful that I bought my 128 GB ram kit in January of last
year for my own 9950 upgrade. We just built my dad a 7000 series to
replace his old AM4 (2017 build) and 32 gigs DDR five was nearly the
same price at Micro Center that I paid last year. I was able to gift
him an Nvidia 1060 discreet graphics card so that he could continue to
run his two monitors. The newer motherboards have much less on board
capability for that.
hu3 wrote 8 hours 9 min ago:
1060 is a sweet card for multi monitor. good on you for gifting him.
rietta wrote 8 hours 6 min ago:
I upgraded to a 4070 super last year. I ran both cards at the same
time for a little bit, but it got really frustrating to keep the
wrong card from being assigned to a particular task with llama. I
really shouldâve taken an R&D tax credit on my AI research but
Iâm still able to expense it for the business.
senfiaj wrote 11 hours 57 min ago:
Back in 2004 my PC RAM was 256. My relative's laptop had 128. That's
crazy when a modern CPU cache can theoretically host an OS (or even
multiple OSes) from early 2000s.
addaon wrote 6 hours 42 min ago:
The Power4 MCM had 128 MB cache in 2001. The G4 TiBook sold the same
year came with 128 MB of system RAM base, and OS X supported 64 MB
configurations for a few years after this.
egeozcan wrote 10 hours 30 min ago:
The RAM prices are so high and the storage is also getting more
expensive every day, so we're forced to fit everything inside the CPU
cache as a solution! /s
sqquima wrote 9 hours 1 min ago:
It would be interesting if it allowed to use the cache as ram and
could boot without any sticks on the motherboard.
addaon wrote 6 hours 40 min ago:
Several processors support this by effectively locking cache
lines. At the low end, it allows a handful of fast interrupt
routines without dedicated TCM. At the high end, it allows boot
ROMs to negotiate DRAM links in software, avoiding both the catch
22 and complex hardware negotiation.
0-_-0 wrote 7 hours 19 min ago:
Instead of a cache you could put down an SRAM buffer, it would be
more efficient than a cache and just as fast. And addressable.
Interesting idea.
swarnie wrote 12 hours 41 min ago:
Factorio mega basing just found a new ceiling.
Lightkey wrote 7 hours 34 min ago:
I'm curious to see if that is true. The maximum amount of cache
addressable per core didn't increase after all.
pwr22 wrote 13 hours 55 min ago:
I'm interested to know if the L3 cache all behaves as a single pool for
any core on either CCD, whether there's a penalty in access time
depending on locality or whether they are just entirely localised.
trynumber9 wrote 3 hours 17 min ago:
It does not. For any of the dual CCD parts AMD has ever released for
consumers. Even Strix Halo which has higher bandwidth, lower latency
interconnect doesn't make a single L3 across CCDs.
It'll probably only happen when they have a singular, large die
filled with cache upon which both CCDs are stacked.
Run this test if you're curious:
URI [1]: https://github.com/ChipsandCheese/MemoryLatencyTest
phire wrote 12 hours 17 min ago:
The short answer is that L3 is local to each CCD.
And that answer is good enough for most workloads. You should stop
reading now.
_______________________
The complex answer is that there is some ability one CCD to pull
cachelines from the other CCD. But I've never been able to find a
solid answer for the limitations on this. I know it can pull a dirty
cache line from the L1/L2 of another CCDs (this is the core-to-core
latency test you often see in benchmarks, and there is an obvious
cross-die latency hit).
But I'm not sure it can pull a clean cacheline from another CCD at
all, or if those just get redirected to main memory (as the latency
to main memory isn't that much higher than between CCDs). And even if
it can pull a clean cacheline, I'm not sure it can pull them from
another CCD's L3 (which is an eviction cache, so only holds clean
cachelines).
The only way for a cacheline to get into a CCD's L3 is to be evicted
from an L2 on that core, so if a dataset is active across both CCDs,
it will end up duplicated across both L3s. Cachelines evicted from
one L3 do NOT end up in another L3, so an idle CCD can't act as a
pseudo L4.
I haven't seen anyone make a benchmark which would show the effect,
if it exists.
undersuit wrote 12 hours 40 min ago:
AMD didn't have to introduce a special driver for the Ryzen 9 5950x
to keep threads resident to the "gaming" CCD. There was only a small
difference between the 5950x and the non-X3d Ryzen 7 5800x in
workloads that didn't use more than 8 cores unlike the observed
slowdowns in the Ryzen 9s 7950X3D and 7900X3D when they were released
compared to the Ryzen 7 7800X3D .
When the L3 sizes are different across CCDs the special AMD driver is
needed to keep threads pinned to the larger L3 CCD and prevent them
from being placed on the small L3 CCD where their memory requests can
exploit the other CCD's L3 as an L4. The AMD driver reduces CCD to
CCD data requests by keeping programs contained in one CCD.
With equal L3 caches when a process spills onto the second CCD it
will still use the first's L3 cache as "L4" but it no longer has to
evict that data at the same rate as the lopsided models. Additionally
the first CCD can use the second CCD's L3 in kind reducing the number
of requests that need to go to main memory.
The same sized L3s reduce contention to the IO die and the larger
sized L3s reduce memory contention, it's a win-win.
URI [1]: https://www.phoronix.com/review/amd-3d-vcache-optimizer-9950...
sylware wrote 15 hours 13 min ago:
With the best silicon tech, in R&D, what would be the maxium static
RAM(L1 cache) you could really slap to a 8 core CPU? (Zero DRAM).
2001zhaozhao wrote 16 hours 39 min ago:
I don't really see a huge reason to buy this other than it being a
top-tier halo product.
For gaming, AMD already pins the game threads to the CCD with the extra
cache pretty well.
For multi-threaded workloads the gain from having cache on both CCDs is
quite small.
pixl97 wrote 8 hours 17 min ago:
It really comes down to how much more this CPU is over the next one
down if you're building a new rid for a long period of time. I'm
running on a 5950X which is coming up on it's 6 years in November. I
could have spend a little less on the next model down, but I expect
this rig will last me for a few more years (especially with how much
memory is). The per year extra expense for that CPU was almost
nothing over its lifetime.
Now, would I upgrade an existing computer with a slightly slower
processor with it, probably not.
adrian_b wrote 15 hours 6 min ago:
The gain is very workload dependent, so there are no
generally-applicable rules.
There are many applications which need synchronization between
threads, so the speed of the slowest thread has a disproportionate
influence on the performance.
In such applications, on X3D2 the slowest thread has a 3 times bigger
cache on an X3D2 vs. X3D. That can make a lot of difference.
So there will be applications with no difference in performance, but
also applications with a very large difference in performance, equal
to the best performance differences shown by X3D vs. plain 9950X.
DeathArrow wrote 17 hours 12 min ago:
My first computer had 64KB of RAM. My first PC had 8MB of RAM.
jaimex2 wrote 18 hours 13 min ago:
Can someone like... boot Windows 98 on these on a system with no ram?!
brandnewideas wrote 14 hours 1 min ago:
Theoretically anything is possible with enough thought and work.
bell-cot wrote 16 hours 47 min ago:
Conceptually - yes, easily.
But to do it literally - I'm not a low-level motherboard EE, but I'd
bet you're looking at 5 to 7 figures (US $) of engineering work, to
get around all the ways in which that would violate assumptions baked
into the designs of the CPU, support chips, firmwares, etc.
anticensor wrote 5 hours 59 min ago:
The CPU literally initialises itself without DDR then initialises
the DDR PHY, there must be a way of keeping the CPU in that "cache
as RAM" mode.
ggm wrote 16 hours 5 min ago:
Make a fake ram which offers write through guarantee and returns
bus no matter what address is referenced. You could possibly short
circuit any "is ram there" test if it just says yes for whatever
size and stride got configured.
tw1984 wrote 19 hours 15 min ago:
that is larger than the HDD of my first PC.
throwaway85825 wrote 19 hours 58 min ago:
It's disappointing that they had this for years but didn't release it
until now.
stingraycharles wrote 19 hours 55 min ago:
I think itâs mostly that they had leftover cache.
neRok wrote 15 hours 12 min ago:
This video made the argument that AMD released it to not give Intel
a look-in: [AMD KILLED Intel's 290K Dreams w/ R9 9950X3D2]( [1] )
URI [1]: https://www.youtube.com/watch?v=u7SyrDPbKls
stingraycharles wrote 11 hours 46 min ago:
I like this theory more, perhaps itâs both.
magicalhippo wrote 18 hours 26 min ago:
Makes sense. RAM pricing surely has lead to a fall of AM5 high-end
CPU purchases, might as well try to get some extra cash from those
who still buy. Bin the remaining now non-X3D chips as something
else.
Ekaros wrote 7 hours 18 min ago:
Bad time to move entirely new platform. Perfect time to sell to
upgrade junkies just CPU.
erulabs wrote 20 hours 11 min ago:
9950X3D2? AMD, who is making you name your products like this? At some
point just give up and name the chip a UUID already.
hu3 wrote 18 hours 25 min ago:
can't agree. this name has logical meaning
sidkshatriya wrote 19 hours 30 min ago:
Like your UUID joke but agree with sibling comment that 9950X3D2 is
actually a good name.
jofzar wrote 19 hours 45 min ago:
I actually don't mind this one, 9950 is the actual chip, x3d is the
cache (where it's larger) and the 2 stands for it being on both
chiplets.
monster_truck wrote 20 hours 12 min ago:
The extra cache doesn't do a damn thing (maybe +2%)
The lower leakage currents at lower voltages allowed them to implement
a far more aggressive clock curve from the factory. That's where the
higher allcore clock comes from (+30W TDP)
I'm not complaining at all, I think this is an excellent way to
leverage binning to sell leftover cache.
Though if I may complain, Ars used to actually write about such things
in their articles instead of speculate in a way that suspiciously
resembles what an AI would write.
Aurornis wrote 19 hours 40 min ago:
> The extra cache doesn't do a damn thing (maybe +2%)
It depends on the task. For some memory-bound tasks the extra cache
is very helpful. For CFD and other simulation workloads the benefits
are huge.
For other tasks it doesn't help at all.
If someone wants a simple gaming CPU or general purpose CPU they
don't need to spend the money for this. They don't need the 16-core
CPU at all. The 9850X3D is a better buy for most users who aren't
frequently doing a lot of highly parallel work
addaon wrote 6 hours 36 min ago:
CFD benefits from cache, but it benefits even more from sustained
memory bandwidth, no? A small(ish) chunk of L3 + two channels of
DRAM is not going to compete with a quarter as much L3 plus eight
channels of DRAM when typical working set sizes (in my experience)
are in the tens of gigabytes, is it?
zahlman wrote 10 hours 41 min ago:
Sorry, what is "CFD" in this context?
detaro wrote 10 hours 38 min ago:
URI [1]: https://en.wikipedia.org/wiki/Computational_fluid_dynami...
monster_truck wrote 14 hours 41 min ago:
It really doesn't. In virtually every case the work is being
completed faster than the cache can grow to that size. What little
gains are being realized are from not having to wait for cores with
access to the cache to become available.
Aurornis wrote 11 hours 2 min ago:
> It really doesn't. In virtually every case the work is being
completed faster than the cache can grow to that size.
If your tasks donât benefit then donât buy it.
But stop claiming that it doesnât help anywhere because
thatâs simply wrong. I do some FEA work occasionally and the
extra cache is a HUGE help.
There are also a lot of non-LLM AI workloads that have models in
the size range than fit into this cache.
Numerlor wrote 12 hours 15 min ago:
There are some very specific workloads (say simple object
detection) that fit into cache and have crazy performance where
the value of the cpu will be unbeatable, as the alternative is
one of the cache epycs, everywhere else it'll only be small
improvement if the software is not purpose made for it
YoumuChan wrote 16 hours 30 min ago:
But consumer product does not support SDCI (only Epyc Turin
supports it), so it does not benefit too much if an accelerator is
involved.
monster_truck wrote 14 hours 38 min ago:
It's also useful to point out that the use cases and workloads
where SDCI are most beneficial are far, far beyond the scope of
what anyone will have installed in a Zen rig. Dual 100G
networking cards? The cost of both of those damn near buys all of
a 9950X3D2 setup.
justincormack wrote 12 hours 41 min ago:
no, dual 100Gb are not that expensive any more, eg [1] UK
retail for gbp349.
URI [1]: https://www.scan.co.uk/products/2-port-intel-e810-cqda...
EnPissant wrote 19 hours 40 min ago:
It's very workload dependent. It certainly does more than 2% on many
workloads.
See [1] > Here is the side-by-side of the Ryzen 9 9950X vs. 9950X3D
for showing the areas where 3D V-Cache really is helpful:
Coincidentally, it looks they filtered to all benchmarks with
differences greater than 2%. The biggest speedup is 58.1%, and that's
just 3d vcache on half the chip.
URI [1]: https://www.phoronix.com/review/amd-ryzen-9-9950x3d-linux/10
spockz wrote 17 hours 34 min ago:
I think GP was saying that the additional 3D cache on this chip
compared to the standard x3d isnât going to do much.
Iâm curious to see whether the same benchmarks benefit again so
greatly.
adrian_b wrote 15 hours 16 min ago:
On AMD the L3 cache is partitioned between the 2 chiplets.
So for 9950X3D half of the cores use a small L3 cache.
For applications that use all 16 cores, the cases where X3D2
provides a great benefit will be much more frequent than for a
hypothetical CPU where the same cache increase would have been
applied to a unified L3 cache.
The threads that happen to be scheduled on the 2nd chiplet will
have a 3 times bigger L3 cache, which can enhance their
performance a lot and many applications may have synchronization
points where they wait for the slowest thread to finish a task,
so the speed of the slowest thread may have a lot of influence on
the performance.
bell-cot wrote 16 hours 53 min ago:
> I think GP was saying...
Agree. The article's 2nd para notes "AMD relies on its driver
software to make sure that software that benefits from the extra
cache is run on the V-Cache-enabled CPU cores, which usually
works well but is occasionally error-prone." - in regard to the
older, mixed-cache-size chips.
> I'm curious to see...
Yeah - though I don't expect current-day Ars Technica will bother
digging that deep. It could take some very specialized
benchmarks to show such large gains.
spockz wrote 13 hours 33 min ago:
I hoping that phoronix will be able to redo the benchmark of
the 9950x3D with this new X3D2 variant.
I might even shell out for an upgrade to AM5 and DDR5. On the
other hand, my 5900X is still blazing fast.
monster_truck wrote 14 hours 35 min ago:
Some of their writers, who are quite excellent, still do.
Others just seem to regurgitate press releases with very little
useful investigation.
How critical of the lazy writers I am may seem outsized, but I
grew up reading and learning from the much better version of
Ars -one I used to subscribe to.
fc417fc802 wrote 20 hours 32 min ago:
Given that the dies still have L3 on them does this count as L4 or does
the hardware treat it as a single pool of L3?
Would be neat to have an additional cache layer of ~1 GB of HBM on the
package but I guess there's no way that happens in the consumer space
any time soon.
trynumber9 wrote 20 hours 11 min ago:
Per compute die it functions as one 96M L3 with uniform latency. It
is 4 cycles more latency than the configuration with smaller 32M L3.
But there are two compute dies, each with their own L3. And like the
9950X coherency between these two L3 is maintained over global memory
interconnect to the third (IO) die.
magicalhippo wrote 20 hours 39 min ago:
Probably fun for those who already bought DDR5 memory... still kicking
myself for not just pulling the trigger on that 128GB dual stick kit I
looked at for $600 back in September. Now it's listed at $4k...
Meanwhile I hope my AM4 will chug along a few more years.
tarangsutariya wrote 14 hours 52 min ago:
Wonder how much sales amd and intel are losing because of tight DDR5
supply
aetimmes wrote 12 hours 33 min ago:
None. Every component is seeing huge demand.
magicalhippo wrote 14 hours 44 min ago:
I can't imagine it's looking good in the consumer space, but server
space seems to be lit[1]:
Su said that typically, the first quarter (Q1) is slower due to
seasonal patterns, but AMD has seen its data center business expand
from Q4 into Q1, demonstrating ongoing strength across both CPUs
and GPUs. This growth underscores the companyâs ability to
capitalize on rising demand for AI compute and enterprise
workloads, even during traditionally quieter periods.
âWe are going into a big inflection year here in 2026. The CPU
business is absolutely on fire.â
[1]
URI [1]: https://stocktwits.com/news-articles/markets/equity/amd-ce...
throawayonthe wrote 15 hours 51 min ago:
oh wow you weren't joking: [1] (cheapest at $1240 USD)
URI [1]: https://pcpartpicker.com/products/memory/#xcx=0&b=ddr5&Z=131...
MrDOS wrote 6 hours 29 min ago:
PCPartPicker are also publishing charts showing the astronomic rise
in DDR5 prices over time: [1] . Those charts don't cover any kits
with 64 GB sticks, but they're a good demonstration of the general
scale.
URI [1]: https://pcpartpicker.com/trends/price/memory/
tom_alexander wrote 16 hours 2 min ago:
> Probably fun for those who already bought DDR5 memory
Nah, those of us who already bought DDR5 memory also already bought
decent CPUs. Dropping another $1k for these incremental gains would
be silly. It'd make a lot more sense if DDR5 had been around longer
so that people had the option to make generational upgrades to this
CPU but DDR5 on AMD has only been around for Zen4 and Zen5.
DeathArrow wrote 17 hours 16 min ago:
>Meanwhile I hope my AM4 will chug along a few more years.
I am fine with my 2 year old 128GB DDR4 for now. I will just upgrade
the 14700K to 14900KS CPU and wait 2 more years.
Judging by the benchmarks newer CPUs aren't much better for
multithreading workloads than 14900KS anyway, so it doesn't make a
lot of sense to upgrade to newer CPUs, DDR5 and a new mobo.
jmyeet wrote 18 hours 46 min ago:
After randomly breaking the AM4 CPU and motherboard in my 4 year old
PC last year and seeing that at the time I'd spent almost a new PC to
get new parts and rebuild it. Less if I wanted to do a complete
rebuild myself but I'm over building PCs. I've done that for years.
It was an expensive mistake as I bought a few options to experiment
including a NUC and an M4 Mac Mini but eventually bought a 9800X3D
5070Ti PC for <$2 and for no reason in particular I bought a 64GB
DDR5-6000 kit for $200 in August or so. I checked recently and that
kit is pushing $1000. I also bought a 4080 laptop and bought a 64GB
kit and an extra SSD for it too last year.
That's pretty lucky given what's happened since. I don't claim any
kind of foresight about what would happen.
I do kind of want to take the parts I have and build another AM4 PC.
The 5900XT is not a bad option with 16 cores for ~$300 but my DDR4
RAM is almost useless because the best deals now are for combos of
CPU + motherboard + RAM at steep discounts.
You can get some good deals on prebuilts still. Not as good as 6+
months ago but still not bad. Costco has a 5080 PC for $2300. There's
no way I'm going overboard and building a 128GB+ PC right now.
I've seen multiple RAM spikes. We had one at the height of the crypto
hysteria IIRC but this is significantly worse and is also impacting
SSDs. I kinda wish I'd bought 1-2 4TB+ SSDs last year but oh well.
We're really waiting for the AI bubble to pop. Part of me think
sthat'll be in the next year but it could stay irrational
substantially longer than that.
sundvor wrote 15 hours 2 min ago:
The C30 64GB kits are nearly impossible to buy now, so, well done.
Got one in September '23 for ~$380 AUD, on the rare occasions it's
available today it's been over $1600 AUD.
I upgraded my UPS to a sine interactive unit to minimise the risk
of it dying to bad power while the market is so crazy...
snvzz wrote 18 hours 47 min ago:
I am glad I decisively ordered 96GB (2x48) DDR5 ECC back in June,
alongside the 9800x3d.
I hope this is still enough for the planned upgrade to Zen7 in 2028.
mroche wrote 15 hours 51 min ago:
I'm looking at building a new system, and was waiting to see what
happens with this chip and Intel's Arc Pro B70 card. I can't find
ECC UDIMMs of 64GB per-stick to make 128GB, but I can put together
two solo UDIMMs of 32GB or 48GB for $800 and $1000 per stick
respectively.
I really want to see what enabling the L3 cache options in the BIOS
do from a NUMA standpoint. I have some projects I want to work on
where being able to even just simulate NUMA subdivisions would be
highly useful.
snvzz wrote 10 hours 52 min ago:
I was surprised to find that ECC modules available were 24 or 48,
so 128GB with 2 sticks was impossible.
While I was aiming at 128, I settled for 96GB, because any more
than 2 sticks means a sharp drop in RAM clocks this generation.
disillusioned wrote 17 hours 19 min ago:
Same... got 2x48 DDR5 for $304 back in February of 2025. Equivalent
kits are going for $900-$1,100. Madness.
Panzer04 wrote 17 hours 54 min ago:
You're basically me. I was mulling 48 vs 96, decided 200$ wasn't
worth quibbling too much over and bought 96GB in August.
Feeling pretty chuffed now XD (though still sad because building a
new PC is dumb when RAM costs more than a 24 core monster CPU)
snvzz wrote 10 hours 50 min ago:
This is the good side.
The not so good side is that getting a RVA23 development board
this year with an usable size of RAM (for e.g. compiling and
linking large code bases) is not going to be cheap.
Aurornis wrote 19 hours 27 min ago:
> Now it's listed at $4k...
You can buy 128GB of DDR5-6000 with a 9950X3D (not this newest X2
version, but still a $699 CPU) and a motherboard and a case for $2800
right now: [1] If you don't need 128GB, there are quality 64GB kits
for under $700 on Newegg right now, which is cheaper than this CPU.
If someone needs to build something now and can wait to upgrade RAM
in a year or two, 32GB kits are in the $370 range.
I don't like this RAM price spike either, but in the context of
building a high-end system with a 16-core flagship CPU like this and
probably an expensive GPU, it's still reasonable to build a system.
If you must have 128GB of RAM it can be done with bundles like the
one I linked above but I'd recommend waiting at least 6 months if you
can. There are signs that prices are falling now that panic-buying
has started to trail off.
128GB of RAM should not cost $4K even in this market.
URI [1]: https://www.newegg.com/Product/ComboDealDetails?ItemList=Com...
nicman23 wrote 11 hours 9 min ago:
that bs of you don't need 128 are toxic. what if you want to
upgrade from ddr4 and you already have 128?
adrian_b wrote 15 hours 28 min ago:
$2800 is still a huge price in comparison with the last year.
Last summer, a 9950X3D + motherboard + cooler + 128 GB DRAM + VAT
sales taxes was the equivalent of $1400 in Europe, where I live.
That's half of your quoted price. That was without case and PSU,
but adding e.g. $200 for those would not change much.
Aurornis wrote 11 hours 11 min ago:
Yes of course. We all know prices are up.
I commented because someone thought that $4K was the going price
for 128GB of RAM, which is way too much even with the demand
crunch.
adrian_b wrote 8 hours 51 min ago:
Due to the high prices of DRAM and SSDs they now are the
greatest fractions of the total price of a computer.
In January I was forced to upgrade an ancient Intel NUC, by
replacing it with an Arrow Lake H based ASUS NUC. The complete
system with 32 GB DRAM and 3 TB SSDs has cost EUR 1200,
including VAT sales tax.
The distribution of the price was like this:
Barebone mini-PC: 41%
32 GB DDR5 SODIMMs: 26%
2 TB PCIe 5.0 SSD: 24%
1 TB PCIe 4.0 SSD: 9%
Since then, the prices of DDR5 and SSDs have continued to
increase, so now the fraction spent for memory would be even
higher than 59%.
Before 2026, for so small amounts of memory its cost would have
been much less than the rest of the system.
alias_neo wrote 14 hours 33 min ago:
In January I upgraded my desktop, 9950X3D £600, 64GB DDR5-6000
£600, MSI MAG Tomahawk X870E £300, Samsung 990 Pro 4TB £350,
Asus Prime 9070XT £580. I spent a another £250 on PSU and
cooler and reused my case (Phanteks Evolv Enthoo TG, beautiful
case but horrible cooling. Will cut some holes in it and if it
doesnt work out look for something with more airflow).
The RAM price was already inflated at that time, and the same kit
is now £800, but in October or earlier last year I'd have saved
possibly the cost of the CPU/GPU on the whole thing, but now it's
be about the cost of a CPU/GPU more expensive.
On a side note for anyone not aware, 9950X3D isn't the best
choice for pure gaming, 9850X3D is cheaper and marginally better,
also I went with 2 sticks of RAM kit, 4 sticks is much harder to
run at the advertised speed (6000) which is actually an
overclock.
Im a dev and a linux user/gamer hence my choice of CPU/GPU.
sqquima wrote 9 hours 7 min ago:
Very similar config, but I bought a second pair of ram. Running
4 sticks at 3600.
Also, the LAN port of the motherboard stopped working after a
week, so I had to buy an Ethernet card
alias_neo wrote 4 hours 18 min ago:
Ouch, were you not willing to RMA for that ethernet port? I
wouldn't be too pleased after only a week if parts of the
board stopped working.
I don't really want to run my RAM that slow which is why I'll
probably stick with two sticks.
sspiff wrote 17 hours 26 min ago:
I bought 192GB (4x 48GB) of DDR5-6400 for 299 euro in September but
returned it because I couldn't get 4 DIMMS to run at decent speeds
in the system.
6 or so weeks after I returned it the kit was listed at 1499.
2001zhaozhao wrote 16 hours 42 min ago:
Yeah the only way to run 4 sticks of DDR5 decently is with Intel.
It's a bit of a shame that you can't cram enough RAM to run big
models.
The most I could get running on 10GB VRAM + 96GB RAM was a REAP'd
+ quantized version of MiniMax-M2.5
mort96 wrote 10 hours 37 min ago:
Got it running with 4800MT/s and literally 30 minute boot times
in an AM5 machine. The 30 minute boot time could be worked
around by enabling the (off-by-default) memory context restore
option in BIOS, but it really made me think something was
broken and it wasn't until I found other people talking about
30 minute boot times that I stopped debugging and just let it
sit for an eternity.
It's so bad. I don't get why they sell AM5 motherboards with 4
RAM slots.
At least that system has been running well for like two years.
But had I known that the situation is so much more dire than
with DDR4, I would've just gotten the same amount of RAM in two
sticks rather than four.
noir_lord wrote 6 hours 49 min ago:
You need to enable MCR (which trains the memory once and
caches the result for (iirc) 30 days) otherwise yeah, booting
is horribly slow, even the 64GB I have can take several
minutes but with MCR it boots basically instantly.
Some motherboards have it off by default.
mort96 wrote 3 hours 39 min ago:
From my comment:
> The 30 minute boot time could be worked around by
enabling the (off-by-default) memory context restore option
in BIOS
kenhwang wrote 6 hours 27 min ago:
Memory training seems to be getting faster with each bios
update. In 2024 when I upgraded to AM5, 64GB memory
training took like 15 minutes. Now the same setup takes
about a minute when it needs to retrain, then near instant
with MCR (Windows 11 takes significantly longer to load
than the POST process).
WD-42 wrote 8 hours 33 min ago:
Iâm in the same situation! My machine will take 2-5 minute
to post every few reboots, it seems random. The messed up
part is the marketing material says this things can handle
256gb of ram or whatever absurd number, f me for thinking
then 128gb should be no problem. Honestly this whole thing
has soured me on AMD. Yea they have bigger numbers than intel
but at what cost, stability?
noir_lord wrote 6 hours 48 min ago:
Check you have MCR (Memory Context Restore) enabled,
otherwise you train the RAM way more often than you need to
(every boot).
secondcoming wrote 8 hours 35 min ago:
Your machine takes 30 minutes to boot because of the RAM? Or
it takes 30 minutes to load a model?
WD-42 wrote 8 hours 24 min ago:
It's the RAM. It needs to "trained" which takes some time
but for for some reason these boards seem to randomly
forget their training, requiring it to happen again.
magicalhippo wrote 4 hours 22 min ago:
I've never had memory training be forgotten with my AM4
nor LPDDR5-based laptops and NUCs. Is this a new thing
with AM5 or something? Or just a certain brand of BIOSes?
jazzyjackson wrote 5 hours 6 min ago:
huh, its been a decade since i built a PC, whats changed?
roboror wrote 3 hours 9 min ago:
It's an AMD thing
mort96 wrote 3 hours 33 min ago:
DDR5 is much, much more fickle than DDR4 and earlier
standards. I think it's primarily due to pushing clock
speeds (6000 MT/s would be insanely fast for DDR4, but
kinda slow for DDR5).
Memory training has always been a thing: during boot,
your PC runs tests to work out what slight changes
between signals and stuff it needs to adapt to the
specific requirements of your particular hardware. With
DDR4 and earlier, that was really fast because the
timings were so relatively loose. With DDR5, it can be
really slow because the timings are so tight.
That's my best understanding of it at least.
WD-42 wrote 3 hours 54 min ago:
My guess is bigger numbers, higher voltages, tighter
timings.
WD-42 wrote 11 hours 12 min ago:
Iâm running 128gb on a 9550x now with 4x32gb sticks and
itâs terrible. Itâs unstsable, post time is about 2 minutes
(not exaggerating)and Iâm stuck at a lower speed.
Iâm considering just taking 2 of the sticks out and working
with 64gb and increasing my swap partition. The nvme drive is
fast at least.
This is my first time off intel and I have to say I donât
understand the hype.
hxorr wrote 3 hours 52 min ago:
What ddr5 speed are you running? 6000 is technically an over
clock, AMD only guarantees being able to run at something
like 4800 or 5200.
You may need to bump up voltages slightly for your CPU's IMC
(I needed to on my ryzen 8700F to run 6000 stable). Its CPU
sample dependant.
Also as other commenter pointed out, typically 4 sticks will
achieve lower stable clocks
magicalhippo wrote 10 hours 27 min ago:
> Itâs unstsable, post time is about 2 minutes (not
exaggerating)
The long POST times must mean it's retraining the memory each
time, which is not normal. Just in case you haven'ttried it
yet, I'd start by reseating them, I've had weird issues with
marginally seated RAM before.
Also you definitely have to go much slower with 4 sticks
compared to two, so lower speed as much as you can. If that
doesn't help, I'd verify them in pairs.
If they work in pairs but not in quad at the slowest speed,
something is surely wrong.
Once you get them working in quad, you can start bumping up
the speed, might need voltage boost as well.
HauntingPin wrote 12 hours 18 min ago:
I had the same issue with Intel. It's not guaranteed there
either.
jodleif wrote 16 hours 34 min ago:
Threadripper is a good alternative. No point having a lot of
dual channel ram for LLMs, too slow
magicalhippo wrote 19 hours 14 min ago:
No such bundle deals where I am. Absolute cheapest DDR5 128GB kit
around is 2 sticks of 5600 64GB for $2k.
Cheapest 64GB kit is $930.
The kit I was oh-so-close to buying was two 6400 64GB sticks.
Not gonna buy now, not that desperate. I have a spare AM4 board,
DDR4 memory and heck even CPU, I'll ride this one out. Likely skip
AM5 entirely if something doesn't drastically change.
Aurornis wrote 18 hours 57 min ago:
> Absolute cheapest DDR5 128GB kit around is 2 sticks of 5600
64GB for $2k.
That's not far from the bundle deal above, once you subtract the
$700 CPU.
If you really need 128GB the 5600 kit is fine. Having 208MB of
total cache on the CPU means the real world difference between a
5600 kit and a slightly faster kit is negligible in most use
cases.
If you don't need to upgrade then clearly don't force an upgrade
right now. I just wanted to comment that $4K for 128GB of RAM is
a very bad price right now, even with the current situation.
throwup238 wrote 12 hours 33 min ago:
> a slightly faster kit is negligible in most use cases
Does that âmost use casesâ caveat really apply to someone
buying 128G of RAM? If Iâm buying that much, it means Iâm
actually going to put it through its paces, unless itâs just
there for huge reserved guest VM overhead.
Aurornis wrote 11 hours 5 min ago:
The 208MB of total cache on the CPU weâre discussing does a
good job of reducing sensitivity to RAM speed differences on
this platform.
If youâre trying to run LLMs off of the CPU instead of the
GPU then the RAM speed dictates a lot. Itâs going to be
slow mo matter what, though. Dual channel DDR5 just isnât
enough to run large LLMs that start to fill 128GB of RAM and
the difference between 5600 and 6400 isnât going to make it
usable.
If youâre just running a lot of VMs or doing a lot of mixed
tasks that keep a lot of RAM occupied then youâd probably
have a hard time measuring a difference between 5600 and 6400
if you tried with one of these X3D CPUs with a lot of cache.
This is a frequent topic of discussion for gamers because
some people obsess over optimizing their RAM speed and
timings and pay large premiums for RAM with CAS latency of 28
instead of 36. Then they see benchmarks showing 1-2%
differences in games or even most productivity apps and
realize they would have been better spending that extra money
on the next faster GPU or CPU or other part.
magicalhippo wrote 18 hours 39 min ago:
> I just wanted to comment that $4K for 128GB of RAM is a very
bad price right now
Oh absolutely. Just mentioned it since I was very close to
buying it back then, and now it's completely bonkers.
That bundle deal is quite well priced all things considered,
it basically prices the memory where it was. Again, sadly no
great bundle deals here.
jofzar wrote 19 hours 46 min ago:
I really want a x3d because a game I play is heavily single threaded,
I have the income and the financial stability but I can't in any good
conscious upgrade to am5 with the ram prices. It's insane
tyjen wrote 11 hours 58 min ago:
I was waiting too, but the one game I play often that requires FPS
performance decided to ruin their game with poor development
direction. Now, I'm planning to buy for local llm hosting.
Here's hoping to more developments like TurboQuant to improve LLM
memory efficiency.
Panzer04 wrote 17 hours 56 min ago:
What game, if you don't mind my asking?
jofzar wrote 15 hours 54 min ago:
World of Warcraft
fakwandi_priv wrote 18 hours 46 min ago:
AMD had an upgrade path with the 5700x3d, assuming youâre on AM4.
Just reading now that they went out of production half a year ago
which is a shame. I was very impressed being able to upgrade with
the same motherboard 6 years down the line.
timschmidt wrote 18 hours 19 min ago:
I'm the mythical customer who went from a 1700X in a B350
motherboard near launch day to a 5800X3D in the same board (after
a dozen BIOS updates). Felt amazing. Like the old 486DX2 days.
ManuelKiessling wrote 1 hour 59 min ago:
Nearly same story here. AMD and MSI will forever hold a special
place in my heart.
slightlygrilled wrote 15 hours 44 min ago:
Same! Kept checking back for bios updates and even years later
they kept announcing more support! Truly crazy.
Other than the speed itâs a very good reason to go with amd,
the upgrade scope is massive, on am5 you can go from a 6 core
and soon all the way to a 24 core with the new zen6
magicalhippo wrote 19 hours 12 min ago:
Yep exactly the same situation.
I would not be surprised if we see casualties in adjacent markets,
such as motherboards, coolers and whatnot.
nexle wrote 20 hours 57 min ago:
Breakdown of the (semi-clickbait) 208MB cache: 16MB L2 (8MB per die?) +
32MB L3 * 2 dies + 64MB L3 Stacked 3D V-cache * 2
For comparison, 9950X3D have a total cache of 144MB.
teaearlgraycold wrote 19 hours 50 min ago:
I wouldnât be caught dead with less than 200MB of cache in my
desktop in 2026.
trynumber9 wrote 20 hours 48 min ago:
> 16MB L2 (8MB per die?)
It is indeed 8MB per compute die but really 1MB per core. Not shared
among the entire CCD.
renewiltord wrote 21 hours 16 min ago:
I have a gigabyte of cache on my 9684x at home!
chao- wrote 21 hours 27 min ago:
Crazy to think that my first personal computer's entire storage (was
160MB IIRC?) could fit into the L3 of a single consumer CPU!
It's probably not possible architecturally, but it would be amusing to
see an entire early 90's OS running entirely in the CPU's cache.
tumdum_ wrote 14 hours 17 min ago:
My first pc had 40MB hrs and 8MB ram :D
amelius wrote 15 hours 19 min ago:
640K ought to be enough for anybody.
defrost wrote 16 hours 1 min ago:
Commodore PET for me - 8 KB of RAM and all the data you could store
and read back from a TDK 120 cassette tape . . .
* [1] Same time as the Trash-80 and BBC micro were making inroads.
URI [1]: https://en.wikipedia.org/wiki/Commodore_PET
Zardoz84 wrote 16 hours 9 min ago:
My first computer whole RAM could fit in L1 of a single core (128k)
alfiedotwtf wrote 16 hours 48 min ago:
> it would be amusing to see an entire early 90's OS running entirely
in the CPU's cache.
Thereâs actually already two running (MINIX and UEFI), and itâs
the opposite OS amusing -
URI [1]: https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...
HerbManic wrote 19 hours 36 min ago:
My first PC had a 20MB HDD with 512Kb of RAM. So yeah that could fit
into cache 10 times now.
shric wrote 20 hours 0 min ago:
You had ~160,000 times more storage than I did for my first personal
computer.
compounding_it wrote 20 hours 18 min ago:
Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to
run multiple LLMs (a model entirely run for each task). Having robots
like in the movies would need LLMs much much faster than what we see
today.
nextaccountic wrote 6 hours 52 min ago:
doubtful that we will still have this computer architecture by then
m463 wrote 21 hours 2 min ago:
I wonder how much faster dos would boot, especially with floppy seek
times...
RulerOf wrote 19 hours 15 min ago:
You can get close with a VM, but there's overhead in device
emulation that slows things down.
Consider a VM where that kind of stuff has been removed, like the
firecracker hypervisor used for AWS Lambda. You're talking
milliseconds.
userbinator wrote 21 hours 0 min ago:
Instantly.
If you run a VM on a CPU like this, using a baremetal hypervisor,
you can get very close to "everything in cache".
basilikum wrote 21 hours 5 min ago:
KolibriOS would fit in there, even with the data in memory. You
cannot load it into the cache directly, but when the cache capacity
is larger than all the data you read there should be no cache
eviction and the OS and all data should end up in the cache more or
less entirely. In other words it should be really, really fast, which
KolibriOS already is to begin with.
RiverCrochet wrote 2 hours 18 min ago:
I thought there was an MSR buried deep somewhere that enables
"Cache as RAM" mode and basically maps the cache into the memory
address space or something like that.
Lol a quick Google search leads me to a Linked in post with all the
gory technical details?
URI [1]: https://www.linkedin.com/pulse/understanding-x86-cpu-cache...
hrmtst93837 wrote 14 hours 40 min ago:
That assumes KolibriOS or any major component is pinned to one core
and one cache slice instead of getting dragged between CCDs or
losing memory affinity. Throw actual users, IO, and interrupts at
it and you get traffic across chiplets, or at least across L3
groups, so the nice 'everything lives in cache' story falls apart
fast.
Nice demo, bad model. The funny part is that an entire OS can fit
in cache now, the hard part is making the rest of the system act
like that matters.
vlovich123 wrote 21 hours 2 min ago:
Unless you lay everything out continuously in memory, youâll
still get cache eviction due to associativty and depending on the
eviction strategy of the CPU. But certainly DOS or even early
Windows 95 could conceivably just run out of the cache
tadfisher wrote 19 hours 56 min ago:
Windows 95 only needed 4MB RAM and 50 MB disk, so that's
certainly doable. The trick is to have a hypervisor spread that
allocation across cache lines.
chao- wrote 20 hours 56 min ago:
Yeah, cache eviction is the reason I was assuming it is "probably
not possible architecturally", but I also figured there could be
features beyond my knowledge that might make it possible.
Edit: Also this 192MB of L3 is spread across two Zen CCDs, so
it's not as simple as "throw it all in L3" either, because any
given core would only have access to half of that.
basilikum wrote 20 hours 58 min ago:
Well, yeah, reality strikes again. All you need is an exploit in
the microcode to gain access to AMD's equivalent to the ME and
now you can just map the cache as memory directly. Maybe. Can
microcode do this or is there still hardware that cannot be
overcome by the black magic of CPU microcode?
pwg wrote 21 hours 8 min ago:
In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024
bytes) 5.25" floppy disks (although the floppies were a few months
after the computer). Eventually upgraded to 48K RAM and 180K double
density floppy disks. The computer: Atari 800.
MegaDeKay wrote 20 hours 54 min ago:
I'll see your Atari 800 and raise you my Atari 2600 with its
whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call
it a computer because you could buy a BASIC cartridge for it (I
didn't and stand by that decision - it was pretty bad).
acomjean wrote 13 hours 27 min ago:
I thought the timex Sinclair 1000 win 2 Kbytes of ram was bad.
The membrane keyboard wasnât great (the lack of a space bar was
a wierd choice) but it did work. We had programs on casette and
did get the 16Kbyte memory expansion. [1] I didnât realize the
Atari 2600 had basic, always thought of it as a game console.
URI [1]: https://en.wikipedia.org/wiki/Timex_Sinclair_1000
makapuf wrote 12 hours 16 min ago:
You can buy this bad boy [attiny11] with no ram, only
registers.
URI [1]: https://ww1.microchip.com/downloads/en/DeviceDoc/1006S...
cwzwarich wrote 21 hours 20 min ago:
URI [1]: https://github.com/coreboot/coreboot/blob/main/src/soc/intel...
wmf wrote 20 hours 43 min ago:
Context: Early in the firmware boot process the memory controller
isn't configured yet so the firmware uses the cache as RAM. In this
mode cache lines are never evicted since there's no memory to evict
them to.
coppsilgold wrote 17 hours 52 min ago:
There may be server workloads for which the L3 cache is
sufficient, would be interesting if it made sense to create
boards for just the CPU and no memory at scale.
I imagine for such a workload you can always solder a small
memory chip to avoid having to waste L3 on unused memory and a
non-standard booting process so probably not.
stingraycharles wrote 15 hours 26 min ago:
Most definitely, I work in finance and optimizing workloads to
fit entirely in cache (and not use any memory allocations after
initialization) is the de-facto standard of writing high perf /
low latency code.
Lots of optimizations happening to make a trading model as
small as possible.
lathiat wrote 19 hours 16 min ago:
I remember the talk about the Wii/WiiU hacking they intentionally
kept the early boot code in cache so that the memory couldnât
be sniffed or modified on the ram bus which was external to the
CPU and thus glitchable.
bombcar wrote 21 hours 25 min ago:
IIRC some relatively strange CPUs could run with unbacked cache.
twbarr wrote 21 hours 20 min ago:
Intel's platform, at the very least, use cache-as-ram during the
boot phase before the DDR interface can be trained and started up.
URI [1]: https://github.com/coreboot/coreboot/blob/main/src/soc/int...
Readerium wrote 21 hours 42 min ago:
Can someone explain if the 3D Vcache are stacked on top of each other
or side by side.
If they are stacked then why not 9800X3D2?
zdw wrote 21 hours 40 min ago:
The 99xx chips have two CPU dies, and one cache die is on each CPU
die.
modeswitch wrote 21 hours 8 min ago:
The 3D V-Cache sits underneath only one of the CCDs. See [1] .
URI [1]: https://en.wikipedia.org/wiki/Ryzen#Ryzen_9000
anonymars wrote 20 hours 27 min ago:
That's what's different about this one. "Enter the Ryzen 9
9950X3D2 Dual Edition, a mouthful of a chip that includes 64MB of
3D V-Cache on both processor dies, without the hybrid arrangement
that has defined the other chips up until now."
Tostino wrote 20 hours 50 min ago:
Did you forget which thread we are on?
DIR <- back to front page