Researchers Invent a Way to Speed Intel's 3D XPoint Computer Memory (ieee.org) 43
Memory modules using Intel's 3D XPoint technology are on their way, and researchers in North Carolina have already figured out how to make them better. New submitter mnemotronic writes: At the 45th ICSA (International Symposium on Computer Architecture), a group of researchers from North Carolina State University led by Prof. Yan Solihin proposed a method called lazy consistency to speedup write operations to 3d XPoint memory. XPoint, developed by Intel and Micron, is non-volatile, cheaper and denser than DRAM but requires more power and writing takes longer. The method proposed reduces write overhead times from 9% to 1% by incorporating a checksum to the cache memory system. The researchers were not able to verify their approach on actual XPoint memory, as those products only recently started sampling. They tested using simulations and DRAM and plan to verify when Intel's modules become more widely available.
Re: (Score:3)
Probably. IIRC ...
MRAM consumes less power than DRAM (vs. more). MRAM is _faster_ than DRAM (and is as fast as L2 cache).
It also has a very small bit cell size (so very high density).
So, it beats out 3D-XPoint (aka Optane) on almost every point.
Also, MRAM doesn't "wear out". From what I've read, 3D-XPoint is better than flash on this, but, eventually, has a wear out point.
MRAM could be better than Xpoint (Score:2)
as stated above MRAM is _faster_ than DRAM and therefore much better than XPoint/Optane
what is needed is a foundry that wants to dominate...
Re: (Score:2)
MRAM is at least as dense as DRAM on the same process tech, and potentially denser than DRAM at finer processes. Just no one is making MRAM in anything like the latest process. The chicken and egg scenario here. And probably a lot of patents slowing things down too.
MRAM doesn't inherently use more power. STT-MRAM, more than a decade ago, reduced the write-current requirements a lot.
Re: (Score:3)
Speculation is about resolving predicates. Consistency is about resolving dependencies.
ICSA? (Score:2)
It's ISCA.
No, it's ISAC (Score:2)
Re: (Score:2)
Just a value question (Score:2)
3D XPoint memory is in between NVRAM [anandtech.com] which is RAM backed by supercapacitators, running some kind of kind of machine/rack-level UPS to ensure RAM is saved to "regular" flash drives or just persisting against NVMe drives before declaring the transaction complete. So there are faster and more expensive options and slower and less expensive options and it also depends on how many components you want involved. But that's always a discussion, if a disgruntled data center worker takes a sledgehammer to your machin
Re: (Score:2)
Intel has numbers. But they're not sharing. They're still promising to ship the NVDIMM Xpoint modules "soon".
Micron has numbers. But they're not sharing. Oh, and they're doubling down on DRAM manufacturing. They're not exactly going full steam ahead with Xpoint. I wonder why?
Won't Work (Score:5, Interesting)
The delay is because XPoint doesn't work. The writes usually take, but sometimes they don't. Intel hasn't figured out why.
They current practice is to verify all the writes and simply redo them if they don't take.
This means you're tying up the the bus, and this is why Intel now recommends dedicating entire memory channels to XPoint instead of mixing and matching with DRAM. If you have XPoint in all of your channels, your latencies go through the roof and your performance tanks.
Wait for generation 3 before considering XPoint NVDIMMs.
Re: (Score:3)
Why would you mix and match DRAM and Xpoint on the same bus anyway when Xpoint is so much slower than DRAM? Even without extra verification and writes its still much slower than DRAM and would seem to clog the channel.
I'll admit that maybe I don't know something about existing DRAM access paths/channels/buses, AFAIK the NUMA node was basically the smallest subdivision. Or is it possible to address individual DRAM modules/pages in parallel with others on the same NUMA node?
I guess I had figured that DRAM i
Re: (Score:3)
Re: (Score:2)
No. Existing NVMe SSDs already use DRAM for the cache, which is faster and more reliable than Xpoint.
And you're bottlenecked by NVMe performance and latency.
Xpoint was supposed to replace both DRAM and storage. Now it's a very expensive replacement for storage that's better in one metric (MAYBE two) and a very poor replacement for DRAM that's worse in every metric except cost. And until Intel actually starts shipping these things for sale to end users, that cost benefit is up in the air.
I for one expect
Re: (Score:2)
You're missing the situations where you need to handle extremely large data sets.
It's not common, but neither is the need for a Xeon 8180 yet they certainly exist (well, for the price of a cheap car).
3d-xpoint RAM isn't going to outright replace DRAM any time soon, but it certainly can supplement it.
Re: (Score:3)
Note you can make numa domains smaller. It is quite plausible that 3d xpoint in 'memory' mode appears as a different NUMA domain with SLIT indicating higher penalty for use. NUMA is relatively expressive and can describe the high level divisions and implications of this sort of approach.
A challenge for Intel is justifying sitting in a rather inconvenient DIMM slot. It may be able to deliver better performance rather than PCIe,, but even PCIe attached NVME has had a very protracted adoption cycle as the r
Re: (Score:2)
I kind of figured XPoint DIMM would wind up in alternative form factors oriented around hyperconverged architectures, something (much) smaller than blade systems that would allow many (6+) nodes in a 2-4U space without losing storage density in the process. Blown XPoint DIMMs would just be replaced by pulling the whole blade and swapping out DIMMs. By having many nodes you don't worry about losing compute capacity or redundancy.
But I can never make hyperconverged work from a cost basis -- too many nodes r
Re: (Score:2)
The problem is that xpoint dimms even as advertised capacity wise will lose out to U.2 drives for storage per unit volume.
Never mind the probably-not-going-to-succeed-either ruler form factor which tries to get more SSD storage density by elongating the drives for optimizing SSD storage per unit volume without incurring crazy number of connectors.
Hyperconverged in practice loves 2U servers with lots of drives. A lot of people love trying to make the argument for storage rich blade-dense form factors, but i
Re: (Score:2)
Why would you mix and match DRAM and Xpoint on the same bus anyway when Xpoint is so much slower than DRAM?
Because Intel originally said you s hould do this and that it would be awesome and that the memory controller knew how to make it all work out so you get DRAM speed for DRAM stuff and multi-channel XPoint speed for NV stuff.
Nope.
Re: (Score:2)
Re: (Score:2)
https://semiaccurate.com/2018/... [semiaccurate.com]
Charlie has been following and detailing Xpoint and its failures for a while now. He's got a half dozen articles or so with more specifics, including official marketing BS from Intel and how its changed over time. I haven't seen Charlie sink his teeth this deep into a story since bumpgate. If you remember that one, he was basically the one guy that bothered to do the legwork to prove an large number of Nvidia's GPUs were defective. This resulted in lawsuits against Nvidi
No cited article (Score:2)
The summary includes NO link to any cited article, just links for defining the conference name, school, the professor, etc.
Re:No cited article (Score:4, Informative)
Re: (Score:2)
Also, to add insult to injury, TFS includes this nugget:
That makes 0 Ohms sense.
Clue for submitter & editor: the SI unit for time is the second, last time I looked.
Re: (Score:2)
Also, to add insult to injury, TFS includes this nugget:
That makes 0 Ohms sense.
Clue for submitter & editor: the SI unit for time is the second, last time I looked.
This makes perfect sense. Write overhead time as a function of the overall write operation is absolutely the correct measurement in this context.
It communicates three critical information points (the previous overhead, reduction, and new overhead) simply and directly without extraneous information like the write operation overall time which is tangential to the article. If I said they reduced write overhead from 9ms to 1ms you'd have no clue what that meant in relation to the overall write operation. I'd
Re: (Score:2)
>But will this technique leak information about the contents of other XPoint memory addresses?
It will open up interesting attack vectors, yes.
1) If the information capacity of the checksum is less than the data being checksummed, you will be able to find collisions and maybe use this to cause targeted data corruption.
2) Any 'lazy/speculative/delayed' execution has turned out to be a side channel vector in recent years.
3) If CRCs are used instead of cryptographic hashes, then targeted data modification ca