TurboQuant Shockwave: Google Algorithm Triggers 5-6% Memory Chip Selloff

Image: Research
Main Takeaway
Memory chip stocks tumbled after Google revealed TurboQuant, a compression technique that slashes AI memory needs by 6×. Samsung, Micron and SK Hynix.
Summary
Why memory stocks cratered
Samsung fell 5%, Micron 3%, SK Hynix 4% and SanDisk 6% in two days after Google published TurboQuant on Tuesday. Investors dumped the names because the algorithm compresses the key-value cache in transformer models to just 3 bits, cutting DRAM demand by up to 6× for the same context window. The knee-jerk reflected fears that every H100-class GPU would need fewer or smaller memory modules right as hyperscalers are still scrambling for supply.
What TurboQuant actually does
Google researchers stripped the KV cache—the working memory that stores attention states—to one-twentieth of its usual footprint without measurable accuracy loss. On an Nvidia H100 the method yields an 8× speed-up on long-context inference. The trick is training-free quantization plus a learned lookup table that rehydrates compressed vectors on the fly. Crucially, it only works on the cache, not the model weights, so the 100 GB-plus parameter blob still needs to live somewhere.
Analysts call the panic overdone
Wall Street views the rout as premature. Bloomberg Intelligence notes that TurboQuant is still a lab demo; no cloud provider has announced adoption timelines. Memory demand is driven by training clusters where weights dominate capacity, not by inference caches. Analysts point out that cheaper inference usually expands usage, so the net effect on DRAM could be neutral or even positive. The selloff may simply be an excuse to lock in gains after a 200% memory-chip rally since early 2024.
Ripple effects beyond the headline names
Taiwan’s Nanya and Winbond also slid, while logic-heavy Nvidia and AMD barely moved. The split shows investors are discriminating between what TurboQuant touches (cache DRAM) and what it does not (HBM and high-bandwidth logic). Contract manufacturers TSMC and GlobalFoundries could benefit if lower memory pressure frees up reticle space for more compute tiles.
What happens next
Google says TurboQuant is “ready for integration” into TPU and GPU runtimes. Cloud providers will test it first on cost-sensitive services like long-document summarization. If adoption spreads, expect new tiers of high-memory GPUs to be re-balanced, but not eliminated. Memory makers are already downplaying risk: Micron cites continued AI model growth as the primary driver. Investors will watch Google I/O in May for any TurboQuant rollout and hyperscaler capex plans due in July.
Key Points
Google released TurboQuant, a technique that compresses AI KV caches by 6× (16-32 bits → 3 bits) with zero accuracy loss.
Memory chip stocks fell 3-6% on fears of reduced DRAM demand, led by Samsung (-5%), Micron (-3%), SK Hynix (-4%), SanDisk (-6%).
The algorithm only trims inference cache, not model weights, so total memory demand may stay flat or rise as usage scales.
TurboQuant is presently a lab demo; no cloud provider has committed to roll it out.
Analysts view the selloff as profit-taking after a 200% memory rally and expect limited long-term impact.
FAQs
It compresses the key-value (KV) cache—the temporary attention matrix used during inference—down to 3 bits per entry, cutting working memory by up to 6×.
Only for the cache; the billions of model parameters (weights) still require the same high-bandwidth memory. Net DRAM demand may even rise if cheaper inference drives greater usage.
Google says it is ready for integration, but no cloud provider has announced adoption. Expect pilot tests on long-context services first, possibly revealed at Google I/O in May.
Samsung, SK Hynix, Micron and SanDisk—firms that supply DRAM and NAND used as KV-cache storage—fell hardest. High-bandwidth memory suppliers remain largely unaffected.
Analysts call it an overreaction. The technology is unproven at scale and could expand the AI market rather than shrink it, making the selloff a buying opportunity according to some.
Unlike pruning or distillation, TurboQuant is training-free and lossless. It relies on learned quantization tables that allow instant decompression during inference, avoiding the quality drops seen in earlier techniques.
Source Reliability
47% of sources are trusted · Avg reliability: 74
Go deeper with Organic Intel
Our AI for Your Business systems give you practical, step-by-step guides based on stories like this.
Explore ai for your business systems