Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL Controller

Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL Controller
Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL Controller

Donghyun Gouk, Seungkwan Kang, Hanyeoreum Bae, Eojin Ryu, Sangwon Lee, Dongpyung Kim, Junhyeok Jang, Myoungsoo Jung

The ACM Workshop on Hot Topics in Storage and File Systems (HotStorage)

2024

Research Areas
Architecture
Coherent Interconnect

Abstract

This work introduces a GPU storage expansion solution utilizing CXL, featuring a novel GPU system design with multiple CXL root ports for integrating diverse storage media (DRAMs and/or SSDs). We developed and siliconized a custom CXL controller integrated at the hardware RTL level, achieving two-digit nanosecond roundtrip latency, the first in the field. This study also includes speculative read and deterministic store mechanisms to efficiently manage read and write operations to hide the endpoint's backend media latency variation. Performance evaluations reveal our approach significantly outperforms existing methods, marking a substantial advancement in GPU storage technology.


Related Publications
Featured
CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD PerformanceIEEE Micro2025
Coherent Interconnect
Machine Learning
+1 more
Featured
Containerized In-Storage Processing and Computing-Enabled SSD DisaggregationIEEE Micro2025
Operating Systems
Architecture