Featured Publication

Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI Infrastructure

Myoungsoo Jung

arXiv (Technical Report)

2025

Research Areas

Coherent Interconnect

Read Paper Google Scholar

Abstract

Modern AI workloads such as large language models (LLMs) and retrieval-augmented generation (RAG) impose severe demands on memory, communication bandwidth, and resource flexibility. Traditional GPU-centric architectures struggle to scale due to growing inter-GPU communication overheads. This report introduces key AI concepts and explains how Transformers revolutionized data representation in LLMs. We analyze large-scale AI hardware and data center designs, identifying scalability bottlenecks in hierarchical systems. To address these, we propose a modular data center architecture based on Compute Express Link (CXL) that enables disaggregated scaling of memory, compute, and accelerators. We further explore accelerator-optimized interconnects-collectively termed XLink (e.g., UALink, NVLink, NVLink Fusion)-and introduce a hybrid CXL-over-XLink design to reduce long-distance data transfers while preserving memory coherence. We also propose a hierarchical memory model that combines local and pooled memory, and evaluate lightweight CXL implementations, HBM, and silicon photonics for efficient scaling. Our evaluations demonstrate improved scalability, throughput, and flexibility in AI infrastructure.

Related Publications

CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD PerformanceIEEE Micro • 2025

Coherent Interconnect

Machine Learning

+1 more

From Block to Byte: Transforming PCIe SSDs with CXL Memory Protocol and Instruction AnnotationIEEE Micro • 2025

Coherent Interconnect

CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies IEEE Micro • 2025

Coherent Interconnect

View All Publications