Abstract: Large language models (LLMs) require inference systems that can handle both compute- and memory-intensive workloads. GPUs and NPUs (referred to as xPUs) efficiently process compute-intensive ...