3 篇博文含有标签「hardware」

查看所有标签

Deploying vLLM Semantic Router on AMD Developer Cloud

2026年3月25日 · 阅读需 11 分钟

Xunzhuo Liu

Intelligent Routing @vLLM

AMD Developer Cloud and vLLM Semantic Router overview

Running vLLM Semantic Router on AMD Developer Cloud is not just about bringing up one more inference endpoint. It is about turning it into a routed multi-tier system that can classify requests, choose a semantic lane, and make replay and Insights immediately useful.

This post walks through the practical path: start the ROCm backend on an AMD Developer Cloud instance, install vLLM-SR, import the reference profile, and validate the deployment end to end.

Building Mixture-of-Models on AMD GPUs with vLLM-SR

2026年1月23日 · 阅读需 1 分钟

Xunzhuo Liu

Intelligent Routing @vLLM

mom-on-amd

Building Mixture-of-Models on AMD GPUs is not just about serving one more model on one more device. It is about turning routing, governance, and inference into a coordinated system so MoM workloads can run efficiently on AMD hardware at production scale.

AMD × vLLM Semantic Router: Building the System Intelligence Together

2025年12月16日 · 阅读需 1 分钟

Xunzhuo Liu

Intelligent Routing @vLLM

Over the past several months, AMD and the vLLM SR Team have been collaborating to bring vLLM Semantic Router (VSR) to AMD GPUs—not just as a performance optimization, but as a fundamental shift in how we think about AI system architecture.

AMD has been a long-term technology partner for the vLLM community, from accelerating the vLLM inference engine on AMD GPUs and ROCm™ Software to now co-building the next layer of the AI stack: intelligent routing and governance for Mixture-of-Models (MoM) systems.

Synced from official vLLM Blog: AMD × vLLM Semantic Router: Building the System Intelligence Together

banner