Updated 1 h ago

Organization

Public GitHub footprint of vLLM

@vllm-project

View profile on GitHub

Public repositories

110,891

Total stars

3,436

Followers

The vllm-project organization on GitHub features a wide range of public repositories focused on large language model (LLM) inference and deployment. Notable projects include vllm, a high-throughput inference engine, and vllm-omni, a framework for efficient model inference. The organization primarily utilizes programming languages such as Python, C++, Rust, Go, HTML, and TypeScript.

Top languages

Python 21C++ 3Rust 3Go 2HTML 2TypeScript 2JavaScript 1Shell 1

Public repositories

vllm

★82,765

A high-throughput and memory-efficient inference and serving engine for LLMs

Python

Updated Jun 13, 2026

vllm-omni

★5,130

A framework for efficient model inference with omni-modality models

Python

Updated Jun 13, 2026

aibrix

★4,875

Cost-efficient and pluggable Infrastructure components for GenAI inference

Updated Jun 13, 2026

semantic-router

★4,349

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

Updated Jun 13, 2026

llm-compressor

★3,392

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python

Updated Jun 13, 2026

production-stack

★2,401

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python

Updated Jun 13, 2026

vllm-ascend

★2,237

Community maintained hardware plugin for vLLM on Ascend

C++

Updated Jun 13, 2026

vllm-metal

★1,315

Community maintained hardware plugin for vLLM on Apple Silicon

Python

Updated Jun 13, 2026

guidellm

★1,252

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python

Updated Jun 13, 2026

recipes

★846

Common recipes to run vLLM

JavaScript

Updated Jun 13, 2026

speculators

★515

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python

Updated Jun 13, 2026

tpu-inference

★350

TPU inference for vLLM, with unified JAX and PyTorch support.

Python

Updated Jun 13, 2026

compressed-tensors

★292

A safetensors extension to efficiently store sparse quantized tensors on disk

Python

Updated Jun 13, 2026

router

★267

A high-performance and light-weight router for vLLM large scale deployment

Rust

Updated Jun 11, 2026

vime

★234

An LLM post-training framework with vLLM for RL Scaling

Python

Updated Jun 13, 2026

flash-attention

★125

Fast and memory-efficient exact attention

Python

Updated Jun 13, 2026

vllm-skills

★84

Agent skills for vLLM

Shell

Updated Jun 13, 2026

vllm-openvino

★54

No description provided for this repository.

Python

Updated May 22, 2026

vllm-daily

★51

vLLM Daily Summarization of Merged PRs

Unknown Language

Updated Jun 13, 2026

vllm-xpu-kernels

★47

The vLLM XPU kernels for Intel GPU

C++

Updated Jun 13, 2026

vllm-project.github.io

★45

No description provided for this repository.

HTML

Updated Jun 13, 2026

ci-infra

★43

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

HCL

Updated Jun 12, 2026

vllm-gaudi

★40

Community maintained hardware plugin for vLLM on Intel Gaudi

Python

Updated Jun 12, 2026

agentic-api

★33

Stateful API logic for agentic applications using vLLM

Rust

Updated Jun 11, 2026

vllm-neuron

★31

Community maintained hardware plugin for vLLM on AWS Neuron

Python

Updated May 29, 2026

dllm-plugin

★21

vLLM plugin for block-based diffusion language model (dLLM) support

Python

Updated Jun 10, 2026

vllm-nccl

★18

Manages vllm-nccl dependency

Python

Updated Apr 14, 2026

FlashMLA

★14

No description provided for this repository.

C++

Updated Jun 1, 2026

bart-plugin

★12

vLLM Model plugin for the encoder-decoder BART model

Python

Updated Jun 3, 2026

vLLM-in-PyTorch-Conference-2025

★11

No description provided for this repository.

Unknown Language

Updated May 26, 2026

media-kit

★9

vLLM Logo Assets

Unknown Language

Updated May 27, 2026

vllm-project.github.io-static

★9

No description provided for this repository.

HTML

Updated Nov 26, 2025

vllm-gguf-plugin

★8

vLLM Quantization plugin for GGUF

Python

Updated Jun 13, 2026

perf-eval

★7

Performance benchmark & accuracy evaluation for vLLM

Python

Updated Jun 12, 2026

vllm-dashboard

★4

No description provided for this repository.

TypeScript

Updated Jun 11, 2026

perf-dashboard

★3

Performance dashboard for vLLM

Python

Updated Jun 11, 2026

vllm-bnb-plugin

★1

vLLM Quantization plugin for bitsandbytes

Python

Updated Jun 9, 2026

rfcs

★1

No description provided for this repository.

Unknown Language

Updated Jun 4, 2025

MSA

★0

No description provided for this repository.

Unknown Language

Updated Jun 11, 2026

DeepGEMM

★0

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda

Updated Jun 5, 2026

vllm-docs

★0

No description provided for this repository.

TypeScript

Updated May 21, 2026

llm-multimodal

★0

Standalone fork of llm-multimodal from SMG

Rust

Updated May 20, 2026

Frequently asked questions

What does vllm-project build on GitHub?

vllm-project develops tools and frameworks for large language model inference and deployment. Key repositories include vllm, a high-throughput serving engine, and aibrix, which provides cost-efficient infrastructure components.

Which programming languages does vllm-project use?

The primary programming languages used by vllm-project are Python, C++, Rust, Go, HTML, and TypeScript. These languages support the development of various tools and frameworks for LLMs.

Are vllm-project's repositories public?

Yes, all repositories of vllm-project are public on GitHub. This allows users and developers to access, contribute to, and collaborate on projects related to LLM inference and deployment.

Is this exposure intended?

Monitor vLLM with RepoGuard and get alerted the moment a new public repository appears.

Monitor this account