Updated 10 h ago

Organization

Public GitHub footprint of kvcache.ai

KVCache.AI is a joint research project between MADSys and top industry collaborators, focusing on efficient LLM serving.

Public repositories

22,965

Total stars

1,146

Followers

kvcache-ai has a significant presence on GitHub, showcasing several widely used projects focused on efficient LLM serving. The organization primarily uses languages such as Python, Cuda, C++, Go, and JavaScript. Notable repositories include ktransformers, Mooncake, and vllm, reflecting its commitment to advancing large language model technologies.

Top languages

Python 5Cuda 2C++ 1Go 1JavaScript 1

Public repositories

ktransformers

★17,268

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python

Updated Jun 12, 2026

Mooncake

★5,563

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++

Updated Jun 12, 2026

TrEnv-X

★84

No description provided for this repository.

Updated Jun 1, 2026

vllm

★15

A high-throughput and memory-efficient inference and serving engine for LLMs

Python

Updated May 26, 2026

kvcache-blog

★11

No description provided for this repository.

JavaScript

Updated Jun 12, 2026

sglang

★11

SGLang is a fast serving framework for large language models and vision language models.

Python

Updated Jun 5, 2026

custom_flashinfer

★7

FlashInfer: Kernel Library for LLM Serving

Cuda

Updated Mar 1, 2026

DeepEP_fault_tolerance

★3

DeepEP: an efficient expert-parallel communication library that supports fault tolerance

Cuda

Updated Mar 10, 2026

sglang_awq

★2

SGLang is a fast serving framework for large language models and vision language models.

Python

Updated Mar 2, 2026

accelerate

★1

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Unknown Language

Updated Apr 13, 2026

Model-Optimizer

★0

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

Unknown Language

Updated May 6, 2026

evalscope

★0

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

Python

Updated Apr 10, 2026

transformers

★0

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Unknown Language

Updated Apr 7, 2026

gpustack

★0

GPU cluster manager for optimized AI model deployment

Unknown Language

Updated Dec 8, 2025

sglang-npu

★0

SGLang is a fast serving framework for large language models and vision language models.

Unknown Language

Updated Aug 12, 2025

Frequently asked questions

What does kvcache-ai build on GitHub?

kvcache-ai builds a variety of projects on GitHub, primarily focused on efficient serving of large language models. Key repositories include ktransformers and Mooncake, which contribute to advancements in LLM inference and optimization.

Which programming languages does kvcache-ai use?

kvcache-ai utilizes several programming languages for its projects, including Python, Cuda, C++, Go, and JavaScript. This diverse language use supports their focus on efficient LLM serving and enhances project flexibility.

Are kvcache-ai's repositories public?

Yes, kvcache-ai's repositories are public on GitHub. This transparency allows the community to access and contribute to their projects, fostering collaboration and innovation in the field of large language models.

Is this exposure intended?

Monitor kvcache.ai with RepoGuard and get alerted the moment a new public repository appears.

Monitor this account