已更新 7 min ago

Organization

kvcache.ai 的公共 GitHub 足迹

KVCache.AI is a joint research project between MADSys and top industry collaborators, focusing on efficient LLM serving.

公共仓库

22,973

总星标

1,146

关注者

kvcache.ai 是一个专注于高效 LLM 服务的联合研究项目，拥有多个公共 GitHub 存储库。其主要编程语言包括 Python、Cuda、C++、Go 和 JavaScript，涵盖了如 ktransformers 和 Mooncake 等多个知名项目，展示了其在大语言模型优化和服务平台方面的贡献。

顶级语言

Python 5Cuda 2C++ 1Go 1JavaScript 1

公共仓库

ktransformers

★17,272

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python

已更新 2026年6月13日

Mooncake

★5,567

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++

已更新 2026年6月13日

TrEnv-X

★84

此仓库未提供描述。

已更新 2026年6月1日

vllm

★15

A high-throughput and memory-efficient inference and serving engine for LLMs

Python

已更新 2026年5月26日

kvcache-blog

★11

此仓库未提供描述。

JavaScript

已更新 2026年6月12日

sglang

★11

SGLang is a fast serving framework for large language models and vision language models.

Python

已更新 2026年6月5日

custom_flashinfer

★7

FlashInfer: Kernel Library for LLM Serving

Cuda

已更新 2026年3月1日

DeepEP_fault_tolerance

★3

DeepEP: an efficient expert-parallel communication library that supports fault tolerance

Cuda

已更新 2026年3月10日

sglang_awq

★2

SGLang is a fast serving framework for large language models and vision language models.

Python

已更新 2026年3月2日

accelerate

★1

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

未知语言

已更新 2026年4月13日

Model-Optimizer

★0

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

未知语言

已更新 2026年5月6日

evalscope

★0

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

Python

已更新 2026年4月10日

transformers

★0

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

未知语言

已更新 2026年4月7日