2 min agoに更新されました

Organization

kvcache.aiの公開GitHubフットプリント

KVCache.AI is a joint research project between MADSys and top industry collaborators, focusing on efficient LLM serving.

公開リポジトリ

22,973

合計スター

1,146

フォロワー

kvcache-aiは、MADSysと業界の主要なコラボレーターによる共同研究プロジェクトで、効率的なLLMサービングに焦点を当てています。GitHub上では、Python、Cuda、C++、Go、JavaScriptを使用した多様なリポジトリを公開しており、特にktransformersやMooncakeなどの注目プロジェクトがあります。

主要な言語

Python 5Cuda 2C++ 1Go 1JavaScript 1

公開リポジトリ

ktransformers

★17,272

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python

更新済み 2026年6月13日

Mooncake

★5,567

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++

更新済み 2026年6月13日

TrEnv-X

★84

このリポジトリに関する説明は提供されていません。

更新済み 2026年6月1日

vllm

★15

A high-throughput and memory-efficient inference and serving engine for LLMs

Python

更新済み 2026年5月26日

kvcache-blog

★11

このリポジトリに関する説明は提供されていません。

JavaScript

更新済み 2026年6月12日

sglang

★11

SGLang is a fast serving framework for large language models and vision language models.

Python

更新済み 2026年6月5日

custom_flashinfer

★7

FlashInfer: Kernel Library for LLM Serving

Cuda

更新済み 2026年3月1日

DeepEP_fault_tolerance

★3

DeepEP: an efficient expert-parallel communication library that supports fault tolerance

Cuda

更新済み 2026年3月10日

sglang_awq

★2

SGLang is a fast serving framework for large language models and vision language models.

Python

更新済み 2026年3月2日

accelerate

★1

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

不明な言語

更新済み 2026年4月13日

Model-Optimizer

★0

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

不明な言語

更新済み 2026年5月6日

evalscope

★0

A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.

Python

更新済み 2026年4月10日

transformers

★0

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

不明な言語

更新済み 2026年4月7日