KVCache.AI is a joint research project between MADSys and top industry collaborators, focusing on efficient LLM serving.
15
公共仓库
22,973
总星标
1,146
关注者
kvcache.ai 是一个专注于高效 LLM 服务的联合研究项目,拥有多个公共 GitHub 存储库。其主要编程语言包括 Python、Cuda、C++、Go 和 JavaScript,涵盖了如 ktransformers 和 Mooncake 等多个知名项目,展示了其在大语言模型优化和服务平台方面的贡献。
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
此仓库未提供描述。
A high-throughput and memory-efficient inference and serving engine for LLMs
此仓库未提供描述。
SGLang is a fast serving framework for large language models and vision language models.
FlashInfer: Kernel Library for LLM Serving
DeepEP: an efficient expert-parallel communication library that supports fault tolerance
SGLang is a fast serving framework for large language models and vision language models.
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
GPU cluster manager for optimized AI model deployment
SGLang is a fast serving framework for large language models and vision language models.
kvcache-ai 在 GitHub 上构建了多个项目,主要集中在大语言模型的高效服务和优化上,包括 ktransformers 和 Mooncake 等重要存储库。
kvcache-ai 主要使用 Python、Cuda、C++、Go 和 JavaScript 等编程语言来开发其公共存储库,支持多种大语言模型的应用。
是的,kvcache-ai 的所有存储库都是公开的,任何人都可以访问和利用其开源项目,促进社区的合作与发展。