Public GitHub footprint of NVIDIA Corporation

Megatron-LM

★17,239

Ongoing research training transformer models at scale

open-gpu-kernel-modules

★17,229

NVIDIA Linux open GPU kernel module source

TensorRT-LLM

★14,233

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

SkillSpector

★13,858

Security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks.

TensorRT

★13,192

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

cosmos

★11,266

NVIDIA Cosmos is an open platform of world models, datasets, and tools that enables developers to build Physical AI for robots, autonomous vehicles, smart infrastructure, and more.

personaplex

★10,269

PersonaPlex code.

cutlass

★10,148

CUDA Templates and Python DSLs for High-Performance Linear Algebra

cuda-samples

★9,429

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

apex

★8,986

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

garak

★8,598

the LLM vulnerability scanner

OpenShell

★7,837

OpenShell is the safe, private runtime for autonomous AI agents.

Isaac-GR00T

★7,694

NVIDIA Isaac GR00T N1.7 - A Foundation Model for Generalist Robots.

warp

★6,909

A Python framework for GPU-accelerated simulation, robotics, and machine learning.

FasterTransformer

★6,447

Transformer related optimization, including BERT, GPT

DALI

★5,730

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

tacotron2

★5,298

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

nccl

★4,916

Optimized primitives for collective multi-GPU communication

nvidia-container-toolkit

★4,487

Build and run containers leveraging NVIDIA GPUs

k8s-device-plugin

★3,825

NVIDIA device plugin for Kubernetes

TransformerEngine

★3,449

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

cuda-python

★3,324

CUDA Python: Performance meets Productivity

Cython

Model-Optimizer

★3,318

A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

flownet2-pytorch

★3,289

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

physicsnemo

★3,100

Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods

NeMo-Retriever

★2,953

NeMo Retriever Library is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever Library uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.

MinkowskiEngine

★2,949

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors

gpu-operator

★2,807

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes

skills

★2,699

Agent Skills for NVIDIA products — install into Claude Code, Codex, and other coding agents to run Physical AI, robotics, simulation, CUDA, and RAG workflows end to end.

NeMo-Agent-Toolkit

★2,537

The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.

CUDALibrarySamples

★2,468

CUDA Library Samples

cccl

★2,441

CUDA Core Compute Libraries

stdexec

★2,395

`std::execution`, the standard C++ framework for asynchronous and parallel programming.

cutile-python

★2,120

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Stable-Diffusion-WebUI-TensorRT

★1,990

TensorRT Extension for Stable Diffusion Web UI

aistore

★1,901

AIStore: scalable storage for AI applications

accelerated-computing-hub

★1,871

NVIDIA curated collection of educational resources related to general purpose GPU programming.

dcgm-exporter

★1,813

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM

nccl-tests

★1,605

NCCL Tests

RULER

★1,589

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

deepops

★1,463

Tools for building GPU clusters

Shell

MatX

★1,439

An efficient C++20 GPU numerical computing library with Python-like syntax

DLSS

★1,404

NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games

gdrcopy

★1,401

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

open-gpu-doc

★1,350

Documentation of NVIDIA chip/hardware interfaces

kvpress

★1,147

LLM KV cache compression made easy

libnvidia-container

★1,116

NVIDIA container runtime library

cuda-quantum

★1,101

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

jetson-gpio

★1,073

A Python library that enables the use of Jetson's GPIOs

earth2studio

★1,050

Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.

raft

★1,030

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

DreamDojo

★1,016

Official Codebase for "DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos" (ICML 2026)

cudf-spark

★990

NVIDIA cuDF for Apache Spark plugin - accelerate Apache Spark with GPUs

Scala

cuopt

★987

GPU accelerated decision optimization

NVFlare

★951

NVIDIA Federated Learning Application Runtime Environment

nvbench

★912

CUDA Kernel Benchmarking Library

cudnn-frontend

★889

cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.

cuvs

★822

cuVS - a library for vector search and clustering on the GPU

flashdreams

★427

high-performance inference and serving library for interactive autoregressive video and world models

JAX-Toolbox

★425

JAX-Toolbox

cosmos-framework

★419

Our inference and training framework to run on the Cosmos Models

Audio2Face-3D

★383

repo collection for NVIDIA Audio2Face-3D models and tools

Unknown Language

Megatron-Energon

★374

Megatron's multi-modal data loader

aicr

★355

Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes

NVSentinel

★352

NVSentinel is a cross-platform fault remediation service designed to rapidly remediate runtime node-level issues in GPU-accelerated computing environments

IsaacTeleop

★324

The unified framework for sim & real robot teleoperation

nvidia-resiliency-ext

★311

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to failures and interruptions.

VisRTX

★279

NVIDIA OptiX based implementation of ANARI

nvidia-kaggle

★272

NVIDIA Kaggle Plugin gives agents end-to-end Kaggle competition workflows through a single skill, nvidia-kaggle-skill. It can gather competition context, study public writeups and notebooks, reproduce kernels locally, submit to competitions, and manage Ka

G-Assist

★239

Help shape the future of Project G-Assist

infra-controller

★237

NVIDIA Infra Controller - Hardware Lifecycle Management and multitenant networking

cosmos-curator

★237

Cosmos Curator is a powerful video curation system that processes, analyzes, and organizes video content using advanced AI models and distributed computing.

nvalchemi-toolkit-ops

★216

ALCHEMI Toolkit-Ops is a collection of optimized batch kernels to accelerate computational chemistry and material science workflows.

OSMO

★198

The developer-first platform for scaling complex Physical AI workloads across heterogeneous compute—unifying training GPUs, simulation clusters, and edge devices in a simple YAML

TypeScript

TorchFort

★198

An Online Deep Learning Interface for HPC programs on NVIDIA GPUs

nvcf

★185

Platform for deploying and routing GPU-accelerated inference, streaming, and batch workloads at scale.

cudf-spark-examples

★169

A repo for all spark examples using Rapids Accelerator including ETL, ML/DL, etc.

vgpu-device-manager

★160

NVIDIA vGPU Device Manager manages NVIDIA vGPU devices on top of Kubernetes

topograph

★148

A toolkit for discovering cluster network topology.

diffusion-audio-restoration

★148

Audio-to-Audio Schrodinger Bridges is a diffusion-based audio restoration model for bandwidth extension and inpainting.

optix-toolkit

★138

Set of utilities supporting workflows common in GPU raytracing applications

ais-k8s

★131

Kubernetes Operator, Helm Charts, Ansible Playbooks, and utility scripts for large-scale AIStore deployments on Kubernetes.

instant-nurec

★121

InstantNuRec: Feed-Forward 3D Gaussian Reconstruction from Driving Logs

Ising-Decoding

★111

A set of training recipes for AI Quantum Error Correction Decoders

NV-Kernels

★109

Ubuntu kernels which are optimized for NVIDIA server systems

Unknown Language

cudaqx

★108

Accelerated libraries for quantum-classical computing built on CUDA-Q.

doca-platform

★92

DOCA Platform manages provisioning and service orchestration for Bluefield DPUs

NeMo-Relay

★84

Multi-language agent runtime and library for execution scope management, lifecycle events, and middleware on tool and LLM calls.

xr-ai

★67

XR AI

cudf-spark-jni

★62

RAPIDS Accelerator JNI For Apache Spark

halos-outside-in-safety

★58

NVIDIA Halos Outside-In Safety Blueprint extends robot perception beyond on-board sensors by using external infrastructure cameras and AI agents to dynamically control robot behavior and perform at maximum efficiency.

k8s-test-infra

★42

K8s-test-infra

srt-slurm

★39

NVIDIA Inference Benchmarks provide recipes in ready-to-use templates for evaluating platform speed. Validate your platform across specific AI use cases across hardware and software combinations.

nv-redfish

★32

NVIDIA's Redfish next generation redfish crate

nova

★28

Linux kernel source tree

nv-rms-client

★8

NVIDIA Rack Management Service Rust language client crate

Spatial-IQ

★5

A diagnostic framework that decomposes 3D object counting into nine hierarchical spatial perception and cognition sub-tasks. Analysis code for the paper Spatial-IQ: Deconstructing Spatial Intelligence via Hierarchical Capability Tests.

OpenShell-Research

★3

🧪 OpenShell's Research Journal

mctp-utils

★1

MCTP tool running on Host to communicate with devices via USB and other transport layers.