Đã cập nhật 3 h ago

Organization

Dấu chân GitHub công khai của LAION AI

@LAION-AI

Xem hồ sơ trên GitHub

This is the repo of LAION, a non-profit organization to liberate machine learning research, models and datasets.

Germany

126

Kho lưu trữ công khai

47.159

Tổng số sao

4.257

Người theo dõi

LAION-AI là một tổ chức phi lợi nhuận hoạt động trên GitHub, nơi họ chia sẻ một loạt các kho mã nguồn công khai liên quan đến nghiên cứu và mô hình học máy. Các ngôn ngữ chính mà họ sử dụng bao gồm Python, Jupyter Notebook và TypeScript, với nhiều kho nổi bật như Open-Assistant và CLAP.

Ngôn ngữ hàng đầu

Python 50Jupyter Notebook 12TypeScript 6HTML 4Dart 2JavaScript 1

Kho lưu trữ công khai

Open-Assistant

★37.397

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Python

Đã cập nhật 12 thg 6, 2026

CLAP

★2.178

Contrastive Language-Audio Pretraining

Python

Đã cập nhật 12 thg 6, 2026

CLIP_benchmark

★812

CLIP-like model evaluation

Python

Đã cập nhật 11 thg 6, 2026

audio-dataset

★740

Audio Dataset for training CLAP and other models

Python

Đã cập nhật 29 thg 5, 2026

aesthetic-predictor

★708

A linear estimator on top of clip to predict the aesthetic quality of pictures

Jupyter Notebook

Đã cập nhật 12 thg 6, 2026

dalle2-laion

★504

Pretrained Dalle2 from laion

Python

Đã cập nhật 28 thg 5, 2026

natural_voice_assistant

★499

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 10 thg 6, 2026

CLIP-based-NSFW-Detector

★466

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 10 thg 6, 2026

lucidrains-projects

★357

A summary of all lucidrains repositores and links to training / research approaches by LAION or other communities.

Jupyter Notebook

Đã cập nhật 6 thg 6, 2026

laion-3d

★296

Collect large 3d dataset and build models

Ngôn ngữ không xác định

Đã cập nhật 12 thg 6, 2026

laion-datasets

★255

Description and pointers of laion datasets

HTML

Đã cập nhật 26 thg 3, 2026

phenaki

★220

A phenaki reproduction using pytorch.

Python

Đã cập nhật 10 thg 3, 2026

Open-Instruction-Generalist

★210

Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks

Python

Đã cập nhật 3 thg 4, 2026

scaling-laws-openclip

★195

Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)

Jupyter Notebook

Đã cập nhật 25 thg 5, 2026

ldm-finetune

★181

Home of `erlich` and `ongo`. Finetune latent-diffusion/glid-3-xl text2image on your own data.

Python

Đã cập nhật 19 thg 11, 2025

laion-dreams

★167

Aim for the moon. If you miss, you may hit a star.

Ngôn ngữ không xác định

Đã cập nhật 29 thg 5, 2026

LAION-5B-WatermarkDetection

★132

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 11 thg 6, 2026

AIW

★129

Alice in Wonderland code base for experiments and raw experiments data

Python

Đã cập nhật 12 thg 6, 2026

laion.ai

★123

Không có mô tả nào được cung cấp cho kho lưu trữ này.

HTML

Đã cập nhật 12 thg 6, 2026

emotion-annotations

★110

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 12 thg 6, 2026

Discord-Scrapers

★106

Implementation of a discord channel scraper to generate datasets.

Python

Đã cập nhật 23 thg 5, 2026

video-clip

★97

Let's make a video clip

Ngôn ngữ không xác định

Đã cập nhật 9 thg 4, 2026

Open-GIA

★87

O-GIA is an umbrella for research, infrastructure and projects ecosystem that should provide open source, reproducible datasets, models, applications & safety tools for Open Generalist Interactive Agents (O-GIA). O-GIA systems will act in collaboration with human or autonomously, supporting various kind of validated decision making and assistance.

Ngôn ngữ không xác định

Đã cập nhật 2 thg 12, 2025

watermark-detection

★74

A repository containing datasets and tools to train a watermark classifier.

Python

Đã cập nhật 30 thg 5, 2026

LAION-SAFETY

★65

An open toolbox for NSFW & toxicity detection

Jupyter Notebook

Đã cập nhật 30 thg 3, 2026

General-GPT

★65

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Jupyter Notebook

Đã cập nhật 16 thg 7, 2025

Text-to-speech

★61

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 19 thg 11, 2025

Big-Interleaved-Dataset

★59

Big-Interleaved-Dataset

Python

Đã cập nhật 3 thg 4, 2026

interesting-text-datasets

★45

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Ngôn ngữ không xác định

Đã cập nhật 21 thg 5, 2026

riverbed

★45

Tools for content datamining and NLP at scale

Python

Đã cập nhật 3 thg 4, 2026

Desktop_BUD-E

★42

BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the creation and integration of diverse skills for educational and research applications.

Python

Đã cập nhật 16 thg 2, 2026

OCR-ensemble

★42

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Jupyter Notebook

Đã cập nhật 28 thg 12, 2025

blade2blade

★41

Adversarial Training and SFT for Bot Safety Models

Python

Đã cập nhật 3 thg 4, 2026

Conditional-Pretraining-of-Large-Language-Models

★37

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 2 thg 1, 2025

deep-image-diffusion-prior

★36

Inverts CLIP text embeds to image embeds and visualizes with deep-image-prior.

Jupyter Notebook

Đã cập nhật 28 thg 4, 2026

laion5B-paper

★36

Building the laion5B paper

Ngôn ngữ không xác định

Đã cập nhật 4 thg 9, 2025

emotional-speech-annotations

★35

This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models

Ngôn ngữ không xác định

Đã cập nhật 1 thg 7, 2025

temporal-embedding-aggregation

★32

Aggregating embeddings over time

Python

Đã cập nhật 23 thg 6, 2025

medical

★30

This repository will be a summary and outlook on all our open, medical, AI advancements.

Jupyter Notebook

Đã cập nhật 13 thg 2, 2026

conditioned-prior

★29

(wip) Use LAION-AI's CLIP "conditoned prior" to generate CLIP image embeds from CLIP text embeds.

Python

Đã cập nhật 13 thg 10, 2025

Anh

★28

Anh - LAION's multilingual assistant datasets and models

Python

Đã cập nhật 3 thg 4, 2026

scaled-echo-tts

★24

Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024 GPUs)

Python

Đã cập nhật 28 thg 5, 2026

Desktop-BUD-E_V1.0

★24

Python

Đã cập nhật 7 thg 4, 2026

laion50BU

★24

Un-*** 50 billions multimodality dataset

Ngôn ngữ không xác định

Đã cập nhật 3 thg 4, 2026

scaling-laws-for-comparison

★22

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Jupyter Notebook

Đã cập nhật 13 thg 5, 2026

school-bud-e-frontend-old

★22

A frontend that is compatible to the school-bud-e-backend.

TypeScript

Đã cập nhật 9 thg 10, 2025

math_problems-step-by-step_solutions

★19

Here we provide and collect many functions to generate math problem and step by step solutions for LLM training

Python

Đã cập nhật 31 thg 5, 2026

laion-dedup

★18

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 10 thg 6, 2026

bud-e

★18

A general human-ai interaction platform.

Dart

Đã cập nhật 27 thg 5, 2026

univeral-audio-annotation-pipeline

★14

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 11 thg 6, 2026

Vocalino-V0.1-Voice-Acting-Pipeline

★14

Open-weights voice acting pipeline combining zero-shot voice cloning with natural-language direction. Provide a reference voice (or generate one) and describe how the line should be performed. Produces speech that keeps the voice identity while following emotional and stylistic prompts—no training required.

HTML

Đã cập nhật 25 thg 5, 2026

opendream

★14

Frontend (and soon also midleware and backend) for a new, opensource image generation platform.

TypeScript

Đã cập nhật 19 thg 11, 2025

LAION-PEOPLE

★14

This project provides a data set with bounding boxes, body poses, 3D face meshes & captions of people from our LAION-2.2B. Additionally it provides clusters based on the poses and face meshes and pose-related captions based on these cluster assignments.

Ngôn ngữ không xác định

Đã cập nhật 14 thg 4, 2025

worldsim

★13

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Ngôn ngữ không xác định

Đã cập nhật 21 thg 12, 2025

super-resolution

★13

This is the LAION repository for creating open super-resolution models with the help of LAION-5B subsets.

Ngôn ngữ không xác định

Đã cập nhật 20 thg 7, 2025

laionide

★12

This repository contains training code and checkpoitns for finetuning glide.

Python

Đã cập nhật 3 thg 4, 2026

project-menu

★12

Projects at LAION

Ngôn ngữ không xác định

Đã cập nhật 25 thg 8, 2025

model-retrieval

★11

Easily compute model embeddings and save the embeddings.

Ngôn ngữ không xác định

Đã cập nhật 3 thg 4, 2026

project-alexandria

★9

Official repo for Project Alexandria

Ngôn ngữ không xác định

Đã cập nhật 17 thg 3, 2026

open-sci-ref-0.01

★8

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Ngôn ngữ không xác định

Đã cập nhật 24 thg 2, 2026

image-deduplication-testset

★8

Không có mô tả nào được cung cấp cho kho lưu trữ này.

HTML

Đã cập nhật 4 thg 1, 2024

KAISER

★7

Knowledge Acquisition and Interlinking via Semantic Embeddings and Reasoning

Ngôn ngữ không xác định

Đã cập nhật 3 thg 4, 2026

Megatron-LM-Open-Sci

★7

MegaTron open-sci fork

Python

Đã cập nhật 3 thg 4, 2026

voice-taxonomies

★5

Collection of three complementary voice taxonomies: VoiceNet (59 speech dimensions), EmoNet (40 emotion categories), VocalBurst (82 non-speech sounds)

Ngôn ngữ không xác định

Đã cập nhật 9 thg 6, 2026

Retrieval-Augmented-Voice-Cloning

★5

Retrieval-augmented voice cloning and emotion conditioning data generation pipeline. Combines Echo TTS, ChatterboxVC, and Empathic Insight Voice+ to generate large-scale datasets of emotionally conditioned speech with disentangled speaker identity and emotional prosody.

Python

Đã cập nhật 9 thg 6, 2026

open_clip_mammut

★5

OpenCLIP fork with MaMMUT support

Python

Đã cập nhật 26 thg 11, 2025

safety-pipeline

★5

A collection of safety classifiers and models to process image and texts.

Python

Đã cập nhật 14 thg 5, 2025

Dream-E

★4

Không có mô tả nào được cung cấp cho kho lưu trữ này.

TypeScript

Đã cập nhật 11 thg 6, 2026

emonet-face

★4

Official repository for the NeurIPS 2025 paper “EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition.” Includes a 40-category emotion taxonomy, balanced synthetic datasets, expert annotations, and baseline models for fair and reproducible evaluation.

Jupyter Notebook

Đã cập nhật 28 thg 5, 2026

annotate-collection

★3

A repository with data for annotation.

Python

Đã cập nhật 13 thg 5, 2025

decentralized-learning

★3

A basic setup for decentralized-learning that can be used for training future DALLE/CLIP/CLAP models.

Ngôn ngữ không xác định

Đã cập nhật 8 thg 9, 2024

chatterbox-voice-conversion

★2

High-level Python library for zero-shot voice conversion using Resemble AI's Chatterbox S3Gen model

Python

Đã cập nhật 9 thg 6, 2026

BVD

★2

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 3 thg 6, 2026

agent-bud-e

★2

Building an agentic voice assistant for mobile & desktop devices with episodic, semantic & procedural memories

Ngôn ngữ không xác định

Đã cập nhật 16 thg 4, 2026

llm-template

★2

A template for procedural template generation using JSON outputs form LLMs.

TypeScript

Đã cập nhật 3 thg 4, 2026

AIW_webpage

★2

Alice in Wonderland project and initiative webpage

Ngôn ngữ không xác định

Đã cập nhật 3 thg 4, 2026

laion5b-subsets

★2

Creating subsets from laion5b via embeddings search

Jupyter Notebook

Đã cập nhật 2 thg 12, 2025

Open-Sci-hf

★2

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 18 thg 8, 2025

curiosit-e

★2

File server for curiosit-e content.

TypeScript

Đã cập nhật 17 thg 4, 2025

images-for-slideshows

★2

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Ngôn ngữ không xác định

Đã cập nhật 4 thg 4, 2025

django-htmx-llm-streaming

★2

A prototype showing how to stream using Django x htmx.

JavaScript

Đã cập nhật 4 thg 4, 2025

crawlingathome

★2

A client library for Crawling@Home's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

Python

Đã cập nhật 19 thg 6, 2024

school-bud-e-frontend

★1

School Bud-E is an intelligent and empathetic learning assistant designed to revolutionize the educational experience.

Dart

Đã cập nhật 27 thg 5, 2026

Admin_Bud-E

★1

Admin Bud-E is a lightweight, privacy-first control center for AI chat, speech-to-text, and text-to-speech. Manage providers, routing, and costs with a simple Admin Console. Give users per-period credits, prices per model, and a shared Common Pool. EU-friendly via OpenAI-Format endpoints or our optional Google Cloud Vertex proxy.

Python

Đã cập nhật 18 thg 5, 2026

transformers

★1

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python

Đã cập nhật 3 thg 4, 2026

snac-to-llama3

★1

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Jupyter Notebook

Đã cập nhật 4 thg 4, 2025

bud-e-mobile

★1

Mobile app development of all bud-e derivatives.

Ngôn ngữ không xác định

Đã cập nhật 4 thg 4, 2025

laionbox

★0

LaionBox: Fine-tuned DramaBox TTS with Multi-Auxiliary Differentiable Losses

Python

Đã cập nhật 13 thg 6, 2026

open-sci-ref

★0

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 12 thg 6, 2026

Voice-Acting-Pipeline

★0

Self-contained DramaBox voice acting pipeline: VoiceNet taxonomy, multi-GPU prompt generation, TTS synthesis, and audio refinement

Python

Đã cập nhật 5 thg 6, 2026

emolia-bench

★0

Benchmark analysis

Python

Đã cập nhật 13 thg 5, 2026

jax-dacvae-echotts

★0

JAX/TPU training code for EchoTTS with DACVAE latent codec

Python

Đã cập nhật 8 thg 5, 2026

tunes

★0

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 7 thg 5, 2026

scientific-summaries

★0

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 7 thg 5, 2026

open-clap-scaling

★0

Multi-node scaling benchmarks for CLAP contrastive audio-language models on HPC clusters

Python

Đã cập nhật 29 thg 3, 2026

vocolino

★0

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Ngôn ngữ không xác định

Đã cập nhật 27 thg 3, 2026

helden-bud-e-frontend

★0

Không có mô tả nào được cung cấp cho kho lưu trữ này.

TypeScript

Đã cập nhật 6 thg 9, 2025

DSA-Wissen-BM25-Server

★0

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Python

Đã cập nhật 6 thg 9, 2025

StoryBuddy-frontend

★0

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Ngôn ngữ không xác định

Đã cập nhật 31 thg 8, 2025

Open-Sci-moe-hf

★0

Không có mô tả nào được cung cấp cho kho lưu trữ này.

Ngôn ngữ không xác định

Đã cập nhật 11 thg 6, 2025

Câu hỏi thường gặp

LAION-AI xây dựng những gì trên GitHub?

LAION-AI phát triển nhiều dự án mã nguồn mở liên quan đến học máy, bao gồm Open-Assistant, một trợ lý chat thông minh, và CLAP, một công cụ tiền huấn luyện ngôn ngữ và âm thanh.

Ngôn ngữ lập trình nào được LAION-AI sử dụng?

LAION-AI chủ yếu sử dụng các ngôn ngữ lập trình như Python, Jupyter Notebook và TypeScript trong các dự án của mình, cho phép họ tạo ra nhiều kho mã nguồn đa dạng và hữu ích.

Các kho mã của LAION-AI có công khai không?

Các kho mã của LAION-AI đều được công khai trên GitHub, cho phép cộng đồng truy cập, sử dụng và đóng góp cho các dự án nghiên cứu và phát triển của tổ chức.

Liệu việc lộ thông tin này có dự định không?

Theo dõi LAION AI với RepoGuard và nhận cảnh báo ngay khi có kho lưu trữ công khai mới xuất hiện.

Theo dõi tài khoản này