Mis à jour 10 h ago

Organization

Empreinte publique GitHub de Ai2

@allenai

Voir le profil sur GitHub

Seattle, WA

584

Dépôts publics

77 201

Total des étoiles

4 769

Abonnés

L'organisation allenai, basée à Seattle, présente une large gamme de dépôts publics sur GitHub, principalement en Python, Scala et C#. Parmi leurs projets notables, on trouve des bibliothèques de recherche en traitement du langage naturel comme allennlp et des outils pour la gestion de données comme olmocr. Leur présence sur GitHub reflète un engagement envers l'open source et l'innovation en intelligence artificielle.

Langues principales

Python 83Jupyter Notebook 3Scala 2Rust 2C# 1Lua 1HTML 1Java 1

Dépôts publics

olmocr

★17 387

Toolkit for linearizing PDFs for LLM datasets/training

Python

Mis à jour 13 juin 2026

allennlp

★11 892

An open-source NLP research library, built on PyTorch.

Python

Mis à jour 13 juin 2026

OLMo

★6 554

Modeling, training, eval, and inference code for OLMo

Python

Mis à jour 12 juin 2026

open-instruct

★3 752

AllenAI's post-training codebase

Python

Mis à jour 12 juin 2026

RL4LMs

★2 388

A modular RL library to fine-tune language models to human preferences

Python

Mis à jour 6 juin 2026

longformer

★2 196

Longformer: The Long-Document Transformer

Python

Mis à jour 5 juin 2026

scispacy

★1 964

A full spaCy pipeline and models for scientific/biomedical documents.

Python

Mis à jour 12 juin 2026

ai2thor

★1 739

An open-source platform for Visual AI.

Mis à jour 11 juin 2026

scibert

★1 703

A BERT model for scientific text.

Python

Mis à jour 10 juin 2026

dolma

★1 508

Data and tools for generating and inspecting OLMo pre-training data.

Python

Mis à jour 8 juin 2026

objaverse-xl

★1 297

🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!

Python

Mis à jour 13 juin 2026

OLMo-core

★1 289

PyTorch building blocks for the OLMo ecosystem

Python

Mis à jour 13 juin 2026

s2orc

★1 064

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/

Python

Mis à jour 10 juin 2026

natural-instructions

★1 047

Expanding natural instructions

Python

Mis à jour 10 juin 2026

OLMoE

★1 026

OLMoE: Open Mixture-of-Experts Language Models

Jupyter Notebook

Mis à jour 9 juin 2026

molmo

★914

Code for the Molmo Vision-Language Model

Python

Mis à jour 11 juin 2026

XNOR-Net

★870

ImageNet classification using binary Convolutional Neural Networks

Lua

Mis à jour 9 juin 2026

papermage

★797

library supporting NLP and CV research on scientific papers

Python

Mis à jour 8 juin 2026

visprog

★773

Official code for VisProg (CVPR 2023 Best Paper!)

Python

Mis à jour 8 juin 2026

scitldr

★759

Aucune description fournie pour ce dépôt.

Python

Mis à jour 9 juin 2026

pdffigures2

★748

Given a scholarly PDF, extract figures, tables, captions, and section titles.

Scala

Mis à jour 7 juin 2026

reward-bench

★721

RewardBench: the first evaluation tool for reward models.

Python

Mis à jour 12 juin 2026

molmo2

★643

Code for the Molmo2 Vision-Language Model

Python

Mis à jour 12 juin 2026

molmoact2

★605

Official Repository for MolmoAct2

Python

Mis à jour 13 juin 2026

specter

★583

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Python

Mis à jour 13 juin 2026

WildDet3D

★576

Allen Institute for AI: WildDet3D: Scaling Promptable 3D Detection in the Wild

Python

Mis à jour 12 juin 2026

molmoweb

★567

Aucune description fournie pour ce dépôt.

Python

Mis à jour 11 juin 2026

allennlp-models

★563

Officially supported AllenNLP models

Python

Mis à jour 9 juin 2026

Holodeck

★553

CVPR 2024: Language Guided Generation of 3D Embodied AI Environments.

Python

Mis à jour 6 juin 2026

dont-stop-pretraining

★543

Code associated with the Don't Stop Pretraining ACL 2020 paper

Python

Mis à jour 5 juin 2026

OLMoASR

★491

An open-source implementation of Whisper

Python

Mis à jour 3 juin 2026

s2orc-doc2json

★469

Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)

Python

Mis à jour 6 juin 2026

procthor

★441

🏘️ Scaling Embodied AI by Procedurally Generating Interactive 3D Houses

Python

Mis à jour 12 juin 2026

deep_qa

★403

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)

Python

Mis à jour 6 juin 2026

allenact

★382

An open source framework for research in Embodied-AI from AI2.

Python

Mis à jour 9 juin 2026

olmes

★379

Reproducible, flexible LLM evaluations

Python

Mis à jour 10 juin 2026

molmoact

★369

Official Repository for MolmoAct

Python

Mis à jour 12 juin 2026

vla-evaluation-harness

★368

One framework to evaluate any VLA model on any robot simulation benchmark.

Python

Mis à jour 12 juin 2026

ScienceWorld

★363

ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.

Scala

Mis à jour 10 juin 2026

molmospaces

★358

An end-to-end open ecosystem for robot learning

Python

Mis à jour 12 juin 2026

satlas-super-resolution

★341

Aucune description fournie pour ce dépôt.

Python

Mis à jour 10 juin 2026

ai2-scholarqa-lib

★281

Repo housing the open sourced code for the ai2 scholar qa app and also the corresponding library

Python

Mis à jour 7 juin 2026

satlas

★280

Aucune description fournie pour ce dépôt.

Python

Mis à jour 11 juin 2026

s2-folks

★275

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.

Langue inconnue

Mis à jour 10 juin 2026

scifact

★263

Data and models for the SciFact verification task.

Python

Mis à jour 10 juin 2026

WildBench

★254

Benchmarking LLMs with Challenging Tasks from Real Users

Python

Mis à jour 8 juin 2026

olmoearth_pretrain

★246

Earth system foundation model data, training, and eval

Python

Mis à jour 12 juin 2026

asta-paper-finder

★244

frozen-in-time version of our Paper Finder agent for reproducing evaluation results

Python

Mis à jour 12 juin 2026

real-toxicity-prompts

★233

Aucune description fournie pour ce dépôt.

Jupyter Notebook

Mis à jour 11 juin 2026

discoveryworld

★215

A virtual environment for developing and evaluating automated scientific discovery agents.

Python

Mis à jour 10 juin 2026

hidden-networks

★198

Aucune description fournie pour ce dépôt.

Python

Mis à jour 8 juin 2026

autodiscovery-neurips

★182

Official code for NeurIPS 2025 paper "AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise"

Python

Mis à jour 4 juin 2026

medicat

★176

Dataset of medical images, captions, subfigure-subcaption annotations, and inline textual references

Python

Mis à jour 12 juin 2026

pixmo-docs

★163

ACL 2025: Synthetic data generation pipelines for text-rich images.

Python

Mis à jour 5 juin 2026

discoverybench

★147

Discovering Data-driven Hypotheses in the Wild

Python

Mis à jour 12 juin 2026

SERA

★146

Data generation and training repository for SERA: Soft-Verified Efficient Repository Agents.

Python

Mis à jour 13 juin 2026

satlaspretrain_models

★144

Aucune description fournie pour ce dépôt.

Jupyter Notebook

Mis à jour 9 juin 2026

IFBench

★142

Aucune description fournie pour ce dépôt.

Python

Mis à jour 11 juin 2026

agent-baselines

★142

Aucune description fournie pour ce dépôt.

Python

Mis à jour 8 juin 2026

SPECTER2

★136

Aucune description fournie pour ce dépôt.

Python

Mis à jour 5 juin 2026

bolmo-core

★134

Code for Bolmo: Byteifying the Next Generation of Language Models

Python

Mis à jour 10 juin 2026

wildguard

★125

Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Python

Mis à jour 12 juin 2026

aokvqa

★116

Official repository for the A-OKVQA dataset

Python

Mis à jour 5 juin 2026

asta-bench

★109

Aucune description fournie pour ce dépôt.

Python

Mis à jour 13 juin 2026

S2AND

★109

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

Python

Mis à jour 4 juin 2026

infinigram-api

★101

Aucune description fournie pour ce dépôt.

Python

Mis à jour 12 juin 2026

DecomP

★99

Repository for Decomposed Prompting

Python

Mis à jour 9 juin 2026

robothor-challenge

★99

RoboTHOR Challenge

Python

Mis à jour 4 juin 2026

MolmoBot

★90

Code and website for "MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation".

Python

Mis à jour 10 juin 2026

rslearn

★89

A tool for developing remote sensing datasets and models.

Python

Mis à jour 11 juin 2026

duplodocus

★85

Tooling for exact and MinHash deduplication of large-scale text datasets

Rust

Mis à jour 5 juin 2026

olmoearth_projects

★74

OlmoEarth projects

Python

Mis à jour 12 juin 2026

codenav

★69

CodeNav is an LLM agent that navigates and leverages previously unseen code repositories to solve user queries.

Python

Mis à jour 6 juin 2026

atlantes

★66

Efficient and low latency real-time global-scale GPS trajectory modeling

Python

Mis à jour 10 juin 2026

phone2proc

★63

📱👉🏠 Perform conditional procedural generation to generate houses like your own!

Python

Mis à jour 10 juin 2026

paper-embedding-public-apis

★60

Collection of public APIs for embedding scientific papers

Langue inconnue

Mis à jour 7 juin 2026

ruletaker

★55

Aucune description fournie pour ce dépôt.

Python

Mis à jour 7 juin 2026

EMO

★42

Aucune description fournie pour ce dépôt.

HTML

Mis à jour 10 juin 2026

fermi

★37

Aucune description fournie pour ce dépôt.

Python

Mis à jour 3 juin 2026

artifact-linker

★36

ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery

Python

Mis à jour 10 juin 2026

c4-documentation

★33

Aucune description fournie pour ce dépôt.

Langue inconnue

Mis à jour 6 juin 2026

signal-and-noise

★30

Measuring the Signal to Noise Ratio in Language Model Evaluation

Python

Mis à jour 12 juin 2026

recoma

★30

Reasoning by Communicating with Agents

Python

Mis à jour 5 juin 2026

persona-bias

★29

Aucune description fournie pour ce dépôt.

Python

Mis à jour 9 juin 2026

natural-instructions-v1

★28

Benchmarking Generalization to New Tasks from Natural Language Instructions

Python

Mis à jour 11 juin 2026

grobid

★23

A machine learning software for extracting information from scholarly documents

Java

Mis à jour 12 juin 2026

rslearn_projects

★22

Aucune description fournie pour ce dépôt.

Python

Mis à jour 9 juin 2026

olmo-eval

★18

Aucune description fournie pour ce dépôt.

Python

Mis à jour 13 juin 2026

twentyquestions

★17

A web application for playing 20 Questions to crowdsource common sense. 🤖

Python

Mis à jour 7 juin 2026

asta-plugins

★16

Aucune description fournie pour ce dépôt.

Python

Mis à jour 12 juin 2026

MolmoPoint-GUISyn

★15

Synthetic GUI Pointing Data Generation

Python

Mis à jour 6 juin 2026

s6ui

★12

A fast AWS S3 browser, with inspiration from s5cmd

Rust

Mis à jour 5 juin 2026

layout-parser

★5

A Python Library for Document Layout Understanding

Python

Mis à jour 4 juin 2026

molmospaces-resources

★4

Resource manager for MolmoSpaces

Python

Mis à jour 11 juin 2026

skiff2-actions

★3

GitHub actions for skiff2 repositories.

TypeScript

Mis à jour 8 juin 2026

OlmoEarth-Feedback

★2

Repo for collection of feedback on OlmoEarth

Langue inconnue

Mis à jour 5 juin 2026

mujoco

★2

Aucune description fournie pour ce dépôt.

C++

Mis à jour 4 juin 2026

personalized-scholarqa-eval

★2

Evaluation code for the paper "Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users"

Python

Mis à jour 3 juin 2026

molmospaces_policy_zoo

★0

Policy zoo for data generation + evaluation in MolmoSpaces

Python

Mis à jour 12 juin 2026

fairseq

★0

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python

Mis à jour 3 juin 2026

Questions fréquemment posées

Que construit allenai sur GitHub ?

allenai développe divers projets, notamment des bibliothèques pour le traitement du langage naturel comme allennlp et des outils pour l'apprentissage automatique. Leur travail inclut également des projets innovants tels qu'OLMo et scispacy, contribuant à la recherche et au développement en intelligence artificielle.

Quels langages de programmation utilise allenai ?

allenai utilise principalement Python pour la plupart de ses projets, avec d'autres langages comme Scala, C#, et Rust. Cette diversité leur permet de créer des outils variés, adaptés à différentes applications en intelligence artificielle et en traitement de données.

Les dépôts d'allenai sont-ils publics ?

Oui, tous les dépôts d'allenai sur GitHub sont publics. Cela permet à la communauté de consulter, utiliser et contribuer à leurs projets, favorisant ainsi la collaboration et l'innovation dans le domaine de l'intelligence artificielle.

Cette exposition est-elle intentionnelle ?

Surveillez Ai2 avec RepoGuard et soyez alerté dès qu'un nouveau dépôt public apparaît.

Surveiller ce compte