Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from easy questions to hard

Python

更新済み 2026年6月11日

steering-llama3

★30

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

tokengrams

★27

Efficiently computing & storing token n-grams from large corpora

Rust

更新済み 2026年6月11日

training-jacobian

★24

このリポジトリに関する説明は提供されていません。

Jupyter Notebook

更新済み 2026年6月11日

w2s

★24

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

deep-ignorance

★19

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

polyglot-data

★19

data related codebase for polyglot project

Python

更新済み 2026年6月11日

pile_dedupe

★18

Pile Deduplication Code

Python

更新済み 2026年6月11日

latent-video-diffusion

★16

Latent video diffusion

Python

更新済み 2026年6月11日

NeMo

★16

NeMo: a toolkit for conversational AI

Python

更新済み 2026年6月11日

attribute

★15

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

exploring-contrastive-topology

★15

このリポジトリに関する説明は提供されていません。

Jupyter Notebook

更新済み 2026年6月11日

polyapprox

★13

Closed-form polynomial approximations to neural networks

Python

更新済み 2026年6月11日

pilev2

★13

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

lm_dataformat

★11

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

transformer-reasoning

★10

Experiments in transformer knowledge and reasoning

Jupyter Notebook

更新済み 2026年6月11日

architecture-objective

★10

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

attention-probes

★8

Linear probes with attention weighting

Python

更新済み 2026年6月11日

equinox-llama

★8

Equinox implementation of llama3 and llama3.1

Python

更新済み 2026年6月11日

GPTeacher

★8

A collection of modular datasets generated by GPT-4, General-Instruct - Roleplay-Instruct - Code-Instruct - and Toolformer

不明な言語

更新済み 2026年6月11日

minetest-baselines

★8

Baseline agents for Minetest tasks.

Python

更新済み 2026年6月11日

aria-utils

★6

MIDI tokenizers and pre-processing utils.

Python

更新済み 2026年6月11日

cupbearer

★6

A library for mechanistic anomaly detection

Jupyter Notebook

更新済み 2026年6月11日

weak-to-strong

★6

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

trlx

★6

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Python

更新済み 2026年6月11日

minetest-interpretabilty-notebook

★6

Jupyter notebook for the interpretablity section of the minetester blog post

Jupyter Notebook

更新済み 2026年6月11日

CodeCARP

★6

Data collection pipeline for CodeCARP. Includes PyCharm plugins.

不明な言語

更新済み 2026年6月11日

clearnets

★5

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

optax-galore

★5

Adds GaLore style projection wrappers to optax optimizers

Python

更新済み 2026年6月11日

architecture-experiments

★5

Repository to host architecture experiments and development using Paxml and Praxis

Python

更新済み 2026年6月11日

FLAN

★5

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

thonkenizers

★5

yes

不明な言語

更新済み 2026年6月11日

scalable-elicitation

★4

The code used in "Balancing Label Quantity and Quality for Scalable Elicitation"

Jupyter Notebook

更新済み 2026年6月11日

monkfish

★4

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

alignment-handbook

★4

Robust recipes for to align language models with human and AI preferences

不明な言語

更新済み 2026年6月11日

Unpaired-Image-Generation

★4

Project Repo for Unpaired Image Generation project

不明な言語

更新済み 2026年6月11日

lm-scope

★4

このリポジトリに関する説明は提供されていません。

Jupyter Notebook

更新済み 2026年6月11日

sae_overlap

★3

Acompanying code for our research on SAE feature overlap when trained on different seeds.

Jupyter Notebook

更新済み 2026年6月11日

variance-across-time

★3

Studying the variance in neural net predictions across training time

Python

更新済み 2026年6月11日

EvilModel

★3

A replication of "EvilModel 2.0: Bringing Neural Network Models into Malware Attacks"

不明な言語

更新済み 2026年6月11日

eai-prompt-gallery

★3

Library of interesting prompt generations

JavaScript

更新済み 2026年6月11日

gamescope

★2

Can interpretability methods confer an advantage in competitive games?

Python

更新済み 2026年6月11日

fmri

★2

Analogue of fMRI on artificial neural networks

不明な言語

更新済み 2026年6月11日

rtopk

★2

https://github.com/xiexi51/RTopK PyTorch wrapper

Cuda

更新済み 2026年6月11日

pd-books

★2

このリポジトリに関する説明は提供されていません。

Jupyter Notebook

更新済み 2026年6月11日

tuned-lens

★2

Tools for understanding how transformer predictions are built layer-by-layer

Python

更新済み 2026年6月11日

tinydpo

★2

このリポジトリに関する説明は提供されていません。

不明な言語

更新済み 2026年6月11日

eleutherai-instruct-dataset

★2

A large instruct dataset for open-source models (WIP).

不明な言語

更新済み 2026年6月11日

examples

★2

Mosaicml example benchmarks + LLM scripts

Python

更新済み 2026年6月11日

minetest_game

★2

Minetest Game - The default game for the Minetest engine [https://github.com/minetest/minetest/]

不明な言語

更新済み 2026年6月11日

groupoid-rl

★2

このリポジトリに関する説明は提供されていません。

Jupyter Notebook

更新済み 2026年6月11日

truffaldino

★1

Investigating goal instability in RL

Python

更新済み 2026年6月11日

rllm

★1

Democratizing Reinforcement Learning for LLMs

Jupyter Notebook

更新済み 2026年6月11日

bayesian-adam

★1

Exactly what it says on the tin

Python

更新済み 2026年6月11日

RWKV-LM

★1

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Python

更新済み 2026年6月11日

conceptual-constraints

★1

Applying LEACE to models during training

Jupyter Notebook

更新済み 2026年6月11日

aria.cpp

★1

GGML implementation of https://github.com/EleutherAI/aria

CMake

更新済み 2026年6月11日

classifier-latent-diffusion

★1

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

language-adaptation

★1

このリポジトリに関する説明は提供されていません。

不明な言語

更新済み 2026年6月11日

maxtext

★1

A simple, performant and scalable Jax LLM!

不明な言語

更新済み 2026年6月11日

irrlicht

★1

Minetest's fork of Irrlicht

C++

更新済み 2026年6月11日

lm-evaulation-ui

★1

App for generating html table from LM evaluation JSONs

JavaScript

更新済み 2026年6月11日

gradient-routing

★0

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

rh-indicators

★0

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

hackable-bergson

★0

Simplified library for mapping out the "memory" of neural nets with data attribution

不明な言語

更新済み 2026年6月11日

vllm

★0

A high-throughput and memory-efficient inference and serving engine for LLMs

不明な言語

更新済み 2026年6月11日

verifiers

★0

Verifiers for LLM Reinforcement Learning

Python

更新済み 2026年6月11日

wmdp

★0

WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining general capabilities.

Jupyter Notebook

更新済み 2026年6月11日

Megatron-LM

★0

Ongoing research training transformer models at scale

不明な言語

更新済み 2026年6月11日

mixture-of-depths

★0

このリポジトリに関する説明は提供されていません。

不明な言語

更新済み 2026年6月11日

llm-score-behavior

★0

このリポジトリに関する説明は提供されていません。

Python

更新済み 2026年6月11日

TransformerEngine

★0

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Python

更新済み 2026年6月11日