Natural Language Processing Lab at Tsinghua University
269
Repositorios públicos
83.923
Total de estrellas
3402
Seguidores
THUNLP es un laboratorio de procesamiento de lenguaje natural en la Universidad de Tsinghua, con una presencia significativa en GitHub. Sus repositorios públicos abarcan una amplia gama de proyectos en Python, C++, TeX y Java, incluyendo herramientas como GNNPapers y OpenPrompt, que son ampliamente utilizados por la comunidad de investigación.
Must-read papers on graph neural networks (GNN)
An open-source online reverse dictionary.
An Open-Source Framework for Prompt-Learning.
An Open-Source Package for Neural Relation Extraction (NRE)
Must-read papers on prompt-based tuning for pre-trained language models.
An Open-Source Package for Knowledge Embedding (KE)
Must-read Papers on pre-trained language models.
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
Must-read papers on network representation learning (NRL) / network embedding (NE)
An Efficient Lexical Analyzer for Chinese
An Open-Source Package for Network Embedding (NE)
Must-read Papers on Textual Adversarial Attack and Defense
Must-read papers on knowledge representation learning (KRL) / knowledge embedding (KE)
Knowledge Graph Embeddings including TransE, TransH, TransR and PTransE
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"
THUOCL(THU Open Chinese Lexicon)中文词库
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
Must-read papers on neural relation extraction (NRE)
Open Chinese Language Pre-trained Model Zoo
No se proporcionó descripción para este repositorio.
Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"
Must-read papers on Machine Reading Comprehension
No se proporcionó descripción para este repositorio.
An Efficient Lexical Analyzer for Chinese
中文谣言数据
An Open-Source Package for Textual Adversarial Attack.
A Large-Scale Few-Shot Relation Extraction Dataset
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
Core Data of HowNet and OpenHowNet Python API
A LLM-based Agent that predict its tasks proactively.
An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow
Chinese AI & Law Challenge
Must-read Papers on Legal Intelligence
No se proporcionó descripción para este repositorio.
An Open-Source Package for Information Retrieval.
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
An Efficient implementation of TransE and its extended models for Knowledge Representation Learning
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
Code and data of ACL 2021 paper "Few-NERD: A Few-shot Named Entity Recognition Dataset"
No se proporcionó descripción para este repositorio.
The repo for Tsinghua summer course: Interdisciplinary Seminar on Big Models
Open Platform for Embodied Agents
An Efficient Lexical Analyzer for Chinese
Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"
Neural Sentiment Classification
Must-read Papers of Parameter-Efficient Tuning (Delta Tuning) Methods on Pre-trained Models.
[ICLR 2026 Blogpost Track Poster] JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Source code for "Packed Levitated Marker for Entity and Relation Extraction"
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
Improved Word Representation Learning with Sememes
Source code and checkpoints for legal pre-trained language models.
Code for Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention (AAAI18)
No se proporcionó descripción para este repositorio.
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
The source code of our COLING'18 paper "Few-Shot Charge Prediction with Discriminative Legal Attributes".
Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
No se proporcionó descripción para este repositorio.
Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"
Source code for ACL 2019 paper "GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification"
No se proporcionó descripción para este repositorio.
On Transferability of Prompt Tuning for Natural Language Processing
Source code for "A Deep-learning System Bridging Molecule Structure and Biomedical Text with Comprehension Comparable to Human Professionals"
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
[ACL'25 Main] ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".
Must-read Papers on Neural Information Retrieval
Max-margin DeepWalk
Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
Source code for EMNLP 2020 paper "Coreferential Reasoning Learning for Language Representation"
The official implementation of the paper: H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
No se proporcionó descripción para este repositorio.
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
Evaluate Multimodal LLMs as Embodied Agents
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
No se proporcionó descripción para este repositorio.
Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"
No se proporcionó descripción para este repositorio.
Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Papers on integrating large language models with embodied AI
Sequence-level 1F1B schedule for LLMs.
Code and models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
Code for EMNLP2020 paper "Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment".
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
Neuron Activation
ACL 2024: LoRA-Flow Dynamic LoRA Fusion for Large Language Models in Generative Tasks
No se proporcionó descripción para este repositorio.
Official implementation for the paper "KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs"
Learning to Generate STRUCTURED Output with Schema Reinforcement Learning
The official implementation of NOSA
No se proporcionó descripción para este repositorio.
[EMNLP 2025 Findings] ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation
A large-scale dataset of Chu bamboo slip scripts and a multi-granularity tokenizer for ancient Chinese scripts
Single-Shot Meta-Pruning (SMP) for attention heads of Transformers
Source code for paper "DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices".
No se proporcionó descripción para este repositorio.
No se proporcionó descripción para este repositorio.
THUNLP desarrolla diversas herramientas relacionadas con el procesamiento de lenguaje natural. Sus proyectos incluyen GNNPapers, que recopila artículos sobre redes neuronales de grafos, y OpenPrompt, un marco de código abierto para aprendizaje de prompts.
THUNLP utiliza principalmente Python, C++, TeX, Java, JavaScript y HTML en sus proyectos. Esto les permite abordar problemas complejos en el campo del procesamiento de lenguaje natural y desarrollar soluciones efectivas.
Sí, todos los repositorios de THUNLP son públicos en GitHub. Esto permite a investigadores y desarrolladores acceder a sus herramientas y colaborar en proyectos relacionados con el procesamiento de lenguaje natural.
Monitorea a THUNLP con RepoGuard y recibe alertas en el momento en que aparece un nuevo repositorio público.
Monitorea esta cuenta