Natural Language Processing Lab at Tsinghua University
269
Dépôts publics
83 916
Total des étoiles
3 402
Abonnés
THUNLP, le laboratoire de traitement du langage naturel de l'Université Tsinghua, possède une présence significative sur GitHub. Avec une large gamme de dépôts publics, il utilise principalement des langages comme Python, C++, TeX et Java. Des projets notables incluent GNNPapers et OpenPrompt, qui sont largement utilisés dans le domaine du traitement du langage naturel.
Must-read papers on graph neural networks (GNN)
An open-source online reverse dictionary.
An Open-Source Framework for Prompt-Learning.
An Open-Source Package for Neural Relation Extraction (NRE)
Must-read papers on prompt-based tuning for pre-trained language models.
An Open-Source Package for Knowledge Embedding (KE)
Must-read Papers on pre-trained language models.
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
Must-read papers on network representation learning (NRL) / network embedding (NE)
An Efficient Lexical Analyzer for Chinese
An Open-Source Package for Network Embedding (NE)
Must-read Papers on Textual Adversarial Attack and Defense
Must-read papers on knowledge representation learning (KRL) / knowledge embedding (KE)
Knowledge Graph Embeddings including TransE, TransH, TransR and PTransE
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"
THUOCL(THU Open Chinese Lexicon)中文词库
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
Must-read papers on neural relation extraction (NRE)
Open Chinese Language Pre-trained Model Zoo
Aucune description fournie pour ce dépôt.
Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"
Must-read papers on Machine Reading Comprehension
Aucune description fournie pour ce dépôt.
An Efficient Lexical Analyzer for Chinese
中文谣言数据
An Open-Source Package for Textual Adversarial Attack.
A Large-Scale Few-Shot Relation Extraction Dataset
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
Core Data of HowNet and OpenHowNet Python API
A LLM-based Agent that predict its tasks proactively.
An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow
Chinese AI & Law Challenge
Must-read Papers on Legal Intelligence
Aucune description fournie pour ce dépôt.
An Open-Source Package for Information Retrieval.
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
An Efficient implementation of TransE and its extended models for Knowledge Representation Learning
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
Code and data of ACL 2021 paper "Few-NERD: A Few-shot Named Entity Recognition Dataset"
Aucune description fournie pour ce dépôt.
The repo for Tsinghua summer course: Interdisciplinary Seminar on Big Models
Open Platform for Embodied Agents
An Efficient Lexical Analyzer for Chinese
Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"
Neural Sentiment Classification
Must-read Papers of Parameter-Efficient Tuning (Delta Tuning) Methods on Pre-trained Models.
[ICLR 2026 Blogpost Track Poster] JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Source code for "Packed Levitated Marker for Entity and Relation Extraction"
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
Improved Word Representation Learning with Sememes
Source code and checkpoints for legal pre-trained language models.
Code for Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention (AAAI18)
Aucune description fournie pour ce dépôt.
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
The source code of our COLING'18 paper "Few-Shot Charge Prediction with Discriminative Legal Attributes".
Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
Aucune description fournie pour ce dépôt.
Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"
Source code for ACL 2019 paper "GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification"
Aucune description fournie pour ce dépôt.
On Transferability of Prompt Tuning for Natural Language Processing
Source code for "A Deep-learning System Bridging Molecule Structure and Biomedical Text with Comprehension Comparable to Human Professionals"
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
[ACL'25 Main] ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".
Must-read Papers on Neural Information Retrieval
Max-margin DeepWalk
Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
Source code for EMNLP 2020 paper "Coreferential Reasoning Learning for Language Representation"
The official implementation of the paper: H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
Aucune description fournie pour ce dépôt.
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
Evaluate Multimodal LLMs as Embodied Agents
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
Aucune description fournie pour ce dépôt.
Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"
Aucune description fournie pour ce dépôt.
Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Papers on integrating large language models with embodied AI
Sequence-level 1F1B schedule for LLMs.
Code and models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
Code for EMNLP2020 paper "Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment".
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
Neuron Activation
ACL 2024: LoRA-Flow Dynamic LoRA Fusion for Large Language Models in Generative Tasks
Aucune description fournie pour ce dépôt.
Official implementation for the paper "KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs"
Learning to Generate STRUCTURED Output with Schema Reinforcement Learning
The official implementation of NOSA
Aucune description fournie pour ce dépôt.
[EMNLP 2025 Findings] ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation
A large-scale dataset of Chu bamboo slip scripts and a multi-granularity tokenizer for ancient Chinese scripts
Single-Shot Meta-Pruning (SMP) for attention heads of Transformers
Source code for paper "DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices".
Aucune description fournie pour ce dépôt.
Aucune description fournie pour ce dépôt.
THUNLP développe plusieurs projets liés au traitement du langage naturel, incluant des outils comme OpenPrompt pour l'apprentissage par invites et OpenNRE pour l'extraction de relations. Ces dépôts sont essentiels pour la recherche et le développement dans ce domaine.
THUNLP utilise principalement Python, C++, TeX, Java, JavaScript et HTML pour ses projets sur GitHub. Ces langages permettent de couvrir un large éventail d'applications en traitement du langage naturel.
Oui, tous les dépôts de THUNLP sont publics. Cela permet à la communauté de recherche d'accéder à leurs travaux, favorisant la collaboration et le partage des connaissances dans le domaine du traitement du langage naturel.
Surveillez THUNLP avec RepoGuard et soyez alerté dès qu'un nouveau dépôt public apparaît.
Surveiller ce compte