Natural Language Processing Lab at Tsinghua University
269
Repositori publik
83.916
Total bintang
3.402
Pengikut
THUNLP adalah laboratorium Pemrosesan Bahasa Alami di Universitas Tsinghua yang memiliki keberadaan publik yang signifikan di GitHub. Dengan berbagai repositori menggunakan bahasa pemrograman seperti Python, C++, dan JavaScript, THUNLP menyediakan proyek-proyek penting seperti GNNPapers, OpenPrompt, dan WantWords yang banyak digunakan dalam penelitian dan pengembangan di bidang NLP.
Must-read papers on graph neural networks (GNN)
An open-source online reverse dictionary.
An Open-Source Framework for Prompt-Learning.
An Open-Source Package for Neural Relation Extraction (NRE)
Must-read papers on prompt-based tuning for pre-trained language models.
An Open-Source Package for Knowledge Embedding (KE)
Must-read Papers on pre-trained language models.
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
Must-read papers on network representation learning (NRL) / network embedding (NE)
An Efficient Lexical Analyzer for Chinese
An Open-Source Package for Network Embedding (NE)
Must-read Papers on Textual Adversarial Attack and Defense
Must-read papers on knowledge representation learning (KRL) / knowledge embedding (KE)
Knowledge Graph Embeddings including TransE, TransH, TransR and PTransE
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"
THUOCL(THU Open Chinese Lexicon)中文词库
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
Must-read papers on neural relation extraction (NRE)
Open Chinese Language Pre-trained Model Zoo
Tidak ada deskripsi yang diberikan untuk repositori ini.
Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"
Must-read papers on Machine Reading Comprehension
Tidak ada deskripsi yang diberikan untuk repositori ini.
An Efficient Lexical Analyzer for Chinese
中文谣言数据
An Open-Source Package for Textual Adversarial Attack.
A Large-Scale Few-Shot Relation Extraction Dataset
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
Core Data of HowNet and OpenHowNet Python API
A LLM-based Agent that predict its tasks proactively.
An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow
Chinese AI & Law Challenge
Must-read Papers on Legal Intelligence
Tidak ada deskripsi yang diberikan untuk repositori ini.
An Open-Source Package for Information Retrieval.
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
An Efficient implementation of TransE and its extended models for Knowledge Representation Learning
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
Code and data of ACL 2021 paper "Few-NERD: A Few-shot Named Entity Recognition Dataset"
Tidak ada deskripsi yang diberikan untuk repositori ini.
The repo for Tsinghua summer course: Interdisciplinary Seminar on Big Models
Open Platform for Embodied Agents
An Efficient Lexical Analyzer for Chinese
Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"
Neural Sentiment Classification
Must-read Papers of Parameter-Efficient Tuning (Delta Tuning) Methods on Pre-trained Models.
[ICLR 2026 Blogpost Track Poster] JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Source code for "Packed Levitated Marker for Entity and Relation Extraction"
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
Improved Word Representation Learning with Sememes
Source code and checkpoints for legal pre-trained language models.
Code for Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention (AAAI18)
Tidak ada deskripsi yang diberikan untuk repositori ini.
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
The source code of our COLING'18 paper "Few-Shot Charge Prediction with Discriminative Legal Attributes".
Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
Tidak ada deskripsi yang diberikan untuk repositori ini.
Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"
Source code for ACL 2019 paper "GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification"
Tidak ada deskripsi yang diberikan untuk repositori ini.
On Transferability of Prompt Tuning for Natural Language Processing
Source code for "A Deep-learning System Bridging Molecule Structure and Biomedical Text with Comprehension Comparable to Human Professionals"
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
[ACL'25 Main] ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".
Must-read Papers on Neural Information Retrieval
Max-margin DeepWalk
Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
Source code for EMNLP 2020 paper "Coreferential Reasoning Learning for Language Representation"
The official implementation of the paper: H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
Tidak ada deskripsi yang diberikan untuk repositori ini.
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
Evaluate Multimodal LLMs as Embodied Agents
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
Tidak ada deskripsi yang diberikan untuk repositori ini.
Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"
Tidak ada deskripsi yang diberikan untuk repositori ini.
Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Papers on integrating large language models with embodied AI
Sequence-level 1F1B schedule for LLMs.
Code and models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
Code for EMNLP2020 paper "Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment".
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
Neuron Activation
ACL 2024: LoRA-Flow Dynamic LoRA Fusion for Large Language Models in Generative Tasks
Tidak ada deskripsi yang diberikan untuk repositori ini.
Official implementation for the paper "KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs"
Learning to Generate STRUCTURED Output with Schema Reinforcement Learning
The official implementation of NOSA
Tidak ada deskripsi yang diberikan untuk repositori ini.
[EMNLP 2025 Findings] ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation
A large-scale dataset of Chu bamboo slip scripts and a multi-granularity tokenizer for ancient Chinese scripts
Single-Shot Meta-Pruning (SMP) for attention heads of Transformers
Source code for paper "DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices".
Tidak ada deskripsi yang diberikan untuk repositori ini.
Tidak ada deskripsi yang diberikan untuk repositori ini.
THUNLP membangun berbagai proyek yang berfokus pada pemrosesan bahasa alami, termasuk repositori seperti OpenPrompt untuk pembelajaran prompt dan GNNPapers yang mengumpulkan makalah penting tentang jaringan saraf graf.
THUNLP menggunakan beberapa bahasa pemrograman dalam pengembangan repositorinya, dengan fokus utama pada Python, C++, TeX, Java, dan JavaScript. Ini mencerminkan keragaman dalam pendekatan pengembangan dan penelitian.
Ya, semua repositori yang dikelola oleh THUNLP di GitHub bersifat publik. Ini memungkinkan akses terbuka bagi peneliti dan pengembang untuk memanfaatkan dan berkontribusi pada proyek-proyek yang ada.
Pantau THUNLP dengan RepoGuard dan dapatkan pemberitahuan saat repositori publik baru muncul.
Pantau akun ini