Natural Language Processing Lab at Tsinghua University
269
공개 저장소
83,916
총 별점
3,402
팔로워
THUNLP는 베이징의 칭화대학교에 위치한 자연어 처리 연구소로, GitHub에서 다양한 공개 리포지토리를 운영하고 있습니다. 주요 프로그래밍 언어로는 Python, C++, TeX, Java, JavaScript, HTML이 있으며, GNNPapers와 WantWords와 같은 주목할 만한 프로젝트를 포함하고 있습니다.
Must-read papers on graph neural networks (GNN)
An open-source online reverse dictionary.
An Open-Source Framework for Prompt-Learning.
An Open-Source Package for Neural Relation Extraction (NRE)
Must-read papers on prompt-based tuning for pre-trained language models.
An Open-Source Package for Knowledge Embedding (KE)
Must-read Papers on pre-trained language models.
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
Must-read papers on network representation learning (NRL) / network embedding (NE)
An Efficient Lexical Analyzer for Chinese
An Open-Source Package for Network Embedding (NE)
Must-read Papers on Textual Adversarial Attack and Defense
Must-read papers on knowledge representation learning (KRL) / knowledge embedding (KE)
Knowledge Graph Embeddings including TransE, TransH, TransR and PTransE
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"
THUOCL(THU Open Chinese Lexicon)中文词库
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
Must-read papers on neural relation extraction (NRE)
Open Chinese Language Pre-trained Model Zoo
이 저장소에 대한 설명이 제공되지 않았습니다.
Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"
Must-read papers on Machine Reading Comprehension
이 저장소에 대한 설명이 제공되지 않았습니다.
An Efficient Lexical Analyzer for Chinese
中文谣言数据
An Open-Source Package for Textual Adversarial Attack.
A Large-Scale Few-Shot Relation Extraction Dataset
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
Core Data of HowNet and OpenHowNet Python API
A LLM-based Agent that predict its tasks proactively.
An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow
Chinese AI & Law Challenge
Must-read Papers on Legal Intelligence
이 저장소에 대한 설명이 제공되지 않았습니다.
An Open-Source Package for Information Retrieval.
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
An Efficient implementation of TransE and its extended models for Knowledge Representation Learning
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
Code and data of ACL 2021 paper "Few-NERD: A Few-shot Named Entity Recognition Dataset"
이 저장소에 대한 설명이 제공되지 않았습니다.
The repo for Tsinghua summer course: Interdisciplinary Seminar on Big Models
Open Platform for Embodied Agents
An Efficient Lexical Analyzer for Chinese
Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"
Neural Sentiment Classification
Must-read Papers of Parameter-Efficient Tuning (Delta Tuning) Methods on Pre-trained Models.
[ICLR 2026 Blogpost Track Poster] JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Source code for "Packed Levitated Marker for Entity and Relation Extraction"
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
Improved Word Representation Learning with Sememes
Source code and checkpoints for legal pre-trained language models.
Code for Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention (AAAI18)
이 저장소에 대한 설명이 제공되지 않았습니다.
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
The source code of our COLING'18 paper "Few-Shot Charge Prediction with Discriminative Legal Attributes".
Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
이 저장소에 대한 설명이 제공되지 않았습니다.
Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"
Source code for ACL 2019 paper "GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification"
이 저장소에 대한 설명이 제공되지 않았습니다.
On Transferability of Prompt Tuning for Natural Language Processing
Source code for "A Deep-learning System Bridging Molecule Structure and Biomedical Text with Comprehension Comparable to Human Professionals"
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
[ACL'25 Main] ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".
Must-read Papers on Neural Information Retrieval
Max-margin DeepWalk
Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
Source code for EMNLP 2020 paper "Coreferential Reasoning Learning for Language Representation"
The official implementation of the paper: H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
이 저장소에 대한 설명이 제공되지 않았습니다.
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
Evaluate Multimodal LLMs as Embodied Agents
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
이 저장소에 대한 설명이 제공되지 않았습니다.
Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"
이 저장소에 대한 설명이 제공되지 않았습니다.
Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Papers on integrating large language models with embodied AI
Sequence-level 1F1B schedule for LLMs.
Code and models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
Code for EMNLP2020 paper "Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment".
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
Neuron Activation
ACL 2024: LoRA-Flow Dynamic LoRA Fusion for Large Language Models in Generative Tasks
이 저장소에 대한 설명이 제공되지 않았습니다.
Official implementation for the paper "KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs"
Learning to Generate STRUCTURED Output with Schema Reinforcement Learning
The official implementation of NOSA
이 저장소에 대한 설명이 제공되지 않았습니다.
[EMNLP 2025 Findings] ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation
A large-scale dataset of Chu bamboo slip scripts and a multi-granularity tokenizer for ancient Chinese scripts
Single-Shot Meta-Pruning (SMP) for attention heads of Transformers
Source code for paper "DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices".
이 저장소에 대한 설명이 제공되지 않았습니다.
이 저장소에 대한 설명이 제공되지 않았습니다.
thunlp는 자연어 처리와 관련된 다양한 프로젝트를 개발합니다. 주요 프로젝트로는 GNNPapers와 WantWords가 있으며, 이들은 모두 연구와 실용성에 중점을 두고 있습니다.
thunlp는 주로 Python, C++, TeX, Java, JavaScript 및 HTML과 같은 여러 프로그래밍 언어를 사용하여 리포지토리를 개발합니다. 이러한 언어들은 다양한 연구 프로젝트에 적합합니다.
네, thunlp의 모든 리포지토리는 공개되어 있습니다. 이를 통해 연구자들과 개발자들이 쉽게 접근하고 기여할 수 있도록 하고 있습니다.