Natural Language Processing Lab at Tsinghua University
269
सार्वजनिक रिपोजिटरी
83,916
कुल सितारे
3,402
अनुयायी
THUNLP, Tsinghua University का Natural Language Processing Lab है, जिसका सार्वजनिक GitHub प्रोफ़ाइल एक व्यापक संग्रह प्रस्तुत करता है। इसमें Python, C++, TeX, Java, JavaScript और HTML जैसी प्रमुख भाषाओं का उपयोग किया गया है। THUNLP के शीर्ष प्रोजेक्ट्स में GNNPapers, WantWords, और OpenPrompt शामिल हैं, जो उनके अनुसंधान और विकास प्रयासों को दर्शाते हैं।
Must-read papers on graph neural networks (GNN)
An open-source online reverse dictionary.
An Open-Source Framework for Prompt-Learning.
An Open-Source Package for Neural Relation Extraction (NRE)
Must-read papers on prompt-based tuning for pre-trained language models.
An Open-Source Package for Knowledge Embedding (KE)
Must-read Papers on pre-trained language models.
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
Must-read papers on network representation learning (NRL) / network embedding (NE)
An Efficient Lexical Analyzer for Chinese
An Open-Source Package for Network Embedding (NE)
Must-read Papers on Textual Adversarial Attack and Defense
Must-read papers on knowledge representation learning (KRL) / knowledge embedding (KE)
Knowledge Graph Embeddings including TransE, TransH, TransR and PTransE
Source code and dataset for ACL 2019 paper "ERNIE: Enhanced Language Representation with Informative Entities"
THUOCL(THU Open Chinese Lexicon)中文词库
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
Must-read papers on neural relation extraction (NRE)
Open Chinese Language Pre-trained Model Zoo
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
Official codes for ACL 2023 paper "WebCPM: Interactive Web Search for Chinese Long-form Question Answering"
Must-read papers on Machine Reading Comprehension
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
An Efficient Lexical Analyzer for Chinese
中文谣言数据
An Open-Source Package for Textual Adversarial Attack.
A Large-Scale Few-Shot Relation Extraction Dataset
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
Core Data of HowNet and OpenHowNet Python API
A LLM-based Agent that predict its tasks proactively.
An implementation of TransE and its extended models for Knowledge Representation Learning on TensorFlow
Chinese AI & Law Challenge
Must-read Papers on Legal Intelligence
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
An Open-Source Package for Information Retrieval.
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
An Efficient implementation of TransE and its extended models for Knowledge Representation Learning
The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"
Code and data of ACL 2021 paper "Few-NERD: A Few-shot Named Entity Recognition Dataset"
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
The repo for Tsinghua summer course: Interdisciplinary Seminar on Big Models
Open Platform for Embodied Agents
An Efficient Lexical Analyzer for Chinese
Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"
Neural Sentiment Classification
Must-read Papers of Parameter-Efficient Tuning (Delta Tuning) Methods on Pre-trained Models.
[ICLR 2026 Blogpost Track Poster] JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Source code for "Packed Levitated Marker for Entity and Relation Extraction"
An open-source toolkit for textual backdoor attack and defense (NeurIPS 2022 D&B, Spotlight)
Improved Word Representation Learning with Sememes
Source code and checkpoints for legal pre-trained language models.
Code for Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention (AAAI18)
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
The source code of our COLING'18 paper "Few-Shot Charge Prediction with Discriminative Legal Attributes".
Source code and dataset for ACL2022 Findings Paper "LEVEN: A Large-Scale Chinese Legal Event Detection dataset"
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
Code and data of the AAAI-20 paper "Multi-channel Reverse Dictionary Model"
Source code for ACL 2019 paper "GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification"
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
On Transferability of Prompt Tuning for Natural Language Processing
Source code for "A Deep-learning System Bridging Molecule Structure and Biomedical Text with Comprehension Comparable to Human Professionals"
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
[ACL'25 Main] ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversarial NLP".
Must-read Papers on Neural Information Retrieval
Max-margin DeepWalk
Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
Source code for EMNLP 2020 paper "Coreferential Reasoning Learning for Language Representation"
The official implementation of the paper: H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
Evaluate Multimodal LLMs as Embodied Agents
[ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
Code and data of the ACL-IJCNLP 2021 paper "Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger"
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
Source codes of Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
Papers on integrating large language models with embodied AI
Sequence-level 1F1B schedule for LLMs.
Code and models for the paper: Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
Code for EMNLP2020 paper "Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment".
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
Neuron Activation
ACL 2024: LoRA-Flow Dynamic LoRA Fusion for Large Language Models in Generative Tasks
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
Official implementation for the paper "KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs"
Learning to Generate STRUCTURED Output with Schema Reinforcement Learning
The official implementation of NOSA
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
[EMNLP 2025 Findings] ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation
A large-scale dataset of Chu bamboo slip scripts and a multi-granularity tokenizer for ancient Chinese scripts
Single-Shot Meta-Pruning (SMP) for attention heads of Transformers
Source code for paper "DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices".
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
इस रिपोजिटरी के लिए कोई विवरण प्रदान नहीं किया गया।
thunlp GitHub पर प्राकृतिक भाषा प्रसंस्करण से संबंधित कई परियोजनाएँ विकसित करता है, जिनमें GNNPapers और OpenPrompt जैसे प्रमुख प्रोजेक्ट शामिल हैं। ये प्रोजेक्ट अनुसंधान और ओपन-सोर्स विकास के लिए महत्वपूर्ण हैं।
thunlp के प्रोजेक्ट्स में मुख्य रूप से Python, C++, TeX, Java, JavaScript और HTML जैसी प्रोग्रामिंग भाषाएँ शामिल हैं। ये भाषाएँ उनके अनुसंधान कार्य और टूल्स के विकास में महत्वपूर्ण भूमिका निभाती हैं।
हाँ, thunlp के सभी रिपॉजिटरी सार्वजनिक हैं। ये रिपॉजिटरी ओपन-सोर्स हैं और किसी भी व्यक्ति द्वारा उपयोग और योगदान के लिए उपलब्ध हैं, जिससे ज्ञान और नवाचार को बढ़ावा मिलता है।
RepoGuard के साथ THUNLP की निगरानी करें और जैसे ही एक नया सार्वजनिक रिपोजिटरी बनता है, सूचित हों।
इस खाते की निगरानी करें