refreshing…

Organization

Public GitHub footprint of Tongyi Lab, Alibaba Group

@Alibaba-NLP

View profile on GitHub

Our team at Tongyi Lab is dedicated to pioneer advancements in AI search technologies.

China

Public repositories

25,446

Total stars

1,656

Followers

Alibaba-NLP, part of Tongyi Lab at Alibaba Group, is actively contributing to the open-source community on GitHub. The organization focuses on AI search technologies, with primary repositories developed in Python, including notable projects like DeepResearch and ZeroSearch, which address advanced research and search capabilities in AI.

Top languages

Python 34

Public repositories

DeepResearch

★19,374

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python

Updated Jun 13, 2026

ZeroSearch

★1,291

ZeroSearch: Incentivize the Search Capability of LLMs without Searching

Python

Updated Jun 13, 2026

VRAG

★947

Multimodal Retrieval-augmented Generation Framework Built by Tongyi Lab, Alibaba Group.

Python

Updated Jun 12, 2026

ViDoRAG

★664

[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

Python

Updated Jun 11, 2026

OmniSearch

★430

Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Python

Updated Jun 11, 2026

ACE

★313

[ACL-IJCNLP 2021] Automated Concatenation of Embeddings for Structured Prediction

Python

Updated Jun 1, 2026

CHRONOS

★300

Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"

Python

Updated Jun 12, 2026

EcomGPT

★275

An Instruction-tuned Large Language Model for E-commerce

Python

Updated Jun 12, 2026

qqr

★254

qqr is an RL training framework for open-ended agents.

Python

Updated Jun 10, 2026

HiAGM

★230

Hierarchy-Aware Global Model for Hierarchical Text Classification

Python

Updated Jun 1, 2026

SeqGPT

★227

SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding

Python

Updated Jun 1, 2026

Multi-CPR

★206

[SIGIR 2022] Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval

Python

Updated Jun 1, 2026

KB-NER

★186

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

Python

Updated May 22, 2026

MaskSearch

★155

Repo for "MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability"

Python

Updated Jun 6, 2026

CLNER

★93

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

Python

Updated May 19, 2026

MultilangStructureKD

★74

[ACL 2020] Structure-Level Knowledge Distillation For Multilingual Sequence Labeling

Python

Updated Jun 1, 2026

E2Rank

★57

E2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker

Python

Updated Jun 10, 2026

LaRA

★51

The code for LaRA Benchmark

Python

Updated Jun 8, 2026

CoFE-RAG

★45

No description provided for this repository.

Python

Updated Jun 7, 2026

RankingGPT

★35

code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》

Python

Updated Apr 9, 2026

ProtoRE

★32

Code for 'Prototypical Representation Learning for Relation Extraction'.

Python

Updated Jun 1, 2026

MuVER

★32

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

Python

Updated Apr 9, 2026

AISHELL-NER

★25

[ICASSP 2022] AISHELL-NER: Named Entity Recognition from Chinese Speech

Unknown Language

Updated Jan 4, 2026

DAAT-CWS

★23

Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation

Python

Updated Jun 1, 2026

MANNER

★20

[ACL 2023] MANNER: A Variational Memory-Augmented Model for Cross Domain Few-Shot Named Entity Recognition

Python

Updated Jun 1, 2026

HLATR

★20

Hybrid List Aware Transformer Reranking

Unknown Language

Updated Apr 9, 2026

AIN

★20

Code for our EMNLP 2020 Paper "AIN: Fast and Accurate Sequence Labeling with Approximate Inference Network"

Python

Updated Apr 9, 2026

CDQA

★18

CDQA: Chinese Dynamic Question Answering Benchmark

Python

Updated Apr 9, 2026

EBM-Net

★14

Codes for the EMNLP'2020 paper "Predicting Clinical Trial Results by Implicit Evidence Integration".

Python

Updated Nov 27, 2024

StructuralKD

★11

[ACL-IJCNLP 2021] Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor

Python

Updated Jun 1, 2026

WebDetective

★7

A new evaluation paradigm for deep search that identifies specific LLM failure sources, introduces challenging hint-free datasets with holistic evaluation, and offers a strong baseline incorporating memory and verification.

Python

Updated Jun 1, 2026

Vec-RA-ODQA

★6

Source code of paper Improving "Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts

Python

Updated Jun 1, 2026

IBKD

★3

This is the official repository for the IBKD knowledge distillation method, as described in the paper .

Python

Updated Jun 1, 2026

MarCo-Dialog

★3

No description provided for this repository.

Python

Updated Mar 17, 2022

VLLM-KB

★2

[EMNLP 2025] Code for "Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference"

Python

Updated Apr 9, 2026

Key-Point-Analysis

★1

No description provided for this repository.

Python

Updated Aug 29, 2024

Gumbel-CRF

★1

Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs

Unknown Language

Updated Mar 24, 2024

Partially-Observed-TreeCRFs

★1

Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs

Unknown Language

Updated Feb 28, 2023

hilichurl

★0

No description provided for this repository.

Unknown Language

Updated Jan 13, 2026

Triaffine-nested-ner

★0

[ACL 2022 Findings] Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition

Unknown Language

Updated May 1, 2022

ICD-MSMN

★0

[ACL 2022] Code Synonyms Do Matter: Multiple Synonyms Matching Network for Automatic ICD Coding

Unknown Language

Updated Apr 29, 2022

Alibaba-TREC-PM

★0

Codes and data for Alibaba's winning systems at the TREC Precision Medicine Track 2020.

Unknown Language

Updated Aug 28, 2021

PoincareProbe

★0

Implementation of ICLR 21 paper: Probing BERT in Hyperbolic Spaces

Unknown Language

Updated Apr 7, 2021

Frequently asked questions

What does Alibaba-NLP build on GitHub?

Alibaba-NLP builds various tools and frameworks focused on AI search technologies. Key repositories include DeepResearch, which is an open-source deep research agent, and ZeroSearch, aimed at enhancing the search capabilities of large language models.

Which programming languages does Alibaba-NLP use?

Alibaba-NLP primarily uses Python for its development work. This language is prevalent across their public repositories, allowing for efficient implementation of their AI-driven projects and frameworks.

Are Alibaba-NLP's repositories public?

Yes, Alibaba-NLP's repositories are public on GitHub. This openness allows collaboration and engagement with the broader development community, fostering advancements in AI search technologies and other related fields.

Is this exposure intended?

Monitor Tongyi Lab, Alibaba Group with RepoGuard and get alerted the moment a new public repository appears.

Monitor this account