Updated 9 h ago

Organization

Public GitHub footprint of Scrapy project

An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.

Public repositories

74,276

Total stars

808

Followers

The Scrapy project has a significant public presence on GitHub, showcasing a wide range of repositories primarily written in Python, HTML, C++, DIGITAL Command Language, and Shell. Notable repositories include Scrapy, a high-level web crawling and scraping framework, and Scrapyd, a service daemon for running Scrapy spiders.

Top languages

Python 22HTML 2C++ 2DIGITAL Command Language 1Shell 1

Public repositories

scrapy

★62,224

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python

Updated Jun 13, 2026

scrapyd

★3,094

A service daemon to run Scrapy spiders

Python

Updated Jun 13, 2026

scrapely

★1,888

A pure-python HTML screen-scraping library

HTML

Updated Jun 9, 2026

dirbot

★1,628

Scrapy project to scrape public web directories (educational) [DEPRECATED]

Python

Updated Jun 12, 2026

quotesbot

★1,357

This is a sample Scrapy project for educational purposes

Python

Updated Jun 8, 2026

parsel

★1,333

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Python

Updated Jun 11, 2026

scrapyd-client

★773

Command line client for Scrapyd server

Python

Updated Jun 3, 2026

w3lib

★419

Python library of web-related functions

Python

Updated Jun 10, 2026

cssselect

★309

CSS Selectors for Python

Python

Updated Jun 1, 2026

queuelib

★299

Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python

Python

Updated Jun 1, 2026

loginform

★279

Fill HTML login forms automatically

Python

Updated Mar 29, 2026

slybot

★224

No description provided for this repository.

Unknown Language

Updated Jun 12, 2026

protego

★88

A pure-Python robots.txt parser with support for modern conventions.

DIGITAL Command Language

Updated Jun 11, 2026

itemadapter

★70

Common interface for data container classes

Python

Updated Jun 1, 2026

scrapy.org

★66

The scrapy.org website (old code)

HTML

Updated Jun 3, 2026

itemloaders

★49

Library to populate items using XPath and CSS with a convenient API

Python

Updated Jun 2, 2026

booksbot

★42

A crawler for http://books.toscrape.com

Python

Updated Dec 8, 2025

scrapy-bench

★32

A CLI for benchmarking Scrapy.

Python

Updated Sep 15, 2025

scrapy-lint

★22

A linter for Scrapy projects.

Python

Updated Apr 15, 2026

scurl

★21

Performance-focused replacement for Python urllib

Python

Updated May 26, 2026

pypydispatcher

★16

A fork of http://pydispatcher.sourceforge.net/ with PyPy support

Python

Updated Jun 12, 2024

xtractmime

★13

https://mimesniff.spec.whatwg.org/ implementation for Python

Python

Updated Jun 10, 2026

base-chromium

★8

base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/

C++

Updated Mar 10, 2026

scrapy-itemloader

★7

[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API

Python

Updated Mar 10, 2026

form2request

★5

Python library to build HTTP requests out of HTML forms

Python

Updated Jun 12, 2026

url-chromium

★4

url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url

C++

Updated Mar 10, 2026

gsoc2014-integration-tests

★3

GSoC2014 - Scrapy Integration tests project

Shell

Updated Jul 6, 2017

scrapy-bench-speedcenter

★2

Codespeed for scrapy-bench

Python

Updated May 26, 2026

sphinx-scrapy

★1

Sphinx extension for documentation in the Scrapy ecosystem

Python

Updated Jun 11, 2026

Frequently asked questions

What does scrapy build on GitHub?

Scrapy builds a variety of tools and libraries on GitHub, including Scrapy for web crawling, Scrapyd for managing Scrapy spiders, and Parsel for data extraction. These projects are designed for web scraping and data collection.

Which programming languages does scrapy use?

The primary programming languages used by Scrapy on GitHub are Python and HTML, with additional contributions in C++, DIGITAL Command Language, and Shell. This diverse language use supports various functionalities in their projects.

Are scrapy's repositories public?

Yes, all of Scrapy's repositories on GitHub are public. This transparency allows users and developers to collaborate, contribute, and review the code, making it accessible for educational and practical use.

Is this exposure intended?

Monitor Scrapy project with RepoGuard and get alerted the moment a new public repository appears.

Monitor this account