RepoGuard
Updated 9 h ago
Scrapy project

Organization

Public GitHub footprint of Scrapy project

@scrapy
View profile on GitHub

An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.

29

Public repositories

74,276

Total stars

808

Followers

The Scrapy project has a significant public presence on GitHub, showcasing a wide range of repositories primarily written in Python, HTML, C++, DIGITAL Command Language, and Shell. Notable repositories include Scrapy, a high-level web crawling and scraping framework, and Scrapyd, a service daemon for running Scrapy spiders.

Top languages

Python 22HTML 2C++ 2DIGITAL Command Language 1Shell 1

Public repositories

scrapy

62,224

Scrapy, a fast high-level web crawling & scraping framework for Python.

Python
Updated Jun 13, 2026

scrapyd

3,094

A service daemon to run Scrapy spiders

Python
Updated Jun 13, 2026

scrapely

1,888

A pure-python HTML screen-scraping library

HTML
Updated Jun 9, 2026

dirbot

1,628

Scrapy project to scrape public web directories (educational) [DEPRECATED]

Python
Updated Jun 12, 2026

quotesbot

1,357

This is a sample Scrapy project for educational purposes

Python
Updated Jun 8, 2026

parsel

1,333

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Python
Updated Jun 11, 2026

scrapyd-client

773

Command line client for Scrapyd server

Python
Updated Jun 3, 2026

w3lib

419

Python library of web-related functions

Python
Updated Jun 10, 2026

cssselect

309

CSS Selectors for Python

Python
Updated Jun 1, 2026

queuelib

299

Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python

Python
Updated Jun 1, 2026

loginform

279

Fill HTML login forms automatically

Python
Updated Mar 29, 2026

slybot

224

No description provided for this repository.

Unknown Language
Updated Jun 12, 2026

protego

88

A pure-Python robots.txt parser with support for modern conventions.

DIGITAL Command Language
Updated Jun 11, 2026

itemadapter

70

Common interface for data container classes

Python
Updated Jun 1, 2026

scrapy.org

66

The scrapy.org website (old code)

HTML
Updated Jun 3, 2026

itemloaders

49

Library to populate items using XPath and CSS with a convenient API

Python
Updated Jun 2, 2026

booksbot

42

A crawler for http://books.toscrape.com

Python
Updated Dec 8, 2025

scrapy-bench

32

A CLI for benchmarking Scrapy.

Python
Updated Sep 15, 2025

scrapy-lint

22

A linter for Scrapy projects.

Python
Updated Apr 15, 2026

scurl

21

Performance-focused replacement for Python urllib

Python
Updated May 26, 2026

pypydispatcher

16

A fork of http://pydispatcher.sourceforge.net/ with PyPy support

Python
Updated Jun 12, 2024

xtractmime

13

https://mimesniff.spec.whatwg.org/ implementation for Python

Python
Updated Jun 10, 2026

base-chromium

8

base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/

C++
Updated Mar 10, 2026

scrapy-itemloader

7

[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API

Python
Updated Mar 10, 2026

form2request

5

Python library to build HTTP requests out of HTML forms

Python
Updated Jun 12, 2026

url-chromium

4

url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url

C++
Updated Mar 10, 2026

gsoc2014-integration-tests

3

GSoC2014 - Scrapy Integration tests project

Shell
Updated Jul 6, 2017

scrapy-bench-speedcenter

2

Codespeed for scrapy-bench

Python
Updated May 26, 2026

sphinx-scrapy

1

Sphinx extension for documentation in the Scrapy ecosystem

Python
Updated Jun 11, 2026

Frequently asked questions

What does scrapy build on GitHub?

Scrapy builds a variety of tools and libraries on GitHub, including Scrapy for web crawling, Scrapyd for managing Scrapy spiders, and Parsel for data extraction. These projects are designed for web scraping and data collection.

Which programming languages does scrapy use?

The primary programming languages used by Scrapy on GitHub are Python and HTML, with additional contributions in C++, DIGITAL Command Language, and Shell. This diverse language use supports various functionalities in their projects.

Are scrapy's repositories public?

Yes, all of Scrapy's repositories on GitHub are public. This transparency allows users and developers to collaborate, contribute, and review the code, making it accessible for educational and practical use.

Is this exposure intended?

Monitor Scrapy project with RepoGuard and get alerted the moment a new public repository appears.

Monitor this account