An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
29
Public repositories
74,276
Total stars
808
Followers
The Scrapy project has a significant public presence on GitHub, showcasing a wide range of repositories primarily written in Python, HTML, C++, DIGITAL Command Language, and Shell. Notable repositories include Scrapy, a high-level web crawling and scraping framework, and Scrapyd, a service daemon for running Scrapy spiders.
Scrapy, a fast high-level web crawling & scraping framework for Python.
A service daemon to run Scrapy spiders
A pure-python HTML screen-scraping library
Scrapy project to scrape public web directories (educational) [DEPRECATED]
This is a sample Scrapy project for educational purposes
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Command line client for Scrapyd server
Python library of web-related functions
CSS Selectors for Python
Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python
Fill HTML login forms automatically
No description provided for this repository.
A pure-Python robots.txt parser with support for modern conventions.
Common interface for data container classes
The scrapy.org website (old code)
Library to populate items using XPath and CSS with a convenient API
A crawler for http://books.toscrape.com
A CLI for benchmarking Scrapy.
A linter for Scrapy projects.
Performance-focused replacement for Python urllib
A fork of http://pydispatcher.sourceforge.net/ with PyPy support
https://mimesniff.spec.whatwg.org/ implementation for Python
base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/
[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API
Python library to build HTTP requests out of HTML forms
url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url
GSoC2014 - Scrapy Integration tests project
Codespeed for scrapy-bench
Sphinx extension for documentation in the Scrapy ecosystem
Scrapy builds a variety of tools and libraries on GitHub, including Scrapy for web crawling, Scrapyd for managing Scrapy spiders, and Parsel for data extraction. These projects are designed for web scraping and data collection.
The primary programming languages used by Scrapy on GitHub are Python and HTML, with additional contributions in C++, DIGITAL Command Language, and Shell. This diverse language use supports various functionalities in their projects.
Yes, all of Scrapy's repositories on GitHub are public. This transparency allows users and developers to collaborate, contribute, and review the code, making it accessible for educational and practical use.
Monitor Scrapy project with RepoGuard and get alerted the moment a new public repository appears.
Monitor this account