An open source and collaborative framework for extracting the data you need from websites. In a fast, simple, yet extensible way.
29
公共仓库
74,276
总星标
808
关注者
Scrapy是一个开源的协作框架,专门用于从网站中提取所需数据。其在GitHub上的公共存在包括使用Python、HTML和C++等多种编程语言的多个广泛使用的项目,如Scrapy、scrapyd和scrapely等。这些项目在数据抓取和网页爬虫领域具有重要影响。
Scrapy, a fast high-level web crawling & scraping framework for Python.
A service daemon to run Scrapy spiders
A pure-python HTML screen-scraping library
Scrapy project to scrape public web directories (educational) [DEPRECATED]
This is a sample Scrapy project for educational purposes
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Command line client for Scrapyd server
Python library of web-related functions
CSS Selectors for Python
Collection of persistent (disk-based) and non-persistent (memory-based) queues for Python
Fill HTML login forms automatically
此仓库未提供描述。
A pure-Python robots.txt parser with support for modern conventions.
Common interface for data container classes
The scrapy.org website (old code)
Library to populate items using XPath and CSS with a convenient API
A crawler for http://books.toscrape.com
A CLI for benchmarking Scrapy.
A linter for Scrapy projects.
Performance-focused replacement for Python urllib
A fork of http://pydispatcher.sourceforge.net/ with PyPy support
https://mimesniff.spec.whatwg.org/ implementation for Python
base component forked from Chromium source https://chromium.googlesource.com/chromium/src/base/
[Archived] Library to populate Scrapy items using XPath and CSS with a convenient API
Python library to build HTTP requests out of HTML forms
url component from Chromium source code, forked from https://chromium.googlesource.com/chromium/src/url
GSoC2014 - Scrapy Integration tests project
Codespeed for scrapy-bench
Sphinx extension for documentation in the Scrapy ecosystem
Scrapy在GitHub上构建了一系列与网页抓取和数据提取相关的项目,主要包括Scrapy框架、scrapyd服务守护进程和scrapely HTML抓取库等。
Scrapy的主要编程语言包括Python、HTML、C++、DIGITAL Command Language和Shell。Python是其核心语言,广泛应用于多个项目中。
是的,scrapy的所有代码库都是公开的,这使得开发者和用户可以自由访问、使用和贡献代码,促进了社区的协作和创新。