Apify在GitHub上拥有广泛的公共代码库,主要使用TypeScript、Python和JavaScript等编程语言。其知名项目包括crawlee和crawlee-python,这些库为网络抓取和浏览器自动化提供了可靠的解决方案,适用于AI和数据提取等多种应用场景。
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
Browser fingerprinting tools for anonymizing your scrapers. Developed by Apify.
Collection of Apify Agent Skills
The Apify MCP server enables your AI agents to extract data from social media, search engines, maps, e-commerce sites, or any other website using thousands of ready-made scrapers, crawlers, and automation tools available on the Apify Store.
Node.js implementation of a proxy server (think Squid) with support for SSL, HTTP/HTTPS, SOCKS5, authentication, and upstream proxy chaining.
HTTP client made for scraping based on got.
A universal CLI client for MCP. mcpc supports persistent sessions, stdio/HTTP, OAuth 2.1, tasks, JSON output for code mode, proxy for AI sandboxes, x402, and more.
impit | rust library for browser impersonation
Experimental Camoufox JS port
Apify command-line interface helps you create, develop, build and run Apify Actors, and manage the Apify cloud platform.
A MCP Server for the RAG Web Browser Actor
Community collection of Apify agent skills for AI coding assistants
Apify SDK monorepo
Apify SDK for Python—The official library for building Apify Actors: serverless cloud programs for web scraping, browser automation, data processing, and AI agents. Manages the Actor lifecycle, storages (datasets, key-value stores, request queues), events, proxies, and pay-per-event monetization. Built on top of the the Apify API Client.
Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSON-LD metadata, analyzes AJAX requests, etc.
House of Apify Scrapers. Generic scraping actors with a simple UI to handle complex web crawling and scraping use cases.
A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Apify API client for Python—Programmatically run Actors, manage and stream data from storages (datasets, key-value stores, request queues), schedule and monitor runs, and access the full Apify platform API. Sync and async interfaces with automatic retries and pagination.
Base Docker images for Apify actors.
Generates realistic browser fingerprints
This whitepaper describes a new concept for building serverless microapps called Actors, which are easy to develop, share, integrate, and build upon. Actors are a reincarnation of the UNIX philosophy for programs running in the cloud.
Apify API client for JavaScript / Node.js.
Index of all Model Context Protocol (MCP) clients and their capabilities
Home of fingerprint injector.
RAG Web Browser is an Apify Actor to feed your LLM applications and RAG pipelines with up-to-date text content scraped from the web.
NodeJs package for generating browser-like headers.
This project is the home of Apify's documentation.
This project is the :house: home of Apify Actor templates to help users quickly get started. Contributions welcome!
Generic REST API for scraping websites. Drop-in replacement for ScrapingBee, ScrapingAnt, and ScraperAPI services. And it is open-source!
n8n node to interact with Apify APIs
JavaScript / Node.js library to stream data into an XLSX file
The /llms.txt Generator Actor 🕸️📄 extracts website content to create an llms.txt file for AI apps 🤖✨ like LLM fine-tuning and indexing. Output is available 📥 in the Key-Value Store for easy download and integration into workflows. 🚀
OpenClaw extension integration
I Don't Care About Cookies extension compiled for use with Playwright/Puppeteer
A curated collection of awesome MCP servers, published and monetized as Actors on Apify
Utilities and constants shared across Apify projects.
Contains a boilerplate of an Apify actor to help you get started quickly build your own actors.
Apify integration for Zapier
A GitHub Action to push an Actor the the Apify platform
Apify's reusable github workflows
A Homebrew tap for Apify tools
Transfer data from Apify Actors to vector databases (Chroma, Milvus, Pinecone, PostgreSQL (PG-Vector), Qdrant, and Weaviate)
This is a template repo for the n8n single Actor apps.
The Finance Monitoring AI Agent 📊💹 analyzes specific tickers, gathering and processing data to generate insightful reports 📈📉. Designed for investors and analysts, this agent provides detailed performance analysis and trends. 🚀
Example Apify Actor written in Python
An example repository showcasing how you can scrape in parallel using one request queue
Apify integration for LangChain 🦜🔗
Documentation site for the Actor Programming Model – a fresh take on serverless microapps. Built with Astro.
Open-source Actor that provides a sandbox for secure execution of AI generated code. Supports Node.js, Python. Provides pre-configured Claude Code, Codex CLI, and OpenCode. 📦
此仓库未提供描述。
An example repository with multiple Apify Actors sharing code between each other.
Patched fork of `ruslts` for `impit`
此仓库未提供描述。
Local emulation of the apify-client NPM package, which enables local use of Apify SDK.
Apify ESLint preset to be shared between projects
A Rust implementation of the filesystem storage used by the Crawlee web scraping framework
Example of Python Scrapy project. It scrapes book data from https://books.toscrape.com/.
The official integration for Apify and Haystack 2.0
Teach your agents to scrape real-time data with this self-guided workshop.
Apify nodes for n8n.
Constants and utilities shared across Apify's Python libraries and projects.
Get your documents ready for gen AI
🔎 Hunt down social media accounts by username across social networks
此仓库未提供描述。
AI Agents & MCPs & AI Workflow Automation • (~400 MCP servers for AI agents) • AI Automation / AI Agent with MCPs • AI Workflows & AI Agents • MCPs for AI Agents
The Github action that makes sure that each PR is correctly set up and has a milestone set.
此仓库未提供描述。
Special, yet insignificant actors
All Dify Plugins listed in Dify Marketplace, plus illustrated plugin examples.
此仓库未提供描述。
TypeScript configuration shared across projects in Apify.
Langflow is a powerful tool for building and deploying AI-powered agents and workflows.
此仓库未提供描述。
This action simplify creating of release PR
此仓库未提供描述。
Tools & lib to test actors on the Apify platform
A model-driven approach to building AI agents in just a few lines of code.
HTTP specific Tower utilities.
Patched fork of h2 for impit
Apify's custom GitHub Actions for internal use
Apify's fork of `docusaurus-plugin-typedoc-api`, customized for our Python documentation.
此仓库未提供描述。
This Actor is running in a schedule every day and monitors the log for new slow queries
Kilo is the all-in-one agentic engineering platform. Build, ship, and iterate faster with the most popular open source coding agent.
Documentation for the Strands Agents SDK. A model-driven approach to building AI agents in just a few lines of code.
此仓库未提供描述。
A simple actor used to test the Apify MCP server
Template for Claude managed agents Actors
此仓库未提供描述。
The agent that grows with you
This Actor maps your Apify dataset items into HubSpot company fields and performs imports
Apify oxlint preset to be shared between projects
Common utilities used with hyper.
Local service that mocks recombee, completely vibe coded 🤖
Official Apify powers for Kiro IDE — web scraping, data extraction, and Actor development
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of tasks that could take minutes to hours.
Repository to define an organization (or team) wide Github Actions workflows
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
A set of tools that gives agents powerful capabilities.
Apify在GitHub上构建了多种用于网络抓取和自动化的工具,特别是crawlee和crawlee-python等库,它们能够提取AI所需的数据,支持多种文件格式的下载和处理。
Apify主要使用TypeScript、Python和JavaScript来开发其公共代码库。此外,Rust和MDX也在其项目中得到了应用,展示了其技术的多样性。
是的,Apify的代码库是公开的,用户可以在GitHub上访问和使用这些资源。这为开发者和研究人员提供了丰富的工具,以满足他们在网络抓取和数据处理方面的需求。