Skip to main content
Open In ColabOpen on GitHub

Apify Actor

Apify Actors are cloud programs designed for a wide range of web scraping, crawling, and data extraction tasks. These actors facilitate automated data gathering from the web, enabling users to extract, process, and store information efficiently. Actors can be used to perform tasks like scraping e-commerce sites for product details, monitoring price changes, or gathering search engine results. They integrate seamlessly with Apify Datasets, allowing the structured data collected by actors to be stored, managed, and exported in formats like JSON, CSV, or Excel for further analysis or use.

This notebook shows how to use Apify Actors with LangChain.

Prerequisites

  • Apify account: Register your free Apify account here.
  • Apify API token: Learn how to get your API token in the Apify documentation.
%pip install langgraph langchain-apify langchain-openai

First, import ApifyActorsTool into your source code:

from langchain_apify import ApifyActorsTool

Find your Apify API token and OpenAI API key and initialize these into environment variables:

import os

os.environ["APIFY_API_TOKEN"] = "your-apify-api-token"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

Using Apify Actors as LangChain agent tools

We will create an ApifyActorsTool instance to be able to call Apify Actors and pass it to an agent. Any Actors from the Apify Store can be used as a tool. In this example, we will use the Website Content Crawler to extract and analyze website content.

wcc = ApifyActorsTool("apify/website-content-crawler")
tools = [wcc]

Providing Apify Actors tool to an agent

We can provide the created tool to an agent. When asked to summarize a website, the agent will call the Apify Actor to extract the content of the website and then return a summary of the retrieved content.

from langchain_core.messages import ToolMessage
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

model = ChatOpenAI(model="gpt-4o")
graph = create_react_agent(model, tools=tools)

Run the agent to summarize the content of whitepaper.actor.

inputs = {
"messages": [("user", "get and summarize me content of this site whitepaper.actor")]
}
for s in graph.stream(inputs, stream_mode="values"):
message = s["messages"][-1]
# skip tool messages
if isinstance(message, ToolMessage):
continue
message.pretty_print()
================================ Human Message =================================

get and summarize me content of this site whitepaper.actor
================================== Ai Message ==================================
Tool Calls:
apify_actor_apify_website-content-crawler (call_VkGocyUlaMUqhm8fHJhi5Z7p)
Call ID: call_VkGocyUlaMUqhm8fHJhi5Z7p
Args:
run_input: {"startUrls":[{"url":"https://whitepaper.actor"}],"proxyConfiguration":{"useApifyProxy":true},"maxCrawlPages":5,"maxResults":3}
================================== Ai Message ==================================

The **Web Actor Programming Model Whitepaper** introduces a new way to build serverless microapps called Actors. These are cloud-based tools that allow developers to create and distribute software effortlessly. Here are the key takeaways:

1. **Overview of Actors:**
- Actors are serverless programs that perform various tasks from simple actions like sending an email to complex ones such as web crawling.
- They are packaged as Docker images to accept inputs, perform tasks, and possibly generate outputs.
- Features include a Dockerfile, input/output schema, out-of-the-box storage system, and metadata like Actor name and version.

2. **Apify Platform:**
- Apify hosts Actors, scales resources as needed, and provides a practical user interface.
- Actors can integrate with services like Zapier, allowing easy usage in workflows. Users can monetize Actors by setting usage fees.

3. **Key Concepts:**
- Actors have well-defined input and output schemas, making it easy to integrate them into other systems or workflows.
- They are designed for flexibility in integrations, offering interoperability with external systems.

4. **Philosophy:**
- Inspired by UNIX principles, Actors are meant to do one task well and can be reused in larger systems.
- While they share conceptual similarities with the Actor model in computing, Apify's Actors focus on practical software utility.

5. **Development and Usage:**
- Actors can be developed using Node.js or Python SDKs.
- Apify CLI supports local development and deployment to the Apify platform.
- Actors support various monetization options, allowing developers to earn from their creations.

6. **Schemas and Specifications:**
- **Dataset Schema**: Allows sequential storage and retrieval of data records in various formats. Users can assign schemas to datasets for compatibility.
- **Actor File Specification**: Defines core Actor properties such as name, description, version, environment variables, and links to other definitions.

Overall, the whitepaper provides in-depth guidance on using Actors within the Apify ecosystem, emphasizing their ease of deployment, flexibility, and potential for contributing to the software development landscape.

Was this page helpful?