diff --git a/api-reference/endpoint/smartcrawler/start.mdx b/api-reference/endpoint/smartcrawler/start.mdx
index c28c7a6..cee1feb 100644
--- a/api-reference/endpoint/smartcrawler/start.mdx
+++ b/api-reference/endpoint/smartcrawler/start.mdx
@@ -229,7 +229,7 @@ sha256={your_webhook_secret}
To verify that a webhook request is authentic:
-1. Retrieve your webhook secret from the [dashboard](https://dashboard.scrapegraphai.com)
+1. Retrieve your webhook secret from the [dashboard](https://scrapegraphai.com/dashboard)
2. Compare the `X-Webhook-Signature` header value with `sha256={your_secret}`
@@ -305,5 +305,5 @@ The webhook POST request contains the following JSON payload:
| result | string | The crawl result data (null if failed) |
-Make sure to configure your webhook secret in the [dashboard](https://dashboard.scrapegraphai.com) before using webhooks. Each user has a unique webhook secret for secure verification.
+Make sure to configure your webhook secret in the [dashboard](https://scrapegraphai.com/dashboard) before using webhooks. Each user has a unique webhook secret for secure verification.
diff --git a/api-reference/errors.mdx b/api-reference/errors.mdx
index 5933a08..07846f7 100644
--- a/api-reference/errors.mdx
+++ b/api-reference/errors.mdx
@@ -139,17 +139,17 @@ except APIError as e:
```
```javascript JavaScript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
-const apiKey = 'your-api-key';
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
-const response = await smartScraper(apiKey, {
- website_url: 'https://example.com',
- user_prompt: 'Extract data',
-});
-
-if (response.status === 'error') {
- console.error('Error:', response.error);
+try {
+ const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract data',
+ });
+ console.log('Data:', data);
+} catch (error) {
+ console.error('Error:', error.message);
}
```
diff --git a/api-reference/introduction.mdx b/api-reference/introduction.mdx
index bee4eb5..872bcb1 100644
--- a/api-reference/introduction.mdx
+++ b/api-reference/introduction.mdx
@@ -9,7 +9,7 @@ The ScrapeGraphAI API provides powerful endpoints for AI-powered web scraping an
## Authentication
-All API requests require authentication using an API key. You can get your API key from the [dashboard](https://dashboard.scrapegraphai.com).
+All API requests require authentication using an API key. You can get your API key from the [dashboard](https://scrapegraphai.com/dashboard).
```bash
SGAI-APIKEY: your-api-key-here
diff --git a/cookbook/examples/pagination.mdx b/cookbook/examples/pagination.mdx
index e078401..df05074 100644
--- a/cookbook/examples/pagination.mdx
+++ b/cookbook/examples/pagination.mdx
@@ -349,22 +349,20 @@ if __name__ == "__main__":
## JavaScript SDK Example
```javascript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
import 'dotenv/config';
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
-const response = await smartScraper(apiKey, {
- website_url: 'https://www.amazon.in/s?k=tv&crid=1TEF1ZFVLU8R8&sprefix=t%2Caps%2C390&ref=nb_sb_noss_2',
- user_prompt: 'Extract all product info including name, price, rating, and image_url',
- total_pages: 3,
-});
+const { data } = await sgai.extract(
+ 'https://www.amazon.in/s?k=tv&crid=1TEF1ZFVLU8R8&sprefix=t%2Caps%2C390&ref=nb_sb_noss_2',
+ {
+ prompt: 'Extract all product info including name, price, rating, and image_url',
+ totalPages: 3,
+ }
+);
-if (response.status === 'error') {
- console.error('Error:', response.error);
-} else {
- console.log('Response:', JSON.stringify(response.data, null, 2));
-}
+console.log('Response:', JSON.stringify(data, null, 2));
```
## Example Output
diff --git a/cookbook/introduction.mdx b/cookbook/introduction.mdx
index b888df7..ca0c3e8 100644
--- a/cookbook/introduction.mdx
+++ b/cookbook/introduction.mdx
@@ -87,7 +87,7 @@ Each example is available in multiple implementations:
4. Experiment and adapt the code for your needs
-Make sure to have your ScrapeGraphAI API key ready. Get one from the [dashboard](https://dashboard.scrapegraphai.com) if you haven't already.
+Make sure to have your ScrapeGraphAI API key ready. Get one from the [dashboard](https://scrapegraphai.com/dashboard) if you haven't already.
## Additional Resources
diff --git a/dashboard/overview.mdx b/dashboard/overview.mdx
index df173d8..4c26957 100644
--- a/dashboard/overview.mdx
+++ b/dashboard/overview.mdx
@@ -19,21 +19,6 @@ The ScrapeGraphAI dashboard is your central hub for managing all your web scrapi
- **Last Used**: Timestamp of your most recent API request
- **Quick Actions**: Buttons to start new scraping jobs or access common features
-## Usage Analytics
-
-Track your API usage patterns with our detailed analytics view:
-
-
-
-
-
-The usage graph provides:
-- **Service-specific metrics**: Track usage for SmartScraper, SearchScraper, and Markdownify separately
-- **Time-based analysis**: View usage patterns over different time periods
-- **Interactive tooltips**: Hover over data points to see detailed information
-- **Trend analysis**: Identify usage patterns and optimize your API consumption
-
-
## Key Features
- **Usage Statistics**: Monitor your API usage and remaining credits
@@ -43,7 +28,7 @@ The usage graph provides:
## Getting Started
-1. Log in to your [dashboard](https://dashboard.scrapegraphai.com)
+1. Log in to your [dashboard](https://scrapegraphai.com/dashboard)
2. View your API key in the settings section
3. Check your available credits
4. Start your first scraping job
diff --git a/developer-guides/llm-sdks-and-frameworks/anthropic.mdx b/developer-guides/llm-sdks-and-frameworks/anthropic.mdx
index f01b8e5..902608b 100644
--- a/developer-guides/llm-sdks-and-frameworks/anthropic.mdx
+++ b/developer-guides/llm-sdks-and-frameworks/anthropic.mdx
@@ -27,24 +27,23 @@ If using Node < 20, install `dotenv` and add `import 'dotenv/config'` to your co
This example demonstrates a simple workflow: scrape a website and summarize the content using Claude.
```typescript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
import Anthropic from '@anthropic-ai/sdk';
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
-const scrapeResult = await smartScraper(apiKey, {
- website_url: 'https://scrapegraphai.com',
- user_prompt: 'Extract all content from this page',
+const { data } = await sgai.extract('https://scrapegraphai.com', {
+ prompt: 'Extract all content from this page',
});
-console.log('Scraped content length:', JSON.stringify(scrapeResult.data.result).length);
+console.log('Scraped content length:', JSON.stringify(data).length);
const message = await anthropic.messages.create({
model: 'claude-haiku-4-5',
max_tokens: 1024,
messages: [
- { role: 'user', content: `Summarize in 100 words: ${JSON.stringify(scrapeResult.data.result)}` }
+ { role: 'user', content: `Summarize in 100 words: ${JSON.stringify(data)}` }
]
});
@@ -56,12 +55,12 @@ console.log('Response:', message);
This example shows how to use Claude's tool use feature to let the model decide when to scrape websites based on user requests.
```typescript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
import { Anthropic } from '@anthropic-ai/sdk';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY
});
@@ -91,12 +90,11 @@ if (toolUse && toolUse.type === 'tool_use') {
const input = toolUse.input as { url: string };
console.log(`Calling tool: ${toolUse.name} | URL: ${input.url}`);
- const result = await smartScraper(apiKey, {
- website_url: input.url,
- user_prompt: 'Extract all content from this page',
+ const { data } = await sgai.extract(input.url, {
+ prompt: 'Extract all content from this page',
});
- console.log(`Scraped content preview: ${JSON.stringify(result.data.result)?.substring(0, 300)}...`);
+ console.log(`Scraped content preview: ${JSON.stringify(data)?.substring(0, 300)}...`);
// Continue with the conversation or process the scraped content as needed
}
```
@@ -106,11 +104,11 @@ if (toolUse && toolUse.type === 'tool_use') {
This example demonstrates how to use Claude to extract structured data from scraped website content.
```typescript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
import Anthropic from '@anthropic-ai/sdk';
import { z } from 'zod';
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const CompanyInfoSchema = z.object({
@@ -119,9 +117,8 @@ const CompanyInfoSchema = z.object({
description: z.string().optional()
});
-const scrapeResult = await smartScraper(apiKey, {
- website_url: 'https://stripe.com',
- user_prompt: 'Extract all content from this page',
+const { data } = await sgai.extract('https://stripe.com', {
+ prompt: 'Extract all content from this page',
});
const prompt = `Extract company information from this website content.
@@ -135,7 +132,7 @@ Output ONLY valid JSON in this exact format (no markdown, no explanation):
}
Website content:
-${JSON.stringify(scrapeResult.data.result)}`;
+${JSON.stringify(data)}`;
const message = await anthropic.messages.create({
model: 'claude-haiku-4-5',
diff --git a/developer-guides/llm-sdks-and-frameworks/gemini.mdx b/developer-guides/llm-sdks-and-frameworks/gemini.mdx
index 0e710c3..1910ef2 100644
--- a/developer-guides/llm-sdks-and-frameworks/gemini.mdx
+++ b/developer-guides/llm-sdks-and-frameworks/gemini.mdx
@@ -27,22 +27,21 @@ If using Node < 20, install `dotenv` and add `import 'dotenv/config'` to your co
This example demonstrates a simple workflow: scrape a website and summarize the content using Gemini.
```typescript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
import { GoogleGenAI } from '@google/genai';
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
-const scrapeResult = await smartScraper(apiKey, {
- website_url: 'https://scrapegraphai.com',
- user_prompt: 'Extract all content from this page',
+const { data } = await sgai.extract('https://scrapegraphai.com', {
+ prompt: 'Extract all content from this page',
});
-console.log('Scraped content length:', JSON.stringify(scrapeResult.data.result).length);
+console.log('Scraped content length:', JSON.stringify(data).length);
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
- contents: `Summarize: ${JSON.stringify(scrapeResult.data.result)}`,
+ contents: `Summarize: ${JSON.stringify(data)}`,
});
console.log('Summary:', response.text);
@@ -53,18 +52,17 @@ console.log('Summary:', response.text);
This example shows how to analyze website content using Gemini's multi-turn conversation capabilities.
```typescript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
import { GoogleGenAI } from '@google/genai';
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
-const scrapeResult = await smartScraper(apiKey, {
- website_url: 'https://news.ycombinator.com/',
- user_prompt: 'Extract all content from this page',
+const { data } = await sgai.extract('https://news.ycombinator.com/', {
+ prompt: 'Extract all content from this page',
});
-console.log('Scraped content length:', JSON.stringify(scrapeResult.data.result).length);
+console.log('Scraped content length:', JSON.stringify(data).length);
const chat = ai.chats.create({
model: 'gemini-2.5-flash'
@@ -72,7 +70,7 @@ const chat = ai.chats.create({
// Ask for the top 3 stories on Hacker News
const result1 = await chat.sendMessage({
- message: `Based on this website content from Hacker News, what are the top 3 stories right now?\n\n${JSON.stringify(scrapeResult.data.result)}`
+ message: `Based on this website content from Hacker News, what are the top 3 stories right now?\n\n${JSON.stringify(data)}`
});
console.log('Top 3 Stories:', result1.text);
@@ -88,22 +86,21 @@ console.log('4th and 5th Stories:', result2.text);
This example demonstrates how to extract structured data using Gemini's JSON mode from scraped website content.
```typescript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
import { GoogleGenAI, Type } from '@google/genai';
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: process.env.SGAI_APIKEY });
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
-const scrapeResult = await smartScraper(apiKey, {
- website_url: 'https://stripe.com',
- user_prompt: 'Extract all content from this page',
+const { data } = await sgai.extract('https://stripe.com', {
+ prompt: 'Extract all content from this page',
});
-console.log('Scraped content length:', JSON.stringify(scrapeResult.data.result).length);
+console.log('Scraped content length:', JSON.stringify(data).length);
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
- contents: `Extract company information: ${JSON.stringify(scrapeResult.data.result)}`,
+ contents: `Extract company information: ${JSON.stringify(data)}`,
config: {
responseMimeType: 'application/json',
responseSchema: {
diff --git a/docs.json b/docs.json
index 282bcf6..60b1947 100644
--- a/docs.json
+++ b/docs.json
@@ -3,250 +3,313 @@
"theme": "mint",
"name": "ScrapeGraphAI",
"colors": {
- "primary": "#9333ea",
- "light": "#9f52eb",
- "dark": "#1f2937"
+ "primary": "#AC6DFF",
+ "light": "#AC6DFF",
+ "dark": "#AC6DFF"
},
"favicon": "/favicon.svg",
"navigation": {
- "tabs": [
+ "versions": [
{
- "tab": "Home",
- "groups": [
+ "version": "v2",
+ "default": true,
+ "tabs": [
{
- "group": "Get Started",
- "pages": [
- "introduction",
- "install",
+ "tab": "Home",
+ "groups": [
{
- "group": "Use Cases",
+ "group": "Get Started",
"pages": [
- "use-cases/overview",
- "use-cases/ai-llm",
- "use-cases/lead-generation",
- "use-cases/market-intelligence",
- "use-cases/content-aggregation",
- "use-cases/research-analysis",
- "use-cases/seo-analytics"
+ "introduction",
+ "install",
+ "transition-from-v1-to-v2",
+ {
+ "group": "Use Cases",
+ "pages": [
+ "use-cases/overview",
+ "use-cases/ai-llm",
+ "use-cases/lead-generation",
+ "use-cases/market-intelligence",
+ "use-cases/content-aggregation",
+ "use-cases/research-analysis",
+ "use-cases/seo-analytics"
+ ]
+ },
+ {
+ "group": "Dashboard",
+ "pages": [
+ "dashboard/overview",
+ "dashboard/settings"
+ ]
+ }
]
},
{
- "group": "Dashboard",
+ "group": "Services",
"pages": [
- "dashboard/overview",
- "dashboard/playground",
- "dashboard/settings"
+ "services/scrape",
+ "services/extract",
+ "services/search",
+ "services/crawl",
+ "services/monitor",
+ {
+ "group": "CLI",
+ "icon": "terminal",
+ "pages": [
+ "services/cli/introduction",
+ "services/cli/commands",
+ "services/cli/json-mode",
+ "services/cli/ai-agent-skill",
+ "services/cli/examples"
+ ]
+ },
+ {
+ "group": "MCP Server",
+ "icon": "/logo/mcp.svg",
+ "pages": [
+ "services/mcp-server/introduction",
+ "services/mcp-server/cursor",
+ "services/mcp-server/claude",
+ "services/mcp-server/smithery"
+ ]
+ },
+ "services/toonify",
+ {
+ "group": "Additional Parameters",
+ "pages": [
+ "services/additional-parameters/headers",
+ "services/additional-parameters/pagination",
+ "services/additional-parameters/proxy",
+ "services/additional-parameters/wait-ms"
+ ]
+ }
+ ]
+ },
+ {
+ "group": "Official SDKs",
+ "pages": [
+ "sdks/python",
+ "sdks/javascript",
+ "sdks/mocking"
+ ]
+ },
+ {
+ "group": "LLM SDKs & Frameworks",
+ "pages": [
+ "developer-guides/llm-sdks-and-frameworks/gemini",
+ "developer-guides/llm-sdks-and-frameworks/anthropic"
+ ]
+ },
+ {
+ "group": "Contribute",
+ "pages": [
+ "contribute/opensource"
]
}
]
},
{
- "group": "Services",
- "pages": [
- "services/smartscraper",
- "services/searchscraper",
- "services/markdownify",
- "services/scrape",
- "services/smartcrawler",
- "services/sitemap",
- "services/agenticscraper",
+ "tab": "Knowledge Base",
+ "groups": [
+ {
+ "group": "Knowledge Base",
+ "pages": [
+ "knowledge-base/introduction"
+ ]
+ },
+ {
+ "group": "Scraping Tools",
+ "pages": [
+ "knowledge-base/ai-tools/lovable",
+ "knowledge-base/ai-tools/v0",
+ "knowledge-base/ai-tools/bolt",
+ "knowledge-base/ai-tools/cursor"
+ ]
+ },
{
"group": "CLI",
- "icon": "terminal",
"pages": [
- "services/cli/introduction",
- "services/cli/commands",
- "services/cli/json-mode",
- "services/cli/ai-agent-skill",
- "services/cli/examples"
+ "knowledge-base/cli/getting-started",
+ "knowledge-base/cli/json-mode",
+ "knowledge-base/cli/ai-agent-skill",
+ "knowledge-base/cli/command-examples"
+ ]
+ },
+ {
+ "group": "Troubleshooting",
+ "pages": [
+ "knowledge-base/troubleshooting/cors-error",
+ "knowledge-base/troubleshooting/empty-results",
+ "knowledge-base/troubleshooting/rate-limiting",
+ "knowledge-base/troubleshooting/timeout-errors"
]
},
{
- "group": "MCP Server",
- "icon": "/logo/mcp.svg",
+ "group": "Scraping Guides",
"pages": [
- "services/mcp-server/introduction",
- "services/mcp-server/cursor",
- "services/mcp-server/claude",
- "services/mcp-server/smithery"
+ "knowledge-base/scraping/javascript-rendering",
+ "knowledge-base/scraping/pagination",
+ "knowledge-base/scraping/custom-headers",
+ "knowledge-base/scraping/proxy"
]
},
- "services/toonify",
{
- "group": "Additional Parameters",
+ "group": "Account & Credits",
"pages": [
- "services/additional-parameters/headers",
- "services/additional-parameters/pagination",
- "services/additional-parameters/proxy",
- "services/additional-parameters/wait-ms"
+ "knowledge-base/account/pricing",
+ "knowledge-base/account/api-keys",
+ "knowledge-base/account/credits",
+ "knowledge-base/account/rate-limits"
]
}
]
},
{
- "group": "Official SDKs",
- "pages": [
- "sdks/python",
- "sdks/javascript",
- "sdks/mocking"
- ]
- },
- {
- "group": "Integrations",
- "pages": [
- "integrations/langchain",
- "integrations/llamaindex",
- "integrations/crewai",
- "integrations/agno",
- "integrations/langflow",
- "integrations/vercel_ai",
- "integrations/google-adk",
- "integrations/x402"
- ]
- },
- {
- "group": "LLM SDKs & Frameworks",
- "pages": [
- "developer-guides/llm-sdks-and-frameworks/gemini",
- "developer-guides/llm-sdks-and-frameworks/anthropic"
- ]
- },
- {
- "group": "Contribute",
- "pages": [
- "contribute/opensource"
- ]
- }
- ]
- },
- {
- "tab": "Knowledge Base",
- "groups": [
- {
- "group": "Knowledge Base",
- "pages": [
- "knowledge-base/introduction"
- ]
- },
- {
- "group": "Scraping Tools",
- "pages": [
- "knowledge-base/ai-tools/lovable",
- "knowledge-base/ai-tools/v0",
- "knowledge-base/ai-tools/bolt",
- "knowledge-base/ai-tools/cursor"
- ]
- },
- {
- "group": "CLI",
- "pages": [
- "knowledge-base/cli/getting-started",
- "knowledge-base/cli/json-mode",
- "knowledge-base/cli/ai-agent-skill",
- "knowledge-base/cli/command-examples"
- ]
- },
- {
- "group": "Troubleshooting",
- "pages": [
- "knowledge-base/troubleshooting/cors-error",
- "knowledge-base/troubleshooting/empty-results",
- "knowledge-base/troubleshooting/rate-limiting",
- "knowledge-base/troubleshooting/timeout-errors"
- ]
- },
- {
- "group": "Scraping Guides",
- "pages": [
- "knowledge-base/scraping/javascript-rendering",
- "knowledge-base/scraping/pagination",
- "knowledge-base/scraping/custom-headers",
- "knowledge-base/scraping/proxy"
- ]
- },
- {
- "group": "Account & Credits",
- "pages": [
- "knowledge-base/account/api-keys",
- "knowledge-base/account/credits",
- "knowledge-base/account/rate-limits"
- ]
- }
- ]
- },
- {
- "tab": "Cookbook",
- "groups": [
- {
- "group": "Cookbook",
- "pages": [
- "cookbook/introduction"
+ "tab": "Cookbook",
+ "groups": [
+ {
+ "group": "Cookbook",
+ "pages": [
+ "cookbook/introduction"
+ ]
+ },
+ {
+ "group": "Examples",
+ "pages": [
+ "cookbook/examples/company-info",
+ "cookbook/examples/github-trending",
+ "cookbook/examples/wired",
+ "cookbook/examples/homes",
+ "cookbook/examples/research-agent",
+ "cookbook/examples/chat-webpage",
+ "cookbook/examples/pagination"
+ ]
+ }
]
},
{
- "group": "Examples",
- "pages": [
- "cookbook/examples/company-info",
- "cookbook/examples/github-trending",
- "cookbook/examples/wired",
- "cookbook/examples/homes",
- "cookbook/examples/research-agent",
- "cookbook/examples/chat-webpage",
- "cookbook/examples/pagination"
+ "tab": "API Reference",
+ "groups": [
+ {
+ "group": "API Documentation",
+ "pages": [
+ "api-reference/introduction",
+ "api-reference/errors"
+ ]
+ },
+ {
+ "group": "SmartScraper",
+ "pages": [
+ "api-reference/endpoint/smartscraper/start",
+ "api-reference/endpoint/smartscraper/get-status"
+ ]
+ },
+ {
+ "group": "SearchScraper",
+ "pages": [
+ "api-reference/endpoint/searchscraper/start",
+ "api-reference/endpoint/searchscraper/get-status"
+ ]
+ },
+ {
+ "group": "Markdownify",
+ "pages": [
+ "api-reference/endpoint/markdownify/start",
+ "api-reference/endpoint/markdownify/get-status"
+ ]
+ },
+ {
+ "group": "SmartCrawler",
+ "pages": [
+ "api-reference/endpoint/smartcrawler/start",
+ "api-reference/endpoint/smartcrawler/get-status"
+ ]
+ },
+ {
+ "group": "Sitemap",
+ "pages": [
+ "api-reference/endpoint/sitemap/start",
+ "api-reference/endpoint/sitemap/get-status"
+ ]
+ },
+ {
+ "group": "User",
+ "pages": [
+ "api-reference/endpoint/user/get-credits",
+ "api-reference/endpoint/user/submit-feedback"
+ ]
+ }
]
}
]
},
{
- "tab": "API Reference",
- "groups": [
- {
- "group": "API Documentation",
- "pages": [
- "api-reference/introduction",
- "api-reference/errors"
- ]
- },
- {
- "group": "SmartScraper",
- "pages": [
- "api-reference/endpoint/smartscraper/start",
- "api-reference/endpoint/smartscraper/get-status"
- ]
- },
- {
- "group": "SearchScraper",
- "pages": [
- "api-reference/endpoint/searchscraper/start",
- "api-reference/endpoint/searchscraper/get-status"
- ]
- },
+ "version": "v1",
+ "tabs": [
{
- "group": "Markdownify",
- "pages": [
- "api-reference/endpoint/markdownify/start",
- "api-reference/endpoint/markdownify/get-status"
- ]
- },
- {
- "group": "SmartCrawler",
- "pages": [
- "api-reference/endpoint/smartcrawler/start",
- "api-reference/endpoint/smartcrawler/get-status"
- ]
- },
- {
- "group": "Sitemap",
- "pages": [
- "api-reference/endpoint/sitemap/start",
- "api-reference/endpoint/sitemap/get-status"
+ "tab": "Home",
+ "groups": [
+ {
+ "group": "Get Started",
+ "pages": [
+ "v1/introduction",
+ "v1/quickstart"
+ ]
+ },
+ {
+ "group": "Services",
+ "pages": [
+ "v1/smartscraper",
+ "v1/searchscraper",
+ "v1/markdownify",
+ "v1/scrape",
+ "v1/smartcrawler",
+ "v1/sitemap",
+ "v1/agenticscraper",
+ {
+ "group": "CLI",
+ "icon": "terminal",
+ "pages": [
+ "v1/cli/introduction",
+ "v1/cli/commands",
+ "v1/cli/json-mode",
+ "v1/cli/ai-agent-skill",
+ "v1/cli/examples"
+ ]
+ },
+ {
+ "group": "MCP Server",
+ "icon": "/logo/mcp.svg",
+ "pages": [
+ "v1/mcp-server/introduction",
+ "v1/mcp-server/cursor",
+ "v1/mcp-server/claude",
+ "v1/mcp-server/smithery"
+ ]
+ },
+ "v1/toonify",
+ {
+ "group": "Additional Parameters",
+ "pages": [
+ "v1/additional-parameters/headers",
+ "v1/additional-parameters/pagination",
+ "v1/additional-parameters/proxy",
+ "v1/additional-parameters/wait-ms"
+ ]
+ }
+ ]
+ }
]
},
{
- "group": "User",
- "pages": [
- "api-reference/endpoint/user/get-credits",
- "api-reference/endpoint/user/submit-feedback"
+ "tab": "API Reference",
+ "groups": [
+ {
+ "group": "API Documentation",
+ "pages": [
+ "v1/api-reference/introduction"
+ ]
+ }
]
}
]
@@ -259,12 +322,7 @@
"href": "https://scrapegraphai.com/",
"icon": "globe"
},
- {
- "anchor": "Community",
- "href": "https://discord.gg/uJN7TYcpNa",
- "icon": "discord"
- },
- {
+{
"anchor": "Blog",
"href": "https://scrapegraphai.com/blog",
"icon": "newspaper"
@@ -273,13 +331,24 @@
}
},
"logo": {
- "light": "https://raw.githubusercontent.com/ScrapeGraphAI/docs-mintlify/main/logo/light.svg",
- "dark": "https://raw.githubusercontent.com/ScrapeGraphAI/docs-mintlify/main/logo/dark.svg",
+ "light": "/logos/logo-light.svg",
+ "dark": "/logos/logo-dark.svg",
"href": "https://docs.scrapegraphai.com"
},
"background": {
"color": {
- "dark": "#101725"
+ "dark": "#242424",
+ "light": "#EFEFEF"
+ }
+ },
+ "fonts": {
+ "heading": {
+ "family": "IBM Plex Sans",
+ "weight": 500
+ },
+ "body": {
+ "family": "IBM Plex Sans",
+ "weight": 400
}
},
"navbar": {
@@ -293,14 +362,14 @@
"href": "mailto:contact@scrapegraphai.com"
},
{
- "label": "⭐ 23.2k+",
+ "label": "⭐ 23.3k+",
"href": "https://github.com/ScrapeGraphAI/Scrapegraph-ai"
}
],
"primary": {
"type": "button",
"label": "Dashboard",
- "href": "https://dashboard.scrapegraphai.com"
+ "href": "https://scrapegraphai.com/dashboard"
}
},
"footer": {
@@ -322,4 +391,4 @@
"vscode"
]
}
-}
\ No newline at end of file
+}
diff --git a/favicon.svg b/favicon.svg
index 33285d6..6fb828b 100644
--- a/favicon.svg
+++ b/favicon.svg
@@ -1,145 +1,15 @@
-
-
-
-
+
diff --git a/images/dashboard/dashboard-1.png b/images/dashboard/dashboard-1.png
index 1120f7e..2249c72 100644
Binary files a/images/dashboard/dashboard-1.png and b/images/dashboard/dashboard-1.png differ
diff --git a/images/dashboard/settings-1.png b/images/dashboard/settings-1.png
index 87ea08e..16f93b3 100644
Binary files a/images/dashboard/settings-1.png and b/images/dashboard/settings-1.png differ
diff --git a/install.md b/install.md
index 1f1165d..07cb1d5 100644
--- a/install.md
+++ b/install.md
@@ -1,11 +1,11 @@
---
title: Installation
-description: 'Install and get started with ScrapeGraphAI SDKs'
+description: 'Install and get started with ScrapeGraphAI v2 SDKs'
---
## Prerequisites
-- Obtain your **API key** by signing up on the [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com)
+- Obtain your **API key** by signing up on the [ScrapeGraphAI Dashboard](https://scrapegraphai.com/dashboard)
---
@@ -22,10 +22,10 @@ from scrapegraph_py import Client
client = Client(api_key="your-api-key-here")
-# Scrape a website
-response = client.smartscraper(
- website_url="https://scrapegraphai.com",
- user_prompt="Extract information about the company"
+# Extract data from a website
+response = client.extract(
+ url="https://scrapegraphai.com",
+ prompt="Extract information about the company"
)
print(response)
```
@@ -40,6 +40,8 @@ For more advanced usage, see the [Python SDK documentation](/sdks/python).
## JavaScript SDK
+Requires **Node.js >= 22**.
+
Install using npm, pnpm, yarn, or bun:
```bash
@@ -59,20 +61,16 @@ bun add scrapegraph-js
**Usage:**
```javascript
-import { smartScraper } from "scrapegraph-js";
+import scrapegraphai from "scrapegraph-js";
-const apiKey = "your-api-key-here";
+const sgai = scrapegraphai({ apiKey: "your-api-key-here" });
-const response = await smartScraper(apiKey, {
- website_url: "https://scrapegraphai.com",
- user_prompt: "What does the company do?",
-});
+const { data } = await sgai.extract(
+ "https://scrapegraphai.com",
+ { prompt: "What does the company do?" }
+);
-if (response.status === "error") {
- console.error("Error:", response.error);
-} else {
- console.log(response.data.result);
-}
+console.log(data);
```
@@ -85,17 +83,20 @@ For more advanced usage, see the [JavaScript SDK documentation](/sdks/javascript
## Key Concepts
-### SmartScraper
-Extract specific information from any webpage using AI. Provide a URL and a prompt describing what you want to extract. [Learn more](/services/smartscraper)
+### Scrape (formerly Markdownify)
+Convert any webpage into markdown, HTML, screenshot, or branding format. [Learn more](/services/scrape)
+
+### Extract (formerly SmartScraper)
+Extract specific information from any webpage using AI. Provide a URL and a prompt describing what you want to extract. [Learn more](/services/extract)
-### SearchScraper
-Search and extract information from multiple web sources using AI. Start with just a prompt - SearchScraper will find relevant websites and extract the information you need. [Learn more](/services/searchscraper)
+### Search (formerly SearchScraper)
+Search and extract information from multiple web sources using AI. Start with just a query - Search will find relevant websites and extract the information you need. [Learn more](/services/search)
-### SmartCrawler
-AI-powered extraction for any webpage with crawl capabilities. Automatically navigate and extract data from multiple pages. [Learn more](/services/smartcrawler)
+### Crawl (formerly SmartCrawler)
+Multi-page website crawling with flexible output formats. Traverse multiple pages, follow links, and return content in your preferred format. [Learn more](/services/crawl)
-### Markdownify
-Convert any webpage into clean, formatted markdown. Perfect for content aggregation and processing. [Learn more](/services/markdownify)
+### Monitor
+Scheduled web monitoring with AI-powered extraction. Set up recurring scraping jobs that automatically extract data on a cron schedule. [Learn more](/services/monitor)
### Structured Output with Schemas
Both SDKs support structured output using schemas:
@@ -119,34 +120,37 @@ class CompanyInfo(BaseModel):
industry: str = Field(description="Industry sector")
client = Client(api_key="your-api-key")
-result = client.smartscraper(
- website_url="https://scrapegraphai.com",
- user_prompt="Extract company information",
+response = client.extract(
+ url="https://scrapegraphai.com",
+ prompt="Extract company information",
output_schema=CompanyInfo
)
-print(result)
+print(response)
```
### JavaScript Example
```javascript
-import { smartScraper } from "scrapegraph-js";
+import scrapegraphai from "scrapegraph-js";
import { z } from "zod";
+const sgai = scrapegraphai({ apiKey: "your-api-key" });
+
const CompanySchema = z.object({
- company_name: z.string().describe("The company name"),
+ companyName: z.string().describe("The company name"),
description: z.string().describe("Company description"),
website: z.string().url().describe("Company website URL"),
industry: z.string().describe("Industry sector"),
});
-const apiKey = "your-api-key";
-const response = await smartScraper(apiKey, {
- website_url: "https://scrapegraphai.com",
- user_prompt: "Extract company information",
- output_schema: CompanySchema,
-});
-console.log(response.data.result);
+const { data } = await sgai.extract(
+ "https://scrapegraphai.com",
+ {
+ prompt: "Extract company information",
+ schema: CompanySchema,
+ }
+);
+console.log(data);
```
---
diff --git a/integrations/claude-code-skill.mdx b/integrations/claude-code-skill.mdx
index 4f85e3c..fedb452 100644
--- a/integrations/claude-code-skill.mdx
+++ b/integrations/claude-code-skill.mdx
@@ -60,7 +60,7 @@ export SGAI_API_KEY="sgai-..."
```
-Get your API key from the [dashboard](https://dashboard.scrapegraphai.com).
+Get your API key from the [dashboard](https://scrapegraphai.com/dashboard).
## What's Included
diff --git a/integrations/crewai.mdx b/integrations/crewai.mdx
index dba500c..7288b59 100644
--- a/integrations/crewai.mdx
+++ b/integrations/crewai.mdx
@@ -100,7 +100,7 @@ SCRAPEGRAPH_API_KEY=your_api_key_here
```
-Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
+Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
## Use Cases
diff --git a/integrations/google-adk.mdx b/integrations/google-adk.mdx
index 1d3c7f9..dc87dee 100644
--- a/integrations/google-adk.mdx
+++ b/integrations/google-adk.mdx
@@ -84,7 +84,7 @@ SGAI_API_KEY = "your-api-key-here"
```
-Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
+Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
## Tool Filtering
diff --git a/integrations/langchain.mdx b/integrations/langchain.mdx
index aed504f..ad0a97f 100644
--- a/integrations/langchain.mdx
+++ b/integrations/langchain.mdx
@@ -25,20 +25,20 @@ pip install langchain-scrapegraph
## Available Tools
-### SmartScraperTool
+### ExtractTool
Extract structured data from any webpage using natural language prompts:
```python
-from langchain_scrapegraph.tools import SmartScraperTool
+from langchain_scrapegraph.tools import ExtractTool
# Initialize the tool (uses SGAI_API_KEY from environment)
-tool = SmartscraperTool()
+tool = ExtractTool()
# Extract information using natural language
result = tool.invoke({
- "website_url": "https://www.example.com",
- "user_prompt": "Extract the main heading and first paragraph"
+ "url": "https://www.example.com",
+ "prompt": "Extract the main heading and first paragraph"
})
```
@@ -46,60 +46,51 @@ result = tool.invoke({
Define the structure of the output using Pydantic models:
```python
-from typing import List
from pydantic import BaseModel, Field
-from langchain_scrapegraph.tools import SmartScraperTool
+from langchain_scrapegraph.tools import ExtractTool
class WebsiteInfo(BaseModel):
- title: str = Field(description="The main title of the webpage")
- description: str = Field(description="The main description or first paragraph")
- urls: List[str] = Field(description="The URLs inside the webpage")
+ title: str = Field(description="The main title of the page")
+ description: str = Field(description="The main description")
-# Initialize with schema
-tool = SmartScraperTool(llm_output_schema=WebsiteInfo)
+# Initialize with output schema
+tool = ExtractTool(llm_output_schema=WebsiteInfo)
result = tool.invoke({
- "website_url": "https://www.example.com",
- "user_prompt": "Extract the website information"
+ "url": "https://example.com",
+ "prompt": "Extract the title and description"
})
```
-### SearchScraperTool
+### SearchTool
-Process HTML content directly with AI extraction:
+Search the web and extract structured results using AI:
```python
-from langchain_scrapegraph.tools import SearchScraperTool
+from langchain_scrapegraph.tools import SearchTool
-
-tool = SearchScraperTool()
+tool = SearchTool()
result = tool.invoke({
- "user_prompt": "Find the best restaurants in San Francisco",
+ "query": "Find the best restaurants in San Francisco",
})
-
```
-
-```python
-from typing import Optional
-from pydantic import BaseModel, Field
-from langchain_scrapegraph.tools import SearchScraperTool
+### ScrapeTool
-class RestaurantInfo(BaseModel):
- name: str = Field(description="The restaurant name")
- address: str = Field(description="The restaurant address")
- rating: float = Field(description="The restaurant rating")
+Scrape a webpage and return it in the desired format:
+```python
+from langchain_scrapegraph.tools import ScrapeTool
-tool = SearchScraperTool(llm_output_schema=RestaurantInfo)
+tool = ScrapeTool()
-result = tool.invoke({
- "user_prompt": "Find the best restaurants in San Francisco"
-})
+# Scrape as markdown (default)
+result = tool.invoke({"url": "https://example.com"})
+# Scrape as HTML
+result = tool.invoke({"url": "https://example.com", "format": "html"})
```
-
### MarkdownifyTool
@@ -112,34 +103,146 @@ tool = MarkdownifyTool()
markdown = tool.invoke({"website_url": "https://example.com"})
```
+### Crawl Tools
+
+Start and manage crawl jobs with `CrawlStartTool`, `CrawlStatusTool`, `CrawlStopTool`, and `CrawlResumeTool`:
+
+```python
+import time
+from langchain_scrapegraph.tools import CrawlStartTool, CrawlStatusTool
+
+start_tool = CrawlStartTool()
+status_tool = CrawlStatusTool()
+
+# Start a crawl job
+result = start_tool.invoke({
+ "url": "https://example.com",
+ "depth": 2,
+ "max_pages": 5,
+ "format": "markdown",
+})
+print("Crawl started:", result)
+
+# Check status
+crawl_id = result.get("id")
+if crawl_id:
+ time.sleep(5)
+ status = status_tool.invoke({"crawl_id": crawl_id})
+ print("Crawl status:", status)
+```
+
+### Monitor Tools
+
+Create and manage monitors (replaces scheduled jobs) with `MonitorCreateTool`, `MonitorListTool`, `MonitorGetTool`, `MonitorPauseTool`, `MonitorResumeTool`, and `MonitorDeleteTool`:
+
+```python
+from langchain_scrapegraph.tools import MonitorCreateTool, MonitorListTool
+
+create_tool = MonitorCreateTool()
+list_tool = MonitorListTool()
+
+# Create a monitor
+result = create_tool.invoke({
+ "name": "Price Monitor",
+ "url": "https://example.com/products",
+ "prompt": "Extract current product prices",
+ "cron": "0 9 * * *", # Daily at 9 AM
+})
+print("Monitor created:", result)
+
+# List all monitors
+monitors = list_tool.invoke({})
+print("All monitors:", monitors)
+```
+
+### HistoryTool
+
+Retrieve request history:
+
+```python
+from langchain_scrapegraph.tools import HistoryTool
+
+tool = HistoryTool()
+history = tool.invoke({})
+```
+
+### GetCreditsTool
+
+Check your remaining API credits:
+
+```python
+from langchain_scrapegraph.tools import GetCreditsTool
+
+tool = GetCreditsTool()
+credits = tool.invoke({})
+```
+
## Example Agent
Create a research agent that can gather and analyze web data:
```python
-from langchain.agents import initialize_agent, AgentType
-from langchain_scrapegraph.tools import SmartScraperTool
+from langchain.agents import AgentExecutor, create_openai_functions_agent
+from langchain_core.messages import SystemMessage
+from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
+from langchain_scrapegraph.tools import ExtractTool, GetCreditsTool, SearchTool
-# Initialize tools
+# Initialize the tools
tools = [
- SmartScraperTool(),
+ ExtractTool(),
+ GetCreditsTool(),
+ SearchTool(),
]
-# Create an agent
-agent = initialize_agent(
- tools=tools,
- llm=ChatOpenAI(temperature=0),
- agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
- verbose=True
-)
-
-# Use the agent
-response = agent.run("""
- Visit example.com, make a summary of the content and extract the main heading and first paragraph
-""")
+# Create the prompt template
+prompt = ChatPromptTemplate.from_messages([
+ SystemMessage(
+ content=(
+ "You are a helpful AI assistant that can analyze websites and extract information. "
+ "You have access to tools that can help you scrape and process web content. "
+ "Always explain what you're doing before using a tool."
+ )
+ ),
+ MessagesPlaceholder(variable_name="chat_history", optional=True),
+ ("user", "{input}"),
+ MessagesPlaceholder(variable_name="agent_scratchpad"),
+])
+
+# Initialize the LLM
+llm = ChatOpenAI(temperature=0)
+
+# Create the agent
+agent = create_openai_functions_agent(llm, tools, prompt)
+agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
+
+# Example usage
+response = agent_executor.invoke({
+ "input": "Extract the main products from https://www.scrapegraphai.com/"
+})
+print(response["output"])
```
+## Migration from v1
+
+If you're upgrading from v1, here are the key changes:
+
+| v1 Tool | v2 Tool |
+|---------|---------|
+| `SmartScraperTool` | `ExtractTool` |
+| `SearchScraperTool` | `SearchTool` |
+| `SmartCrawlerTool` | `CrawlStartTool` / `CrawlStatusTool` / `CrawlStopTool` / `CrawlResumeTool` |
+| `CreateScheduledJobTool` | `MonitorCreateTool` |
+| `GetScheduledJobsTool` | `MonitorListTool` |
+| `GetScheduledJobTool` | `MonitorGetTool` |
+| `PauseScheduledJobTool` | `MonitorPauseTool` |
+| `ResumeScheduledJobTool` | `MonitorResumeTool` |
+| `DeleteScheduledJobTool` | `MonitorDeleteTool` |
+| `MarkdownifyTool` | `MarkdownifyTool` (unchanged) |
+| `GetCreditsTool` | `GetCreditsTool` (unchanged) |
+| `AgenticScraperTool` | Removed |
+| -- | `HistoryTool` (new) |
+
## Configuration
Set your ScrapeGraph API key in your environment:
@@ -156,7 +259,7 @@ os.environ["SGAI_API_KEY"] = "your-api-key-here"
```
-Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
+Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
## Use Cases
diff --git a/integrations/vercel_ai.mdx b/integrations/vercel_ai.mdx
index 889df6b..f57a290 100644
--- a/integrations/vercel_ai.mdx
+++ b/integrations/vercel_ai.mdx
@@ -5,19 +5,19 @@ description: "Integrate ScrapeGraphAI into Vercel AI"
## Overview
-[Vercel AI sdk](https://ai-sdk.dev/) is a very populate javascript/typescript framework to interact with various LLMs providers. This page shows how to integrate it with ScrapeGraph
+[Vercel AI SDK](https://ai-sdk.dev/) is a popular JavaScript/TypeScript framework to interact with various LLM providers. This page shows how to integrate it with ScrapeGraph.
- View the integration on LlamaHub
+ View the Vercel AI SDK documentation
## Installation
-Follow out [javascript sdk installation steps](/sdks/javascript) using your favourite package manager:
+Follow our [JavaScript SDK installation steps](/sdks/javascript) using your favourite package manager:
```bash
# Using npm
@@ -33,7 +33,7 @@ yarn add scrapegraph-js
bun add scrapegraph-js
```
-Then, install [vercel ai](https://ai-sdk.dev/docs/getting-started) with their [openai provider](https://ai-sdk.dev/providers/ai-sdk-providers/openai)
+Then, install [Vercel AI](https://ai-sdk.dev/docs/getting-started) with their [OpenAI provider](https://ai-sdk.dev/providers/ai-sdk-providers/openai):
```bash
# Using npm
@@ -51,15 +51,15 @@ bun add ai @ai-sdk/openai
## Usage
-ScrapeGraph sdk can be used like any other tools, see [vercel ai tool calling doc](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling)
+The ScrapeGraph SDK can be used like any other tool. See [Vercel AI tool calling docs](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling).
```ts
import { z } from "zod";
import { generateText, tool } from "ai";
import { openai } from "@ai-sdk/openai";
-import { smartScraper } from "scrapegraph-js";
+import { scrapegraphai } from "scrapegraph-js";
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: process.env.SGAI_API_KEY });
const ArticleSchema = z.object({
title: z.string().describe("The article title"),
@@ -77,15 +77,14 @@ const result = await generateText({
model: openai("gpt-4.1-mini"),
tools: {
scrape: tool({
- description: "Get articles information for a given url.",
+ description: "Extract articles information from a given URL.",
parameters: z.object({
- url: z.string().describe("The exact url."),
+ url: z.string().describe("The exact URL."),
}),
execute: async ({ url }) => {
- const response = await smartScraper(apiKey, {
- website_url: url,
- user_prompt: "Extract the article information",
- output_schema: ArticlesArraySchema,
+ const response = await sgai.extract(url, {
+ prompt: "Extract the article information",
+ schema: ArticlesArraySchema,
});
return response.data;
},
@@ -97,8 +96,6 @@ const result = await generateText({
console.log(result);
```
-**TODO ADD THE LOGS**
-
## Support
Need help with the integration?
@@ -107,7 +104,7 @@ Need help with the integration?
Report bugs and request features
diff --git a/introduction.mdx b/introduction.mdx
index d848a3c..ab84d8d 100644
--- a/introduction.mdx
+++ b/introduction.mdx
@@ -33,7 +33,7 @@ description: 'Welcome to ScrapeGraphAI - AI-Powered Web Data Extraction'
- Sign up and access your API key from the [dashboard](https://dashboard.scrapegraphai.com)
+ Sign up and access your API key from the [dashboard](https://scrapegraphai.com/dashboard)
Select from our specialized extraction services based on your needs
diff --git a/knowledge-base/account/api-keys.mdx b/knowledge-base/account/api-keys.mdx
index 71d593d..99a3bb0 100644
--- a/knowledge-base/account/api-keys.mdx
+++ b/knowledge-base/account/api-keys.mdx
@@ -7,7 +7,7 @@ Your API key authenticates every request you make to the ScrapeGraphAI API. Keep
## Finding your API key
-1. Log in to the [ScrapeGraphAI dashboard](https://dashboard.scrapegraphai.com).
+1. Log in to the [ScrapeGraphAI dashboard](https://scrapegraphai.com/dashboard).
2. Navigate to **Settings**.
3. Your API key is displayed in the **API Key** section.
@@ -24,9 +24,10 @@ client = Client(api_key="your-api-key")
```
```javascript JavaScript
-import { smartScraper } from "scrapegraph-js";
+import { scrapegraphai } from "scrapegraph-js";
-const result = await smartScraper("your-api-key", url, prompt);
+const sgai = scrapegraphai({ apiKey: "your-api-key" });
+const { data } = await sgai.extract(url, { prompt });
```
```bash cURL
@@ -61,7 +62,7 @@ client = Client(api_key=os.getenv("SGAI_API_KEY"))
If your key has been exposed or you want to rotate it for security:
-1. Go to **Settings** in the [dashboard](https://dashboard.scrapegraphai.com).
+1. Go to **Settings** in the [dashboard](https://scrapegraphai.com/dashboard).
2. Click **Regenerate API Key**.
3. Copy the new key immediately — it will only be shown once.
4. Update all services and environment variables that use the old key.
diff --git a/knowledge-base/account/credits.mdx b/knowledge-base/account/credits.mdx
index 72d38f0..2fd375d 100644
--- a/knowledge-base/account/credits.mdx
+++ b/knowledge-base/account/credits.mdx
@@ -7,14 +7,29 @@ ScrapeGraphAI uses a credit system to measure API usage. Each successful API cal
## Credit costs per service
-| Service | Credits per request |
+| Service | Credits per request | Details |
+|---|---|---|
+| **Scrape** (markdown) | 1 | Basic page scrape returning markdown |
+| **Scrape** (screenshot) | 2 | Page scrape with a screenshot |
+| **Scrape** (branding analysis) | 25 | Full branding analysis of a page |
+| **Extract** | 5 | Structured data extraction |
+| **Search** (no prompt) | 2 per result | Search results without LLM processing |
+| **Search** (with prompt) | 5 per result | Search results processed by an LLM |
+| **Crawl** | 2 startup + per-page scrape cost | Startup fee plus scrape cost for each page |
+| **Monitor** | +5 | Additional credits when a change is detected |
+
+### Proxy modifiers
+
+Using a proxy adds extra credits on top of the base service cost:
+
+| Proxy mode | Additional credits |
|---|---|
-| SmartScraper | 1 |
-| SearchScraper | 5 |
-| Markdownify | 1 |
-| SmartCrawler | 1 per page crawled |
-| Sitemap | 1 |
-| AgenticScraper | Variable |
+| Fast / JS rendering | +0 |
+| Stealth | +4 |
+| JS + Stealth | +5 |
+| Auto (worst case) | +9 |
+
+For a full breakdown of plans and monthly credit allowances, see [Plans & Pricing](/knowledge-base/account/pricing).
Failed requests and requests that return an error are not charged.
@@ -22,7 +37,7 @@ ScrapeGraphAI uses a credit system to measure API usage. Each successful API cal
## Checking your credit balance
-Log in to the [dashboard](https://dashboard.scrapegraphai.com) to see:
+Log in to the [dashboard](https://scrapegraphai.com/dashboard) to see:
- **Remaining credits** for your current billing period
- **Usage history** broken down by service and date
@@ -54,7 +69,7 @@ When your credits are exhausted, the API returns an HTTP `402 Payment Required`
}
```
-Upgrade your plan or purchase additional credits from the [dashboard](https://dashboard.scrapegraphai.com).
+Upgrade your plan or purchase additional credits from the [dashboard](https://scrapegraphai.com/dashboard).
## Tips to reduce credit usage
diff --git a/knowledge-base/account/pricing.mdx b/knowledge-base/account/pricing.mdx
new file mode 100644
index 0000000..77124a4
--- /dev/null
+++ b/knowledge-base/account/pricing.mdx
@@ -0,0 +1,109 @@
+---
+title: Plans & Pricing
+description: 'Overview of ScrapeGraphAI plans, pricing, and what each tier includes'
+---
+
+ScrapeGraphAI offers flexible plans to fit teams of every size — from hobbyists to enterprises. All plans include access to every service; higher tiers unlock more credits, throughput, and support.
+
+## Plans
+
+
+
+ **$0 / month**
+
+ - 500 API credits / month
+ - 10 requests / min
+ - 1 monitor
+ - 1 concurrent crawl
+
+
+
+ **$17 / month** (or $204 / year — save $36)
+
+ - 10,000 API credits / month
+ - 100 requests / min
+ - 5 monitors
+ - 3 concurrent crawls
+
+
+
+ **$85 / month** (or $1,020 / year — save $180)
+
+ - 100,000 API credits / month
+ - 500 requests / min
+ - 25 monitors
+ - 15 concurrent crawls
+ - Basic Proxy Rotation
+
+
+
+ **$425 / month** (or $5,100 / year — save $900)
+
+ - 750,000 API credits / month
+ - 5,000 requests / min
+ - 100 monitors
+ - 50 concurrent crawls
+ - Advanced Proxy Rotation
+ - Priority support
+
+
+
+Need more? **Enterprise** plans offer custom credit volumes, custom rate limits, dedicated support, and SLA guarantees. [Contact us](mailto:contact@scrapegraphai.com) for details.
+
+## Credit costs per service
+
+Every API call consumes credits. The exact cost depends on the service and the options you use.
+
+| Service | Base cost | Details |
+|---|---|---|
+| **Scrape** (markdown) | 1 credit | Basic page scrape returning markdown |
+| **Scrape** (screenshot) | 2 credits | Page scrape with a screenshot |
+| **Scrape** (branding analysis) | 25 credits | Full branding analysis of a page |
+| **Extract** | 5 credits | Structured data extraction |
+| **Search** (no prompt) | 2 credits / result | Search results without LLM processing |
+| **Search** (with prompt) | 5 credits / result | Search results processed by an LLM |
+| **Crawl** | 2 credits startup + per-page scrape cost | Startup fee plus scrape cost for each page |
+| **Monitor** | +5 credits | Additional credits charged when a change is detected |
+
+### Proxy modifiers
+
+Using a proxy adds extra credits on top of the base service cost:
+
+| Proxy mode | Additional credits |
+|---|---|
+| Fast / JS rendering | +0 |
+| Stealth | +4 |
+| JS + Stealth | +5 |
+| Auto (worst case) | +9 |
+
+
+ Failed requests and requests that return an error are **not** charged.
+
+
+## Comparing plans at a glance
+
+| | Free | Starter | Growth | Pro | Enterprise |
+|---|---|---|---|---|---|
+| **Monthly price** | $0 | $17 | $85 | $425 | Custom |
+| **Annual price** | $0 | $204 | $1,020 | $5,100 | Custom |
+| **Credits / month** | 500 | 10,000 | 100,000 | 750,000 | Custom |
+| **Requests / min** | 10 | 100 | 500 | 5,000 | Custom |
+| **Monitors** | 1 | 5 | 25 | 100 | Custom |
+| **Concurrent crawls** | 1 | 3 | 15 | 50 | Custom |
+| **Proxy rotation** | — | — | Basic | Advanced | Custom |
+| **Priority support** | — | — | — | Yes | Yes |
+| **SLA guarantee** | — | — | — | — | Yes |
+
+## Upgrading or downgrading
+
+You can change your plan at any time from the [dashboard](https://scrapegraphai.com/dashboard). When upgrading mid-cycle, you receive the additional credits immediately. Downgrades take effect at the start of the next billing period.
+
+## Annual billing
+
+All paid plans offer an annual billing option with significant savings:
+
+- **Starter** — save $36 / year
+- **Growth** — save $180 / year
+- **Pro** — save $900 / year
+
+Switch to annual billing from the [dashboard](https://scrapegraphai.com/dashboard).
diff --git a/knowledge-base/account/rate-limits.mdx b/knowledge-base/account/rate-limits.mdx
index 5a495d5..8c54205 100644
--- a/knowledge-base/account/rate-limits.mdx
+++ b/knowledge-base/account/rate-limits.mdx
@@ -7,12 +7,15 @@ ScrapeGraphAI enforces rate limits to ensure reliable performance for all users.
## Limits overview
-| Plan | Requests per minute | Concurrent jobs | Monthly credits |
-|---|---|---|---|
-| Free | 5 | 1 | 100 |
-| Starter | 30 | 5 | 5,000 |
-| Pro | 100 | 20 | 50,000 |
-| Enterprise | Custom | Custom | Custom |
+| Plan | Requests per minute | Concurrent crawls | Monitors | Monthly credits |
+|---|---|---|---|---|
+| Free | 10 | 1 | 1 | 500 |
+| Starter | 100 | 3 | 5 | 10,000 |
+| Growth | 500 | 15 | 25 | 100,000 |
+| Pro | 5,000 | 50 | 100 | 750,000 |
+| Enterprise | Custom | Custom | Custom | Custom |
+
+For full pricing details, see [Plans & Pricing](/knowledge-base/account/pricing).
Contact [support](mailto:contact@scrapegraphai.com) for custom limits or high-volume plans.
@@ -59,5 +62,5 @@ def scrape_with_backoff(client, url, prompt, max_retries=5):
## Increasing your limits
-- **Upgrade your plan** from the [dashboard](https://dashboard.scrapegraphai.com) to get higher limits immediately.
+- **Upgrade your plan** from the [dashboard](https://scrapegraphai.com/dashboard) to get higher limits immediately.
- **Enterprise customers** can request custom rate limit configurations by contacting [support](mailto:contact@scrapegraphai.com).
diff --git a/knowledge-base/ai-tools/cursor.mdx b/knowledge-base/ai-tools/cursor.mdx
index 017d321..a2ade0a 100644
--- a/knowledge-base/ai-tools/cursor.mdx
+++ b/knowledge-base/ai-tools/cursor.mdx
@@ -53,14 +53,14 @@ Ask Cursor:
> Write a JavaScript function using scrapegraph-js that extracts product details from an e-commerce page.
```javascript
-import { smartScraper } from "scrapegraph-js";
+import { scrapegraphai } from "scrapegraph-js";
+
+const sgai = scrapegraphai({ apiKey: "your-api-key" });
async function extractProduct(url) {
- return await smartScraper(
- "your-api-key",
- url,
- "Extract the product name, price, and availability"
- );
+ return await sgai.extract(url, {
+ prompt: "Extract the product name, price, and availability",
+ });
}
```
diff --git a/knowledge-base/ai-tools/lovable.mdx b/knowledge-base/ai-tools/lovable.mdx
index ab252ff..4d8b928 100644
--- a/knowledge-base/ai-tools/lovable.mdx
+++ b/knowledge-base/ai-tools/lovable.mdx
@@ -13,7 +13,7 @@ Because Lovable apps run in the browser, API calls to ScrapeGraphAI must be made
### 1. Get your API key
-Log in to the [ScrapeGraphAI dashboard](https://dashboard.scrapegraphai.com) and copy your API key from the Settings page.
+Log in to the [ScrapeGraphAI dashboard](https://scrapegraphai.com/dashboard) and copy your API key from the Settings page.
### 2. Create a Supabase Edge Function
diff --git a/knowledge-base/cli/getting-started.mdx b/knowledge-base/cli/getting-started.mdx
index cb64ee3..d68c913 100644
--- a/knowledge-base/cli/getting-started.mdx
+++ b/knowledge-base/cli/getting-started.mdx
@@ -39,7 +39,7 @@ Package: [just-scrape](https://www.npmjs.com/package/just-scrape) on npm | [GitH
## Setting up your API key
-The CLI needs a ScrapeGraphAI API key. Get one from the [dashboard](https://dashboard.scrapegraphai.com). The CLI checks for it in this order:
+The CLI needs a ScrapeGraphAI API key. Get one from the [dashboard](https://scrapegraphai.com/dashboard). The CLI checks for it in this order:
1. **Environment variable** — `export SGAI_API_KEY="sgai-..."`
2. **`.env` file** — `SGAI_API_KEY=sgai-...` in the project root
@@ -53,19 +53,14 @@ The easiest approach for a new machine is to just run any command — the CLI wi
| Variable | Description | Default |
|---|---|---|
| `SGAI_API_KEY` | ScrapeGraphAI API key | — |
-| `JUST_SCRAPE_API_URL` | Override the API base URL | `https://api.scrapegraphai.com/v1` |
-| `JUST_SCRAPE_TIMEOUT_S` | Request/polling timeout in seconds | `120` |
-| `JUST_SCRAPE_DEBUG` | Set to `1` to enable debug logging to stderr | `0` |
+| `SGAI_API_URL` | Override the API base URL | `https://api.scrapegraphai.com` |
+| `SGAI_TIMEOUT_S` | Request timeout in seconds | `30` |
-## Verify your setup
-
-Run a quick health check to confirm the key is valid:
+Legacy variables (`JUST_SCRAPE_API_URL`, `JUST_SCRAPE_TIMEOUT_S`, `JUST_SCRAPE_DEBUG`) are still bridged.
-```bash
-just-scrape validate
-```
+## Verify your setup
-Check your credit balance:
+Check your credit balance to confirm the key is valid:
```bash
just-scrape credits
@@ -74,7 +69,7 @@ just-scrape credits
## Your first scrape
```bash
-just-scrape smart-scraper https://news.ycombinator.com \
+just-scrape extract https://news.ycombinator.com \
-p "Extract the top 5 story titles and their URLs"
```
diff --git a/knowledge-base/scraping/custom-headers.mdx b/knowledge-base/scraping/custom-headers.mdx
index fd4b482..b69d919 100644
--- a/knowledge-base/scraping/custom-headers.mdx
+++ b/knowledge-base/scraping/custom-headers.mdx
@@ -26,19 +26,18 @@ response = client.smartscraper(
```
```javascript
-import { smartScraper } from "scrapegraph-js";
+import { scrapegraphai } from "scrapegraph-js";
-const result = await smartScraper(
- "your-api-key",
- "https://example.com/protected-page",
- "Extract the main content",
- {
+const sgai = scrapegraphai({ apiKey: "your-api-key" });
+const { data } = await sgai.extract("https://example.com/protected-page", {
+ prompt: "Extract the main content",
+ fetchConfig: {
headers: {
Authorization: "Bearer your-token-here",
Cookie: "session=abc123",
},
- }
-);
+ },
+});
```
See the [headers parameter documentation](/services/additional-parameters/headers) for the full reference.
diff --git a/knowledge-base/scraping/javascript-rendering.mdx b/knowledge-base/scraping/javascript-rendering.mdx
index 5ab0afe..732b6d8 100644
--- a/knowledge-base/scraping/javascript-rendering.mdx
+++ b/knowledge-base/scraping/javascript-rendering.mdx
@@ -26,14 +26,13 @@ response = client.smartscraper(
```
```javascript
-import { smartScraper } from "scrapegraph-js";
-
-const result = await smartScraper(
- "your-api-key",
- "https://example.com/products",
- "Extract all product names and prices",
- { wait_ms: 2000 }
-);
+import { scrapegraphai } from "scrapegraph-js";
+
+const sgai = scrapegraphai({ apiKey: "your-api-key" });
+const { data } = await sgai.extract("https://example.com/products", {
+ prompt: "Extract all product names and prices",
+ fetchConfig: { wait: 2000 },
+});
```
See the [wait_ms parameter documentation](/services/additional-parameters/wait-ms) for more details.
diff --git a/knowledge-base/scraping/pagination.mdx b/knowledge-base/scraping/pagination.mdx
index b086aaa..48466d3 100644
--- a/knowledge-base/scraping/pagination.mdx
+++ b/knowledge-base/scraping/pagination.mdx
@@ -43,19 +43,17 @@ print(f"Total products extracted: {len(all_results)}")
```
```javascript
-import { smartScraper } from "scrapegraph-js";
+import { scrapegraphai } from "scrapegraph-js";
-const apiKey = "your-api-key";
+const sgai = scrapegraphai({ apiKey: "your-api-key" });
const allResults = [];
for (let page = 1; page <= 5; page++) {
const url = `https://example.com/products?page=${page}`;
- const result = await smartScraper(
- apiKey,
- url,
- "Extract all product names and prices on this page"
- );
- allResults.push(...(result?.products ?? []));
+ const { data } = await sgai.extract(url, {
+ prompt: "Extract all product names and prices on this page",
+ });
+ allResults.push(...(data?.products ?? []));
}
```
diff --git a/knowledge-base/scraping/proxy.mdx b/knowledge-base/scraping/proxy.mdx
index 1350b71..2713ad8 100644
--- a/knowledge-base/scraping/proxy.mdx
+++ b/knowledge-base/scraping/proxy.mdx
@@ -1,88 +1,131 @@
---
-title: Scraping behind a proxy
-description: 'Route requests through your own proxy for geo-targeting or privacy'
+title: Proxy & Fetch Configuration
+description: 'Control proxy routing, stealth mode, and geo-targeting with FetchConfig'
---
-Using a proxy lets you route ScrapeGraphAI requests through a specific IP address or geographic location. This is useful for accessing geo-restricted content, bypassing IP-based blocks, or testing region-specific pages.
+In v2, all proxy and fetch behaviour is controlled through the `FetchConfig` object. You can set the proxy strategy (`mode`), country-based geotargeting (`country`), wait times, scrolling, custom headers, and more.
-## How to pass a proxy
+See the [full proxy reference](/services/additional-parameters/proxy) for all available options.
-Use the `proxy` parameter available in SmartScraper, SearchScraper, and Markdownify:
+## Choosing a fetch mode
-```python
-from scrapegraph_py import Client
+The `mode` parameter controls how pages are retrieved:
+
+| Mode | Description |
+|------|-------------|
+| `auto` | Automatically selects the best strategy (default) |
+| `fast` | Direct HTTP fetch, no JS rendering — fastest option |
+| `js` | Headless browser for JavaScript-heavy pages |
+| `direct+stealth` | Residential proxy with stealth headers (no JS) |
+| `js+stealth` | JS rendering + residential/stealth proxy |
+
+## Examples
+
+### Geo-targeted content
+
+Access content from a specific country using the `country` parameter:
+
+
+
+```python Python
+from scrapegraph_py import Client, FetchConfig
client = Client(api_key="your-api-key")
-response = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract the main content",
- proxy="http://username:password@proxy-host:8080",
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract the main content",
+ fetch_config=FetchConfig(country="de"), # Route through Germany
)
```
-```javascript
-import { smartScraper } from "scrapegraph-js";
-
-const result = await smartScraper(
- "your-api-key",
- "https://example.com",
- "Extract the main content",
- {
- proxy: "http://username:password@proxy-host:8080",
- }
-);
+```javascript JavaScript
+import { scrapegraphai } from 'scrapegraph-js';
+
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
+
+const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract the main content',
+ fetchConfig: { country: 'de' },
+});
```
-See the [proxy parameter documentation](/services/additional-parameters/proxy) for the full reference.
+
-## Proxy URL format
+### Stealth mode for protected sites
-```
-http://username:password@host:port
-socks5://username:password@host:port
-```
+Use stealth modes to bypass anti-bot protections:
-If the proxy does not require authentication:
+
-```
-http://host:port
+```python Python
+from scrapegraph_py import Client, FetchConfig
+
+client = Client(api_key="your-api-key")
+
+response = client.scrape(
+ url="https://protected-site.com",
+ format="markdown",
+ fetch_config=FetchConfig(
+ mode="js+stealth",
+ wait=3000,
+ scrolls=3,
+ country="us",
+ ),
+)
```
-## Common use cases
+```javascript JavaScript
+const { data } = await sgai.scrape('https://protected-site.com', {
+ format: 'markdown',
+ fetchConfig: {
+ mode: 'js+stealth',
+ wait: 3000,
+ scrolls: 3,
+ country: 'us',
+ },
+});
+```
-### Geo-targeted content
+
-Access content that is only available in a specific country:
+### Custom headers and cookies
-```python
-# Using a proxy located in Germany
-proxy = "http://user:pass@de-proxy.example.com:8080"
-```
+Pass custom HTTP headers or cookies with your requests:
-### Bypassing IP-based rate limits
+
-If the target website blocks your IP after too many requests, rotate through a pool of proxy IPs:
+```python Python
+from scrapegraph_py import Client, FetchConfig
-```python
-import itertools
+client = Client(api_key="your-api-key")
-proxies = itertools.cycle([
- "http://user:pass@proxy1.example.com:8080",
- "http://user:pass@proxy2.example.com:8080",
- "http://user:pass@proxy3.example.com:8080",
-])
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract product details",
+ fetch_config=FetchConfig(
+ headers={"Accept-Language": "en-US"},
+ cookies={"session": "abc123"},
+ ),
+)
+```
-for url in urls_to_scrape:
- response = client.smartscraper(
- website_url=url,
- user_prompt="Extract the product details",
- proxy=next(proxies),
- )
+```javascript JavaScript
+const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract product details',
+ fetchConfig: {
+ headers: { 'Accept-Language': 'en-US' },
+ cookies: { session: 'abc123' },
+ },
+});
```
+
+
## Tips
-- Use a reputable proxy provider for reliable uptime and performance.
-- Test your proxy connection independently before passing it to ScrapeGraphAI to rule out proxy-side issues.
-- Do not use public/free proxies for sensitive data — they may log or modify your traffic.
+- Start with `mode: "auto"` and only switch to a specific mode if you need to.
+- Use `js+stealth` for sites with strong anti-bot protections.
+- Add `wait` time for pages that load content dynamically after the initial render.
+- Use `scrolls` to trigger lazy-loaded content on infinite-scroll pages.
+- The `country` parameter doesn't affect pricing — credits are charged the same regardless of proxy location.
diff --git a/knowledge-base/troubleshooting/empty-results.mdx b/knowledge-base/troubleshooting/empty-results.mdx
index b74e467..0163d04 100644
--- a/knowledge-base/troubleshooting/empty-results.mdx
+++ b/knowledge-base/troubleshooting/empty-results.mdx
@@ -47,10 +47,10 @@ If you define an `output_schema` with required fields, the LLM will return `null
If you have exhausted your credits or are being rate-limited, the API may return an empty or error response.
-**Fix:** Check your [dashboard](https://dashboard.scrapegraphai.com) for remaining credits and current usage.
+**Fix:** Check your [dashboard](https://scrapegraphai.com/dashboard) for remaining credits and current usage.
## Debugging tips
- Log the full API response — the `result` key contains the extracted data; `status` and `error` keys may contain useful information.
- Test the URL with a simple prompt like `"What is the main heading of this page?"` to verify that extraction works at all.
-- Use the [interactive playground](https://dashboard.scrapegraphai.com) to test your URL and prompt before integrating.
+- Use the [interactive playground](https://scrapegraphai.com/dashboard) to test your URL and prompt before integrating.
diff --git a/knowledge-base/troubleshooting/rate-limiting.mdx b/knowledge-base/troubleshooting/rate-limiting.mdx
index 6a5732f..0363681 100644
--- a/knowledge-base/troubleshooting/rate-limiting.mdx
+++ b/knowledge-base/troubleshooting/rate-limiting.mdx
@@ -28,7 +28,7 @@ When you exceed the rate limit, the API returns an HTTP `429 Too Many Requests`
| Enterprise | Custom | Custom |
- Check the [dashboard](https://dashboard.scrapegraphai.com) for up-to-date limits for your current plan.
+ Check the [dashboard](https://scrapegraphai.com/dashboard) for up-to-date limits for your current plan.
## How to handle rate limits in code
@@ -56,12 +56,14 @@ def scrape_with_retry(url: str, prompt: str, max_retries: int = 3):
### JavaScript — with retry
```javascript
-import { smartScraper } from "scrapegraph-js";
+import { scrapegraphai } from "scrapegraph-js";
-async function scrapeWithRetry(apiKey, url, prompt, retries = 3) {
+const sgai = scrapegraphai({ apiKey: "your-api-key" });
+
+async function scrapeWithRetry(url, prompt, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
- return await smartScraper(apiKey, url, prompt);
+ return await sgai.extract(url, { prompt });
} catch (err) {
if (err.status === 429) {
const wait = Math.pow(2, i) * 1000;
diff --git a/logo/dark.svg b/logo/dark.svg
deleted file mode 100644
index 33285d6..0000000
--- a/logo/dark.svg
+++ /dev/null
@@ -1,145 +0,0 @@
-
-
-
-
diff --git a/logo/light.svg b/logo/light.svg
deleted file mode 100644
index 33285d6..0000000
--- a/logo/light.svg
+++ /dev/null
@@ -1,145 +0,0 @@
-
-
-
-
diff --git a/logos/logo-color.svg b/logos/logo-color.svg
new file mode 100644
index 0000000..6fb828b
--- /dev/null
+++ b/logos/logo-color.svg
@@ -0,0 +1,15 @@
+
diff --git a/logos/logo-dark-alt.svg b/logos/logo-dark-alt.svg
new file mode 100644
index 0000000..cd47e15
--- /dev/null
+++ b/logos/logo-dark-alt.svg
@@ -0,0 +1,15 @@
+
diff --git a/logos/logo-dark.svg b/logos/logo-dark.svg
new file mode 100644
index 0000000..8545571
--- /dev/null
+++ b/logos/logo-dark.svg
@@ -0,0 +1,15 @@
+
diff --git a/logos/logo-light.svg b/logos/logo-light.svg
new file mode 100644
index 0000000..6fb828b
--- /dev/null
+++ b/logos/logo-light.svg
@@ -0,0 +1,15 @@
+
diff --git a/resources/blog.mdx b/resources/blog.mdx
index 0d4aa0a..2076c5e 100644
--- a/resources/blog.mdx
+++ b/resources/blog.mdx
@@ -44,7 +44,7 @@ Master the art of prompt engineering for AI web scraping. This comprehensive gui
## Additional Resources
- **Complete Guide**: [The Art of Prompting](https://scrapegraphai.com/blog/prompt-engineering-guide)
-- **Practice in Playground**: [Test your prompts](https://dashboard.scrapegraphai.com/playground)
+- **Practice in Playground**: [Test your prompts](https://scrapegraphai.com/dashboard)
- **Community Support**: [Discord discussions](https://discord.gg/uJN7TYcpNa)
- **Examples**: Check our [Cookbook](/cookbook/introduction) for real-world implementations
diff --git a/sdks/javascript.mdx b/sdks/javascript.mdx
index ed4fb55..4dd33f2 100644
--- a/sdks/javascript.mdx
+++ b/sdks/javascript.mdx
@@ -1,6 +1,6 @@
---
title: "JavaScript SDK"
-description: "Official JavaScript/TypeScript SDK for ScrapeGraphAI"
+description: "Official JavaScript/TypeScript SDK for ScrapeGraphAI v2"
icon: "js"
---
@@ -22,8 +22,6 @@ icon: "js"
## Installation
-Install the package using npm, pnpm, yarn or bun:
-
```bash
# Using npm
npm i scrapegraph-js
@@ -38,82 +36,77 @@ yarn add scrapegraph-js
bun add scrapegraph-js
```
-## Features
+
+v2 requires **Node.js >= 22**.
+
-- **AI-Powered Extraction**: Smart web scraping with artificial intelligence
-- **Async by Design**: Fully asynchronous architecture
-- **Type Safety**: Built-in TypeScript support with Zod schemas
-- **Zero Exceptions**: All errors wrapped in `ApiResult` — no try/catch needed
-- **Developer Friendly**: Comprehensive error handling and debug logging
+## What's New in v2
-## Quick Start
+- **Factory pattern**: Create a client with `scrapegraphai({ apiKey })` instead of importing individual functions
+- **Renamed methods**: `smartScraper` → `extract`, `searchScraper` → `search`
+- **camelCase parameters**: All params are now camelCase (e.g., `fetchConfig` instead of `fetch_config`)
+- **Throws on error**: Methods return `{ data, requestId }` and throw on failure (no more `ApiResult` wrapper)
+- **Native Zod support**: Pass Zod schemas directly to `schema` parameter
+- **Namespace methods**: `crawl.start()`, `monitor.create()`, etc.
+- **Removed**: `agenticScraper`, `generateSchema`, `sitemap`, `checkHealth`, `markdownify`
-### Basic example
+
+v2 is a breaking release. If you're upgrading from v1, see the [Migration Guide](https://github.com/ScrapeGraphAI/scrapegraph-js/blob/main/MIGRATION.md).
+
-
- Store your API keys securely in environment variables. Use `.env` files and
- libraries like `dotenv` to load them into your app.
-
+## Quick Start
```javascript
-import { smartScraper } from "scrapegraph-js";
-import "dotenv/config";
+import { scrapegraphai } from "scrapegraph-js";
-const apiKey = process.env.SGAI_APIKEY;
+const sgai = scrapegraphai({ apiKey: "your-api-key" });
-const response = await smartScraper(apiKey, {
- website_url: "https://example.com",
- user_prompt: "What does the company do?",
-});
+const { data, requestId } = await sgai.extract(
+ "https://example.com",
+ { prompt: "What does the company do?" }
+);
-if (response.status === "error") {
- console.error("Error:", response.error);
-} else {
- console.log(response.data.result);
-}
+console.log(data);
```
+
+Store your API keys securely in environment variables. Use `.env` files and
+libraries like `dotenv` to load them into your app.
+
+
+### Client Options
+
+| Parameter | Type | Default | Description |
+| ---------- | ------ | -------------------------------- | ------------------------------- |
+| apiKey | string | Required | Your ScrapeGraphAI API key |
+| baseUrl | string | `https://api.scrapegraphai.com` | API base URL |
+| timeout | number | `30000` | Request timeout in ms |
+| maxRetries | number | `2` | Maximum number of retries |
+
## Services
-### SmartScraper
+### extract()
-Extract specific information from any webpage using AI:
+Extract structured data from any webpage using AI. Replaces the v1 `smartScraper` function.
```javascript
-const response = await smartScraper(apiKey, {
- website_url: "https://example.com",
- user_prompt: "Extract the main content",
-});
-```
-
-All functions return an `ApiResult` object:
-```typescript
-type ApiResult = {
- status: "success" | "error";
- data: T | null;
- error?: string;
- elapsedMs: number;
-};
+const { data, requestId } = await sgai.extract(
+ "https://example.com",
+ { prompt: "Extract the main heading and description" }
+);
```
#### Parameters
-| Parameter | Type | Required | Description |
-| --------------- | ------- | -------- | ----------------------------------------------------------------------------------- |
-| apiKey | string | Yes | The ScrapeGraph API Key (first argument). |
-| user_prompt | string | Yes | A textual description of what you want to extract. |
-| website_url | string | No* | The URL of the webpage to scrape. *One of `website_url`, `website_html`, or `website_markdown` is required. |
-| output_schema | object | No | A Zod schema (converted to JSON) that describes the structure of the response. |
-| number_of_scrolls | number | No | Number of scrolls for infinite scroll pages (0-50). |
-| stealth | boolean | No | Enable anti-detection mode (+4 credits). |
-| headers | object | No | Custom HTTP headers. |
-| mock | boolean | No | Enable mock mode for testing. |
-| wait_ms | number | No | Page load wait time in ms (default: 3000). |
-| country_code | string | No | Proxy routing country code (e.g., "us"). |
-
-
-Define a simple schema using Zod:
+| Parameter | Type | Required | Description |
+| -------------------- | ----------- | -------- | -------------------------------------------------------- |
+| url | string | Yes | The URL of the webpage to scrape |
+| options.prompt | string | Yes | A description of what you want to extract |
+| options.schema | ZodSchema / object | No | Zod schema or JSON schema for structured response |
+| options.fetchConfig | FetchConfig | No | Fetch configuration |
+| options.llmConfig | LlmConfig | No | LLM configuration |
+
```javascript
import { z } from "zod";
@@ -122,301 +115,222 @@ const ArticleSchema = z.object({
author: z.string().describe("The author's name"),
publishDate: z.string().describe("Article publication date"),
content: z.string().describe("Main article content"),
- category: z.string().describe("Article category"),
});
-const ArticlesArraySchema = z
- .array(ArticleSchema)
- .describe("Array of articles");
+const { data } = await sgai.extract(
+ "https://example.com/blog/article",
+ {
+ prompt: "Extract the article information",
+ schema: ArticleSchema,
+ }
+);
-const response = await smartScraper(apiKey, {
- website_url: "https://example.com/blog/article",
- user_prompt: "Extract the article information",
- output_schema: ArticlesArraySchema,
-});
-
-console.log(`Title: ${response.data.result.title}`);
-console.log(`Author: ${response.data.result.author}`);
-console.log(`Published: ${response.data.result.publishDate}`);
+console.log(`Title: ${data.title}`);
+console.log(`Author: ${data.author}`);
```
-
-
-Define a complex schema for nested data structures:
-
+
```javascript
-import { z } from "zod";
-
-const EmployeeSchema = z.object({
- name: z.string().describe("Employee's full name"),
- position: z.string().describe("Job title"),
- department: z.string().describe("Department name"),
- email: z.string().describe("Email address"),
-});
-
-const OfficeSchema = z.object({
- location: z.string().describe("Office location/city"),
- address: z.string().describe("Full address"),
- phone: z.string().describe("Contact number"),
-});
-
-const CompanySchema = z.object({
- name: z.string().describe("Company name"),
- description: z.string().describe("Company description"),
- industry: z.string().describe("Industry sector"),
- foundedYear: z.number().describe("Year company was founded"),
- employees: z.array(EmployeeSchema).describe("List of key employees"),
- offices: z.array(OfficeSchema).describe("Company office locations"),
- website: z.string().url().describe("Company website URL"),
-});
+const { data } = await sgai.extract(
+ "https://example.com",
+ {
+ prompt: "Extract the main heading",
+ fetchConfig: {
+ mode: 'js+stealth',
+ wait: 2000,
+ scrolls: 3,
+ },
+ llmConfig: {
+ temperature: 0.3,
+ maxTokens: 1000,
+ },
+ }
+);
+```
+
-const response = await smartScraper(apiKey, {
- website_url: "https://example.com/about",
- user_prompt: "Extract detailed company information including employees and offices",
- output_schema: CompanySchema,
-});
+### search()
-console.log(`Company: ${response.data.result.name}`);
-console.log("\nKey Employees:");
-response.data.result.employees.forEach((employee) => {
- console.log(`- ${employee.name} (${employee.position})`);
-});
+Search the web and extract information. Replaces the v1 `searchScraper` function.
-console.log("\nOffice Locations:");
-response.data.result.offices.forEach((office) => {
- console.log(`- ${office.location}: ${office.address}`);
-});
+```javascript
+const { data } = await sgai.search(
+ "What are the key features and pricing of ChatGPT Plus?",
+ { numResults: 5 }
+);
```
-
+#### Parameters
-
-For modern web applications built with React, Vue, Angular, or other JavaScript frameworks:
+| Parameter | Type | Required | Description |
+| -------------------- | ----------- | -------- | -------------------------------------------------------- |
+| query | string | Yes | The search query |
+| options.numResults | number | No | Number of results (3-20). Default: 5 |
+| options.schema | ZodSchema / object | No | Schema for structured response |
+| options.fetchConfig | FetchConfig | No | Fetch configuration |
+| options.llmConfig | LlmConfig | No | LLM configuration |
+
```javascript
-import { smartScraper } from 'scrapegraph-js';
-import { z } from 'zod';
-
-const apiKey = 'your-api-key';
+import { z } from "zod";
const ProductSchema = z.object({
- name: z.string().describe('Product name'),
- price: z.string().describe('Product price'),
- description: z.string().describe('Product description'),
- availability: z.string().describe('Product availability status')
+ name: z.string().describe("Product name"),
+ price: z.string().describe("Product price"),
+ features: z.array(z.string()).describe("Key features"),
});
-const response = await smartScraper(apiKey, {
- website_url: 'https://example-react-store.com/products/123',
- user_prompt: 'Extract product details including name, price, description, and availability',
- output_schema: ProductSchema,
-});
+const { data } = await sgai.search(
+ "Find information about iPhone 15 Pro",
+ {
+ schema: ProductSchema,
+ numResults: 5,
+ }
+);
-if (response.status === 'error') {
- console.error('Error:', response.error);
-} else {
- console.log('Product:', response.data.result.name);
- console.log('Price:', response.data.result.price);
- console.log('Available:', response.data.result.availability);
-}
+console.log(`Product: ${data.name}`);
+console.log(`Price: ${data.price}`);
```
-
-### SearchScraper
+### scrape()
-Search and extract information from multiple web sources using AI:
+Convert any webpage to markdown, HTML, screenshot, or branding format.
```javascript
-const response = await searchScraper(apiKey, {
- user_prompt: "Find the best restaurants in San Francisco",
- location_geo_code: "us",
- time_range: "past_week",
-});
+const { data } = await sgai.scrape("https://example.com");
+console.log(data);
```
#### Parameters
-| Parameter | Type | Required | Description |
-| ------------------ | ------- | -------- | ---------------------------------------------------------------------------------- |
-| apiKey | string | Yes | The ScrapeGraph API Key (first argument). |
-| user_prompt | string | Yes | A textual description of what you want to achieve. |
-| num_results | number | No | Number of websites to search (3-20). Default: 3. |
-| extraction_mode | boolean | No | **true** = AI extraction mode (10 credits/page), **false** = markdown mode (2 credits/page). |
-| output_schema | object | No | Zod schema for structured response format (AI extraction mode only). |
-| location_geo_code | string | No | Geo code for location-based search (e.g., "us"). |
-| time_range | string | No | Time range filter. Options: "past_hour", "past_24_hours", "past_week", "past_month", "past_year". |
+| Parameter | Type | Required | Description |
+| -------------------- | ----------- | -------- | -------------------------------------------------------- |
+| url | string | Yes | The URL of the webpage to scrape |
+| options.format | string | No | `"markdown"`, `"html"`, `"screenshot"`, `"branding"` |
+| options.fetchConfig | FetchConfig | No | Fetch configuration |
-
-Define a simple schema using Zod:
+### crawl
-```javascript
-import { z } from "zod";
+Manage multi-page crawl operations asynchronously.
-const ArticleSchema = z.object({
- title: z.string().describe("The article title"),
- author: z.string().describe("The author's name"),
- publishDate: z.string().describe("Article publication date"),
- content: z.string().describe("Main article content"),
- category: z.string().describe("Article category"),
+```javascript
+// Start a crawl
+const job = await sgai.crawl.start("https://example.com", {
+ maxDepth: 2,
+ maxPages: 10,
+ includePatterns: ["/blog/*", "/docs/**"],
+ excludePatterns: ["/admin/*", "/api/*"],
});
+console.log(`Crawl started: ${job.data.id}`);
-const response = await searchScraper(apiKey, {
- user_prompt: "Find news about the latest trends in AI",
- output_schema: ArticleSchema,
- location_geo_code: "us",
- time_range: "past_week",
-});
+// Check status
+const status = await sgai.crawl.status(job.data.id);
+console.log(`Status: ${status.data.status}`);
-console.log(`Title: ${response.data.result.title}`);
-console.log(`Author: ${response.data.result.author}`);
-console.log(`Published: ${response.data.result.publishDate}`);
+// Stop / Resume
+await sgai.crawl.stop(job.data.id);
+await sgai.crawl.resume(job.data.id);
```
-
+### monitor
-
-Define a complex schema for nested data structures:
+Create and manage site monitoring jobs.
```javascript
-import { z } from "zod";
-
-const EmployeeSchema = z.object({
- name: z.string().describe("Employee's full name"),
- position: z.string().describe("Job title"),
- department: z.string().describe("Department name"),
- email: z.string().describe("Email address"),
+// Create a monitor
+const monitor = await sgai.monitor.create({
+ url: "https://example.com",
+ prompt: "Track price changes",
+ schedule: "daily",
});
-const OfficeSchema = z.object({
- location: z.string().describe("Office location/city"),
- address: z.string().describe("Full address"),
- phone: z.string().describe("Contact number"),
-});
-
-const RestaurantSchema = z.object({
- name: z.string().describe("Restaurant name"),
- address: z.string().describe("Restaurant address"),
- rating: z.number().describe("Restaurant rating"),
- website: z.string().url().describe("Restaurant website URL"),
-});
+// List all monitors
+const monitors = await sgai.monitor.list();
-const response = await searchScraper(apiKey, {
- user_prompt: "Find the best restaurants in San Francisco",
- output_schema: RestaurantSchema,
- location_geo_code: "us",
- time_range: "past_month",
-});
+// Get / Pause / Resume / Delete
+const details = await sgai.monitor.get(monitor.data.id);
+await sgai.monitor.pause(monitor.data.id);
+await sgai.monitor.resume(monitor.data.id);
+await sgai.monitor.delete(monitor.data.id);
```
-
+### credits()
-
-Use markdown mode for cost-effective content gathering:
+Check your account credit balance.
```javascript
-import { searchScraper } from 'scrapegraph-js';
+const { data } = await sgai.credits();
+console.log(`Remaining: ${data.remainingCredits}`);
+console.log(`Used: ${data.totalCreditsUsed}`);
+```
-const apiKey = 'your-api-key';
+### history()
-const response = await searchScraper(apiKey, {
- user_prompt: 'Latest developments in artificial intelligence',
- num_results: 3,
- extraction_mode: false,
- location_geo_code: "us",
- time_range: "past_week",
+Retrieve paginated request history.
+
+```javascript
+const { data } = await sgai.history({
+ endpoint: "extract",
+ status: "completed",
+ limit: 20,
+ offset: 0,
});
-if (response.status === 'error') {
- console.error('Error:', response.error);
-} else {
- const markdownContent = response.data.markdown_content;
- console.log('Markdown content length:', markdownContent.length);
- console.log('Reference URLs:', response.data.reference_urls);
- console.log('Content preview:', markdownContent.substring(0, 500) + '...');
-}
+data.items.forEach((entry) => {
+ console.log(`${entry.createdAt} - ${entry.endpoint} - ${entry.status}`);
+});
```
-**Markdown Mode Benefits:**
-- **Cost-effective**: Only 2 credits per page (vs 10 credits for AI extraction)
-- **Full content**: Get complete page content in markdown format
-- **Faster**: No AI processing overhead
-- **Perfect for**: Content analysis, bulk data collection, building datasets
+## Configuration Objects
-
+### FetchConfig
-
-Filter search results by date range to get only recent information:
+Controls how pages are fetched. See the [proxy configuration guide](/services/additional-parameters/proxy) for details on modes and geotargeting.
```javascript
-import { searchScraper } from 'scrapegraph-js';
-
-const apiKey = 'your-api-key';
-
-const response = await searchScraper(apiKey, {
- user_prompt: 'Latest news about AI developments',
- num_results: 5,
- time_range: 'past_week', // Options: 'past_hour', 'past_24_hours', 'past_week', 'past_month', 'past_year'
-});
-
-if (response.status === 'error') {
- console.error('Error:', response.error);
-} else {
- console.log('Recent AI news:', response.data.result);
- console.log('Reference URLs:', response.data.reference_urls);
+{
+ mode: 'js+stealth', // Proxy strategy: auto, fast, js, direct+stealth, js+stealth
+ timeout: 15000, // Request timeout in ms (1000-60000)
+ wait: 2000, // Wait after page load in ms (0-30000)
+ scrolls: 3, // Number of scrolls (0-100)
+ country: 'us', // Proxy country code (ISO 3166-1 alpha-2)
+ headers: { 'X-Custom': 'header' },
+ cookies: { key: 'value' },
+ mock: false, // Enable mock mode for testing
}
```
-**Time Range Options:**
-- `past_hour` - Results from the past hour
-- `past_24_hours` - Results from the past 24 hours
-- `past_week` - Results from the past week
-- `past_month` - Results from the past month
-- `past_year` - Results from the past year
-
-**Use Cases:**
-- Finding recent news and updates
-- Tracking time-sensitive information
-- Getting latest product releases
-- Monitoring recent market changes
-
-
-
-### Markdownify
+### LlmConfig
-Convert any webpage into clean, formatted markdown:
+Controls LLM behavior for AI-powered methods.
```javascript
-const response = await markdownify(apiKey, {
- website_url: "https://example.com",
-});
+{
+ model: "gpt-4o-mini", // LLM model to use
+ temperature: 0.3, // Response creativity (0-1)
+ maxTokens: 1000, // Maximum response tokens
+ chunker: { // Content chunking strategy
+ size: "dynamic", // Chunk size (number or "dynamic")
+ overlap: 100, // Overlap between chunks
+ },
+}
```
-#### Parameters
-
-| Parameter | Type | Required | Description |
-| ----------- | ------- | -------- | ---------------------------------------------- |
-| apiKey | string | Yes | The ScrapeGraph API Key (first argument). |
-| website_url | string | Yes | The URL of the webpage to convert to markdown. |
-| wait_ms | number | No | Page load wait time in ms (default: 3000). |
-| stealth | boolean | No | Enable anti-detection mode (+4 credits). |
-| country_code| string | No | Proxy routing country code (e.g., "us"). |
+## Error Handling
-## API Credits
-
-Check your available API credits:
+v2 throws errors instead of returning `ApiResult`. Use try/catch:
```javascript
-import { getCredits } from "scrapegraph-js";
-
-const credits = await getCredits(apiKey);
-
-if (credits.status === "error") {
- console.error("Error fetching credits:", credits.error);
-} else {
- console.log("Remaining credits:", credits.data.remaining_credits);
- console.log("Total used:", credits.data.total_credits_used);
+try {
+ const { data, requestId } = await sgai.extract(
+ "https://example.com",
+ { prompt: "Extract the title" }
+ );
+ console.log(data);
+} catch (err) {
+ console.error(`Request failed: ${err.message}`);
}
```
@@ -438,9 +352,3 @@ if (credits.status === "error") {
Get help from our development team
-
-
- This project is licensed under the MIT License. See the
- [LICENSE](https://github.com/ScrapeGraphAI/scrapegraph-js/blob/main/LICENSE)
- file for details.
-
diff --git a/sdks/mocking.mdx b/sdks/mocking.mdx
index 592de50..ef82ad4 100644
--- a/sdks/mocking.mdx
+++ b/sdks/mocking.mdx
@@ -1,594 +1,265 @@
---
title: 'Mocking & Testing'
-description: 'Test ScrapeGraphAI functionality in an isolated environment without consuming API credits'
+description: 'Test ScrapeGraphAI v2 functionality without consuming API credits'
icon: 'test-tube'
---
-
-
-
- Test your code without making real API calls
+
+ Use familiar testing tools for mocking
-
- Override responses for specific endpoints
+
+ Test without consuming API credits
## Overview
-A mock environment is an isolated test environment. You can use mock mode to test ScrapeGraphAI functionality in your application, and experiment with new features without affecting your live integration or consuming API credits. For example, when testing in mock mode, the scraping requests you create aren't processed by our servers or counted against your credit usage.
-
-## Use cases
-
-Mock mode provides an environment for testing various functionalities and scenarios without the implications of real API calls. Below are some common use cases for mocking in your ScrapeGraphAI integrations:
-
-| Scenario | Description |
-|----------|-------------|
-| **Simulate scraping responses to test without real API calls** | Use mock mode to test scraping functionality without real API calls. Create mock responses in your application to test data processing logic or use custom handlers to simulate various response scenarios. |
-| **Scale isolated testing for teams** | Your team can test in separate mock environments to make sure that data and actions are completely isolated from other tests. Changes made in one mock configuration don't interfere with changes in another. |
-| **Test without API key requirements** | You can test your integration without providing real API keys, making it easier for external developers, implementation partners, or design agencies to work with your code without access to your live API credentials. |
-| **Test in development or CI/CD pipelines** | Access mock mode from your development environment or continuous integration pipelines. Test ScrapeGraphAI functionality directly in your code or use familiar testing frameworks and fixtures. |
+In v2, the built-in mock mode (`mock=True`, `mock_handler`, `mock_responses`) has been removed from the SDKs. Instead, use standard mocking libraries for your language to test ScrapeGraphAI integrations without making real API calls or consuming credits.
-## Test in mock mode
+
+If you're migrating from v1, replace `Client(mock=True)` with standard mocking patterns shown below.
+
-You can simulate scraping responses and use mock data to test your integration without consuming API credits. Learn more about using mock responses to confirm that your integration works correctly.
+## Python SDK Testing
-## Basic Mock Usage
-
-Enable mock mode by setting `mock=True` when initializing the client:
+### Using `unittest.mock`
```python
+from unittest.mock import patch, MagicMock
from scrapegraph_py import Client
-from scrapegraph_py.logger import sgai_logger
-
-# Set logging level for better visibility
-sgai_logger.set_logging(level="INFO")
-
-def basic_mock_usage():
- # Initialize the client with mock mode enabled
- client = Client.from_env(mock=True)
-
- print("\n-- get_credits (mock) --")
- print(client.get_credits())
-
- print("\n-- markdownify (mock) --")
- md = client.markdownify(website_url="https://example.com")
- print(md)
-
- print("\n-- get_markdownify (mock) --")
- md_status = client.get_markdownify("00000000-0000-0000-0000-000000000123")
- print(md_status)
-
- print("\n-- smartscraper (mock) --")
- ss = client.smartscraper(user_prompt="Extract title", website_url="https://example.com")
- print(ss)
-
-if __name__ == "__main__":
- basic_mock_usage()
-```
-
-
-When mock mode is enabled, all API calls return predefined mock responses instead of making real HTTP requests. This ensures your tests run quickly and don't consume API credits.
-
-## Custom Response Overrides
+def test_extract():
+ client = Client(api_key="test-key")
-You can override specific endpoint responses using the `mock_responses` parameter:
-
-```python
-def mock_with_path_overrides():
- # Initialize the client with mock mode and custom responses
- client = Client.from_env(
- mock=True,
- mock_responses={
- "/v1/credits": {"remaining_credits": 42, "total_credits_used": 58, "mock": true}
+ mock_response = {
+ "data": {
+ "title": "Test Page",
+ "content": "This is test content"
},
- )
-
- print("\n-- get_credits with override (mock) --")
- print(client.get_credits())
-```
+ "request_id": "test-request-123"
+ }
-
-You can override responses for any endpoint by providing the path and expected response:
+ with patch.object(client, "extract", return_value=mock_response):
+ response = client.extract(
+ url="https://example.com",
+ prompt="Extract title and content"
+ )
-```python
-client = Client.from_env(
- mock=True,
- mock_responses={
- "/v1/credits": {
- "remaining_credits": 100,
- "total_credits_used": 0,
- "mock": true
- },
- "/v1/smartscraper/start": {
- "job_id": "mock-job-123",
- "status": "processing",
- "mock": true
- },
- "/v1/smartscraper/status/mock-job-123": {
- "job_id": "mock-job-123",
- "status": "completed",
- "result": {
- "title": "Mock Title",
- "content": "Mock content from the webpage",
- "mock": true
- }
- },
- "/v1/markdownify/start": {
- "job_id": "mock-markdown-456",
- "status": "processing",
- "mock": true
- },
- "/v1/markdownify/status/mock-markdown-456": {
- "job_id": "mock-markdown-456",
- "status": "completed",
- "result": "# Mock Markdown\n\nThis is mock markdown content.",
- "mock": true
- }
- }
-)
+ assert response["data"]["title"] == "Test Page"
+ assert response["request_id"] == "test-request-123"
```
-
-## Custom Handler Functions
+### Using `responses` Library
-For more complex mocking scenarios, you can provide a custom handler function:
+Mock HTTP requests at the transport layer:
```python
-def mock_with_custom_handler():
- def handler(method, url, kwargs):
- return {"handled_by": "custom_handler", "method": method, "url": url}
-
- # Initialize the client with mock mode and custom handler
- client = Client.from_env(mock=True, mock_handler=handler)
+import responses
+from scrapegraph_py import Client
- print("\n-- searchscraper via custom handler (mock) --")
- resp = client.searchscraper(user_prompt="Search something")
- print(resp)
-```
+@responses.activate
+def test_extract_http():
+ responses.post(
+ "https://api.scrapegraphai.com/api/v2/extract",
+ json={
+ "data": {"title": "Mock Title"},
+ "request_id": "mock-123"
+ },
+ status=200,
+ )
-
-Create sophisticated mock responses based on request parameters:
+ client = Client(api_key="test-key")
+ response = client.extract(
+ url="https://example.com",
+ prompt="Extract the title"
+ )
-```python
-def advanced_custom_handler():
- def smart_handler(method, url, kwargs):
- # Handle different endpoints with custom logic
- if "/v1/credits" in url:
- return {
- "remaining_credits": 50,
- "total_credits_used": 50,
- "mock": true
- }
- elif "/v1/smartscraper" in url:
- # Extract user_prompt from kwargs to create contextual responses
- user_prompt = kwargs.get("user_prompt", "")
- if "title" in user_prompt.lower():
- return {
- "job_id": "mock-title-job",
- "status": "completed",
- "result": {
- "title": "Extracted Title",
- "content": "This is the extracted content",
- "mock": true
- }
- }
- else:
- return {
- "job_id": "mock-generic-job",
- "status": "completed",
- "result": {
- "data": "Generic extracted data",
- "mock": true
- }
- }
- else:
- return {"error": "Unknown endpoint", "url": url}
-
- client = Client.from_env(mock=True, mock_handler=smart_handler)
-
- # Test different scenarios
- print("Credits:", client.get_credits())
- print("Title extraction:", client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract the title"
- ))
- print("Generic extraction:", client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract some data"
- ))
+ assert response["data"]["title"] == "Mock Title"
```
-
-## Testing Best Practices
-
-### Unit Testing with Mocks
+### Using `pytest` Fixtures
```python
-import unittest
-from unittest.mock import patch
+import pytest
+from unittest.mock import MagicMock
from scrapegraph_py import Client
-class TestScrapeGraphAI(unittest.TestCase):
- def setUp(self):
- self.client = Client.from_env(mock=True)
-
- def test_get_credits(self):
- credits = self.client.get_credits()
- self.assertIn("remaining_credits", credits)
- self.assertIn("total_credits_used", credits)
-
- def test_smartscraper_with_schema(self):
- from pydantic import BaseModel, Field
-
- class TestSchema(BaseModel):
- title: str = Field(description="Page title")
- content: str = Field(description="Page content")
-
- response = self.client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract title and content",
- output_schema=TestSchema
- )
-
- self.assertIsInstance(response, TestSchema)
- self.assertIsNotNone(response.title)
- self.assertIsNotNone(response.content)
-
-if __name__ == "__main__":
- unittest.main()
-```
-
-### Integration Testing
-
-```python
-def test_integration_flow():
- """Test a complete workflow using mocks"""
- client = Client.from_env(
- mock=True,
- mock_responses={
- "/v1/credits": {"remaining_credits": 10, "total_credits_used": 90, "mock": true},
- "/v1/smartscraper/start": {
- "job_id": "test-job-123",
- "status": "processing",
- "mock": true
- },
- "/v1/smartscraper/status/test-job-123": {
- "job_id": "test-job-123",
- "status": "completed",
- "result": {
- "title": "Test Page",
- "content": "Test content",
- "mock": true
- }
- }
- }
- )
-
- # Test the complete flow
- credits = client.get_credits()
- assert credits["remaining_credits"] == 10
-
- # Start a scraping job
- job = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract title and content"
+@pytest.fixture
+def mock_client():
+ client = Client(api_key="test-key")
+ client.extract = MagicMock(return_value={
+ "data": {"title": "Mock Title"},
+ "request_id": "mock-123"
+ })
+ client.search = MagicMock(return_value={
+ "data": {"results": []},
+ "request_id": "mock-456"
+ })
+ client.credits = MagicMock(return_value={
+ "remaining_credits": 100,
+ "total_credits_used": 0
+ })
+ return client
+
+def test_extract(mock_client):
+ response = mock_client.extract(
+ url="https://example.com",
+ prompt="Extract the title"
)
-
- # Check job status
- status = client.get_smartscraper("test-job-123")
- assert status["status"] == "completed"
- assert "title" in status["result"]
-```
-
-## Environment Variables
-
-You can also control mocking through environment variables:
-
-```bash
-# Enable mock mode via environment variable
-export SGAI_MOCK=true
+ assert response["data"]["title"] == "Mock Title"
-# Set custom mock responses (JSON format)
-export SGAI_MOCK_RESPONSES='{"\/v1\/credits": {"remaining_credits": 100, "mock": true}}'
+def test_credits(mock_client):
+ credits = mock_client.credits()
+ assert credits["remaining_credits"] == 100
```
-```python
-# The client will automatically detect mock mode from environment
-client = Client.from_env() # Will use mock mode if SGAI_MOCK=true
-```
-
-## Async Mocking
-
-Mocking works seamlessly with async clients:
+### Async Testing with `aioresponses`
```python
+import pytest
import asyncio
+from aioresponses import aioresponses
from scrapegraph_py import AsyncClient
-async def async_mock_example():
- async with AsyncClient(mock=True) as client:
- # All async methods work with mocks
- credits = await client.get_credits()
- print(f"Mock credits: {credits}")
-
- response = await client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract data"
+@pytest.mark.asyncio
+async def test_async_extract():
+ with aioresponses() as mocked:
+ mocked.post(
+ "https://api.scrapegraphai.com/api/v2/extract",
+ payload={
+ "data": {"title": "Async Mock"},
+ "request_id": "async-123"
+ },
)
- print(f"Mock response: {response}")
-
-# Run the async example
-asyncio.run(async_mock_example())
-```
-
-## HTTP Method Mocking with cURL
-
-You can also test ScrapeGraphAI endpoints directly using cURL with mock responses. This is useful for testing API integrations without using SDKs.
-
-### Basic cURL Mock Usage
-
-```bash
-# Enable mock mode via environment variable
-export SGAI_MOCK=true
-
-# Test credits endpoint with mock
-curl -X GET "https://api.scrapegraph.ai/v1/credits" \
- -H "Authorization: Bearer $SGAI_API_KEY" \
- -H "Content-Type: application/json"
-```
-### Custom Mock Responses with cURL
+ async with AsyncClient(api_key="test-key") as client:
+ response = await client.extract(
+ url="https://example.com",
+ prompt="Extract data"
+ )
-```bash
-# Set custom mock responses via environment variable
-export SGAI_MOCK_RESPONSES='{
- "/v1/credits": {
- "remaining_credits": 100,
- "total_credits_used": 0,
- "mock": true
- },
-}'
-
-# Test smartscraper endpoint
-curl -X POST "https://api.scrapegraph.ai/v1/smartscraper/" \
- -H "Authorization: Bearer $SGAI_API_KEY" \
- -H "Content-Type: application/json" \
- -d '{
- "website_url": "https://example.com",
- "user_prompt": "Extract title and content"
- "mock": true
- }'
+ assert response["data"]["title"] == "Async Mock"
```
-### Testing Different HTTP Methods
+## JavaScript SDK Testing
-```bash
-# POST request - to smartscraper
-curl --location 'https://api.scrapegraphai.com/v1/smartscraper' \
---data '{
- "website_url": "https://www.scrapegraphai.com//",
- "user_prompt": "Extract founder info ",
- "mock":true
-}'
-```
+### Using Jest / Vitest
-```bash
-# POST request - to Markdownify
-curl --location 'https://api.scrapegraphai.com/v1/markdownify' \
---data '{
- "website_url": "https://www.scrapegraphai.com//",
- "mock":true
-}'
-```
-
-```bash
-# POST request - to SearchScraper
-curl --location 'https://api.scrapegraphai.com/v1/searchscraper' \
---data '{
- "website_url": "https://www.scrapegraphai.com//",
- "mock":true
- "output_schema":{},
- "num_results":3,
-}'
+```javascript
+import { describe, it, expect, vi } from "vitest";
+import { scrapegraphai } from "scrapegraph-js";
+
+// Mock the module
+vi.mock("scrapegraph-js", () => ({
+ scrapegraphai: vi.fn(() => ({
+ extract: vi.fn().mockResolvedValue({
+ data: { title: "Mock Title" },
+ requestId: "mock-123",
+ }),
+ search: vi.fn().mockResolvedValue({
+ data: { results: [] },
+ requestId: "mock-456",
+ }),
+ credits: vi.fn().mockResolvedValue({
+ data: { remainingCredits: 100 },
+ }),
+ })),
+}));
+
+describe("ScrapeGraphAI", () => {
+ const sgai = scrapegraphai({ apiKey: "test-key" });
+
+ it("should extract data", async () => {
+ const { data } = await sgai.extract("https://example.com", {
+ prompt: "Extract the title",
+ });
+ expect(data.title).toBe("Mock Title");
+ });
+
+ it("should check credits", async () => {
+ const { data } = await sgai.credits();
+ expect(data.remainingCredits).toBe(100);
+ });
+});
```
+### Using MSW (Mock Service Worker)
-## JavaScript SDK Mocking
-
-The JavaScript SDK supports per-request mocking via the `mock` parameter. Pass `mock: true` in the params object of any function to receive mock data instead of making a real API call.
-
-### Per-Request Mock Mode
+Mock at the network level for more realistic testing:
```javascript
-import { smartScraper, scrape, searchScraper, getCredits } from 'scrapegraph-js';
-
-const API_KEY = 'your-api-key';
-
-// SmartScraper with mock
-const smartResult = await smartScraper(API_KEY, {
- website_url: 'https://example.com',
- user_prompt: 'Extract the title',
- mock: true,
-});
-console.log('SmartScraper mock:', smartResult.data);
-
-// Scrape with mock
-const scrapeResult = await scrape(API_KEY, {
- website_url: 'https://example.com',
- mock: true,
-});
-console.log('Scrape mock:', scrapeResult.data);
-
-// SearchScraper with mock
-const searchResult = await searchScraper(API_KEY, {
- user_prompt: 'Find AI news',
- mock: true,
+import { http, HttpResponse } from "msw";
+import { setupServer } from "msw/node";
+import { scrapegraphai } from "scrapegraph-js";
+
+const server = setupServer(
+ http.post("https://api.scrapegraphai.com/api/v2/extract", () => {
+ return HttpResponse.json({
+ data: { title: "MSW Mock Title" },
+ requestId: "msw-123",
+ });
+ }),
+ http.get("https://api.scrapegraphai.com/api/v2/credits", () => {
+ return HttpResponse.json({
+ data: { remainingCredits: 50, totalCreditsUsed: 50 },
+ });
+ })
+);
+
+beforeAll(() => server.listen());
+afterAll(() => server.close());
+afterEach(() => server.resetHandlers());
+
+test("extract returns mocked data", async () => {
+ const sgai = scrapegraphai({ apiKey: "test-key" });
+ const { data } = await sgai.extract("https://example.com", {
+ prompt: "Extract the title",
+ });
+ expect(data.title).toBe("MSW Mock Title");
});
-console.log('SearchScraper mock:', searchResult.data);
```
-
-The JavaScript SDK does not have global mock functions like `enableMock()` or `setMockResponses()`. Mock mode is controlled per-request via the `mock: true` parameter. All functions return `ApiResult` — errors are never thrown.
-
-
-## SDK Comparison
-
-
-
- - `Client(mock=True)` initialization
- - `mock_responses` parameter for overrides
- - `mock_handler` for custom logic
- - Environment variable: `SGAI_MOCK=true`
-
-
- - `mock: true` in per-request params
- - All functions support mock parameter
- - Native async/await
-
-
- - Environment variable: `SGAI_MOCK=true`
- - `SGAI_MOCK_RESPONSES` for custom responses
- - Direct HTTP method testing
- - No SDK dependencies required
-
-
-
-### Feature Comparison
-
-| Feature | Python SDK | JavaScript SDK | cURL/HTTP |
-|---------|------------|----------------|-----------|
-| **Global Mock Mode** | `Client(mock=True)` | N/A | `SGAI_MOCK=true` |
-| **Per-Request Mock** | `{mock: True}` in params | `mock: true` in params | N/A |
-| **Custom Responses** | `mock_responses` dict | N/A | `SGAI_MOCK_RESPONSES` |
-| **Custom Handler** | `mock_handler` function | N/A | N/A |
-| **Environment Variable** | `SGAI_MOCK=true` | N/A | `SGAI_MOCK=true` |
-| **Async Support** | `AsyncClient(mock=True)` | Native async/await | N/A |
-| **Dependencies** | Python SDK required | JavaScript SDK required | None |
-
-## Limitations
-
-* You can't test real-time scraping performance in mock mode.
-* Mock responses don't reflect actual website changes or dynamic content.
-* Rate limiting and credit consumption are not simulated in mock mode.
-* Some advanced features may behave differently in mock mode compared to live mode.
-
-## Troubleshooting
-
-
+## Testing with cURL
-### Mock responses not working
-- Ensure `mock=True` is set when initializing the client
-- Check that your mock response paths match the actual API endpoints
-- Verify the response format matches the expected schema
+Test API endpoints directly using cURL against a local mock server or staging environment:
-### Custom handler not being called
-- Make sure you're passing the `mock_handler` parameter correctly
-- Check that your handler function accepts the correct parameters: `(method, url, kwargs)`
-- Ensure the handler returns a valid response object
+```bash
+# Test extract endpoint
+curl -X POST "https://api.scrapegraphai.com/api/v2/extract" \
+ -H "Authorization: Bearer your-api-key" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "url": "https://example.com",
+ "prompt": "Extract the title"
+ }'
-### Schema validation errors
-- Mock responses must match the expected Pydantic schema structure
-- Use the same field names and types as defined in your schema
-- Test your mock responses with the actual schema classes
+# Test credits endpoint
+curl -X GET "https://api.scrapegraphai.com/api/v2/credits" \
+ -H "Authorization: Bearer your-api-key"
+```
-
+## SDK Comparison
-## Examples
+| Feature | Python | JavaScript |
+|---------|--------|------------|
+| **Mock library** | `unittest.mock`, `responses` | Jest/Vitest mocks, MSW |
+| **HTTP-level mocking** | `responses`, `aioresponses` | MSW (Mock Service Worker) |
+| **Async mocking** | `aioresponses`, `unittest.mock` | Native async/await |
+| **Fixture support** | pytest fixtures | beforeEach/afterEach |
-
-Here's a complete example showing all mocking features:
+## Best Practices
-```python
-from scrapegraph_py import Client
-from scrapegraph_py.logger import sgai_logger
-from pydantic import BaseModel, Field
-from typing import List
-
-# Set up logging
-sgai_logger.set_logging(level="INFO")
-
-class ProductInfo(BaseModel):
- name: str = Field(description="Product name")
- price: str = Field(description="Product price")
- features: List[str] = Field(description="Product features")
-
-def complete_mock_demo():
- # Initialize with comprehensive mock responses
- client = Client.from_env(
- mock=True,
- mock_responses={
- "/v1/credits": {
- "remaining_credits": 25,
- "total_credits_used": 75,
- "mock": true
- },
- "/v1/smartscraper/start": {
- "job_id": "demo-job-789",
- "status": "processing",
- "mock": true
- },
- "/v1/smartscraper/status/demo-job-789": {
- "job_id": "demo-job-789",
- "status": "completed",
- "result": {
- "name": "iPhone 15 Pro",
- "price": "$999",
- "features": [
- "A17 Pro chip",
- "48MP camera system",
- "Titanium design",
- "Action Button"
- ],
- "mock": true
- }
- }
- }
- )
-
- print("=== ScrapeGraphAI Mock Demo ===\n")
-
- # Test credits endpoint
- print("1. Checking credits:")
- credits = client.get_credits()
- print(f" Remaining: {credits['remaining_credits']}")
- print(f" Used: {credits['total_credits_used']}\n")
-
- # Test smartscraper with schema
- print("2. Extracting product information:")
- product = client.smartscraper(
- website_url="https://apple.com/iphone-15-pro",
- user_prompt="Extract product name, price, and key features",
- output_schema=ProductInfo
- )
-
- print(f" Product: {product.name}")
- print(f" Price: {product.price}")
- print(" Features:")
- for feature in product.features:
- print(f" - {feature}")
-
- print("\n3. Testing markdownify:")
- markdown = client.markdownify(website_url="https://example.com")
- print(f" Markdown length: {len(markdown)} characters")
-
- print("\n=== Demo Complete ===")
-
-if __name__ == "__main__":
- complete_mock_demo()
-```
-
+- Mock at the **client method level** for unit tests (fastest, simplest)
+- Mock at the **HTTP level** for integration tests (validates request/response shapes)
+- Use **fixtures** to share mock configurations across tests
+- Keep mock responses **realistic** - match the actual API response structure
+- Test both **success and error** scenarios
## Support
-
+
Report bugs or request features
@@ -596,4 +267,4 @@ if __name__ == "__main__":
-Need help with mocking? Check out our [Python SDK documentation](/sdks/python) or join our [Discord community](https://discord.gg/uJN7TYcpNa) for support.
+Need help with testing? Join our [Discord community](https://discord.gg/uJN7TYcpNa) for support.
diff --git a/sdks/python.mdx b/sdks/python.mdx
index 43da3f2..a780001 100644
--- a/sdks/python.mdx
+++ b/sdks/python.mdx
@@ -1,15 +1,9 @@
---
title: 'Python SDK'
-description: 'Official Python SDK for ScrapeGraphAI'
+description: 'Official Python SDK for ScrapeGraphAI v2'
icon: 'python'
---
-
-
[](https://badge.fury.io/py/scrapegraph-py)
@@ -21,23 +15,23 @@ icon: 'python'
## Installation
-Install the package using pip:
-
```bash
pip install scrapegraph-py
```
-## Features
+## What's New in v2
-- **AI-Powered Extraction**: Advanced web scraping using artificial intelligence
-- **Flexible Clients**: Both synchronous and asynchronous support
-- **Type Safety**: Structured output with Pydantic schemas
-- **Production Ready**: Detailed logging and automatic retries
-- **Developer Friendly**: Comprehensive error handling
+- **Renamed methods**: `smartscraper()` → `extract()`, `searchscraper()` → `search()`
+- **Unified config objects**: `FetchConfig` and `LlmConfig` replace scattered parameters
+- **Namespace methods**: `crawl.start()`, `crawl.status()`, `monitor.create()`, etc.
+- **New endpoints**: `credits()`, `history()`, `crawl.stop()`, `crawl.resume()`
+- **Removed**: `markdownify()`, `agenticscraper()`, `sitemap()`, `healthz()`, `feedback()`, built-in mock mode
-## Quick Start
+
+v2 is a breaking release. If you're upgrading from v1, see the [Migration Guide](https://github.com/ScrapeGraphAI/scrapegraph-py/blob/main/MIGRATION_V2.md).
+
-Initialize the client with your API key:
+## Quick Start
```python
from scrapegraph_py import Client
@@ -49,30 +43,54 @@ client = Client(api_key="your-api-key-here")
You can also set the `SGAI_API_KEY` environment variable and initialize the client without parameters: `client = Client()`
+### Client Options
+
+| Parameter | Type | Default | Description |
+| ------------- | ------ | -------------------------------- | ------------------------------- |
+| api_key | string | `SGAI_API_KEY` env var | Your ScrapeGraphAI API key |
+| base_url | string | `https://api.scrapegraphai.com` | API base URL |
+| verify_ssl | bool | `True` | Verify SSL certificates |
+| timeout | int | `30` | Request timeout in seconds |
+| max_retries | int | `3` | Maximum number of retries |
+| retry_delay | float | `1.0` | Delay between retries (seconds) |
+
+You can also use the `Client.from_env()` class method to create a client from the `SGAI_API_KEY` environment variable:
+
+```python
+client = Client.from_env()
+```
+
+Both `Client` and `AsyncClient` support context managers for automatic session cleanup:
+
+```python
+with Client(api_key="your-api-key") as client:
+ response = client.extract(url="https://example.com", prompt="Extract data")
+```
+
## Services
-### SmartScraper
+### Extract
-Extract specific information from any webpage using AI:
+Extract structured data from any webpage using AI. Replaces the v1 `smartscraper()` method.
```python
-response = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract the main heading and description"
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract the main heading and description"
)
+print(response)
```
#### Parameters
-| Parameter | Type | Required | Description |
-| ---------------- | ------- | -------- | ---------------------------------------------------------------------------------- |
-| website_url | string | Yes | The URL of the webpage that needs to be scraped. |
-| user_prompt | string | Yes | A textual description of what you want to achieve. |
-| output_schema | object | No | The Pydantic object that describes the structure and format of the response. |
-
-
-Define a simple schema for basic data extraction:
+| Parameter | Type | Required | Description |
+| ------------ | ----------- | -------- | -------------------------------------------------------- |
+| url | string | Yes | The URL of the webpage to scrape |
+| prompt | string | Yes | A description of what you want to extract |
+| output_schema| object | No | Pydantic model for structured response |
+| fetch_config | FetchConfig | No | Fetch configuration (stealth, rendering, etc.) |
+
```python
from pydantic import BaseModel, Field
@@ -81,93 +99,38 @@ class ArticleData(BaseModel):
author: str = Field(description="The author's name")
publish_date: str = Field(description="Article publication date")
content: str = Field(description="Main article content")
- category: str = Field(description="Article category")
-response = client.smartscraper(
- website_url="https://example.com/blog/article",
- user_prompt="Extract the article information",
+response = client.extract(
+ url="https://example.com/blog/article",
+ prompt="Extract the article information",
output_schema=ArticleData
)
-print(f"Title: {response.title}")
-print(f"Author: {response.author}")
-print(f"Published: {response.publish_date}")
+print(f"Title: {response['data']['title']}")
+print(f"Author: {response['data']['author']}")
```
-
-Define a complex schema for nested data structures:
+### Search
-```python
-from typing import List
-from pydantic import BaseModel, Field
-
-class Employee(BaseModel):
- name: str = Field(description="Employee's full name")
- position: str = Field(description="Job title")
- department: str = Field(description="Department name")
- email: str = Field(description="Email address")
-
-class Office(BaseModel):
- location: str = Field(description="Office location/city")
- address: str = Field(description="Full address")
- phone: str = Field(description="Contact number")
-
-class CompanyData(BaseModel):
- name: str = Field(description="Company name")
- description: str = Field(description="Company description")
- industry: str = Field(description="Industry sector")
- founded_year: int = Field(description="Year company was founded")
- employees: List[Employee] = Field(description="List of key employees")
- offices: List[Office] = Field(description="Company office locations")
- website: str = Field(description="Company website URL")
-
-# Extract comprehensive company information
-response = client.smartscraper(
- website_url="https://example.com/about",
- user_prompt="Extract detailed company information including employees and offices",
- output_schema=CompanyData
-)
-
-# Access nested data
-print(f"Company: {response.name}")
-print("\nKey Employees:")
-for employee in response.employees:
- print(f"- {employee.name} ({employee.position})")
-
-print("\nOffice Locations:")
-for office in response.offices:
- print(f"- {office.location}: {office.address}")
-```
-
-
-### SearchScraper
-
-Search and extract information from multiple web sources using AI:
+Search the web and extract information from multiple sources. Replaces the v1 `searchscraper()` method.
```python
-from scrapegraph_py.models import TimeRange
-
-response = client.searchscraper(
- user_prompt="What are the key features and pricing of ChatGPT Plus?",
- time_range=TimeRange.PAST_WEEK # Optional: Filter results by time range
+response = client.search(
+ query="What are the key features and pricing of ChatGPT Plus?"
)
```
#### Parameters
-| Parameter | Type | Required | Description |
-| ---------------- | ------- | -------- | ---------------------------------------------------------------------------------- |
-| user_prompt | string | Yes | A textual description of what you want to achieve. |
-| num_results | number | No | Number of websites to search (3-20). Default: 3. |
-| extraction_mode | boolean | No | **True** = AI extraction mode (10 credits/page), **False** = markdown mode (2 credits/page). Default: True |
-| output_schema | object | No | The Pydantic object that describes the structure and format of the response (AI extraction mode only) |
-| location_geo_code| string | No | Optional geo code for location-based search (e.g., "us") |
-| time_range | TimeRange| No | Optional time range filter for search results. Options: TimeRange.PAST_HOUR, TimeRange.PAST_24_HOURS, TimeRange.PAST_WEEK, TimeRange.PAST_MONTH, TimeRange.PAST_YEAR |
-
-
-Define a simple schema for structured search results:
+| Parameter | Type | Required | Description |
+| ------------- | ----------- | -------- | -------------------------------------------------------- |
+| query | string | Yes | The search query |
+| num_results | number | No | Number of results (3-20). Default: 5 |
+| output_schema | object | No | Pydantic model for structured response |
+| fetch_config | FetchConfig | No | Fetch configuration |
+
```python
from pydantic import BaseModel, Field
from typing import List
@@ -177,174 +140,154 @@ class ProductInfo(BaseModel):
description: str = Field(description="Product description")
price: str = Field(description="Product price")
features: List[str] = Field(description="List of key features")
- availability: str = Field(description="Availability information")
-
-from scrapegraph_py.models import TimeRange
-response = client.searchscraper(
- user_prompt="Find information about iPhone 15 Pro",
+response = client.search(
+ query="Find information about iPhone 15 Pro",
output_schema=ProductInfo,
- location_geo_code="us", # Optional: Geo code for location-based search
- time_range=TimeRange.PAST_MONTH # Optional: Filter results by time range
+ num_results=5,
)
-print(f"Product: {response.name}")
-print(f"Price: {response.price}")
-print("\nFeatures:")
-for feature in response.features:
- print(f"- {feature}")
+print(f"Product: {response['data']['name']}")
+print(f"Price: {response['data']['price']}")
```
-
-Define a complex schema for comprehensive market research:
+### Scrape
-```python
-from typing import List
-from pydantic import BaseModel, Field
+Convert any webpage into markdown, HTML, screenshot, or branding format.
-class MarketPlayer(BaseModel):
- name: str = Field(description="Company name")
- market_share: str = Field(description="Market share percentage")
- key_products: List[str] = Field(description="Main products in market")
- strengths: List[str] = Field(description="Company's market strengths")
-
-class MarketTrend(BaseModel):
- name: str = Field(description="Trend name")
- description: str = Field(description="Trend description")
- impact: str = Field(description="Expected market impact")
- timeframe: str = Field(description="Trend timeframe")
-
-class MarketAnalysis(BaseModel):
- market_size: str = Field(description="Total market size")
- growth_rate: str = Field(description="Annual growth rate")
- key_players: List[MarketPlayer] = Field(description="Major market players")
- trends: List[MarketTrend] = Field(description="Market trends")
- challenges: List[str] = Field(description="Industry challenges")
- opportunities: List[str] = Field(description="Market opportunities")
-
-from scrapegraph_py.models import TimeRange
-
-# Perform comprehensive market research
-response = client.searchscraper(
- user_prompt="Analyze the current AI chip market landscape",
- output_schema=MarketAnalysis,
- location_geo_code="us", # Optional: Geo code for location-based search
- time_range=TimeRange.PAST_MONTH # Optional: Filter results by time range
+```python
+response = client.scrape(
+ url="https://example.com"
)
-
-# Access structured market data
-print(f"Market Size: {response.market_size}")
-print(f"Growth Rate: {response.growth_rate}")
-
-print("\nKey Players:")
-for player in response.key_players:
- print(f"\n{player.name}")
- print(f"Market Share: {player.market_share}")
- print("Key Products:")
- for product in player.key_products:
- print(f"- {product}")
-
-print("\nMarket Trends:")
-for trend in response.trends:
- print(f"\n{trend.name}")
- print(f"Impact: {trend.impact}")
- print(f"Timeframe: {trend.timeframe}")
```
-
-
-Use markdown mode for cost-effective content gathering:
+#### Parameters
-```python
-from scrapegraph_py import Client
+| Parameter | Type | Required | Description |
+| ------------- | ----------- | -------- | -------------------------------------------------------- |
+| url | string | Yes | The URL of the webpage to scrape |
+| format | string | No | Output format: `"markdown"`, `"html"`, `"screenshot"`, `"branding"` |
+| fetch_config | FetchConfig | No | Fetch configuration |
-client = Client(api_key="your-api-key")
+### Crawl
-from scrapegraph_py.models import TimeRange
+Manage multi-page crawl operations asynchronously.
-# Enable markdown mode for cost-effective content gathering
-response = client.searchscraper(
- user_prompt="Latest developments in artificial intelligence",
- num_results=3,
- extraction_mode=False, # Enable markdown mode (2 credits per page vs 10 credits)
- location_geo_code="us", # Optional: Geo code for location-based search
- time_range=TimeRange.PAST_WEEK # Optional: Filter results by time range
+```python
+# Start a crawl
+job = client.crawl.start(
+ url="https://example.com",
+ depth=2,
+ include_patterns=["/blog/*", "/docs/**"],
+ exclude_patterns=["/admin/*", "/api/*"],
)
+print(f"Crawl started: {job['id']}")
-# Access the raw markdown content
-markdown_content = response['markdown_content']
-reference_urls = response['reference_urls']
+# Check status
+status = client.crawl.status(job["id"])
+print(f"Status: {status['status']}")
-print(f"Markdown content length: {len(markdown_content)} characters")
-print(f"Reference URLs: {len(reference_urls)}")
+# Stop a crawl
+client.crawl.stop(job["id"])
-# Process the markdown content
-print("Content preview:", markdown_content[:500] + "...")
+# Resume a crawl
+client.crawl.resume(job["id"])
+```
-# Save to file for analysis
-with open('ai_research_content.md', 'w', encoding='utf-8') as f:
- f.write(markdown_content)
+#### crawl.start() Parameters
-print("Content saved to ai_research_content.md")
-```
+| Parameter | Type | Required | Description |
+| ---------------- | ----------- | -------- | -------------------------------------------------------- |
+| url | string | Yes | The starting URL to crawl |
+| depth | int | No | Crawl depth level |
+| include_patterns | list[str] | No | URL patterns to include (`*` any chars, `**` any path) |
+| exclude_patterns | list[str] | No | URL patterns to exclude |
+| fetch_config | FetchConfig | No | Fetch configuration |
-**Markdown Mode Benefits:**
-- **Cost-effective**: Only 2 credits per page (vs 10 credits for AI extraction)
-- **Full content**: Get complete page content in markdown format
-- **Faster**: No AI processing overhead
-- **Perfect for**: Content analysis, bulk data collection, building datasets
+### Monitor
-
+Create and manage site monitoring jobs.
+
+```python
+# Create a monitor
+monitor = client.monitor.create(
+ url="https://example.com",
+ prompt="Track price changes",
+ schedule="daily",
+)
+
+# List all monitors
+monitors = client.monitor.list()
-
-Filter search results by date range to get only recent information:
+# Get a specific monitor
+details = client.monitor.get(monitor["id"])
+
+# Pause / Resume / Delete
+client.monitor.pause(monitor["id"])
+client.monitor.resume(monitor["id"])
+client.monitor.delete(monitor["id"])
+```
+
+### Credits
+
+Check your account credit balance.
```python
-from scrapegraph_py import Client
-from scrapegraph_py.models import TimeRange
+credits = client.credits()
+print(f"Remaining: {credits['remaining_credits']}")
+print(f"Used: {credits['total_credits_used']}")
+```
-client = Client(api_key="your-api-key")
+### History
-# Search for recent news from the past week
-response = client.searchscraper(
- user_prompt="Latest news about AI developments",
- num_results=5,
- time_range=TimeRange.PAST_WEEK # Options: PAST_HOUR, PAST_24_HOURS, PAST_WEEK, PAST_MONTH, PAST_YEAR
-)
+Retrieve paginated request history with optional service filtering.
-print("Recent AI news:", response['result'])
-print("Reference URLs:", response['reference_urls'])
+```python
+history = client.history(endpoint="extract", status="completed", limit=20, offset=0)
+for entry in history["items"]:
+ print(f"{entry['created_at']} - {entry['endpoint']} - {entry['status']}")
```
-**Time Range Options:**
-- `TimeRange.PAST_HOUR` - Results from the past hour
-- `TimeRange.PAST_24_HOURS` - Results from the past 24 hours
-- `TimeRange.PAST_WEEK` - Results from the past week
-- `TimeRange.PAST_MONTH` - Results from the past month
-- `TimeRange.PAST_YEAR` - Results from the past year
+## Configuration Objects
-**Use Cases:**
-- Finding recent news and updates
-- Tracking time-sensitive information
-- Getting latest product releases
-- Monitoring recent market changes
+### FetchConfig
-
+Controls how pages are fetched. See the [proxy configuration guide](/services/additional-parameters/proxy) for details on modes and geotargeting.
-### Markdownify
+```python
+from scrapegraph_py import FetchConfig
+
+config = FetchConfig(
+ mode="js+stealth", # Proxy strategy: auto, fast, js, direct+stealth, js+stealth
+ timeout=15000, # Request timeout in ms (1000-60000)
+ wait=2000, # Wait after page load in ms (0-30000)
+ scrolls=3, # Number of scrolls (0-100)
+ country="us", # Proxy country code (ISO 3166-1 alpha-2)
+ headers={"X-Custom": "header"},
+ cookies={"key": "value"},
+ mock=False, # Enable mock mode for testing
+)
+```
-Convert any webpage into clean, formatted markdown:
+### LlmConfig
+
+Controls LLM behavior for AI-powered methods.
```python
-response = client.markdownify(
- website_url="https://example.com"
+from scrapegraph_py import LlmConfig
+
+config = LlmConfig(
+ model="gpt-4o-mini", # LLM model to use
+ temperature=0.3, # Response creativity (0.0-2.0)
+ max_tokens=1000, # Maximum response tokens
+ chunker="auto", # Content chunking strategy ("auto" or custom config)
)
```
## Async Support
-All endpoints support asynchronous operations:
+All methods are available on the async client:
```python
import asyncio
@@ -352,38 +295,32 @@ from scrapegraph_py import AsyncClient
async def main():
async with AsyncClient() as client:
- response = await client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract the main content"
+ # Extract
+ response = await client.extract(
+ url="https://example.com",
+ prompt="Extract the main content"
)
print(response)
-asyncio.run(main())
-```
+ # Crawl
+ job = await client.crawl.start("https://example.com", depth=2)
+ status = await client.crawl.status(job["id"])
+ print(status)
-## Feedback
+ # Credits
+ credits = await client.credits()
+ print(credits)
-Help us improve by submitting feedback programmatically:
-
-```python
-client.submit_feedback(
- request_id="your-request-id",
- rating=5,
- feedback_text="Great results!"
-)
+asyncio.run(main())
```
## Support
-
+
Report issues and contribute to the SDK
Get help from our development team
-
-
- This project is licensed under the MIT License. See the [LICENSE](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/LICENSE) file for details.
-
diff --git a/services/additional-parameters/headers.mdx b/services/additional-parameters/headers.mdx
index 53446b5..0046076 100644
--- a/services/additional-parameters/headers.mdx
+++ b/services/additional-parameters/headers.mdx
@@ -77,9 +77,9 @@ response = client.markdownify(
```
```javascript JavaScript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
-const apiKey = 'your-api-key';
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
// Define custom headers
const headers = {
@@ -88,11 +88,10 @@ const headers = {
'Sec-Ch-Ua-Platform': '"Windows"',
};
-// Use with SmartScraper
-const response = await smartScraper(apiKey, {
- website_url: 'https://example.com',
- user_prompt: 'Extract the main content',
- headers: headers,
+// Use with extract (SmartScraper)
+const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract the main content',
+ fetchConfig: { headers },
});
```
@@ -139,9 +138,9 @@ response = client.smartscraper(
```
```javascript JavaScript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
-const apiKey = 'your-api-key';
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
// Example with session cookies
const headers = {
@@ -149,10 +148,9 @@ const headers = {
'Cookie': 'session_id=abc123; user_id=12345; theme=dark',
};
-const response = await smartScraper(apiKey, {
- website_url: 'https://example.com/dashboard',
- user_prompt: 'Extract user information',
- headers: headers,
+const { data } = await sgai.extract('https://example.com/dashboard', {
+ prompt: 'Extract user information',
+ fetchConfig: { headers },
});
```
diff --git a/services/additional-parameters/pagination.mdx b/services/additional-parameters/pagination.mdx
index 207833f..b4e312b 100644
--- a/services/additional-parameters/pagination.mdx
+++ b/services/additional-parameters/pagination.mdx
@@ -65,15 +65,14 @@ response = client.smartscraper(
### JavaScript SDK
```javascript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
-const apiKey = 'your-api-key';
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
// Basic pagination - scrape 3 pages
-const response = await smartScraper(apiKey, {
- website_url: 'https://example-store.com/products',
- user_prompt: 'Extract all product information',
- total_pages: 3,
+const { data } = await sgai.extract('https://example-store.com/products', {
+ prompt: 'Extract all product information',
+ totalPages: 3,
});
```
diff --git a/services/additional-parameters/proxy.mdx b/services/additional-parameters/proxy.mdx
index adcedd8..35be03f 100644
--- a/services/additional-parameters/proxy.mdx
+++ b/services/additional-parameters/proxy.mdx
@@ -1,6 +1,6 @@
---
title: 'Proxy Configuration'
-description: 'Configure proxy settings and geotargeting for web scraping requests'
+description: 'Configure proxy settings, fetch modes, and geotargeting for web scraping requests'
icon: 'globe'
---
@@ -10,10 +10,12 @@ icon: 'globe'
## Overview
-The ScrapeGraphAI API uses an intelligent proxy system that automatically handles web scraping requests through multiple proxy providers. The system uses a fallback strategy to ensure maximum reliability - if one provider fails, it automatically tries the next one.
+The ScrapeGraphAI API uses an intelligent proxy system that automatically handles web scraping requests through multiple proxy providers. The system uses a fallback strategy to ensure maximum reliability — if one provider fails, it automatically tries the next one.
**No configuration required**: The proxy system is fully automatic and transparent to API users. You don't need to configure proxy credentials or settings yourself.
+In v2, all proxy and fetch behaviour is controlled through the `FetchConfig` object, which you can pass to any service method (`extract`, `scrape`, `search`, `crawl`, etc.).
+
## How It Works
The API automatically routes your scraping requests through multiple proxy providers in a smart order:
@@ -21,11 +23,58 @@ The API automatically routes your scraping requests through multiple proxy provi
1. The system tries different proxy providers automatically
2. If one provider fails, it automatically falls back to the next one
3. Successful providers are cached for each domain to improve performance
-4. Everything happens transparently - you just make your API request as normal
+4. Everything happens transparently — you just make your API request as normal
+
+## Fetch Modes
+
+The `mode` parameter inside `FetchConfig` controls how pages are retrieved and which proxy strategy is used:
+
+| Mode | Description | JS Rendering | Stealth Proxy | Best For |
+|------|-------------|:------------:|:-------------:|----------|
+| `auto` | Automatically selects the best provider chain | Adaptive | Adaptive | General use (default) |
+| `fast` | Direct HTTP fetch via impit | No | No | Static pages, maximum speed |
+| `js` | Headless browser rendering | Yes | No | JavaScript-heavy SPAs |
+| `direct+stealth` | Residential proxy with stealth headers | No | Yes | Anti-bot sites (static) |
+| `js+stealth` | JS rendering + residential proxy | Yes | Yes | Anti-bot sites (dynamic) |
+
+
+
+```python Python
+from scrapegraph_py import Client, FetchConfig
+
+client = Client(api_key="your-api-key")
+
+# Use stealth mode with JS rendering
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract product information",
+ fetch_config=FetchConfig(
+ mode="js+stealth",
+ wait=2000,
+ ),
+)
+```
+
+```javascript JavaScript
+import { scrapegraphai } from 'scrapegraph-js';
+
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
+
+// Use stealth mode with JS rendering
+const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract product information',
+ fetchConfig: {
+ mode: 'js+stealth',
+ wait: 2000,
+ },
+});
+```
+
+
## Country Selection (Geotargeting)
-You can optionally specify a country code to route requests through proxies in a specific country. This is useful for:
+You can optionally specify a two-letter country code via `FetchConfig.country` to route requests through proxies in a specific country. This is useful for:
- Accessing geo-restricted content
- Getting localized versions of websites
@@ -34,46 +83,46 @@ You can optionally specify a country code to route requests through proxies in a
### Using Country Code
-Include the `country_code` parameter in your API request:
-
```python Python
-from scrapegraph_py import Client
+from scrapegraph_py import Client, FetchConfig
client = Client(api_key="your-api-key")
-# Request with country code
-response = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract product information",
- country_code="us" # Route through US proxies
+# Route through US proxies
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract product information",
+ fetch_config=FetchConfig(country="us"),
)
```
```javascript JavaScript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
-const apiKey = 'your-api-key';
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
-// Request with country code
-const response = await smartScraper(apiKey, {
- website_url: 'https://example.com',
- user_prompt: 'Extract product information',
- country_code: 'us',
+// Route through US proxies
+const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract product information',
+ fetchConfig: { country: 'us' },
});
```
```bash cURL
curl -X 'POST' \
- 'https://api.scrapegraphai.com/v1/smartscraper' \
+ 'https://api.scrapegraphai.com/api/v2/extract' \
-H 'accept: application/json' \
+ -H 'Authorization: Bearer your-api-key' \
-H 'SGAI-APIKEY: your-api-key' \
-H 'Content-Type: application/json' \
-d '{
- "website_url": "https://example.com",
- "user_prompt": "Extract product information",
- "country_code": "us"
+ "url": "https://example.com",
+ "prompt": "Extract product information",
+ "fetchConfig": {
+ "country": "us"
+ }
}'
```
@@ -106,16 +155,52 @@ And many more! The API supports over 100 countries. Use standard ISO 3166-1 alph
-## Available Parameters
+## FetchConfig Reference
+
+All proxy and fetch behaviour is configured through the `FetchConfig` object:
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `mode` | string | `"auto"` | Fetch/proxy mode: `auto`, `fast`, `js`, `direct+stealth`, `js+stealth` |
+| `timeout` | int | `30000` | Request timeout in milliseconds (1000–60000) |
+| `wait` | int | `0` | Milliseconds to wait after page load before scraping (0–30000) |
+| `scrolls` | int | `0` | Number of page scrolls to perform (0–100) |
+| `country` | string | — | Two-letter ISO country code for geo-located proxy routing (e.g. `"us"`) |
+| `headers` | object | — | Custom HTTP headers to send with the request |
+| `cookies` | object | — | Cookies to send with the request |
+| `mock` | bool | `false` | Enable mock mode for testing (no real request is made) |
+
+
+
+```python Python
+from scrapegraph_py import FetchConfig
+
+config = FetchConfig(
+ mode="js+stealth", # Proxy strategy
+ timeout=15000, # 15s timeout
+ wait=2000, # Wait 2s after page load
+ scrolls=3, # Scroll 3 times
+ country="us", # Route through US proxies
+ headers={"Accept-Language": "en-US"},
+ cookies={"session": "abc123"},
+ mock=False,
+)
+```
-The following parameters in API requests can affect proxy behavior:
+```javascript JavaScript
+const fetchConfig = {
+ mode: 'js+stealth', // Proxy strategy
+ timeout: 15000, // 15s timeout
+ wait: 2000, // Wait 2s after page load
+ scrolls: 3, // Scroll 3 times
+ country: 'us', // Route through US proxies
+ headers: { 'Accept-Language': 'en-US' },
+ cookies: { session: 'abc123' },
+ mock: false,
+};
+```
-### `country_code` (optional)
-- **Type**: String
-- **Description**: Two-letter ISO country code to route requests through proxies in a specific country
-- **Example**: `"us"`, `"uk"`, `"de"`, `"it"`, `"fr"`
-- **Default**: No specific country (uses optimal routing)
-- **Format**: ISO 3166-1 alpha-2 (e.g., `us`, `gb`, `de`)
+
## Usage Examples
@@ -128,22 +213,21 @@ from scrapegraph_py import Client
client = Client(api_key="your-api-key")
-# Automatic proxy selection - no configuration needed
-response = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract product information"
+# Automatic proxy selection — no configuration needed
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract product information",
)
```
```javascript JavaScript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
-const apiKey = 'your-api-key';
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
// Automatic proxy selection
-const response = await smartScraper(apiKey, {
- website_url: 'https://example.com',
- user_prompt: 'Extract product information',
+const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract product information',
});
```
@@ -154,35 +238,79 @@ const response = await smartScraper(apiKey, {
```python Python
-from scrapegraph_py import Client
+from scrapegraph_py import Client, FetchConfig
client = Client(api_key="your-api-key")
# Route through US proxies
-response = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract product information",
- country_code="us"
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract product information",
+ fetch_config=FetchConfig(country="us"),
)
# Route through UK proxies
-response = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract product information",
- country_code="uk"
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract product information",
+ fetch_config=FetchConfig(country="gb"),
)
```
```javascript JavaScript
-import { smartScraper } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
-const apiKey = 'your-api-key';
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
// Route through US proxies
-const response = await smartScraper(apiKey, {
- website_url: 'https://example.com',
- user_prompt: 'Extract product information',
- country_code: 'us',
+const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract product information',
+ fetchConfig: { country: 'us' },
+});
+
+// Route through UK proxies
+const { data: ukData } = await sgai.extract('https://example.com', {
+ prompt: 'Extract product information',
+ fetchConfig: { country: 'gb' },
+});
+```
+
+
+
+### Stealth Mode with JS Rendering
+
+
+
+```python Python
+from scrapegraph_py import Client, FetchConfig
+
+client = Client(api_key="your-api-key")
+
+response = client.scrape(
+ url="https://heavily-protected-site.com",
+ format="markdown",
+ fetch_config=FetchConfig(
+ mode="js+stealth",
+ wait=3000,
+ scrolls=5,
+ country="us",
+ ),
+)
+```
+
+```javascript JavaScript
+import { scrapegraphai } from 'scrapegraph-js';
+
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
+
+const { data } = await sgai.scrape('https://heavily-protected-site.com', {
+ format: 'markdown',
+ fetchConfig: {
+ mode: 'js+stealth',
+ wait: 3000,
+ scrolls: 5,
+ country: 'us',
+ },
});
```
@@ -192,75 +320,105 @@ const response = await smartScraper(apiKey, {
#### Accessing Geo-Restricted Content
-```python
-from scrapegraph_py import Client
+
+
+```python Python
+from scrapegraph_py import Client, FetchConfig
client = Client(api_key="your-api-key")
# Access US-only content
-response = client.smartscraper(
- website_url="https://us-only-service.com",
- user_prompt="Extract available services",
- country_code="us"
+response = client.extract(
+ url="https://us-only-service.com",
+ prompt="Extract available services",
+ fetch_config=FetchConfig(country="us"),
)
```
+```javascript JavaScript
+const { data } = await sgai.extract('https://us-only-service.com', {
+ prompt: 'Extract available services',
+ fetchConfig: { country: 'us' },
+});
+```
+
+
+
#### Getting Localized Content
```python
+from scrapegraph_py import Client, FetchConfig
+
+client = Client(api_key="your-api-key")
+
# Get German version of a website
-response = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract product prices in local currency",
- country_code="de"
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract product prices in local currency",
+ fetch_config=FetchConfig(country="de"),
)
# Get French version
-response = client.smartscraper(
- website_url="https://example.com",
- user_prompt="Extract product prices in local currency",
- country_code="fr"
+response = client.extract(
+ url="https://example.com",
+ prompt="Extract product prices in local currency",
+ fetch_config=FetchConfig(country="fr"),
)
```
#### E-commerce Price Comparison
```python
+from scrapegraph_py import Client, FetchConfig
+
+client = Client(api_key="your-api-key")
+
# Compare prices from different regions
-countries = ["us", "uk", "de", "fr"]
+countries = ["us", "gb", "de", "fr"]
for country in countries:
- response = client.smartscraper(
- website_url="https://ecommerce-site.com/product/123",
- user_prompt="Extract product price and availability",
- country_code=country
+ response = client.extract(
+ url="https://ecommerce-site.com/product/123",
+ prompt="Extract product price and availability",
+ fetch_config=FetchConfig(country=country),
)
- print(f"{country}: {response['result']}")
+ print(f"{country}: {response['data']}")
```
## Best Practices
-### 1. Use Country Code When Needed
+### 1. Choose the Right Fetch Mode
+
+Pick the mode that matches your target site:
+- **`auto`** (default) — let the system decide; works for most sites
+- **`fast`** — use for simple, static HTML pages
+- **`js`** — use for SPAs and JavaScript-rendered content
+- **`direct+stealth`** — use for anti-bot sites that don't require JS
+- **`js+stealth`** — use for anti-bot sites with dynamic content
+
+### 2. Use Country Code When Needed
Only specify a country code if you have a specific requirement:
-- ✅ Accessing geo-restricted content
-- ✅ Getting localized versions of websites
-- ✅ Complying with regional requirements
-- ❌ Don't specify if you don't need it - let the system optimize automatically
+- Accessing geo-restricted content
+- Getting localized versions of websites
+- Complying with regional requirements
+- Don't specify if you don't need it — let the system optimize automatically
-### 2. Let the System Handle Routing
+### 3. Let the System Handle Routing
The API automatically selects the best proxy provider for each request:
- No manual proxy selection needed
- Automatic failover ensures reliability
- Performance is optimized automatically
-### 3. Handle Errors Gracefully
+### 4. Handle Errors Gracefully
If a request fails, the system has already tried multiple providers:
-```python
-from scrapegraph_py import Client
+
+
+```python Python
+from scrapegraph_py import Client, FetchConfig
import time
client = Client(api_key="your-api-key")
@@ -268,10 +426,10 @@ client = Client(api_key="your-api-key")
def scrape_with_retry(url, prompt, max_retries=3):
for attempt in range(max_retries):
try:
- response = client.smartscraper(
- website_url=url,
- user_prompt=prompt,
- country_code="us"
+ response = client.extract(
+ url=url,
+ prompt=prompt,
+ fetch_config=FetchConfig(country="us"),
)
return response
except Exception as e:
@@ -282,7 +440,29 @@ def scrape_with_retry(url, prompt, max_retries=3):
raise e
```
-### 4. Monitor Rate Limits
+```javascript JavaScript
+async function scrapeWithRetry(url, prompt, maxRetries = 3) {
+ for (let attempt = 0; attempt < maxRetries; attempt++) {
+ try {
+ return await sgai.extract(url, {
+ prompt,
+ fetchConfig: { country: 'us' },
+ });
+ } catch (err) {
+ if (attempt < maxRetries - 1) {
+ console.log(`Attempt ${attempt + 1} failed: ${err.message}`);
+ await new Promise((r) => setTimeout(r, 2 ** attempt * 1000));
+ } else {
+ throw err;
+ }
+ }
+ }
+}
+```
+
+
+
+### 5. Monitor Rate Limits
Be aware of your API rate limits:
- The proxy system respects these limits automatically
@@ -298,8 +478,9 @@ If your scraping request fails:
1. **Verify the URL**: Make sure the URL is correct and accessible
2. **Check the website**: Some websites may block automated access regardless of proxy
-3. **Retry the request**: The system uses automatic retries, but you can manually retry after a delay
-4. **Try a different country**: If geo-restriction is the issue, try a different `country_code`
+3. **Try a different mode**: Switch to `js+stealth` for heavily-protected sites
+4. **Retry the request**: The system uses automatic retries, but you can manually retry after a delay
+5. **Try a different country**: If geo-restriction is the issue, try a different `country`
### Rate Limiting
@@ -318,21 +499,21 @@ If you receive rate limit errors (HTTP 429):
If you're trying to access geo-restricted content:
-- Use the `country_code` parameter to specify the required country
+- Use the `country` parameter inside `FetchConfig` to specify the required country
- Make sure the content is available in that country
- Some content may still be restricted regardless of proxy location
- Try multiple country codes if one doesn't work
-### Proxy Selection Issues
+### Anti-Bot Protection
-
-If you're experiencing proxy-related issues:
+
+If a website is blocking your requests:
-- The system automatically tries multiple providers
-- No manual configuration is needed
-- If issues persist, contact support with your request ID
-- Check if the issue is specific to certain websites or domains
+- Use `mode: "direct+stealth"` or `mode: "js+stealth"` in `FetchConfig`
+- Add a `wait` time to let the page fully load
+- Use `scrolls` to trigger lazy-loaded content
+- Add custom `headers` if the site expects specific ones
## FAQ
@@ -341,42 +522,30 @@ If you're experiencing proxy-related issues:
**A**: No, the proxy system is fully managed and automatic. You don't need to provide any proxy credentials or configuration.
-
-**A**: No, the system automatically selects the best proxy provider for each request. This ensures optimal performance and reliability.
+
+**A**: Use the `mode` parameter in `FetchConfig`. Set it to `auto` (default), `fast`, `js`, `direct+stealth`, or `js+stealth` depending on your needs.
-
-**A**: The proxy selection is handled automatically and transparently. You don't need to know which proxy was used - just use the API as normal.
-
-
-
-**A**: The API uses managed proxy services. If you have specific proxy requirements, please contact support.
+
+**A**: No, the system automatically selects the best proxy provider for each request. You can influence the strategy by setting the `mode` parameter.
**A**: The API will return an error. The system tries multiple providers with automatic fallback, so this is rare. If it happens, verify the URL and try again.
-
-**A**: No, the `country_code` parameter doesn't affect pricing. Credits are charged the same regardless of proxy location.
+
+**A**: No, the `country` parameter doesn't affect pricing. Credits are charged the same regardless of proxy location.
-
-**A**: Yes, `country_code` is available for all scraping services including SmartScraper, SearchScraper, SmartCrawler, and Markdownify.
+
+**A**: Yes, `FetchConfig` is available for all services including `extract`, `scrape`, `search`, `crawl`, and `monitor`.
**A**: Both `uk` and `gb` refer to the United Kingdom. The API accepts both codes for compatibility.
-## API Reference
-
-For detailed API documentation, see:
-- [SmartScraper Start Job](/api-reference/endpoint/smartscraper/start)
-- [SearchScraper Start Job](/api-reference/endpoint/searchscraper/start)
-- [SmartCrawler Start Job](/api-reference/endpoint/smartcrawler/start)
-- [Markdownify Start Job](/api-reference/endpoint/markdownify/start)
-
## Support & Resources
diff --git a/services/additional-parameters/wait-ms.mdx b/services/additional-parameters/wait-ms.mdx
index 45a4646..db961ea 100644
--- a/services/additional-parameters/wait-ms.mdx
+++ b/services/additional-parameters/wait-ms.mdx
@@ -67,27 +67,25 @@ response = client.markdownify(
### JavaScript SDK
```javascript
-import { smartScraper, scrape, markdownify } from 'scrapegraph-js';
+import { scrapegraphai } from 'scrapegraph-js';
-const apiKey = 'your-api-key';
+const sgai = scrapegraphai({ apiKey: 'your-api-key' });
-// SmartScraper with custom wait time
-const response = await smartScraper(apiKey, {
- website_url: 'https://example.com',
- user_prompt: 'Extract product information',
- wait_ms: 5000,
+// Extract with custom wait time
+const { data } = await sgai.extract('https://example.com', {
+ prompt: 'Extract product information',
+ fetchConfig: { wait: 5000 },
});
// Scrape with custom wait time
-const scrapeResponse = await scrape(apiKey, {
- website_url: 'https://example.com',
- wait_ms: 5000,
+const { data: scrapeData } = await sgai.scrape('https://example.com', {
+ fetchConfig: { wait: 5000 },
});
// Markdownify with custom wait time
-const mdResponse = await markdownify(apiKey, {
- website_url: 'https://example.com',
- wait_ms: 5000,
+const { data: mdData } = await sgai.scrape('https://example.com', {
+ format: 'markdown',
+ fetchConfig: { wait: 5000 },
});
```
diff --git a/services/agenticscraper.mdx b/services/agenticscraper.mdx
index 45e0c65..b2167d6 100644
--- a/services/agenticscraper.mdx
+++ b/services/agenticscraper.mdx
@@ -17,7 +17,7 @@ Agentic Scraper is our most advanced service for automating browser actions and
- **Optionally** use AI to extract structured data according to a schema
-Try it instantly in our [interactive playground](https://dashboard.scrapegraphai.com/) – no coding required!
+Try it instantly in our [interactive playground](https://scrapegraphai.com/dashboard) – no coding required!
## Difference: With vs Without AI Extraction
@@ -39,7 +39,7 @@ const apiKey = process.env.SGAI_APIKEY;
// Basic scraping without AI extraction
const response = await agenticScraper(apiKey, {
- url: 'https://dashboard.scrapegraphai.com/',
+ url: 'https://scrapegraphai.com/dashboard',
steps: [
'Type email@gmail.com in email input box',
'Type test-password@123 in password inputbox',
@@ -52,7 +52,7 @@ console.log(response.data);
// With AI extraction
const aiResponse = await agenticScraper(apiKey, {
- url: 'https://dashboard.scrapegraphai.com/',
+ url: 'https://scrapegraphai.com/dashboard',
steps: [
'Type email@gmail.com in email input box',
'Type test-password@123 in password inputbox',
@@ -86,7 +86,7 @@ curl -X 'POST' \
-H 'SGAI-APIKEY: your-api-key' \
-H 'Content-Type: application/json' \
-d '{
- "url": "https://dashboard.scrapegraphai.com/",
+ "url": "https://scrapegraphai.com/dashboard",
"use_session": true,
"steps": ["Type email@gmail.com in email input box", "Type test-password@123 in password inputbox", "click on login"],
"ai_extraction": false
@@ -99,7 +99,7 @@ curl -X 'POST' \
-H 'SGAI-APIKEY: your-api-key' \
-H 'Content-Type: application/json' \
-d '{
- "url": "https://dashboard.scrapegraphai.com/",
+ "url": "https://scrapegraphai.com/dashboard",
"use_session": true,
"steps": ["Type email@gmail.com in email input box", "Type test-password@123 in password inputbox", "click on login", "wait for dashboard to load completely"],
"user_prompt": "Extract user info, dashboard sections, and remaining credits",
@@ -132,7 +132,7 @@ client = Client(api_key=api_key)
# Basic example: login and scrape without AI
response = client.agenticscraper(
- url="https://dashboard.scrapegraphai.com/",
+ url="https://scrapegraphai.com/dashboard",
use_session=True,
steps=[
"Type email@gmail.com in email input box",
@@ -157,7 +157,7 @@ output_schema = {
}
}
ai_response = client.agenticscraper(
- url="https://dashboard.scrapegraphai.com/",
+ url="https://scrapegraphai.com/dashboard",
use_session=True,
steps=[
"Type email@gmail.com in email input box",
@@ -175,12 +175,12 @@ client.close()
```bash CLI
# Basic scraping without AI extraction
-just-scrape agentic-scraper https://dashboard.scrapegraphai.com/ \
+just-scrape agentic-scraper https://scrapegraphai.com/dashboard \
-s "Type email@gmail.com in email input box,Type test-password@123 in password inputbox,Click login" \
--use-session
# With AI extraction
-just-scrape agentic-scraper https://dashboard.scrapegraphai.com/ \
+just-scrape agentic-scraper https://scrapegraphai.com/dashboard \
-s "Type email@gmail.com in email input box,Type test-password@123 in password inputbox,Click login,wait for dashboard to load" \
--ai-extraction -p "Extract user info, dashboard sections, and remaining credits" \
--use-session
@@ -201,7 +201,7 @@ just-scrape agentic-scraper https://dashboard.scrapegraphai.com/ \
| ai_extraction | bool | No | true = AI extraction, false = raw content only |
-Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
+Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
## Use Cases
@@ -245,6 +245,6 @@ For technical details see:
-
+
Get your API key and start using Agentic Scraper now!
diff --git a/services/cli.mdx b/services/cli.mdx
index ab551d2..a0c8b4e 100644
--- a/services/cli.mdx
+++ b/services/cli.mdx
@@ -6,10 +6,10 @@ icon: 'terminal'
## Overview
-`just-scrape` is the official CLI for [ScrapeGraph AI](https://scrapegraphai.com) — AI-powered web scraping, data extraction, search, and crawling, straight from your terminal.
+`just-scrape` is the official CLI for [ScrapeGraph AI](https://scrapegraphai.com) — AI-powered web scraping, data extraction, search, and crawling, straight from your terminal. Uses the **v2 API**.
-Get your API key from the [dashboard](https://dashboard.scrapegraphai.com)
+Get your API key from the [dashboard](https://scrapegraphai.com/dashboard)
## Installation
@@ -58,110 +58,81 @@ The CLI needs a ScrapeGraph API key. Four ways to provide it (checked in order):
| Variable | Description | Default |
|---|---|---|
| `SGAI_API_KEY` | ScrapeGraph API key | — |
-| `JUST_SCRAPE_API_URL` | Override API base URL | `https://api.scrapegraphai.com/v1` |
-| `JUST_SCRAPE_TIMEOUT_S` | Request/polling timeout in seconds | `120` |
-| `JUST_SCRAPE_DEBUG` | Set to `1` to enable debug logging | `0` |
+| `SGAI_API_URL` | Override API base URL | `https://api.scrapegraphai.com` |
+| `SGAI_TIMEOUT_S` | Request timeout in seconds | `30` |
+
+Legacy variables (`JUST_SCRAPE_API_URL`, `JUST_SCRAPE_TIMEOUT_S`, `JUST_SCRAPE_DEBUG`) are still bridged.
## JSON Mode
All commands support `--json` for machine-readable output. Banner, spinners, and interactive prompts are suppressed — only minified JSON on stdout. Saves tokens when piped to AI agents.
```bash
-just-scrape credits --json | jq '.remaining_credits'
-just-scrape smart-scraper https://example.com -p "Extract data" --json > result.json
+just-scrape credits --json | jq '.remainingCredits'
+just-scrape extract https://example.com -p "Extract data" --json > result.json
```
## Commands
-### SmartScraper
-
-Extract structured data from any URL using AI. [Full docs →](/services/smartscraper)
-
-```bash
-just-scrape smart-scraper -p
-just-scrape smart-scraper -p --schema
-just-scrape smart-scraper -p --scrolls
-just-scrape smart-scraper -p --pages
-just-scrape smart-scraper -p --stealth
-just-scrape smart-scraper -p --cookies --headers
-just-scrape smart-scraper -p --plain-text
-```
-
-### SearchScraper
+### Extract
-Search the web and extract structured data from results. [Full docs →](/services/searchscraper)
+Extract structured data from any URL using AI (replaces `smart-scraper`). [Full docs →](/api-reference/extract)
```bash
-just-scrape search-scraper
-just-scrape search-scraper --num-results
-just-scrape search-scraper --no-extraction
-just-scrape search-scraper --schema
-just-scrape search-scraper --stealth --headers
+just-scrape extract -p
+just-scrape extract -p --schema
+just-scrape extract -p --scrolls
+just-scrape extract -p --mode direct+stealth
+just-scrape extract -p --cookies --headers
+just-scrape extract -p --country
```
-### Markdownify
-
-Convert any webpage to clean markdown. [Full docs →](/services/markdownify)
-
-```bash
-just-scrape markdownify
-just-scrape markdownify --stealth
-just-scrape markdownify --headers
-```
-
-### Crawl
+### Search
-Crawl multiple pages and extract data from each. [Full docs →](/services/smartcrawler)
+Search the web and extract structured data from results (replaces `search-scraper`). [Full docs →](/api-reference/search)
```bash
-just-scrape crawl -p
-just-scrape crawl -p --max-pages
-just-scrape crawl -p --depth
-just-scrape crawl --no-extraction --max-pages
-just-scrape crawl -p --schema
-just-scrape crawl -p --rules
-just-scrape crawl -p --no-sitemap
-just-scrape crawl -p --stealth
+just-scrape search
+just-scrape search --num-results
+just-scrape search -p
+just-scrape search --schema
+just-scrape search --headers
```
### Scrape
-Get raw HTML content from a URL. [Full docs →](/services/scrape)
+Scrape content from a URL in various formats: markdown (default), html, screenshot, or branding. [Full docs →](/api-reference/scrape)
```bash
just-scrape scrape
-just-scrape scrape --stealth
-just-scrape scrape --branding
-just-scrape scrape --country-code
+just-scrape scrape -f html
+just-scrape scrape -f screenshot
+just-scrape scrape -f branding
+just-scrape scrape -m direct+stealth
+just-scrape scrape --country
```
-### Sitemap
-
-Get all URLs from a website's sitemap. [Full docs →](/services/sitemap)
-
-```bash
-just-scrape sitemap
-just-scrape sitemap --json | jq -r '.urls[]'
-```
-
-### Agentic Scraper
+### Markdownify
-Browser automation with AI — login, click, navigate, fill forms. [Full docs →](/services/agenticscraper)
+Convert any webpage to clean markdown (convenience wrapper for `scrape --format markdown`). [Full docs →](/api-reference/scrape)
```bash
-just-scrape agentic-scraper -s
-just-scrape agentic-scraper -s --ai-extraction -p
-just-scrape agentic-scraper -s --schema
-just-scrape agentic-scraper -s --use-session
+just-scrape markdownify
+just-scrape markdownify -m direct+stealth
+just-scrape markdownify --headers
```
-### Generate Schema
+### Crawl
-Generate a JSON schema from a natural language description.
+Crawl multiple pages. The CLI starts the crawl and polls until completion. [Full docs →](/api-reference/crawl)
```bash
-just-scrape generate-schema
-just-scrape generate-schema --existing-schema
+just-scrape crawl
+just-scrape crawl --max-pages
+just-scrape crawl --max-depth
+just-scrape crawl --max-links-per-page
+just-scrape crawl --allow-external
+just-scrape crawl -m direct+stealth
```
### History
@@ -176,7 +147,7 @@ just-scrape history --page-size
just-scrape history --json
```
-Services: `markdownify`, `smartscraper`, `searchscraper`, `scrape`, `crawl`, `agentic-scraper`, `sitemap`
+Services: `scrape`, `extract`, `search`, `monitor`, `crawl`
### Credits
@@ -184,14 +155,7 @@ Check your credit balance.
```bash
just-scrape credits
-```
-
-### Validate
-
-Validate your API key.
-
-```bash
-just-scrape validate
+just-scrape credits --json | jq '.remainingCredits'
```
## AI Agent Integration
@@ -214,7 +178,7 @@ bunx skills add https://github.com/ScrapeGraphAI/just-scrape
Join our Discord community
-
+
Get your API key
diff --git a/services/cli/ai-agent-skill.mdx b/services/cli/ai-agent-skill.mdx
index 50ee527..17ae68c 100644
--- a/services/cli/ai-agent-skill.mdx
+++ b/services/cli/ai-agent-skill.mdx
@@ -17,9 +17,10 @@ Browse the skill: [skills.sh/scrapegraphai/just-scrape/just-scrape](https://skil
Once installed, your coding agent can:
-- Scrape a website to gather data needed for a task
+- Extract structured data from any website using AI
- Convert documentation pages to markdown for context
- Search the web and extract structured results
+- Crawl multiple pages and collect data
- Check your credit balance mid-session
- Browse request history
@@ -28,13 +29,13 @@ Once installed, your coding agent can:
Agents call `just-scrape` in `--json` mode for clean, token-efficient output:
```bash
-just-scrape smart-scraper https://api.example.com/docs \
+just-scrape extract https://api.example.com/docs \
-p "Extract all endpoint names, methods, and descriptions" \
--json
```
```bash
-just-scrape search-scraper "latest release notes for react-query" \
+just-scrape search "latest release notes for react-query" \
--num-results 3 --json
```
@@ -76,15 +77,14 @@ This project uses `just-scrape` (ScrapeGraph AI CLI) for web scraping.
The API key is set via the SGAI_API_KEY environment variable.
Available commands (always use --json flag):
-- `just-scrape smart-scraper -p --json` — AI extraction from a URL
-- `just-scrape search-scraper --json` — search the web and extract data
+- `just-scrape extract -p --json` — AI extraction from a URL
+- `just-scrape search --json` — search the web and extract data
- `just-scrape markdownify --json` — convert a page to markdown
-- `just-scrape crawl -p --json` — crawl multiple pages
-- `just-scrape scrape --json` — get raw HTML
-- `just-scrape sitemap --json` — get all URLs from a sitemap
+- `just-scrape crawl --json` — crawl multiple pages
+- `just-scrape scrape --json` — get page content (markdown, html, screenshot, branding)
Use --schema to enforce a JSON schema on the output.
-Use --stealth for sites with anti-bot protection.
+Use --mode direct+stealth or --mode js+stealth for sites with anti-bot protection.
```
### Example prompts for Claude Code
@@ -120,7 +120,7 @@ claude -p "Use just-scrape to scrape https://example.com/changelog \
- Pass `--schema` with a JSON schema to get typed, predictable output:
```bash
-just-scrape smart-scraper https://example.com \
+just-scrape extract https://example.com \
-p "Extract company info" \
--schema '{"type":"object","properties":{"name":{"type":"string"},"founded":{"type":"number"}}}' \
--json
diff --git a/services/cli/commands.mdx b/services/cli/commands.mdx
index 566f827..524084a 100644
--- a/services/cli/commands.mdx
+++ b/services/cli/commands.mdx
@@ -3,110 +3,79 @@ title: 'Commands'
description: 'Full reference for every just-scrape command and its flags'
---
-## smart-scraper
+## extract
-Extract structured data from any URL using AI. [Full docs →](/services/smartscraper)
+Extract structured data from any URL using AI (replaces `smart-scraper`). [Full docs →](/api-reference/extract)
```bash
-just-scrape smart-scraper -p
-just-scrape smart-scraper -p --schema
-just-scrape smart-scraper -p --scrolls # infinite scroll (0-100)
-just-scrape smart-scraper -p --pages # multi-page (1-100)
-just-scrape smart-scraper -p --stealth # anti-bot bypass (+4 credits)
-just-scrape smart-scraper -p --cookies --headers
-just-scrape smart-scraper -p --plain-text # plain text instead of JSON
+just-scrape extract -p
+just-scrape extract -p --schema
+just-scrape extract -p --scrolls # infinite scroll (0-100)
+just-scrape extract -p --mode js+stealth # anti-bot bypass
+just-scrape extract -p --cookies --headers
+just-scrape extract -p --country # geo-targeting
```
-## search-scraper
+## search
-Search the web and extract structured data from results. [Full docs →](/services/searchscraper)
+Search the web and extract structured data from results (replaces `search-scraper`). [Full docs →](/api-reference/search)
```bash
-just-scrape search-scraper
-just-scrape search-scraper --num-results # sources to scrape (3-20, default 3)
-just-scrape search-scraper --no-extraction # markdown only (2 credits vs 10)
-just-scrape search-scraper --schema
-just-scrape search-scraper --stealth --headers
+just-scrape search
+just-scrape search -p # extraction prompt for results
+just-scrape search --num-results # sources to scrape (1-20, default 3)
+just-scrape search --schema
+just-scrape search --headers
```
## markdownify
-Convert any webpage to clean markdown. [Full docs →](/services/markdownify)
+Convert any webpage to clean markdown (convenience wrapper for `scrape --format markdown`). [Full docs →](/api-reference/scrape)
```bash
just-scrape markdownify
-just-scrape markdownify --stealth
+just-scrape markdownify -m direct+stealth # anti-bot bypass
just-scrape markdownify --headers
```
-## crawl
-
-Crawl multiple pages and extract data from each. [Full docs →](/services/smartcrawler)
-
-```bash
-just-scrape crawl -p
-just-scrape crawl -p --max-pages # max pages (default 10)
-just-scrape crawl -p --depth # crawl depth (default 1)
-just-scrape crawl --no-extraction --max-pages # markdown only (2 credits/page)
-just-scrape crawl -p --schema
-just-scrape crawl -p --rules # include_paths, same_domain
-just-scrape crawl -p --no-sitemap # skip sitemap discovery
-just-scrape crawl -p --stealth
-```
-
## scrape
-Get raw HTML content from a URL. [Full docs →](/services/scrape)
-
-```bash
-just-scrape scrape
-just-scrape scrape --stealth # anti-bot bypass (+4 credits)
-just-scrape scrape --branding # extract branding (+2 credits)
-just-scrape scrape --country-code # geo-targeting
-```
-
-## sitemap
-
-Get all URLs from a website's sitemap. [Full docs →](/services/sitemap)
+Scrape content from a URL in various formats. [Full docs →](/api-reference/scrape)
```bash
-just-scrape sitemap
-just-scrape sitemap --json | jq -r '.urls[]'
+just-scrape scrape # markdown (default)
+just-scrape scrape -f html # raw HTML
+just-scrape scrape -f screenshot # screenshot
+just-scrape scrape -f branding # extract branding info
+just-scrape scrape -m direct+stealth # anti-bot bypass
+just-scrape scrape --country # geo-targeting
```
-## agentic-scraper
-
-Browser automation with AI — login, click, navigate, fill forms. [Full docs →](/services/agenticscraper)
-
-```bash
-just-scrape agentic-scraper -s
-just-scrape agentic-scraper -s --ai-extraction -p
-just-scrape agentic-scraper -s --schema
-just-scrape agentic-scraper -s --use-session # persist browser session
-```
-
-## generate-schema
+## crawl
-Generate a JSON schema from a natural language description.
+Crawl multiple pages. The CLI starts the crawl and polls until completion. [Full docs →](/api-reference/crawl)
```bash
-just-scrape generate-schema
-just-scrape generate-schema --existing-schema
+just-scrape crawl
+just-scrape crawl --max-pages # max pages (default 50)
+just-scrape crawl --max-depth # crawl depth (default 2)
+just-scrape crawl --max-links-per-page # max links per page (default 10)
+just-scrape crawl