Batch Processing

Overview

Batch processing allows you to scrape multiple URLs in a single request, optimizing performance and reducing overhead. WhizoAI’s batch scraping system handles concurrent processing, automatic retries, and progress tracking.

Key Benefits

10% credit discount when processing multiple URLs
Concurrent processing for faster results
Automatic retry on failed pages
Progress tracking in real-time
Bulk export in multiple formats (JSON, CSV, XML)

Basic Usage

from whizoai import WhizoAI

client = WhizoAI(api_key="whizo_YOUR-API-KEY")

# Batch scrape multiple URLs
urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
]

result = client.batch_scrape(
    urls=urls,
    options={
        "format": "markdown",
        "includeScreenshot": False
    }
)

# Results will be processed asynchronously
print(f"Job ID: {result['jobId']}")
print(f"Status: {result['status']}")
print(f"Total URLs: {len(urls)}")

Advanced Configuration

Concurrent Processing

Control how many URLs are processed simultaneously:

result = client.batch_scrape(
    urls=url_list,
    options={
        "concurrency": 5,  # Process 5 URLs at once
        "maxRetries": 3,   # Retry failed pages up to 3 times
        "timeout": 30000   # 30 second timeout per page
    }
)

Progress Monitoring

Track batch progress in real-time:

import time

job_id = result['jobId']

while True:
    status = client.get_job_status(job_id)

    print(f"Progress: {status['pagesCompleted']}/{status['totalPages']}")
    print(f"Success Rate: {(status['pagesCompleted'] - status['pagesFailed']) / status['pagesCompleted'] * 100:.1f}%")

    if status['status'] == 'completed':
        break

    time.sleep(5)

Error Handling

Handle individual page failures gracefully:

results = client.get_job_results(job_id)

successful_pages = [p for p in results['pages'] if p['status'] == 'completed']
failed_pages = [p for p in results['pages'] if p['status'] == 'failed']

print(f"Successfully scraped: {len(successful_pages)}")
print(f"Failed pages: {len(failed_pages)}")

for page in failed_pages:
    print(f"Failed URL: {page['url']}")
    print(f"Error: {page['error']}")

Bulk Export

Export batch results in multiple formats:

# Export as JSON
client.export_job(job_id, format='json', file='results.json')

# Export as CSV
client.export_job(job_id, format='csv', file='results.csv')

# Export as XML
client.export_job(job_id, format='xml', file='results.xml')

# Compressed export
client.export_job(job_id, format='json', compressed=True, file='results.json.gz')

Best Practices

Optimize Batch Size

Small batches (10-50 URLs): Best for quick processing
Medium batches (50-200 URLs): Balanced performance
Large batches (200+ URLs): Use lower concurrency to avoid rate limits

Handle Rate Limits

Batch processing respects your plan’s rate limits:

Free Plan: Max 10 concurrent requests
Starter Plan: Max 20 concurrent requests
Pro Plan: Max 50 concurrent requests
Enterprise Plan: Custom limits

Memory Management

For large batches (1000+ URLs):

Process in chunks
Stream results instead of loading all at once
Use webhooks for completion notifications

Credit Costs

Operation	Base Cost	Batch Discount
Single URL	1 credit	-
10 URLs	10 credits	9 credits (10% off)
100 URLs	100 credits	90 credits (10% off)
Screenshots	+1 credit/page	Included in discount
PDF Generation	+1 credit/page	Included in discount

Webhooks Integration

Get notified when batch processing completes:

result = client.batch_scrape(
    urls=url_list,
    options={
        "webhook": "https://your-server.com/webhook"
    }
)

# Webhook receives:
# {
#   "event": "batch.completed",
#   "jobId": "job_123",
#   "status": "completed",
#   "summary": {
#     "totalPages": 100,
#     "successful": 98,
#     "failed": 2,
#     "creditsUsed": 90
#   }
# }

Common Use Cases

E-commerce Product Scraping

Scrape thousands of product pages efficiently with bulk export

Content Migration

Migrate entire websites by batch processing all pages

Competitive Analysis

Monitor multiple competitor websites simultaneously

Data Aggregation

Collect data from multiple sources for analysis

Job Management API

Learn how to monitor and manage batch jobs

Webhooks

Set up webhook notifications for batch completion

Error Handling

Best practices for handling failures in batch processing

Core Features

Advanced Features

Overview

Key Benefits

Basic Usage

Advanced Configuration

Concurrent Processing

Progress Monitoring

Error Handling

Bulk Export

Best Practices

Credit Costs

Webhooks Integration

Common Use Cases

E-commerce Product Scraping

Content Migration

Competitive Analysis

Data Aggregation

Job Management API

Webhooks

Error Handling

Core Features

Advanced Features

​Overview

​Key Benefits

​Basic Usage

​Advanced Configuration

​Concurrent Processing

​Progress Monitoring

​Error Handling

​Bulk Export

​Best Practices

​Credit Costs

​Webhooks Integration

​Common Use Cases

E-commerce Product Scraping

Content Migration

Competitive Analysis

Data Aggregation

​Related Resources

Job Management API

Webhooks

Error Handling

Overview

Key Benefits

Basic Usage

Advanced Configuration

Concurrent Processing

Progress Monitoring

Error Handling

Bulk Export

Best Practices

Credit Costs

Webhooks Integration

Common Use Cases

Related Resources