Skip to main content

Overview

Batch processing allows you to scrape multiple URLs in a single request, optimizing performance and reducing overhead. WhizoAI’s batch scraping system handles concurrent processing, automatic retries, and progress tracking.

Key Benefits

  • 10% credit discount when processing multiple URLs
  • Concurrent processing for faster results
  • Automatic retry on failed pages
  • Progress tracking in real-time
  • Bulk export in multiple formats (JSON, CSV, XML)

Basic Usage

from whizoai import WhizoAI

client = WhizoAI(api_key="whizo_YOUR-API-KEY")

# Batch scrape multiple URLs
urls = [
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
]

result = client.batch_scrape(
    urls=urls,
    options={
        "format": "markdown",
        "includeScreenshot": False
    }
)

# Results will be processed asynchronously
print(f"Job ID: {result['jobId']}")
print(f"Status: {result['status']}")
print(f"Total URLs: {len(urls)}")

Advanced Configuration

Concurrent Processing

Control how many URLs are processed simultaneously:
result = client.batch_scrape(
    urls=url_list,
    options={
        "concurrency": 5,  # Process 5 URLs at once
        "maxRetries": 3,   # Retry failed pages up to 3 times
        "timeout": 30000   # 30 second timeout per page
    }
)

Progress Monitoring

Track batch progress in real-time:
import time

job_id = result['jobId']

while True:
    status = client.get_job_status(job_id)

    print(f"Progress: {status['pagesCompleted']}/{status['totalPages']}")
    print(f"Success Rate: {(status['pagesCompleted'] - status['pagesFailed']) / status['pagesCompleted'] * 100:.1f}%")

    if status['status'] == 'completed':
        break

    time.sleep(5)

Error Handling

Handle individual page failures gracefully:
results = client.get_job_results(job_id)

successful_pages = [p for p in results['pages'] if p['status'] == 'completed']
failed_pages = [p for p in results['pages'] if p['status'] == 'failed']

print(f"Successfully scraped: {len(successful_pages)}")
print(f"Failed pages: {len(failed_pages)}")

for page in failed_pages:
    print(f"Failed URL: {page['url']}")
    print(f"Error: {page['error']}")

Bulk Export

Export batch results in multiple formats:
# Export as JSON
client.export_job(job_id, format='json', file='results.json')

# Export as CSV
client.export_job(job_id, format='csv', file='results.csv')

# Export as XML
client.export_job(job_id, format='xml', file='results.xml')

# Compressed export
client.export_job(job_id, format='json', compressed=True, file='results.json.gz')

Best Practices

  • Small batches (10-50 URLs): Best for quick processing
  • Medium batches (50-200 URLs): Balanced performance
  • Large batches (200+ URLs): Use lower concurrency to avoid rate limits
Batch processing respects your plan’s rate limits:
  • Free Plan: Max 10 concurrent requests
  • Starter Plan: Max 20 concurrent requests
  • Pro Plan: Max 50 concurrent requests
  • Enterprise Plan: Custom limits
For large batches (1000+ URLs):
  • Process in chunks
  • Stream results instead of loading all at once
  • Use webhooks for completion notifications

Credit Costs

OperationBase CostBatch Discount
Single URL1 credit-
10 URLs10 credits9 credits (10% off)
100 URLs100 credits90 credits (10% off)
Screenshots+1 credit/pageIncluded in discount
PDF Generation+1 credit/pageIncluded in discount

Webhooks Integration

Get notified when batch processing completes:
result = client.batch_scrape(
    urls=url_list,
    options={
        "webhook": "https://your-server.com/webhook"
    }
)

# Webhook receives:
# {
#   "event": "batch.completed",
#   "jobId": "job_123",
#   "status": "completed",
#   "summary": {
#     "totalPages": 100,
#     "successful": 98,
#     "failed": 2,
#     "creditsUsed": 90
#   }
# }

Common Use Cases

E-commerce Product Scraping

Scrape thousands of product pages efficiently with bulk export

Content Migration

Migrate entire websites by batch processing all pages

Competitive Analysis

Monitor multiple competitor websites simultaneously

Data Aggregation

Collect data from multiple sources for analysis

Job Management API

Learn how to monitor and manage batch jobs

Webhooks

Set up webhook notifications for batch completion

Error Handling

Best practices for handling failures in batch processing