Browser Automation

Overview

WhizoAI’s browser automation features allow you to scrape JavaScript-heavy websites, SPAs (Single Page Applications), and dynamic content that requires user interaction or waiting for elements to load.

Supported Engines

Playwright

Recommended for most use cases

Fast and reliable
Chrome, Firefox, WebKit support
Excellent for modern SPAs

Puppeteer

Chrome-specific automation

Stable Chrome automation
Good for Chrome-specific features
Wide community support

JavaScript Rendering

Enable JavaScript Rendering

result = client.scrape(
    url="https://spa-example.com",
    options={
        "javascript": True,  # Enable JS rendering
        "engine": "playwright",
        "waitTime": 5  # Wait 5 seconds for JS to execute
    }
)

Wait for Specific Elements

Wait for dynamic content to load:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "waitFor": {
            "selector": ".product-price",  # Wait for this element
            "timeout": 10000  # Max wait time (10 seconds)
        }
    }
)

Wait for Network Idle

Ensure all AJAX requests complete:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "waitUntil": "networkidle",  # Wait until network is idle
        "timeout": 30000
    }
)

Page Interactions

Click Elements

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "actions": [
            {
                "type": "click",
                "selector": "#load-more-button"
            },
            {
                "type": "wait",
                "duration": 2000  # Wait 2 seconds
            }
        ]
    }
)

Fill Forms

result = client.scrape(
    url="https://example.com/search",
    options={
        "javascript": True,
        "actions": [
            {
                "type": "type",
                "selector": "#search-input",
                "text": "AI web scraping"
            },
            {
                "type": "click",
                "selector": "#search-button"
            },
            {
                "type": "waitForNavigation"
            }
        ]
    }
)

Scroll Page

Load infinite scroll content:

result = client.scrape(
    url="https://example.com/feed",
    options={
        "javascript": True,
        "actions": [
            {
                "type": "scroll",
                "direction": "down",
                "distance": 1000  # Pixels
            },
            {
                "type": "wait",
                "duration": 2000
            },
            {
                "type": "scroll",
                "direction": "down",
                "distance": 1000
            }
        ]
    }
)

Advanced Browser Options

Custom Viewport

Simulate different screen sizes:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "viewport": {
            "width": 1920,
            "height": 1080
        },
        "mobile": False
    }
)

# Mobile viewport
result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "viewport": {
            "width": 375,
            "height": 667,
            "isMobile": True,
            "hasTouch": True
        },
        "mobile": True
    }
)

Geolocation

Set custom geolocation:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "geolocation": {
            "latitude": 40.7128,
            "longitude": -74.0060
        }
    }
)

Timezone

Set custom timezone:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "timezone": "America/New_York"
    }
)

Screenshot Capture

Full Page Screenshots

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "includeScreenshot": True,
        "screenshotOptions": {
            "fullPage": True,
            "type": "png"
        }
    }
)

screenshot_url = result['screenshots'][0]

Element Screenshots

Capture specific elements:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "includeScreenshot": True,
        "screenshotOptions": {
            "selector": "#main-content",
            "type": "jpeg",
            "quality": 80
        }
    }
)

PDF Generation

Generate PDFs from rendered pages:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "includePdf": True,
        "pdfOptions": {
            "format": "A4",
            "printBackground": True,
            "margin": {
                "top": "20mm",
                "bottom": "20mm",
                "left": "15mm",
                "right": "15mm"
            }
        }
    }
)

pdf_url = result['pdf']

Complex Automation Workflows

Multi-step Interactions

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "actions": [
            # Accept cookies
            {
                "type": "click",
                "selector": "#accept-cookies"
            },
            # Open dropdown
            {
                "type": "click",
                "selector": ".category-dropdown"
            },
            # Select option
            {
                "type": "click",
                "selector": "option[value='electronics']"
            },
            # Fill search
            {
                "type": "type",
                "selector": "#search",
                "text": "laptop"
            },
            # Submit
            {
                "type": "click",
                "selector": "button[type='submit']"
            },
            # Wait for results
            {
                "type": "waitFor",
                "selector": ".search-results"
            }
        ]
    }
)

Handling Auth & Session

Scrape authenticated pages:

result = client.scrape(
    url="https://example.com/dashboard",
    options={
        "javascript": True,
        "cookies": [
            {
                "name": "session_id",
                "value": "abc123...",
                "domain": ".example.com"
            },
            {
                "name": "auth_token",
                "value": "xyz789...",
                "domain": ".example.com"
            }
        ]
    }
)

Local Storage

Set local storage values:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "localStorage": {
            "user_preferences": "dark_mode",
            "auth_token": "token_value"
        }
    }
)

Performance Optimization

Block Resources

Speed up scraping by blocking unnecessary resources:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "blockResources": ["image", "stylesheet", "font"],  # Block images, CSS, fonts
        "blockDomains": ["analytics.google.com", "facebook.com"]  # Block tracking
    }
)

Request Interception

Modify requests on the fly:

result = client.scrape(
    url="https://example.com",
    options={
        "javascript": True,
        "interceptRequests": {
            "enabled": True,
            "rules": [
                {
                    "pattern": "*/api/data",
                    "response": {"custom": "data"}
                }
            ]
        }
    }
)

Error Handling

try:
    result = client.scrape(
        url="https://example.com",
        options={
            "javascript": True,
            "waitFor": {"selector": ".content", "timeout": 15000}
        }
    )
except WhizoAIError as e:
    if e.code == 'TIMEOUT':
        print("Page took too long to load. Try increasing timeout.")
    elif e.code == 'SELECTOR_NOT_FOUND':
        print("Waited element never appeared on the page.")

Credit Costs

Feature	Additional Cost
JavaScript Rendering	+1 credit
Screenshot (full page)	+1 credit
PDF Generation	+1 credit
Each Page Action	No additional cost
Mobile Rendering	No additional cost

Example: Scraping with JavaScript + Screenshot = 1 (base) + 1 (JS) + 1 (screenshot) = 3 credits

Best Practices

Optimize Wait Times

Use specific selectors instead of fixed wait times
Prefer waitFor over waitTime when possible
Use networkidle only when necessary (slower)

Timeout Management

Set reasonable timeouts (10-30 seconds)
Handle timeout errors gracefully
Don’t set timeouts too high (costs more, slower)

Common Use Cases

Single Page Applications (SPAs)

React, Vue, Angular apps that load content dynamically

Infinite Scroll Pages

Social media feeds, product listings that load on scroll

Interactive Maps

Scrape data from Google Maps, Mapbox, and other mapping services

Anti-Bot Stealth

Combine with stealth mode for better success rates

Proxy Rotation

Use proxies with browser automation

Scrape API Reference

Complete API documentation

Core Features

Advanced Features

Overview

Supported Engines

Playwright

Puppeteer

JavaScript Rendering

Enable JavaScript Rendering

Wait for Specific Elements

Wait for Network Idle

Page Interactions

Click Elements

Fill Forms

Scroll Page

Advanced Browser Options

Custom Viewport

Geolocation

Timezone

Screenshot Capture

Full Page Screenshots

Element Screenshots

PDF Generation

Complex Automation Workflows

Multi-step Interactions

Handling Auth & Session

Local Storage

Performance Optimization

Block Resources

Request Interception

Error Handling

Credit Costs

Best Practices

Common Use Cases

Anti-Bot Stealth

Proxy Rotation

Scrape API Reference

Core Features

Advanced Features

​Overview

​Supported Engines

Playwright

Puppeteer

​JavaScript Rendering

​Enable JavaScript Rendering

​Wait for Specific Elements

​Wait for Network Idle

​Page Interactions

​Click Elements

​Fill Forms

​Scroll Page

​Advanced Browser Options

​Custom Viewport

​Geolocation

​Timezone

​Screenshot Capture

​Full Page Screenshots

​Element Screenshots

​PDF Generation

​Complex Automation Workflows

​Multi-step Interactions

​Handling Auth & Session

​Cookie Injection

​Local Storage

​Performance Optimization

​Block Resources

​Request Interception

​Error Handling

​Credit Costs

​Best Practices

​Common Use Cases

​Related Resources

Anti-Bot Stealth

Proxy Rotation

Scrape API Reference

Overview

Supported Engines

JavaScript Rendering

Enable JavaScript Rendering

Wait for Specific Elements

Wait for Network Idle

Page Interactions

Click Elements

Fill Forms

Scroll Page

Advanced Browser Options

Custom Viewport

Geolocation

Timezone

Screenshot Capture

Full Page Screenshots

Element Screenshots

PDF Generation

Complex Automation Workflows

Multi-step Interactions

Handling Auth & Session

Cookie Injection

Local Storage

Performance Optimization

Block Resources

Request Interception

Error Handling

Credit Costs

Best Practices

Common Use Cases

Related Resources