Skip to main content

Web Archive API

The Web Archive API provides access to historical versions of websites through integration with the Internet Archive’s Wayback Machine. Retrieve archived content, search historical snapshots, and analyze website evolution over time.
Credits Required: Archive operations consume 1-2 credits per request (availability checks are free)

Authentication

All archive endpoints require authentication using either:
  • API Key: Include in Authorization: Bearer YOUR_API_KEY header
  • Session Token: Use Supabase session for dashboard access

Endpoints Overview


Search Snapshots

Search for historical snapshots of a URL, with optional date filtering and result limiting.
curl -X POST "https://api.whizo.ai/v1/archive/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "dateRange": {
      "from": "20220101",
      "to": "20231231"
    },
    "limit": 50,
    "provider": "wayback",
    "includeMetadata": true
  }'

Request Body

url
string
required
The URL to search for archived snapshots
timestamp
string
Specific timestamp in YYYYMMDD format (searches around this date)
dateRange
object
Search within a date range
limit
number
default:50
Maximum number of snapshots to return (1-1000)
provider
string
default:"wayback"
Archive provider to search. Options: wayback, archive_today, memento
includeMetadata
boolean
default:false
Include additional metadata about each snapshot
fallbackToClosest
boolean
default:true
Return closest available snapshots if exact matches not found

Response

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "provider": "wayback",
    "snapshots": [
      {
        "timestamp": "20230615120000",
        "url": "https://example.com",
        "mimetype": "text/html",
        "statuscode": "200",
        "digest": "sha1:ABCD1234...",
        "length": "15432"
      }
    ],
    "totalFound": 1,
    "query": {
      "dateRange": {
        "from": "20220101",
        "to": "20231231"
      },
      "limit": 50
    },
    "creditsUsed": 1
  },
  "metadata": {
    "searchTime": "2024-01-15T10:30:00Z",
    "provider": "wayback",
    "cached": false
  }
}

Retrieve Snapshot

Download the actual content from a specific archived snapshot.
curl -X POST "https://api.whizo.ai/v1/archive/snapshot" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "timestamp": "20230615120000",
    "provider": "wayback"
  }'

Request Body

url
string
required
The original URL of the archived page
timestamp
string
required
Exact timestamp in YYYYMMDDHHMMSS format
provider
string
default:"wayback"
Archive provider. Options: wayback, archive_today, memento

Response

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "timestamp": "20230615120000",
    "provider": "wayback",
    "snapshot": {
      "content": "<!DOCTYPE html><html>...",
      "metadata": {
        "statusCode": 200,
        "contentType": "text/html"
      },
      "timestamp": "20230615120000",
      "url": "https://example.com",
      "available": true
    },
    "creditsUsed": 2
  },
  "metadata": {
    "retrievedAt": "2024-01-15T10:30:00Z",
    "provider": "wayback",
    "cached": false
  }
}

Check Availability

Check if archived versions exist for a URL without consuming credits.
curl -X POST "https://api.whizo.ai/v1/archive/availability" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "timestamp": "20230615"
  }'

Request Body

url
string
required
The URL to check for archived versions
timestamp
string
Check availability around specific date (YYYYMMDD format)

Response

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "timestamp": "20230615",
    "availability": {
      "hasSnapshots": true,
      "closestDate": "20230615120000"
    },
    "creditsUsed": 0
  },
  "metadata": {
    "checkedAt": "2024-01-15T10:30:00Z",
    "cached": false
  }
}

Generate Timeline

Generate a timeline showing the archive history of a URL with customizable granularity.
curl -X POST "https://api.whizo.ai/v1/archive/timeline" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "dateRange": {
      "from": "20220101",
      "to": "20231231"
    },
    "granularity": "month"
  }'

Request Body

url
string
required
The URL to generate timeline for
dateRange
object
required
Date range for timeline analysis
granularity
string
default:"month"
Timeline granularity. Options: day, week, month, year

Response

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "dateRange": {
      "from": "20220101",
      "to": "20231231"
    },
    "granularity": "month",
    "timeline": [
      {
        "period": "2022-01",
        "count": 3,
        "snapshots": ["20220105120000", "20220115080000", "20220128140000"]
      },
      {
        "period": "2022-02",
        "count": 2,
        "snapshots": ["20220207100000", "20220224160000"]
      }
    ],
    "totalSnapshots": 5,
    "creditsUsed": 1
  }
}

Error Handling

The Web Archive API returns standard HTTP status codes and structured error responses:
{
  "success": false,
  "error": "Insufficient credits",
  "details": {
    "required": 2,
    "available": 0
  }
}

Common Error Codes

Invalid request parameters or malformed data
Invalid or missing authentication credentials
Insufficient credits for the requested operation
URL not found in archive or no snapshots available
Too many requests - please slow down
Internal server error or Wayback Machine unavailable

Use Cases

Website Evolution Analysis

Track how a website has changed over time by analyzing archived snapshots

Content Recovery

Recover deleted content or previous versions of web pages

Competitor Research

Study competitor websites’ historical changes and strategies

SEO Analysis

Analyze historical SEO changes and their impact

Rate Limits

  • Search operations: 60 requests per minute
  • Snapshot retrieval: 30 requests per minute
  • Availability checks: 120 requests per minute (free operations)
  • Timeline generation: 20 requests per minute

Best Practices

  • Use availability checks before expensive snapshot operations
  • Implement proper error handling for archive service unavailability
  • Cache results when possible to reduce credit usage
  • Use appropriate date ranges to limit search scope
  • Archive content may be limited by robots.txt restrictions
  • Some snapshots may be incomplete or corrupted
  • Response times vary based on archive age and size