Overview
n8n is a powerful, self-hosted workflow automation platform. Combine it with WhizoAI for complete control over your web scraping workflows—no vendor lock-in, unlimited executions, and full data privacy.Why n8n + WhizoAI?
Self-Hosted
Run on your own infrastructure for complete data control
Unlimited Executions
No execution limits unlike cloud-based alternatives
Visual Workflows
Build complex workflows with drag-and-drop interface
Open Source
Customize and extend to fit your exact needs
Installation
Quick Start with Docker
npm Installation
Setup WhizoAI in n8n
Method 1: HTTP Request Node (Recommended)
- Add “HTTP Request” node
- Configure:
- Method:
POST - URL:
https://api.whizo.ai/v1/scrape - Authentication:
Generic Credential Type - Add header:
Authorization=Bearer whizo_YOUR-API-KEY - Body: JSON with scraping options
- Method:
Method 2: Webhook Node (For Webhooks)
- Add “Webhook” node to receive WhizoAI webhook events
- Copy the webhook URL
- Register in WhizoAI dashboard
Common Workflows
1. Scheduled Website Monitoring
Use Case: Monitor website daily and alert on changes. Workflow:- Mode: Every day
- Hour: 9
- Minute: 0
- Method:
POST - URL:
https://api.whizo.ai/v1/scrape - Headers:
- Body:
- Add field:
currentContent={{$json["content"]}}
- File Path:
/data/previous_scrape.json
- Condition:
{{$node["Set"].json["currentContent"]}}≠{{$node["Read Binary File"].json["content"]}}
- Channel:
#alerts - Message:
Website changed! Previous: {{$node["Read Binary File"].json["content"][:100]}}... Current: {{$node["Set"].json["currentContent"][:100]}}...
- File Path:
/data/previous_scrape.json - Data:
{{$node["Set"].json}}
2. Form Submission → Research → CRM Update
Use Case: Auto-research companies submitted via form. Workflow:- Path:
/form-webhook
- URL:
https://api.whizo.ai/v1/scrape - Body:
- URL:
https://api.whizo.ai/v1/extract - Body:
- Method:
POST - URL:
https://api.hubapi.com/crm/v3/objects/companies - Body: Mapped from AI extraction
- To:
{{$node["Webhook"].json["body"]["email"]}} - Subject:
Company Research Complete: {{$node["HTTP Request 1"].json["extractedData"]["company_name"]}}
3. RSS Feed → Scrape → Summarize → Publish
Use Case: Aggregate industry news, scrape full articles, summarize, publish to blog. Workflow:- URL:
https://example.com/rss
- Batch Size: 5
- URL:
https://api.whizo.ai/v1/batch - Body:
- Amount: 2
- Unit: minutes
- URL:
https://api.whizo.ai/v1/jobs/{{$node["HTTP Request"].json["jobId"]}}/results
- URL:
https://api.whizo.ai/v1/extract - Body:
- Operation: Create Post
- Title:
{{$json["title"]}} - Content: Formatted with summary and key points
4. Batch URL Processing
Use Case: Process list of URLs from CSV/database. Workflow:- File Path:
/data/urls.csv
- URL:
https://api.whizo.ai/v1/batch - Body:
{{$json["urls"]}}
- Amount:
{{Math.ceil($json["urls"].length / 5)}}minutes
- URL:
https://api.whizo.ai/v1/jobs/{{$node["HTTP Request"].json["jobId"]}}/results
- File Path:
/data/results.csv
Advanced Patterns
Error Handling
Add error handling to workflows: Node: IF (Check Status)- Condition 1:
{{$node["HTTP Request"].statusCode}}= 200- Success path
- Condition 2:
{{$node["HTTP Request"].statusCode}}≠ 200- Error path → Log → Alert → Retry
Webhook-Driven Workflows
Receive WhizoAI job completion webhooks: Node 1: Webhook- Path:
/whizoai-webhook - Authentication: Header Auth
- Name:
X-WhizoAI-Signature - Value: Verify with your secret
- Name:
- Route by
{{$json["event"]}}job.completed→ Success handlerjob.failed→ Error handlercredit.low→ Alert handler
Data Transformation
Transform scraped data before saving: Node: Function (Transform Data)Best Practices
Credential Management
Credential Management
Store API keys securely:
- Go to n8n Settings → Credentials
- Add “Header Auth” credential
- Name:
WhizoAI API Key - Value:
Bearer whizo_YOUR-API-KEY - Use in HTTP Request nodes
Error Logging
Error Logging
Always log errors:
- Add “Write Binary File” node to error paths
- Log to file:
/logs/errors-{{$now.format("YYYY-MM-DD")}}.json - Include full error context and request details
Performance Optimization
Performance Optimization
- Use batch operations when scraping multiple URLs
- Implement rate limiting with “Wait” nodes
- Cache results to avoid duplicate scraping
- Use lazy loading for large datasets
Workflow Organization
Workflow Organization
- Use sticky notes to document complex logic
- Group related nodes together
- Name nodes descriptively
- Version control your workflows (export JSON)
Example Workflow JSON
Basic Scraping Workflow
Monitoring & Debugging
Enable Execution Logging
View Execution History
- Click “Executions” in n8n sidebar
- Filter by workflow
- Click execution to see detailed logs
- Review each node’s input/output
Common Issues
| Issue | Solution |
|---|---|
| Authentication failed | Check API key format and credentials |
| Timeout errors | Increase timeout in HTTP Request node settings |
| Memory issues | Process data in smaller batches |
| Webhook not receiving | Verify webhook URL and check firewall rules |