Overview
WhizoAI’s AI extraction feature uses advanced Language Learning Models (LLMs) to intelligently extract structured data from web pages. Simply define your schema, and let AI handle the extraction logic.Supported Models
GPT-3.5 Turbo
3 credits/page
Fast and cost-effective for simple extraction
GPT-4
6 credits/page
Advanced reasoning for complex data structures
Claude 3
5 credits/page
Excellent for long-form content and nuanced extraction
Basic Usage
Advanced Schema Definitions
Complex Data Structures
Extract nested objects and arrays:Type Hints and Validation
Specify data types for better accuracy:Extraction Options
Model Selection
Choose the best model for your use case:Custom Instructions
Add context for better extraction:Output Formatting
Handling Large Pages
For content-heavy pages, optimize extraction:Batch AI Extraction
Extract from multiple pages efficiently:Error Handling
Handle extraction failures gracefully:Validation and Confidence Scores
WhizoAI provides confidence scores for extractions:Common Use Cases
E-commerce Product Data
E-commerce Product Data
Extract product names, prices, descriptions, specs, and reviews from online stores
Job Listings
Job Listings
Parse job titles, salaries, requirements, and company info from career pages
Real Estate Listings
Real Estate Listings
Extract property details, prices, locations, and features from listing sites
News Articles
News Articles
Extract headlines, authors, publish dates, and article content from news sites
Business Directories
Business Directories
Extract company names, addresses, phone numbers, and services from directories
Best Practices
Credit Costs
| Model | Cost per Page | Best For |
|---|---|---|
| GPT-3.5 Turbo | 3 credits | Simple, flat data structures |
| GPT-4 | 6 credits | Complex nested data, high accuracy needed |
| Claude 3 Sonnet | 5 credits | Long-form content, nuanced extraction |
Comparison: AI vs Traditional Scraping
| Feature | Traditional Scraping | AI Extraction |
|---|---|---|
| Setup Time | High (write custom selectors) | Low (define schema) |
| Adaptability | Breaks when HTML changes | Adapts to layout changes |
| Complex Data | Manual nested parsing needed | Handles nesting automatically |
| Cost | 1 credit/page | 3-6 credits/page |
| Speed | Faster | Slower (LLM processing) |
| Best For | Static, predictable layouts | Dynamic, complex structures |
Related Resources
LLM SDK Integrations
Integrate WhizoAI with LangChain, LlamaIndex, and more
Batch Processing
Process thousands of extractions efficiently
Extract API Reference
Full API documentation for extraction endpoints