Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
VLM and Claude Web Agents
Explore building reliable web‑scraping agents using a vision-language model, Claude reasoning, Selenium automation, and prompt engineering, demonstrated with flight price extraction.
Web scraping is broken. Companies spend millions maintaining brittle scrapers while developers waste countless hours rebuilding the same solutions. The emergence of powerful vision-language models (VLMs) and LLMs creates an opportunity to revolutionize this space.
I’ll demonstrate a novel architecture that combines:
- Microsoft VLM for visual understanding and DOM parsing
- Claude for reasoning and task planning
- Selenium for browser automation
- Custom prompt engineering for reliable structured output
We’ll explore:
- Why traditional scrapers fail
- How VLMs understand web interfaces
- Prompt engineering for reliable agents
- Live demo: Flight price comparison
- Challenges in hallucination prevention
- Open source architecture decisions
Key technical innovations:
- Vision-guided DOM traversal
- RAG memory during browsing
- Structured data extraction
This project started from personal frustration with repetitive research tasks. The goal: make web automation accessible to everyone while being reliable enough for production use.
Live demo will showcase the agent finding flight prices and returning structured JSON - all without human intervention.
OneQuery.app: API for structured, asynchronous web data, no manual scraping.
OneQuery: AI web agent extracts structured data via Playwright, LLMs.