Token-Budgeted Extraction: Why Context Size Matters for LLM Cost
A typical news article is 1,500 words. The raw HTML of that page — including navigation, ads, scripts, and boilerplate — is 40,000–100,000 tokens. If you send the raw HTML to your LLM, you're paying 20–60x more than necessary and getting worse results (LLMs lose focus in long, noisy contexts).
The problem with naive extraction
Most web scraping approaches either (a) send raw HTML to the LLM and let it figure it out, or (b) apply a generic boilerplate remover that strips everything and loses structure. Neither approach is query-aware. QATBE takes a different approach.
How QATBE works
- Segment the extracted content into meaningful units (paragraphs, headings, lists, code blocks)
- Score each segment by BM25 relevance to your specific query
- Pack the highest-scoring segments into your token budget using a greedy knapsack algorithm
- Preserve the document order of selected segments so the LLM gets coherent context
Real-world impact
In our internal benchmarks on 500 web pages, QATBE reduced context size by an average of 78% while preserving 94% of query-relevant information. At GPT-4o pricing ($5/M input tokens), that's a reduction from ~$250 to ~$55 for 10,000 pages.
Using QATBE via the API
curl -X POST https://api.fetchium.com/v1/scrape \
-H "Authorization: Bearer YOUR_KEY" \
-d '{
"url": "https://example.com/article",
"query": "async rust patterns",
"token_budget": 4096
}'The token_budget parameter sets your target. QATBE guarantees the response fits within it while maximizing query-relevant content.
Related Articles
Try Fetchium free — 1,000 requests/month
Get API Key Free →