Token Budgeted Extraction Llm Cost

A typical news article is 1,500 words. The raw HTML of that page — including navigation, ads, scripts, and boilerplate — is 40,000–100,000 tokens. If you send the raw HTML to your LLM, you're paying 20–60x more than necessary and getting worse results (LLMs lose focus in long, noisy contexts).

The problem with naive extraction

Most web scraping approaches either (a) send raw HTML to the LLM and let it figure it out, or (b) apply a generic boilerplate remover that strips everything and loses structure. Neither approach is query-aware. QATBE takes a different approach.

How QATBE works

Segment the extracted content into meaningful units (paragraphs, headings, lists, code blocks)
Score each segment by BM25 relevance to your specific query
Pack the highest-scoring segments into your token budget using a greedy knapsack algorithm
Preserve the document order of selected segments so the LLM gets coherent context

Real-world impact

In our internal benchmarks on 500 web pages, QATBE reduced context size by an average of 78% while preserving 94% of query-relevant information. At GPT-4o pricing ($5/M input tokens), that's a reduction from ~$250 to ~$55 for 10,000 pages.

Using QATBE via the API

curl -X POST https://api.fetchium.com/v1/scrape \
  -H "Authorization: Bearer YOUR_KEY" \
  -d '{
    "url": "https://example.com/article",
    "query": "async rust patterns",
    "token_budget": 4096
  }'

The token_budget parameter sets your target. QATBE guarantees the response fits within it while maximizing query-relevant content.

Token-Budgeted Extraction: Why Context Size Matters for LLM Cost

The problem with naive extraction

How QATBE works

Real-world impact

Using QATBE via the API

Related Articles