March 24, 2026

How to Track OpenAI, Gemini, and Grok Pricing Automatically

min read

Copied!

Tom Shaked

No items found.

March 24, 2026

How to Track OpenAI, Gemini, and Grok Pricing Automatically

min read

Copied!

Tom Shaked

No items found.

Table of Contents

Connect with Nimble

Connect on Slack

The Problem

Pricing changes happen more often than you'd expect. OpenAI cut GPT-4o input pricing in half in May 2024 — from $5 to $2.50 per million tokens — and then cut it again later in the year. Google has repriced Gemini tiers multiple times as the model lineup expanded. xAI launched Grok with one pricing structure and revised it as the API matured. None of these changes came with a notification. If you were routing traffic based on cost comparisons, your routing logic was wrong before you knew it needed updating.

Here's the problem: there's no webhook. No official changelog. No structured API endpoint for pricing data. The pricing page on each provider's website is the source of truth, and it changes without announcement. Most teams find out from a Hacker News comment, a Slack message from someone who happened to check, or worse, from a surprised accounting email at the end of the month.

What to Monitor

Start with the pages that actually matter:

OpenAI — openai.com/api/pricing Track input/output price per million tokens for GPT-4o, o1, and o3. Watch the context window limits and which models are deprecated.

Google — ai.google.dev/gemini-api/docs/pricing Gemini pricing varies by tier and model. Track per-million-token rates and free tier limits (they change).

X — x.ai/api Grok model pricing. Smaller provider, but if you're using Grok in production, you need to know when rates shift.

Anthropic — anthropic.com/pricing Claude pricing by model and token tier. Input vs. output rates, batch processing discounts.

For each page, the data that matters: input price per million tokens, output price per million tokens, context window size, and any free tier limits or rate restrictions.

Building It with Python

Let's build this step by step. Each step shows you the problem, then the solution.

Step 1: Fetch the Page

Start simple. Get the HTML.

import requests
from bs4 import BeautifulSoup

response = requests.get("https://openai.com/api/pricing", headers={"User-Agent": "Mozilla/5.0"})
soup = BeautifulSoup(response.text, "html.parser")
tables = soup.find_all("table")
print(tables)  # []

You get an empty list. The pricing tables aren't in the static HTML. OpenAI renders them with JavaScript after the page loads. You need a browser.

Step 2: Add JavaScript Rendering

Use Playwright to actually render the page.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://openai.com/api/pricing")
    page.wait_for_load_state("networkidle")
    content = page.content()
    browser.close()
print(content[:500])

This works sometimes. It also doesn't work sometimes. Cloudflare blocks most headless browsers on openai.com. You'll get through occasionally, but not reliably.

Step 3: Add Stealth

Make the browser look more like a real user.

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    stealth_sync(page)
    page.goto("https://openai.com/api/pricing")
    page.wait_for_load_state("networkidle")
    content = page.content()
    browser.close()

This is more reliable but still not consistent. Cloudflare's detection rules evolve. What works this week may not work next week. You're in an arms race with their bot detection.

Step 4: Store a Snapshot and Detect Changes

Assume you can fetch the page. Now track changes over time.

import hashlib
import json
from datetime import datetime

def snapshot_hash(content):
    return hashlib.sha256(content.encode()).hexdigest()

def save_snapshot(url, content):
    record = {
        "url": url,
        "timestamp": datetime.now().isoformat(),
        "hash": snapshot_hash(content),
        "content": content
    }
    filename = f"snapshot_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
    with open(filename, "w") as f:
        json.dump(record, f)
    return record

def check_for_changes(url, current_content, previous_hash):
    current_hash = snapshot_hash(current_content)
    if current_hash != previous_hash:
        print(f"[CHANGE DETECTED] {url}")
        return True
    return False

This hashes the content and compares it to the previous hash. If they differ, you've got a change. Store both the hash and the full content so you can diff it later.

Step 5: Send an Alert

When you detect a change, tell someone.

import smtplib
from email.mime.text import MIMEText

def send_alert(url, subject="Pricing page changed"):
    msg = MIMEText(f"Change detected on: {url}")
    msg["Subject"] = subject
    msg["From"] = "monitor@yourdomain.com"
    msg["To"] = "team@yourdomain.com"
    with smtplib.SMTP("smtp.yourdomain.com", 587) as server:
        server.sendmail(msg["From"], [msg["To"]], msg.as_string())

Or use a Slack webhook instead:

import requests

def send_slack_alert(url, webhook_url):
    payload = {
        "text": f"Pricing change detected on: {url}"
    }
    requests.post(webhook_url, json=payload)

This is the core loop. Fetch, hash, compare, alert.

Where It Gets Hard to Maintain

Building it is one thing. Running it for six months is another.

Cloudflare detection on OpenAI updates periodically. A script that worked last month stops working today because they changed their fingerprinting rules. You get an alert that nothing has changed, but actually the page fetch is now silently failing. You don't know which.

Each provider page has a different structure, different bot detection, and different rendering behavior. OpenAI uses React. Google's page is mostly static but the pricing tier logic is JavaScript. X's pricing is embedded in a different URL structure. You can't write one extractor and apply it to all four. You need custom parsing for each.

Scheduling the script reliably is its own problem. Cron jobs fail silently. If the script crashes, the cron job just exits and you don't know about it. You need monitoring on top of your monitoring. You need to know that the fetch itself succeeded before you trust the "no changes" result.

Handling failures well is critical. What happens when a page is temporarily down? If you're not careful, you send a false alarm that "pricing changed" when really the page was just unreachable for 30 seconds. Now your team doesn't trust the alerts anymore.

Proxy rotation adds cost and another failure point. Cloudflare sees the same IP fetching every hour. Add proxies to reduce detection risk, but now you've got rotating proxy management, proxy failure handling, and an additional expense.

Someone has to maintain all of this when it breaks. When Cloudflare changes their detection method next month, you'll be the one debugging why the script stopped working. When Google restructures their pricing page, you'll be the one rewriting the parser.

Making It Production-Ready

There are two ways forward. Either you own all of the above, or you use a service that does.

Single Page Extraction

Use Nimble to fetch and render a single pricing page.

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

result = nimble.extract(
    url="https://openai.com/api/pricing",
    render=True,
    driver="vx10",
    formats=["markdown"]
)

print(result.data.markdown)

The HTML is rendered, Cloudflare is handled, and you get back clean markdown of the pricing table. No bot detection arms race, no stealth browser configuration, no failed requests.

Scaling to All Providers with Async

Monitor all four pricing pages in parallel without blocking.

from nimble_python import Nimble
import time
import hashlib
from datetime import datetime

nimble = Nimble(api_key="YOUR_API_KEY")

urls = [
    "https://openai.com/api/pricing",
    "https://ai.google.dev/gemini-api/docs/pricing",
    "https://x.ai/api",
    "https://www.anthropic.com/pricing"
]

# Submit all extractions at once
tasks = []
for url in urls:
    response = nimble.extract_async(
        url=url,
        render=True,
        driver="vx10",
        formats=["markdown"]
    )
    tasks.append({"url": url, "task_id": response.task_id})
    print(f"Submitted: {url} → {response.task_id}")

# Poll for results
previous_hashes = {}  # load from storage in practice

for task in tasks:
    while True:
        status = nimble.tasks.get(task["task_id"])
        if status.task.state == "success":
            result = nimble.tasks.results(task["task_id"])
            content = result.data.markdown
            current_hash = hashlib.sha256(content.encode()).hexdigest()

            if task["url"] in previous_hashes and previous_hashes[task["url"]] != current_hash:
                print(f"[{datetime.now()}] CHANGE DETECTED: {task['url']}")
                # trigger alert

            previous_hashes[task["url"]] = current_hash
            break
        elif status.task.state == "failed":
            print(f"Failed: {task['url']}")
            break
        time.sleep(2)

Submit all four URLs to Nimble at once, then poll for results. When they come back, compare hashes and alert on changes. For scheduled runs, you can use Nimble's Managed Service instead. Set it up once, and Nimble executes the extraction on your schedule and delivers results to your webhook. No cron job to maintain, no server to keep alive.

Getting Started with Nimble

Install the Python client:

pip install nimble_python

Authenticate and make your first extraction:

from nimble_python import Nimble

nimble = Nimble(api_key="YOUR_API_KEY")

result = nimble.extract(
    url="https://openai.com/api/pricing",
    render=True,
    driver="vx10"
)

print(result.data.markdown)

Extraction starts at $0.90 per 1,000 URLs. Free trial: 5,000 pages.

Continue Exploring

Pricing is one piece of the picture. These posts cover other LLM data worth tracking and how to collect it.

The Complete Guide to LLM Scraping — The full picture: scraping provider documentation pages and querying LLM interfaces, with DIY approaches for each.
How to Extract ChatGPT Responses as Structured Data with Python — Collect answers, source citations, and links from ChatGPT's web interface programmatically.
Scraping Google AI Mode for LLM Overviews and Sources — Pull Google's AI-generated overviews and their cited sources using Python.
Scraping Grok: Real-Time Answers, Images, and Web Search Results — Extract structured responses from Grok, including answer HTML and image data.
Scraping Gemini's Web Search Answers with Python — Collect Gemini's grounded answers with full source metadata and position data.
How to Monitor AI Model Deprecations in Real-Time — Set up alerts for model deprecation notices so you're not caught off guard when a model gets turned off.