Dev Tools

Collect and compare LLM responses

Ask any question. See what ChatGPT, Perplexity, and Gemini actually say.

Use case: Send any question to ChatGPT, Perplexity, and Gemini simultaneously and use Claude to determine whether the models agree.

Quick Start

Inputs

  1. Question Any question to send to all three AI platforms simultaneously — typed into the Live tab in the dashboard.

Outputs

  • 100 pre-analyzed questions with per-model verdicts and consensus labels
  • Consensus breakdown across the full dataset: 44 Strong / 48 Moderate / 8 Split
  • Category breakdown: Tech & AI, Finance & Business, Health & Science, E-commerce & Consumer, Society & Work
  • Live tab: real-time consensus from any question in approximately 65 seconds
  • Per-question detail: all three raw responses plus Claude’s extracted verdict and reason per model

Sample dataset. 100 questions across 5 categories — Tech & AI, Finance & Business, Health & Science, E-commerce & Consumer, and Society & Work — pre-analyzed with Claude Haiku. Results: 44 strong consensus, 48 moderate, 8 split. No API key needed to browse the pre-loaded dataset.

View example on GitHub

How it works

A 5-phase pipeline. Read the blog here for a deeper explanation.

  1. Send The same question is sent to ChatGPT, Perplexity, and Gemini simultaneously via three parallel Nimble agent calls — all three fire at once via ThreadPoolExecutor, so round-trip time is bounded by the slowest platform.
  2. Collect All three raw responses are stored as-is with no preprocessing — Claude Haiku receives the raw output exactly as each platform produced it, including Gemini's freeform markdown.
  3. Judge Claude Haiku reads all three responses in a single prompt and extracts each model's core position regardless of format, tone, or verbosity.
  4. Label Consensus is labeled as Strong (all three agree), Moderate (two agree, one differs), or Split (clear disagreement) — assigned semantically by Claude, not by keyword matching.
  5. Render Results are displayed with per-model verdict, reason, and consensus label — browse the 100 pre-loaded questions or enter any question in the Live tab (~65 seconds end-to-end).

Stack

Nimble primitives plus the full runtime stack.
Nimble APIs
What it does
  1. chatgpt Fetches a live response from ChatGPT’s web interface for any prompt. Returns structured text.
  2. perplexity Fetches a live response from Perplexity’s search interface for any prompt. Returns structured text.
  3. gemini Fetches a live response from Gemini’s interface for any prompt. Returns freeform markdown — no structured format enforced.
3rd Party Tools
Role
  1. claude-haiku-4-5 Anthropic Claude API — reads all three raw responses per question in a single pass and judges consensus semantically.
  2. streamlit Dashboard — Browse tab for the pre-loaded dataset and Live tab for real-time queries.
  3. python 3.9+ Parallel fetch with ThreadPoolExecutor — 300 total agent calls across 100 questions.
Reach out if you have any questions.
Talk to an Expert