AI

AI Consensus Dashboard

Ask any question. See what ChatGPT, Perplexity, and Gemini actually say.

A pipeline that sends the same question to three major AI platforms simultaneously — ChatGPT, Perplexity, and Gemini — via Nimble’s AI agents, then uses Claude Haiku to read all three raw responses and judge whether they agree. Ships with 100 pre-loaded questions and a live tab for real-time queries.

Inputs

Outputs

What you get after a full run.
  • 100 pre-analyzed questions with per-model verdicts and consensus labels
  • Consensus breakdown across the full dataset: 44 Strong / 48 Moderate / 8 Split
  • Category breakdown: Tech & AI, Finance & Business, Health & Science, E-commerce & Consumer, Society & Work
  • Live tab: real-time consensus from any question in approximately 65 seconds
  • Per-question detail: all three raw responses plus Claude’s extracted verdict and reason per model

Sample dataset: A complete run against stripe.com is bundled — 47 pages extracted, 30 search terms tracked, full report generated. No API key needed to explore the dashboard.

View dataset on GitHub

How it works

A 5-phase pipeline. Read the blog here for a deeper explanation.

  1. Send The same question is sent to ChatGPT, Perplexity, and Gemini simultaneously via three parallel Nimble agent calls. Each agent call is independent — no shared session, no cross-platform contamination. All three requests fire at the same time via ThreadPoolExecutor, so the round-trip is bounded by the slowest platform, not the sum of all three.
  2. Collect All three raw responses are returned as structured data. Gemini returns a freeform markdown essay; ChatGPT and Perplexity return structured text following the prompt format. Responses are stored as-is with no preprocessing or normalization — Claude Haiku receives the raw output exactly as each platform produced it. This preserves edge cases like refusals, hedged answers, and format deviations that would be lost in a cleaning pass.
  3. Judge Claude Haiku reads all three raw responses in a single prompt — no regex, no parsing — and extracts each model’s core position regardless of format. The prompt instructs Claude to identify the substantive stance each model takes, ignoring differences in tone, formatting, and verbosity. This semantic extraction step is what makes the pipeline robust to Gemini’s freeform output and to platforms that change response structure over time.
  4. Label Consensus labeled as one of three outcomes: Strong (all three broadly agree), Moderate (two agree, one differs), or Split (clear disagreement across models). Labels are assigned by Claude based on the extracted positions, not keyword matching or cosine similarity. The three-tier scheme was calibrated against the 100-question dataset: Strong requires all models to share the same core conclusion, while Split requires a substantive disagreement — not just a difference in emphasis.
  5. Render Results displayed with per-model verdict, reason, and consensus label. Browse the 100 pre-loaded questions or enter any question in the Live tab (~65 seconds end-to-end). The Browse tab loads from the pre-analyzed JSON dataset — no API calls required. The Live tab fires all three agent requests and the Claude judgment call on demand, with a progress indicator while the ~65-second pipeline runs.

Stack

Nimble primitives plus the full runtime stack.
APIS & AGENTS
What it does
  1. chatgpt Fetches a live response from ChatGPT’s web interface for any prompt. Returns structured text.
  2. perplexity Fetches a live response from Perplexity’s search interface for any prompt. Returns structured text.
  3. gemini Fetches a live response from Gemini’s interface for any prompt. Returns freeform markdown — no structured format enforced.
RUNTIME STACK
Role
  1. claude-haiku-4-5 Anthropic Claude API — reads all three raw responses per question in a single pass and judges consensus semantically.
  2. streamlit Dashboard — Browse tab for the pre-loaded dataset and Live tab for real-time queries.
  3. python 3.9+ Parallel fetch with ThreadPoolExecutor — 300 total agent calls across 100 questions.
  4. MIT license Fork, modify, ship — no restrictions.
Reach out if you have any questions.
Talk to an Expert