June 16, 2026

Nimble Now Fetches Media Files at Scale

min read

Copied!

Charlie Klein

Director of Product Marketing

No items found.

June 16, 2026

Nimble Now Fetches Media Files at Scale

min read

Copied!

Charlie Klein

Director of Product Marketing

No items found.

Table of Contents

Connect with Nimble

Connect on Slack

Nimble Now Fetches Media Files at Scale

Today we’re launching the Nimble Media API, a purpose-built endpoint for programmatically downloading images, videos, audio, and documents from any URL at scale.

Web data is not just structured HTML. For teams building AI training pipelines, product catalogs, or media archives, the most valuable assets on the web are binary files: product images, video clips, GIFs, and more. Until now, collecting those files reliably meant managing proxies, handling geo-restrictions, and writing custom validation logic. The Media API handles all of that.

What It Does

The Media API is a single POST request that fetches any media file through Nimble’s infrastructure and returns the raw binary. There are two modes:

Realtime streams the file directly back in the HTTP response. The Content-Type header tells you the actual format. Use this for on-demand downloads where you need the file immediately.

Async accepts the same request parameters but routes delivery to your cloud storage bucket instead. Nimble uploads the file to S3, GCS, or DigitalOcean Spaces in the background and returns a task_id you can poll for status. Use this for large files or high-volume pipelines where you want decoupled delivery.

Built-In Controls

expected_mime_types lets you whitelist acceptable formats before the file is accepted. Pass ["image/*"] to accept any image, or get more specific with ["image/webp", "image/jpeg"]. If the fetched file does not match, the request fails cleanly. No surprise formats making it into your pipeline.

country routes the request through a specific geography using ISO Alpha-2 codes. Combine it with locale for full regional targeting. This is the mechanism for accessing region-locked content without managing proxy infrastructure yourself.

storage.object_name (async only) lets you set a custom filename for each file saved to your bucket. The original extension is appended automatically. Use a unique prefix per request — such as product_{id} — to avoid overwrites.

Who It Is For

AI training data collection. Building image or video datasets for fine-tuning requires collecting large volumes of validated media from public URLs. The async mode saves files directly to S3 in the background, and expected_mime_types enforces format consistency across the dataset without post-processing.

Agentic pipelines that need media in the loop. Agents that reason over visual content — monitoring press coverage, detecting changes in competitor assets, or evaluating product imagery — need a reliable way to retrieve the actual files, not just URLs. The Media API gives agents a single call to fetch and validate any media file on demand, so the vision step has something real to work with.

Building data apps on top of media. Teams building applications that surface media at scale — competitive intelligence dashboards, IP monitoring tools, visual content trackers — need a collection layer that handles volume without breaking. The async endpoint delivers files directly to your storage bucket, so the app’s data layer stays clean and the collection pipeline stays decoupled from the front end.

Example: Saving Press Photos from a Corporate Newsroom

When a company announces a partnership, the press release usually includes signing ceremony photos or event imagery. Those images carry context the text alone doesn’t — who was in the room, what logos appeared together, how the announcement was staged. Saving them alongside the article text gives you a richer record.

The example below monitors the AirAsia Newsroom for partnership announcements and saves the accompanying press photos. It uses Nimble’s Search API with search_depth: "deep" to find and fully extract the article in a single call — deep mode automatically fetches the full page content, including all inline image URLs, so no separate extraction step is needed.

Step 1: Search with deep mode to find the article and extract its content

_PYTHON

import requests
import re
 
NIMBLE_API_KEY = "<YOUR-API-KEY>"
 
response = requests.post(
    "https://sdk.nimbleway.com/v1/search",
    headers={
        "Authorization": f"Bearer {NIMBLE_API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "query": "AirAsia Buds Moonbug MiraiLab partnership announcement",
        "search_depth": "deep",
        "output_format": "markdown",
        "max_results": 1,
        "include_domains": ["newsroom.airasia.com"]
    }
)
 
article = response.json()["results"][0]
print(f"Title: {article['title']}")
print(f"URL:   {article['url']}")
 
# Parse image URLs from the returned markdown content
image_urls = re.findall(r'!\[.*?\]\((https?://[^\)]+)\)', article["content"])
print(f"Found {len(image_urls)} images")

‍

OUTPUT — REAL DATA FETCHED LIVE FROM THE AIRASIA NEWSROOM

Title: Capital A's AirAsia brand co. announces partnerships with Moonbug,
       MiraiLab to strengthen AirAsia Buds' IP presence
URL:   https://newsroom.airasia.com/news/...
Found 3 images

‍
Deep mode extracts the full page content in the same call that finds the article. The image URLs embedded in the markdown — three signing ceremony photos from the Kre8tif! 2024 conference — are ready to download without any additional requests.

Step 2: Download each image using the Media API

_PYTHON

or i, image_url in enumerate(image_urls):
    media_response = requests.post(
        "https://sdk.nimbleway.com/v1/media",
        headers={
            "Authorization": f"Bearer {NIMBLE_API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "url": image_url,
            "expected_mime_types": ["image/*"],
            "country": "MY"
        }
    )
    filename = f"press_photo_{i:02d}.jpg"
    with open(filename, "wb") as f:
        f.write(media_response.content)
    print(f"Saved {filename} ({len(media_response.content):,} bytes)")

‍_OUTPUT

Saved press_photo_00.jpg (248,832 bytes)
Saved press_photo_01.jpg (231,456 bytes)
Saved press_photo_02.jpg (195,712 bytes)

The three images are now saved locally, routed through Malaysian infrastructure to match the article’s origin. The expected_mime_types filter ensures only actual image files are written to disk — if a URL resolves to an HTML error page or redirect, the request fails cleanly instead of saving junk.

For high-volume pipelines — archiving images from hundreds of articles per day, or saving files directly to S3 instead of locally — switch to the async endpoint and add a storage destination. The same two-step pattern applies; only the delivery mechanism changes.

Getting Started

The realtime endpoint is a single command:

CURL

curl -X POST 'https://sdk.nimbleway.com/v1/media' \
  --header 'Authorization: Bearer <YOUR-API-KEY>' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "url": "https://example.com/product-image.jpg",
    "expected_mime_types": ["image/*"]
  }' \
  --output image.jpg