How it works — and the problems worth solving

This page is for the technically curious. The interesting part of this project

wasn't the happy path — it was the layers of real-world friction that had to be

solved to make it reliable. A few highlights.

Architecture

The tool is a Model Context Protocol (MCP) server written in Node.js, deployed

in a Docker container on Railway. It runs in two modes from one shared codebase:

a local mode for desktop use, and a remote mode reachable from any device —

including mobile — secured by a full OAuth 2.0 implementation, since the client

connector requires real OAuth rather than a static token. It integrates the

official YouTube Data API for search, playlists, and comments; an unofficial

captions path plus an OpenAI Whisper fallback for transcripts; and Claude Haiku

for the self-correcting search judgment.

Transcripts come from three fallback tiers, in order: official-style captions

first (free and fast), then audio download plus Whisper transcription (reliable

but costs money per call), and finally a metadata-only summary if both fail.

The response always reports which tier was used, so the result is never a

silent black box.

Cloud servers get treated with suspicion by large platforms — requests from a

datacenter IP are often silently blocked or degraded, independent of whether

the code is correct. Getting transcripts to work reliably from the cloud meant

solving a chain of real problems: authenticating requests with browser session

cookies, computing the specific signed authorization header the platform's

internal API actually requires (sending cookies alone is silently ignored),

handling cookie rotation, and even catching subtle bugs like Windows-style line

endings corrupting the cookie data. Each layer looked like the last one — until

it wasn't.

The smart-search feature sends the search results to a fast, inexpensive

language model (Claude Haiku) with a plain-language description of the goal,

and asks it to judge relevance and, if needed, propose a better query. The

server loops this up to a set number of attempts, then returns both the results

and a full record of every query tried and the reasoning behind each — turning

an opaque "trust me" step into an auditable one. Each judgment costs a fraction

of a cent.

Reliability against an adversarial platform is an ongoing effort, not a solved

problem — cookies expire, detection changes, and the cheap captions path

remains less reliable than the paid transcription fallback. The project treats

these as managed trade-offs rather than pretending they're fixed. That honesty

is part of the point.

Page updated

Report abuse