This page is for the technically curious. The interesting part of this project
wasn't the happy path — it was the layers of real-world friction that had to be
solved to make it reliable. A few highlights.
The tool is a Model Context Protocol (MCP) server written in Node.js, deployed
in a Docker container on Railway. It runs in two modes from one shared codebase:
a local mode for desktop use, and a remote mode reachable from any device —
including mobile — secured by a full OAuth 2.0 implementation, since the client
connector requires real OAuth rather than a static token. It integrates the
official YouTube Data API for search, playlists, and comments; an unofficial
captions path plus an OpenAI Whisper fallback for transcripts; and Claude Haiku
for the self-correcting search judgment.
Transcripts come from three fallback tiers, in order: official-style captions
first (free and fast), then audio download plus Whisper transcription (reliable
but costs money per call), and finally a metadata-only summary if both fail.
The response always reports which tier was used, so the result is never a
silent black box.
Cloud servers get treated with suspicion by large platforms — requests from a
datacenter IP are often silently blocked or degraded, independent of whether
the code is correct. Getting transcripts to work reliably from the cloud meant
solving a chain of real problems: authenticating requests with browser session
cookies, computing the specific signed authorization header the platform's
internal API actually requires (sending cookies alone is silently ignored),
handling cookie rotation, and even catching subtle bugs like Windows-style line
endings corrupting the cookie data. Each layer looked like the last one — until
it wasn't.
The smart-search feature sends the search results to a fast, inexpensive
language model (Claude Haiku) with a plain-language description of the goal,
and asks it to judge relevance and, if needed, propose a better query. The
server loops this up to a set number of attempts, then returns both the results
and a full record of every query tried and the reasoning behind each — turning
an opaque "trust me" step into an auditable one. Each judgment costs a fraction
of a cent.
Reliability against an adversarial platform is an ongoing effort, not a solved
problem — cookies expire, detection changes, and the cheap captions path
remains less reliable than the paid transcription fallback. The project treats
these as managed trade-offs rather than pretending they're fixed. That honesty
is part of the point.