MCP: The USB Port for LLM Integrations

Watch the video:

How many times have you started a new AI project and found yourself re-wiring the same GitHub, Slack, or SQL integration? That annoying code copy-paste repetition and lack of norms between organizations and individuals is exactly why Model Context Protocol, or MCP, exists.

MCP is an open standard released a few months ago in November 2024 by Anthropic to formalize how we feed large language models the three things they crave most: extra data, good prompts, and external tools. The goal is that everyone uses it so that all tools and databases become easily accessible through the same protocol, reducing the implementation complexity of almost anything LLM-related.

Under the hood it works like any basic two-party web exchange: a client (your chat window, IDE, or agent doing the asking) sends a request, and a server (the machine that actually holds the tools and data) answers with what the client needs.

Let’s put a quick picture of the full stack quickly before entering into each detail… Your app launches an LLM session and embeds a tiny MCP client next to it. The moment the user types a question, the LLM decides whether it can answer straight from its own weights or needs outside help. If it needs help, the MCP client fires an HTTP call to tools/list on every server URL in its config — those servers might be local Docker containers, a team-hosted VM on the intranet, or a cloud function halfway across the world. Each server responds with a JSON catalogue: tool names, descriptions, and the JSON schema for accepted arguments. The LLM chooses one (or chains several), hands the chosen tool name plus arguments to the MCP client, and the client invokes tools/execute. The results are streamed back, get folded into the model’s context, and the final answer is generated to the user. In short, MCP slots cleanly into existing HTTP infrastructure — one lightweight client beside the model, many purpose-built servers behind standard web ports. If that was a bit too much, let me detail all this a bit more.

First of all, and super conveniently, our day-to-day application — Claude Desktop, VS Code, Cursor, or even a bare-bones CLI — boots up with a built-in MCP client or is enabled by a one-click extension, which means it immediately knows how to discover and call MCP servers — nothing extra to wire up. At launch the client pings one or more MCP servers, each of which can sit on your laptop, your team’s intranet, or halfway across the planet. One server might expose a direct line to a production database, another publishes a stash of battle-tested team prompts, a third wraps a brave-search scraper for finding sources. Because an MCP server is basically a small, self-contained web service (a “micro-service”) whose behavior is defined by the MCP specification, its feature list is limited only by what you code — and, once you publish it, any MCP-aware client with the proper credentials can use it.

Before MCP, every new app you built had to include its own GitHub integration. If you wrote three different assistants — say a VS Code extension, a Slack bot, and a command-line tool — you’d copy-paste the same GitHub API code into all three. With MCP you put that GitHub logic once in an MCP server, and all your apps simply call it, no rewiring required.

The best analogy I’ve seen to explain it is the USB. Nobody invents a fresh cable and port for every gadget (except Apple); you slide the stick into the port and the OS figures it out. MCP aims to be USB for model context.

Here’s why that matters day-to-day. Your MCP servers live outside the editor itself, so if you bounce from VS Code to Cursor, the new editor pings the same server URL on startup and — boom — your full toolbox is already there. No reinstall and no re-import to do. Because a server is nothing more than a shareable link, you can hand that link to a teammate and, the moment they drop it into their config, they get the exact same capabilities.

In the future we can think of MCP marketplaces that start selling access to premium servers; if your useful MCP server costs money to run, you can charge people using it.

And even better, both OpenAI and Google have stated that their desktop ChatGPT and Gemini stacks will integrate MCP, meaning their models will soon be able to plug directly into third-party servers.

You might wonder why we need MCP when OpenAPI already exists. Picture OpenAPI as a printed instruction sheet taped to whatever service you’re exposing. If you hit OpenAI’s own Chat Completion endpoint, OpenAI publishes the sheet that spells out POST /v1/chat/completions, the fields you can pass, and the JSON you’ll get back. Now imagine you spin up your own little service, POST /translate-text, that first detects the language, then calls that same OpenAI route, cleans up the reply, and returns a clean chunk of text. Because you — not OpenAI — own /translate-text, you have to draft a fresh instruction sheet, bundle it into every client — your VS Code extension, the Slack bot, the little CLI your ops team relies on — and ship a new version each time you tweak the feature.

MCP flips the model. Instead of frozen paperwork, it serves a live digital menu. On startup the app’s built-in MCP client calls tools/list, grabs the current lineup — maybe translate_text, summarize_article, get_weather — and later hits tools/execute to run whichever one it needs. If a teammate drops a brand-new translate_text tool onto the server at three in the morning, that menu refreshes automatically on the very next launch — no spec rewrites, no redeploys, no stale instructions, and most importantly, nothing breaking.

Also, another cool point is that without MCP you’d stitch each step together by hand: first write code that hits the sales database API, wait for the rows to come back, then feed that data into a separate Python service you’ve also wired in, wait again, and finally ship the output back to the user. The language model can’t see or coordinate those services on its own; you act as the middleware every time.

With MCP the whole chain sits on the server’s menu. When a user asks, “Analyze this month’s sales by city,” the model’s very first turn can call a get_sales_data tool, grab the live rows, immediately pass them to a python_analyze tool in the same breath, and return a ready-made chart — all inside one round-trip. The LLM orchestrates the flow automatically because it discovers both tools at startup and can invoke them back-to-back without you writing a single line of extra glue. So it’s perfect for agentic systems that are becoming increasingly popular. By the way, if you’d like to learn more about MCP and agents, we are releasing a special course with my friend Paul Iusztin from DecondingML teaching everything you need to know about agents, along with creating MCP servers and clients. Message me for a link to the course with a good discounted price!

Unfortunately, MCP is still at its early days, which come with early headaches. Simon Willison showed how a malicious WhatsApp message could hijack a careless MCP client by smuggling injection instructions into tool descriptions. The problem is that there’s no authentication or fine-grained authorization yet: every user who reaches a server sees every tool, including the nuclear delete_record button. Context handling is also still quite rudimentary: hit the 128 k-token ceiling and the client chops off history without you noticing it. Load the menu with a hundred tools and the model’s odds of picking the right one plummet — and, because MCP offers no native logging or metrics, you won’t notice the misfires unless you add monitoring yourself. Fortunately, many companies are working on this issue, such as Descope, but we need to be careful as this isn’t implemented by default.

Still, the ecosystem around MCP is expanding quickly with all model providers joining, better authentication systems and lots of other cool stuff. For ready-made servers you can explore Anthropic’s official examples, skim the community-maintained “MCP-Servers” list, or browse Smithery.ai, a site where developers publish endpoints for anyone to clone — early signs of the marketplaces that will likely follow.

Now, how do you actually spin up an MCP server? The official SDKs make it super easy. Pick your language — Python, TypeScript or whichever — and follow the docs. The spec even suggests a nice trick for quick reference: grab the plain-text file llmspec.txt, which contains the entire protocol, paste it into a large-context model like Gemini, and just chat with it while you code. If you’re unsure about any field or header, Gemini got you.

Let’s see that in a concrete example. Say you want a server that fetches the weather for any city. In Python, you import the MCP library, create a tiny script, and mark your function with the @mcp.tool decorator. The function name becomes the tool name, the docstring becomes the description, and the type hints automatically fill the JSON schema — one function, one tool, job done. Fire up the script and you’re already hosting a compliant MCP server.

To wire that server into Claude Desktop for example, you open claude_desktop_config.json, drop in a new entry under mcp_servers, point it to your local command — maybe weather.py — and restart the app. From that moment on, every time you ask “What’s the weather in Sacramento?” the MCP client sees your get_weather tool, calls it, and replies with the forecast.

Using servers inside custom code is just as painless. The OpenAI Agents SDK treats any MCP endpoint as a native tool. LangChain does the same through langchain-mcp-adapters, and LlamaIndex mirrors the pattern with llama-index-tools-mcp. Crew AI has built-in hooks too, so most of the agent ecosystem now speaks MCP out of the box.

If you’d rather see raw code, Anthropic’s “quickstarts-agents” repo is a good place to start. One notebook defines three tools — a reflection helper, a calculator, and a Brave-Search wrapper — plugs them into an agent, and then you can ask any questions like, “What’s the best restaurant in Québec City?” The agent decides on its own to hit the web-search tool and answer with a short list — Tanière, ARVI, and Le Saint-Amour.

In short, write a function, decorate it, list it in your client config, and every MCP-aware agent can call it. The rest of the ecosystem — Agents SDK, LangChain, LlamaIndex, Crew AI, LangGraph — takes care of the plumbing so you can stay focused on what the tool actually does.

Google also recently added the A2A protocol for agent-to-agent conversations. Google’s new A2A protocol fills the gap that MCP deliberately leaves open. MCP lets an agent reach out to tools — a weather API, a SQL query runner, a PDF parser. But sometimes the “thing you need” is another self-contained agent that can plan trips, optimise shipping routes, or handle accounting rules on its own stack. A2A gives those agents a common language: one agent bundles up a task, sends it across the network, and the remote agent answers with the result — no shared database or codebase required.

The recommended workflow is to use MCP’s resources endpoint to discover which remote agents exist, then switch to A2A to have the actual back-and-forth conversation.

If you want to get started right now, Google published three small reference projects to show the idea in practice — one written with their own lightweight SDK, one rebuilt in LangGraph, and one in Crew AI — so you can open a terminal, run both ends locally, and watch one agent delegate a job to the other in a few lines of code. It turns out that once discovery and message formatting are handled, agent-to-agent hand-offs are no harder than calling any other HTTP service.

So, if you were still wondering, this is what MCP, and now A2A were built for: instead of pouring hours into bespoke integrations, you spin up an MCP server once and every compliant client — from your IDE to tomorrow’s ChatGPT desktop — plugs in and plays. We still need smarter ways to trim or summarise long conversations and, just as important, proper built-in monitoring — a simple log that shows which tools were called, how long they ran, and where they failed. But the core plumbing is solid and ready to ship quite fast.

So the next time you’re about to copy-paste the same GitHub integration for the fourth production project, pause. Wrap that code in a tiny MCP server, add Descope’s three-line OAuth wrapper, and get back to the feature your users actually care about.

If this walkthrough shaved a few hours off your boilerplate, give it a thumbs-up, share it with the teammate still not using MCP, and subscribe. If you want to learn more, one of the original MCP authors gave a 1-hour-44-minute deep dive on YouTube. I also invite you to check out the agents course we are releasing with Towards AI and DecodingML, we show how to best leverage MCP and deploy an agent on A2A and use it in the course! Message me if you’d like to learn more about it.

On that, I hope you enjoyed this overview of MCP and better understand its reason to be and potential. Thank for reading and I’ll see you in the next one!