Building production MCP servers and clients with Python. Covers the JSON-RPC 2.0 wire protocol, transport layers (stdio, SSE, Streamable HTTP), filesystem tool implementation with path traversal protection, and connecting Claude to your custom tools.
Tyler McDaniel
AI Engineer & IBM Business Partner
Every few months the AI ecosystem produces a new "standard" that's really just one vendor's wrapper around JSON. The MCP protocol is different. Anthropic open-sourced the [Model Context Protocol](https://modelcontextprotocol.io/) in late 2024, and by mid-2025 it had adoption from OpenAI, Google, Microsoft, and every major IDE vendor. That almost never happens. I've been building MCP servers for production tooling since early 2025, and this is the guide I'd hand to anyone who's tired of gluing together bespoke function-calling implementations for every new model.
The MCP protocol solves a specific problem: giving LLMs a standardized way to discover, negotiate, and invoke external tools, resources, and prompts — regardless of which model is backing the application. If you've ever written the same tool-calling integration three times for three different providers, you know why this matters.
OpenAI's function calling works. So does Anthropic's tool use API. The problem isn't that they're broken — it's that they're silos. Every provider defines tool schemas slightly differently, handles streaming tool calls with different event types, and returns results in incompatible formats. You end up with an adapter layer per provider, and that adapter layer becomes the most fragile code in your stack.
The MCP protocol sits below the model layer. It defines a [JSON-RPC 2.0](https://www.jsonrpc.org/specification) interface between a host (your application), a client (the protocol handler), and a server (the thing that actually provides tools). The model never talks to MCP directly — your application translates between the model's native function-calling format and MCP's standardized tool interface.
This sounds like extra indirection, and it is. But it means:
Compare this to LangChain's tool abstraction, which standardizes the Python interface but doesn't define a wire protocol. You can't run a LangChain tool in a separate process, on a remote server, or in a different language without building your own RPC layer. MCP gives you that for free.
The architecture has three actors:
A single host can run multiple clients, and each client connects to one server. If your application needs filesystem access, a database connection, and a web scraper, that's three servers, three clients, one host.
The three core primitives:
| Primitive | Direction | Purpose |
|-----------|-----------|---------|
| Tools | Server → Client (model-invoked) | Functions the LLM can call. Read a file, query a database, send an API request. |
| Resources | Server → Client (application-controlled) | Data the application can read. File contents, database schemas, API responses. Like GET endpoints. |
| Prompts | Server → Client (user-invoked) | Pre-built prompt templates with arguments. "Summarize this code" with a language parameter. |
Tools are the most commonly implemented. Resources and prompts are useful but adoption is still catching up.
Every message is a JSON-RPC 2.0 frame. Requests have method, params, and an id. Responses have result or error. Notifications are fire-and-forget (no id). The [full spec](https://spec.modelcontextprotocol.io/) is readable in one sitting — I recommend doing so before writing your first server.
MCP defines transports, not opinions about networking. The protocol doesn't care how bytes get from client to server. But three transports dominate: stdio — The default for local tools. Client spawns the server as a subprocess, talks to it over stdin/stdout. Zero network config, zero auth overhead. Every MCP-enabled IDE (VS Code, Cursor, Windsurf) uses this for local servers. If your tool runs on the same machine as the host, use stdio. SSE (Server-Sent Events) — The original remote transport. Client makes an HTTP POST to send messages, server pushes responses over an SSE stream. Works through firewalls and proxies. The downside: SSE is unidirectional, so you need two channels (POST for client→server, SSE for server→client), which complicates load balancing. Streamable HTTP — Introduced in the [2025-03-26 spec revision](https://spec.modelcontextprotocol.io/specification/2025-03-26/transport/streamable-http/). Single HTTP endpoint. Client sends JSON-RPC via POST, server responds directly or upgrades to SSE for streaming. This is the future for remote servers — simpler infrastructure, stateless-friendly, works with standard HTTP load balancers.
For this guide, I'll use stdio because it's the fastest path to a working implementation.
Here's a complete MCP server that gives an LLM access to read files and list directories. Install the SDK first:
pip install mcp anthropic
Save this as filesystem_server.py:
import os
import json
from pathlib import Path
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent
ALLOWED_ROOT = Path.home() / "mcp-sandbox"
ALLOWED_ROOT.mkdir(exist_ok=True)
server = Server("filesystem-tools")
def validate_path(requested: str) -> Path:
resolved = (ALLOWED_ROOT / requested).resolve()
if not str(resolved).startswith(str(ALLOWED_ROOT.resolve())):
raise ValueError(f"Path traversal blocked: {requested}")
return resolved
@server.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="read_file",
description="Read the contents of a text file within the sandbox directory",
inputSchema={
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Relative path from sandbox root",
}
},
"required": ["path"],
},
),
Tool(
name="list_directory",
description="List files and subdirectories in a directory within the sandbox",
inputSchema={
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Relative path from sandbox root (use '.' for root)",
"default": ".",
}
},
"required": [],
},
),
]
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
if name == "read_file":
target = validate_path(arguments["path"])
if not target.is_file():
return [TextContent(type="text", text=f"Error: {arguments['path']} is not a file")]
content = target.read_text(encoding="utf-8", errors="replace")
return [TextContent(type="text", text=content)]
elif name == "list_directory":
target = validate_path(arguments.get("path", "."))
if not target.is_dir():
return [TextContent(type="text", text=f"Error: not a directory")]
entries = []
for entry in sorted(target.iterdir()):
kind = "dir" if entry.is_dir() else "file"
size = entry.stat().st_size if entry.is_file() else 0
entries.append({"name": entry.name, "type": kind, "size": size})
return [TextContent(type="text", text=json.dumps(entries, indent=2))]
return [TextContent(type="text", text=f"Unknown tool: {name}")]
async def main():
async with stdio_server() as (read_stream, write_stream):
await server.run(
read_stream,
write_stream,
server.create_initialization_options(),
)
if __name__ == "__main__":
import asyncio
asyncio.run(main())
A few things worth noting:
validate_path function resolves symlinks and ensures every path stays within ALLOWED_ROOT. Without this, you've given an LLM arbitrary filesystem read access. The [OWASP Path Traversal](https://owasp.org/www-community/attacks/Path_Traversal) page isn't optional reading when you're building these.list_tools returns schemas, not functions. The LLM sees the inputSchema JSON Schema, not your Python code. Make descriptions precise — the model uses them to decide when and how to call the tool.call_tool is a dispatch function. One handler, switch on name. This is the pattern the SDK expects.Test it locally by dropping a file into the sandbox:
echo "Hello from MCP" > ~/mcp-sandbox/test.txt
Here's a minimal client that spawns the server, sends a user query to Claude, and handles tool calls. Save as mcp_client.py:
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from anthropic import Anthropic
anthropic_client = Anthropic()
async def run():
server_params = StdioServerParameters(
command="python",
args=["filesystem_server.py"],
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
tools_response = await session.list_tools()
anthropic_tools = [
{
"name": tool.name,
"description": tool.description,
"input_schema": tool.inputSchema,
}
for tool in tools_response.tools
]
user_message = "List the files in the sandbox and read any .txt file you find."
messages = [{"role": "user", "content": user_message}]
while True:
response = anthropic_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=anthropic_tools,
messages=messages,
)
if response.stop_reason == "end_turn":
for block in response.content:
if hasattr(block, "text"):
print(block.text)
break
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f"→ Calling tool: {block.name}({block.input})")
result = await session.call_tool(block.name, block.input)
tool_results.append(
{
"type": "tool_result",
"tool_use_id": block.id,
"content": result.content[0].text,
}
)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
if __name__ == "__main__":
asyncio.run(run())
The flow:
tool_use blocks, forward them to the MCP server via session.call_tool.end_turn.The key insight: if you wanted to swap Claude for GPT-4, you'd change the API call and reformat the tool schemas. The MCP server stays identical. That's the whole point.
| Dimension | MCP Protocol | OpenAI Function Calling | LangChain Tools |
|-----------|-------------|------------------------|----------------|
| Type | Wire protocol (JSON-RPC 2.0) | API feature (provider-specific) | Library abstraction |
| Model-agnostic | Yes — any model, any provider | OpenAI models only | Yes (within LangChain) |
| Tool discovery | Runtime via tools/list RPC | Defined per API request | Defined in code at build time |
| Cross-language | Any language with JSON-RPC | Through OpenAI SDKs | Python or JavaScript |
| Cross-process | Yes (stdio, HTTP, SSE) | No — in-process only | No — in-process only |
| Capability negotiation | Built-in (initialize handshake) | None | None |
| Resources (read-only data) | First-class primitive | Not applicable | Retrievers (different concept) |
| Prompt templates | First-class primitive | System prompts (manual) | PromptTemplate class |
| Ecosystem | [150+ servers](https://github.com/modelcontextprotocol/servers) | Deprecated plugin ecosystem | LangChain Hub |
| Latency overhead | Process spawn + JSON-RPC | Single HTTP call | Function call |
| Best for | Multi-model apps, IDE tooling, shared infrastructure | Single-model apps using OpenAI only | Rapid prototyping |
The MCP protocol wins when tools need to outlive a single project or serve multiple models. OpenAI function calling wins when you're shipping fast with GPT-4 only. [LangChain tools](https://python.langchain.com/docs/how_to/custom_tools/) win for prototyping but create coupling you'll regret later.
Not every tool needs MCP. If you're building a chatbot with two tools and one model provider, a JSON-RPC server is ceremony without payoff. Use your provider's native [function calling](https://platform.openai.com/docs/guides/function-calling) and move on.
MCP earns its complexity when:
If none of those apply, skip it. A well-typed Python function with a docstring is fine. Don't adopt infrastructure because it's new — adopt it because it solves a problem you have right now.
The spec moves fast. The [2025-03-26 revision](https://spec.modelcontextprotocol.io/specification/2025-03-26/) added Streamable HTTP transport, OAuth 2.1 authorization for remote servers, structured tool output (JSON results, not just text), and audio/image content types. The [Python SDK](https://github.com/modelcontextprotocol/python-sdk) tracks spec changes closely — pin your dependency version.
The trend is clear: MCP is becoming the USB-C of LLM tool integration. Not everyone likes it, some things don't fit, but the ecosystem is converging. I'd rather write one good tool server than maintain five provider-specific adapters.
If you're building multi-agent systems, MCP gives you a clean interface between agents and their tools — I go deeper on that architecture in [Agentic AI: Building Multi-Agent Systems That Actually Work in 2026](https://tostupidtooquit.com/blog/agentic-ai-multi-agent-systems). And if your MCP servers are serving embeddings for RAG, the database layer matters more than you think — [Vector Databases: A Practitioner's Comparison](https://tostupidtooquit.com/blog/vector-databases-practitioner-comparison) has the honest benchmarks.
---
Running quantized LLMs behind a FastAPI proxy with Ollama and vLLM backends. Covers model quantization tradeoffs, GGUF vs GPTQ vs AWQ, streaming responses, request queuing, Docker Compose deployment, and production monitoring.
Building production multi-agent systems from scratch. Covers ReAct, plan-and-execute, supervisor, and pipeline patterns with full Python implementations. Includes inter-agent communication, human-in-the-loop, memory systems, failure modes, and a real EdTech production architecture.
Hands-on comparison of Pinecone, Qdrant, Weaviate, pgvector, and Chroma for production AI. Covers embedding fundamentals, indexing algorithms (HNSW, IVF, PQ), chunking strategies, reranking, and when each database fits.