The Modern LLM Development Stack: Simplifying AI Integration

TL;DR

LiteLLM: Universal gateway for 100+ LLM providers with OpenAI-compatible API
Claudette: High-level Claude wrapper with stateful chat and tool integration
Cosette: OpenAI equivalent of Claudette with GPT-optimized features
Literate Programming: Documentation-first development approach used by these tools
The Vision: “Lisette” - hypothetical hybrid combining LiteLLM’s flexibility with Claudette’s UX

Introduction

Working with Large Language Models today means navigating a fragmented ecosystem. OpenAI uses one API format, Anthropic another, Azure has its own authentication flow, AWS Bedrock requires different setup entirely, and each new provider brings fresh complexity. For developers building production applications, this creates a maintenance nightmare of conditional logic and provider-specific error handling.

Enter a new generation of tools designed to solve these integration challenges: LiteLLM for universal API access, Claudette and Cosette for provider-specific optimization, and the literate programming approach that’s changing how we document and develop these integrations.

LiteLLM: The Universal LLM Gateway

LiteLLM¹ acts as a translation layer that converts any LLM API call into OpenAI’s format. Instead of writing separate code paths for each provider, you make one standardized call that LiteLLM routes appropriately.

# Installation: pip install litellm
from litellm import completion
import os

# Set up your API keys
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

# Same interface, different providers
openai_response = completion(
    model="gpt-4", 
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

claude_response = completion(
    model="claude-3-sonnet", 
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Production features: load balancing and fallbacks
response = completion(
    model=["gpt-4", "claude-3-sonnet"],  # Try gpt-4, fallback to claude
    messages=[{"role": "user", "content": "Hello"}],
    fallbacks=["gpt-3.5-turbo"]
)

Beyond API unification, LiteLLM provides production essentials: authentication management, load balancing across multiple instances, spend tracking, and automatic failover. This addresses the operational complexity that emerges when you move from prototype to production. For detailed documentation, see the LiteLLM docs².

The value proposition is immediate: eliminate conditional logic, reduce maintenance overhead, and gain operational visibility across all your LLM usage.

Claudette: Beyond API Compatibility

While LiteLLM focuses on universal compatibility, Claudette³ optimizes specifically for Anthropic’s Claude. It’s a high-level wrapper that automates the repetitive parts of Claude integration while preserving full control over advanced features.

The core abstraction is the Chat class, which maintains conversation state automatically:

# Installation: pip install claudette
from claudette import *
import os

os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

# Stateful conversation
chat = Chat(model='claude-3-5-sonnet')
response1 = chat("What's the capital of France?")
response2 = chat("What's its population?")  # Remembers previous context

# Tool use with Python functions
def get_weather(location: str):
    """Get weather for a location"""
    return f"It's sunny in {location}"

chat_with_tools = Chat('claude-3-5-sonnet', tools=[get_weather])
result = chat_with_tools("What's the weather in Paris?")

Claudette shines with Claude’s Tool Use API, automatically handling the request-response cycle that typically requires manual orchestration. It can call Python functions directly, bridging the gap between Claude’s reasoning and your application logic.

What sets Claudette apart is its “literate nbdev” implementation - the library’s source code is a rendered Jupyter notebook with explanations, examples, and design rationale embedded directly in the codebase. The complete documentation⁴ showcases this approach beautifully.

Cosette: Claudette’s OpenAI Counterpart

Cosette⁵ applies the same high-level approach to OpenAI’s GPT models. As “Claudette’s sister,” it provides a stateful Chat interface that simplifies GPT integration beyond what the raw OpenAI SDK offers.

# Installation: pip install cosette
from cosette import *
import os

os.environ["OPENAI_API_KEY"] = "your-openai-key"

# Stateful conversation with GPT
chat = Chat('gpt-4o')
response1 = chat("Explain quantum computing")
response2 = chat("Give me a practical example")  # Maintains conversation context

# System prompts and conversation management
chat = Chat(
    'gpt-4o', 
    sp="You are a helpful Python tutor. Always provide working code examples."
)
code_help = chat("How do I read a CSV file?")

Like Claudette, Cosette handles the tedious aspects of API interaction - conversation history, model selection, and response streaming - while exposing the full power of OpenAI’s features when needed.

The parallel between Cosette and Claudette reflects a broader design philosophy: provider-specific optimization beats one-size-fits-all approaches when you need to leverage unique capabilities.

Literate Programming: Documentation as Development

Literate programming, introduced by Donald Knuth in 1984, inverts the traditional relationship between code and documentation. Instead of adding comments to code, you write prose that explains your thinking, with code snippets embedded naturally within the narrative.

The philosophy: “concentrate rather on explaining to human beings what we want a computer to do” rather than instructing computers directly.

Modern implementations like nbdev⁶ bring this concept to Jupyter notebooks, where the entire development process happens in an environment that seamlessly blends explanation, code, tests, and documentation. The notebook becomes both the development environment and the final documentation site. The nbdev documentation⁷ itself demonstrates this approach in action.

This approach addresses a chronic problem in software development: documentation that becomes outdated the moment it’s written. In literate programming, documentation and code evolve together because they’re the same artifact.

Claudette exemplifies this approach as the first “literate nbdev” project - its source code is literally a rendered notebook with callouts, explanations, and design rationale embedded throughout.

When to Use What: A Decision Framework

Tool	Best For	Key Features
LiteLLM	Multi-provider apps, infrastructure	Universal API, load balancing, spend tracking
Claudette	Claude-focused development	Stateful chat, tool integration, literate docs
Cosette	OpenAI-focused development	GPT optimization, conversation management

Choose LiteLLM when:

You need to support multiple LLM providers
You’re building infrastructure or platforms
Operational concerns (cost tracking, failover) are priorities
You want to avoid vendor lock-in

Choose Claudette when:

You’re committed to Claude and want to leverage its specific strengths
Tool use and function calling are central to your application
You value deeply documented, literate code
Conversation state management is important

Choose Cosette when:

You’re building primarily on OpenAI’s ecosystem
You want higher-level abstractions than the raw OpenAI SDK
You need stateful chat interfaces with GPT models

The Literate Programming Angle: These tools represent more than API wrappers - they embody a philosophy of development where code explanation is as important as code execution. This becomes crucial as AI integration complexity grows.

Lisette: A Hypothetical Claudette for LiteLLM

Imagine combining Claudette’s elegant developer experience with LiteLLM’s provider flexibility. “Lisette” could offer the best of both worlds - a high-level Chat interface that works seamlessly across any LLM provider.

# Hypothetical Lisette API
from lisette import *

# Provider-agnostic stateful chat
chat = Chat('gpt-4')  # or 'claude-3-sonnet', 'bedrock/anthropic.claude-v2'
response1 = chat("What's machine learning?")
response2 = chat("Give me a Python example")

# Automatic fallback on failure
robust_chat = Chat(
    models=['claude-3-sonnet', 'gpt-4'], 
    fallback=True,
    load_balance=True
)

# Universal tool calling
def analyze_data(data: list):
    """Analyze a list of numbers"""
    return {"mean": sum(data)/len(data), "max": max(data)}

chat_with_tools = Chat(
    models=['claude-3-sonnet', 'gpt-4'],
    tools=[analyze_data],
    fallback=True
)

The compelling vision: provider-agnostic tool calling that adapts to each backend’s capabilities, conversation state that persists across provider switches, and cost optimization through intelligent model routing.

Such a tool would represent the logical evolution of the current ecosystem - abstracting not just API differences, but entire provider paradigms into a unified, stateful interface.

The Bigger Picture

The fragmented LLM ecosystem isn’t going away - if anything, it’s accelerating as new models and providers emerge. The tools we’ve explored represent a maturation of the space, moving beyond raw API access toward developer-centric abstractions.

LiteLLM, Claudette, and Cosette each solve different aspects of the same fundamental problem: making AI integration sustainable at scale. Whether through universal compatibility, provider optimization, or literate development practices, they share a common goal of reducing cognitive overhead for developers.

The literate programming approach underlying these tools suggests something deeper - that as AI becomes infrastructure, the quality of our explanations becomes as critical as the quality of our code. When systems become complex enough, documentation isn’t optional; it’s architectural.

For teams building on LLMs today, the question isn’t whether to adopt these patterns, but which combination best fits your specific constraints and ambitions.