Multi-Modal AI Architecture

Our three-layer architecture for designing scalable, decoupled, and robust multi-modal AI features.

To support a variety of AI-powered features, including text generation, image analysis, and more, we use a standardized three-layer architecture. This pattern ensures that our AI integrations are scalable, maintainable, and decoupled from specific AI providers.

The Three-Layer Architecture

Our AI architecture is composed of three distinct layers, each with its own set of responsibilities.

1. Application Layer

This is the user-facing layer, consisting of React components that make up the UI.

Responsibilities: Renders the UI, handles user input, and displays the results of AI operations.
Implementation: Client components use hooks like useTransition to manage loading states during AI processing. They call Server Actions to initiate AI tasks.

2. Business Logic Layer

This layer consists of Server Actions that orchestrate the entire workflow of an AI-powered feature.

Responsibilities:
1. Validate user input (using Zod schemas).
2. Create a record in the database to log the AI task and its initial status (e.g., GENERATING).
3. Call the appropriate AI Service Layer function.
4. Store any resulting artifacts (e.g., images, text) using @repo/storage.
5. Update the database record with the final status (COMPLETED or FAILED) and the path to the artifact.
Implementation: Server Actions are defined in actions.ts files and always return a { success: boolean, message: string } object.

3. AI Service Layer

This is the abstraction layer that communicates with the actual AI providers (like OpenAI, Anthropic, etc.).

Responsibilities: Provides a unified interface for different AI models and modalities. It handles the specifics of formatting requests for each provider and parsing their responses.
Implementation: All AI service calls go through our centralized AI package (@repo/ai). This package contains a router that delegates tasks to the appropriate provider-specific service based on the requested model.

Key Principles

Abstraction & Decoupling: The application and business logic layers are never directly coupled to a specific AI provider. This allows us to switch providers or add new models without refactoring the entire feature.
Scalability & Extensibility: This architecture makes it easy to add new AI modalities (e.g., audio processing, video analysis) or new providers by simply adding a new service to the AI Service Layer.
Robustness & Observability: Every AI operation is logged in our database, providing a full audit trail. The status of each task is tracked, allowing for proper error handling and retry mechanisms.

Adding a New AI Provider

To add a new provider, you would:

Create a new service file for the provider in packages/ai/services/.
Update the AI router in @repo/ai to delegate to the new service when its models are requested.
Add the new model definitions to packages/ai/services/models.ts.

This decoupled approach ensures that the rest of the application remains unchanged.