Architecture

MIRA is a FastAPI application with event-driven architecture. PostgreSQL handles storage and vector search. Everything runs locally except the LLM calls.

System Requirements

Resource	Requirement
RAM	3GB total (including embedding model and all services, excluding LLM if running locally)
Disk	10GB minimum
GPU	Not required (CPU-only PyTorch)
OS	Linux or macOS

Core Stack

Component	Purpose
PostgreSQL	Memory storage, vector search (pgvector)
Valkey	Redis-compatible caching
HashiCorp Vault	Secrets management
sentence-transformers	Local embeddings
spaCy	Entity extraction (NER)
APScheduler	Background job scheduling

Model Downloads (One-Time)

spaCy en_core_web_lg: ~800MB
mdbr-leaf-ir-asym embedding model: ~300MB
Playwright (optional): ~300MB

Provider Support

MIRA works with any OpenAI-compatible endpoint. Internally follows Anthropic SDK conventions, but translation happens at the proper layer. No vendor lock-in.

Tested Models

Claude Sonnet 4.5 (best results)
Deepseek V3.2
Qwen 3
Ministral 3
Acceptable results down to 4b parameters

What You Lose with Local Models

Extended thinking disabled
cache_control stripped
Server-side code execution filtered out
File uploads become text warnings

Deployment

Single cURL command. The deploy.sh script is 2000+ lines of production-grade automation.

curl -fsSL https://raw.githubusercontent.com/taylorsatula/mira-OSS/refs/heads/main/deploy.sh -o deploy.sh && chmod +x deploy.sh && ./deploy.sh

What the Script Handles

Platform detection (Linux/macOS) with OS-specific service management
Pre-flight validation: 10GB disk space, port availability (1993, 8200, 6379, 5432), existing installation detection
Dependency installation with idempotency (skips what's already installed)
Python venv creation and package installation
Model downloads (~1.4GB total)
HashiCorp Vault initialization: AppRole creation, policy setup, automatic unseal, credential storage
PostgreSQL database and user creation
Valkey setup
API key configuration (interactive prompts or skip for later)
Offline mode with Ollama fallback
systemd service creation with auto-start on boot (Linux)
Cleanup and script archival when complete

Run with --loud for verbose output. Fully unattended-capable.

Token Overhead

Component	Tokens
System prompt	~1,100-1,500
Typical full context	~8,300
Cached portion on subsequent requests	~3,300

Content controlled via config limits (20 memories max, 5 summaries max).

Event-Driven Architecture

MIRA uses a synchronous event bus. All handlers complete before publish() returns.

Characteristics

100% synchronous (no async/await)
Single-threaded (handlers execute sequentially)
Error-isolated (one handler failure doesn't block others)
Ephemeral (no persistence, no replay)

Why synchronous? Guarantees ordering and eliminates race conditions. When TurnCompletedEvent fires, all cleanup completes before the next turn can begin. LLM calls dominate latency anyway, so trading parallelism for predictability is worth it.

Event Hierarchy

ContinuumEvent (frozen dataclass - immutable)
├── MessageEvent
├── ToolEvent
├── WorkingMemoryEvent
│   ├── ComposeSystemPromptEvent
│   ├── SystemPromptComposedEvent
│   ├── UpdateTrinketEvent
│   ├── TrinketContentEvent
│   └── WorkingMemoryUpdatedEvent
└── ContinuumCheckpointEvent
    ├── TurnCompletedEvent
    ├── SegmentTimeoutEvent
    ├── SegmentCollapsedEvent
    ├── ManifestUpdatedEvent
    └── PointerSummariesCollapsingEvent

Segment Collapse ("REM Sleep")

Every 5 minutes, APScheduler checks for inactive conversation segments. On timeout, the system loads segment messages, generates a summary and embedding, extracts tools used, submits to memory extraction via the Batch API, clears search results, and persists collapsed metadata.

Trinket System

Trinkets are modular prompt composition units. Each contributes content with its own cache policy.

Built-in Trinkets

TimeManager
ReminderManager
ManifestTrinket
ProactiveMemoryTrinket
ToolGuidanceTrinket
PunchclockTrinket
DomaindocTrinket
GetContextTrinket
LoraTrinket

Each trinket has a standard lifecycle: registration with factory, optional event subscription, receiving update requests during prompt composition, generating content, storing in Valkey, and emitting content events.