Skip to content

Memory System

Mirror Mate includes a memory system that enables persistent user context through RAG (Retrieval-Augmented Generation). The system stores user information, extracts memories from conversations, and provides relevant context to the AI.

Overview

Configuration

Memory settings are configured in:

  • config/providers.yaml - Provider and RAG settings
  • config/locales/[lang]/memory.yaml - Extraction prompts (locale-specific)

The database is also locale-specific: data/mirrormate.[lang].db

yaml
providers:
  embedding:
    enabled: true
    provider: ollama  # PLaMo server provides Ollama-compatible API
    ollama:
      model: plamo-embedding-1b
      baseUrl: "http://studio:8000"  # PLaMo embedding server

  memory:
    enabled: true
    # RAG settings
    rag:
      topK: 8           # Max memories to retrieve
      threshold: 0.3    # Minimum similarity score
    # Memory extraction settings
    extraction:
      autoExtract: true      # Auto-extract from conversations
      minConfidence: 0.5     # Minimum confidence threshold

Note: PLaMo-Embedding-1B is recommended for Japanese. See Recommended Setup for details. You can also use bge-m3 via Ollama as an alternative.

Memory Types

TypeDescriptionExample
profileUser preferences and traits"Favorite color: blue"
episodeRecent interactions and events"Asked about weather on 2024-01-01"
knowledgeFacts and learned information"User works at ACME Corp"

Profile Memories

Profile memories store persistent user information:

  • User preferences (language, style)
  • Personality traits
  • Communication preferences
  • Recurring topics of interest

Profile memories are always included in the RAG context.

Episode Memories

Episode memories capture recent interactions:

  • Recent conversations
  • Events and activities
  • Time-sensitive information

Episodes have a recency factor that prioritizes recent memories.

Knowledge Memories

Knowledge memories store factual information:

  • User's work, hobbies, relationships
  • Learned facts from conversations
  • Important dates and information

RAG (Retrieval-Augmented Generation)

The RAG system retrieves relevant memories to provide context-aware responses.

How It Works

  1. Embed Query: Convert user input to a vector using Ollama embedding
  2. Semantic Search: Find similar memories using cosine similarity
  3. Rank Results: Sort by similarity score and filter by threshold
  4. Format Context: Combine profiles and relevant memories into a prompt

Configuration Options

OptionTypeDescriptionDefault
topKnumberMaximum memories to retrieve8
thresholdnumberMinimum similarity score (0.0-1.0)0.3

Example Context Output

[User Profile]
- Preferred language: Japanese
- Interests: programming, music

[Related Information]
- [Important] (Note) User works at a tech company
- (Recent) Asked about weather forecast yesterday

Memory Extraction

The system automatically extracts memories from conversations using the LLM.

How It Works

  1. Analyze Conversation: Send recent messages to LLM for analysis
  2. Extract Information: LLM identifies memorable facts and updates
  3. Validate Results: Filter by confidence score
  4. Store Memories: Save to database with embeddings

Configuration Options

OptionTypeDescriptionDefault
autoExtractbooleanEnable automatic extractiontrue
minConfidencenumberMinimum confidence for saving (0.0-1.0)0.5

Prompt Configuration

Extraction prompts are configured in config/memory.yaml:

yaml
memory:
  extraction:
    # System prompt for LLM
    systemPrompt: |
      あなたは会話から重要な情報を抽出する専門家です。
      ...

    # Labels for user prompt
    labels:
      user: ユーザー
      assistant: アシスタント
      conversationHistory: "## 会話履歴"
      existingProfiles: "## 既存の Profile"
      relatedMemories: "## 関連する既存の記憶"
      task: |
        ## タスク
        上記の会話から、記憶として保存すべき情報を抽出してください。
        ...

This allows customizing the extraction behavior without modifying code.

Extraction Process

The LLM is prompted to extract:

  • Profile Updates: Changes to user preferences or traits
  • New Memories: Facts worth remembering
  • Archive Candidates: Outdated or superseded information

Database Schema

Mirror Mate uses SQLite with Drizzle ORM for persistence.

Tables

TableDescription
usersUser accounts
sessionsConversation sessions
messagesChat messages
memoriesStored memories
memory_embeddingsVector embeddings for semantic search

Memory Fields

FieldTypeDescription
idstringUnique identifier
userIdstringOwner user ID
kindenumprofile, episode, or knowledge
titlestringMemory title/key
contentstringMemory content
tagsstring[]Categorization tags
importancenumberImportance score (0.0-1.0)
statusenumactive, archived, or deleted
sourceenummanual or extracted
createdAtdatetimeCreation timestamp
updatedAtdatetimeLast update timestamp
lastUsedAtdatetimeLast retrieval timestamp

Memory Management UI

Access the memory management interface at /control/memory.

Features

  • View Memories: List all memories with filtering
  • Create Memory: Manually add new memories
  • Edit Memory: Update existing memories
  • Delete Memory: Soft delete or permanently remove
  • Filter: By type (profile/episode/knowledge) and status

API Endpoints

MethodEndpointDescription
GET/api/memoriesList memories
POST/api/memoriesCreate memory
GET/api/memories/[id]Get memory details
PUT/api/memories/[id]Update memory
DELETE/api/memories/[id]Delete memory

Query Parameters

GET /api/memories

ParameterTypeDescription
userIdstringFilter by user ID
kindstringFilter by type (profile/episode/knowledge)
statusstringFilter by status (active/archived/deleted)

DELETE /api/memories/[id]

ParameterTypeDescription
hardbooleanIf true, permanently delete

Setup

1. Set Up Embedding Service

Option A: PLaMo-Embedding-1B (Recommended for Japanese)

See Recommended Setup for PLaMo server setup on Mac Studio.

Option B: Ollama with bge-m3 (Alternative)

bash
# Start Ollama
ollama serve

# Pull the embedding model
ollama pull bge-m3

2. Initialize Database

bash
# Create data directory
mkdir -p data

# Run database migration
bun run db:push

3. Configure Providers

Edit config/providers.yaml:

yaml
providers:
  embedding:
    enabled: true
    provider: ollama  # PLaMo server provides Ollama-compatible API
    ollama:
      model: plamo-embedding-1b
      baseUrl: "http://studio:8000"  # PLaMo (or http://localhost:11434 for Ollama)

  memory:
    enabled: true
    rag:
      topK: 8
      threshold: 0.3
    extraction:
      autoExtract: true
      minConfidence: 0.5

4. Verify Setup

bash
# Start the development server
bun run dev

# Open memory management
open http://localhost:3000/control/memory

Docker Setup

When running in Docker, the database is persisted in a volume:

yaml
# compose.yaml
services:
  mirrormate:
    volumes:
      - mirrormate-data:/app/data

volumes:
  mirrormate-data:

Configure embedding to use PLaMo server:

yaml
# config/providers.yaml
providers:
  embedding:
    enabled: true
    provider: ollama  # PLaMo server provides Ollama-compatible API
    ollama:
      model: plamo-embedding-1b
      baseUrl: "http://studio:8000"  # PLaMo embedding server

See Docker Documentation and Recommended Setup for details.


Troubleshooting

Embedding Service Not Available

Error: Ollama embed API error: 404 or connection refused

Solution (PLaMo):

  1. Check PLaMo server is running: curl http://studio:8000/health
  2. View logs: docker compose -f compose.studio.yaml logs plamo-embedding

Solution (Ollama/bge-m3):

  1. Ensure Ollama is running: ollama serve
  2. Pull the model: ollama pull bge-m3
  3. Verify the model exists: ollama list

Database Not Found

Error: SQLITE_CANTOPEN

Solution:

  1. Create data directory: mkdir -p data
  2. Run migration: bun run db:push

Memory Not Being Extracted

Solution:

  1. Check memory.enabled is true in config
  2. Check extraction.autoExtract is true
  3. Verify LLM provider is working
  4. Check console logs for extraction errors

Low Quality Retrieval

Solution:

  1. Lower the threshold value (e.g., 0.2)
  2. Increase the topK value
  3. Add more profile memories for better context
  4. Use a higher quality embedding model

Best Practices

Memory Organization

  1. Use profile memories for persistent info: Things that rarely change
  2. Use episode memories for recent events: Time-sensitive information
  3. Use knowledge memories for facts: Learned information

Performance Tips

  1. Set appropriate thresholds: Too low = irrelevant results, too high = missing context
  2. Keep topK reasonable: 5-10 is usually sufficient
  3. Periodic cleanup: Archive or delete outdated memories

Privacy Considerations

  1. Review extracted memories: Check what the LLM is storing
  2. Manual cleanup: Remove sensitive information if needed
  3. User-specific memories: Memories are scoped to user IDs

Released under the MIT License.