Skip to content

Providers

Mirror Mate uses external providers for LLM (language model), TTS (text-to-speech), STT (speech-to-text), VLM (vision language model), and Embedding (vector generation). Providers are configured in config/providers.yaml.

Configuration

yaml
providers:
  llm:
    enabled: true
    provider: ollama  # openai or ollama
    # ...

  tts:
    enabled: true
    provider: voicevox  # openai or voicevox
    # ...

  stt:
    enabled: true
    provider: web  # openai, local, or web
    # ...

  vlm:
    enabled: true
    provider: ollama
    # ...

  embedding:
    enabled: true
    provider: ollama
    # ...

  memory:
    enabled: true
    # ...

LLM Providers

ProviderDescriptionAPI Key Required
OpenAIGPT-4o, GPT-4o-miniYes
OllamaLocal LLM hostingNo

OpenAI

  1. Get an API key from OpenAI
  2. Add to .env:
bash
OPENAI_API_KEY=sk-your-api-key-here
  1. Configure in providers.yaml:
yaml
providers:
  llm:
    enabled: true
    provider: openai
    openai:
      model: gpt-4o-mini  # or gpt-4o
      maxTokens: 300
      temperature: 0.7

Models

ModelDescriptionSpeedCost
gpt-4oMost capableMediumHigher
gpt-4o-miniFast and efficientFastLower

Ollama

Ollama allows running LLMs locally without API costs.

  1. Install Ollama:
bash
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh
  1. Start Ollama server:
bash
ollama serve
  1. Pull a model:
bash
ollama pull gpt-oss:20b
  1. Configure in providers.yaml:
yaml
providers:
  llm:
    enabled: true
    provider: ollama
    ollama:
      model: "gpt-oss:20b"
      baseUrl: "http://localhost:11434"
      maxTokens: 300
      temperature: 0.7
ModelSizeJapanese QualityTool CallingSpeed
gpt-oss:20b20BExcellentNativeMedium
qwen2.5:14b14BVery GoodYesMedium
qwen2.5:32b32BExcellentYesSlow

LLM Options

OptionTypeDescriptionDefault
providerstringopenai or ollamaopenai
modelstringModel name/IDvaries
maxTokensnumberMaximum response length300
temperaturenumberCreativity (0.0-1.0)0.7
baseUrlstringAPI endpoint (Ollama only)http://localhost:11434

TTS Providers

ProviderDescriptionAPI Key Required
OpenAIOpenAI TTS APIYes
VOICEVOXFree, local, Japanese voicesNo

OpenAI TTS

yaml
providers:
  tts:
    enabled: true
    provider: openai
    openai:
      voice: shimmer  # alloy, echo, fable, onyx, nova, shimmer
      model: tts-1    # tts-1 or tts-1-hd
      speed: 0.95

Voices

VoiceDescription
alloyNeutral, balanced
echoWarm, conversational
fableExpressive, narrative
onyxDeep, authoritative
novaFriendly, upbeat
shimmerClear, gentle (default)

VOICEVOX

  1. Download and install VOICEVOX from voicevox.hiroshiba.jp
  2. Start VOICEVOX (runs on port 50021 by default)
  3. Configure:
yaml
providers:
  tts:
    enabled: true
    provider: voicevox
    voicevox:
      speaker: 3  # Speaker ID
      baseUrl: "http://localhost:50021"

Speaker IDs (Common)

IDCharacter
0四国めたん (あまあま)
1ずんだもん (あまあま)
2四国めたん (ノーマル)
3ずんだもん (ノーマル)
8春日部つむぎ
9波音リツ

STT Providers (Speech-to-Text)

STT providers enable speech recognition for voice input. Mirror Mate supports multiple providers with automatic silence detection.

Note: STT language settings can be automatically configured based on your app locale using Locale Presets. When you change your locale (e.g., ja to en), the STT language is automatically updated.

ProviderDescriptionAPI Key RequiredAccuracy
Web Speech APIBrowser native (Chrome/Edge)NoGood
OpenAI WhisperCloud APIYesExcellent
Local WhisperSelf-hosted (faster-whisper)NoExcellent

Web Speech API (Default)

Uses the browser's built-in speech recognition. Best for quick setup with no additional configuration.

yaml
providers:
  stt:
    enabled: true
    provider: web
    web:
      language: ja-JP  # BCP 47 language tag

Pros: Zero cost, instant setup, real-time interim results Cons: Browser-dependent quality, requires Chrome/Edge

OpenAI Whisper

High-accuracy speech recognition using OpenAI's Whisper API.

yaml
providers:
  stt:
    enabled: true
    provider: openai
    openai:
      model: whisper-1
      language: ja  # ISO 639-1 code (or omit for auto-detect)
      temperature: 0

Pros: Excellent accuracy (especially Japanese), 99+ languages Cons: API cost ($0.006/minute), requires internet

Local Whisper (faster-whisper)

Self-hosted Whisper for privacy and cost savings. Uses faster-whisper-server with OpenAI-compatible API.

yaml
providers:
  stt:
    enabled: true
    provider: local
    local:
      baseUrl: "http://studio:8080"  # Your whisper server
      model: large-v3  # tiny, base, small, medium, large-v3
      language: ja

Setup with Docker

bash
# On Mac Studio (or any server)
docker compose -f compose.studio.yaml up -d faster-whisper

See Docker Documentation for details.

Models

ModelSizeAccuracySpeed (30s audio)
tiny39MLow~2s
base74MMedium~4s
small244MGood~8s
medium769MVery Good~12s
large-v31.5GExcellent~15s

Speed measured on Apple M1/M2 Ultra (CPU mode)

Silence Detection

All STT providers support automatic silence detection to determine when the user has finished speaking.

yaml
providers:
  stt:
    silenceDetection:
      silenceThreshold: 1.5      # Seconds of silence before sending
      volumeThreshold: 0.02      # RMS volume threshold (0-1)
      minRecordingDuration: 500  # Minimum recording time (ms)
      maxRecordingDuration: 60000 # Maximum recording time (ms)
OptionTypeDescriptionDefault
silenceThresholdnumberSeconds of silence before auto-send1.5
volumeThresholdnumberRMS volume below which is silence0.02
minRecordingDurationnumberMin time before silence detection (ms)500
maxRecordingDurationnumberMax recording duration (ms)60000

STT Options Summary

OptionTypeDescriptionDefault
providerstringweb, openai, or localweb
openai.modelstringWhisper modelwhisper-1
openai.languagestringLanguage code (ISO 639-1)auto
local.baseUrlstringWhisper server URLhttp://localhost:8080
local.modelstringModel namebase
local.languagestringLanguage codeauto

VLM Providers (Vision Language Model)

VLM providers enable visual understanding through the see_camera tool.

ProviderDescriptionAPI Key Required
OllamaLocal vision models (llava, moondream)No

Ollama VLM

yaml
providers:
  vlm:
    enabled: true
    provider: ollama
    ollama:
      model: llava:7b  # or moondream, granite3.2-vision
      baseUrl: "http://localhost:11434"
ModelSizeDescriptionSpeed
moondream1.8BLightweight, edge-friendlyFast
llava:7b7BGood balance of quality/speedMedium
granite3.2-vision2BDocument understandingMedium

Usage

When VLM is enabled and the user asks visual questions, the LLM will use the see_camera tool:

User: "何を持ってるかわかる?"
AI: [calls see_camera tool]
AI: "スマートフォンを持っていますね!"

Embedding Providers

Embedding providers generate vector representations of text for semantic search.

ProviderDescriptionAPI Key Required
OllamaLocal embedding modelsNo

Ollama Embedding

yaml
providers:
  embedding:
    enabled: true
    provider: ollama  # PLaMo server provides Ollama-compatible API
    ollama:
      model: plamo-embedding-1b
      baseUrl: "http://studio:8000"  # PLaMo embedding server
ModelDimensionsDescription
plamo-embedding-1b2048Japanese-optimized, top JMTEB scores (recommended)
bge-m31024Multi-lingual, good quality (alternative)
nomic-embed-text768Fast, English-focused

Setup

PLaMo-Embedding-1B provides superior Japanese text embedding. See Recommended Setup for full instructions.

bash
# On Mac Studio
docker compose -f compose.studio.yaml up -d

Option 2: Ollama with bge-m3 (Alternative)

bash
ollama serve
ollama pull bge-m3
yaml
providers:
  embedding:
    enabled: true
    provider: ollama
    ollama:
      model: bge-m3
      baseUrl: "http://localhost:11434"

Memory Configuration

Memory system enables persistent user context through RAG (Retrieval-Augmented Generation).

yaml
providers:
  memory:
    enabled: true
    # RAG settings
    rag:
      topK: 8           # Max memories to retrieve
      threshold: 0.3    # Minimum similarity score (0.0-1.0)
    # Memory extraction settings
    extraction:
      autoExtract: true      # Auto-extract from conversations
      minConfidence: 0.5     # Minimum confidence for extraction

Options

OptionTypeDescriptionDefault
enabledbooleanEnable memory systemtrue
rag.topKnumberMax memories to retrieve per query8
rag.thresholdnumberSimilarity threshold (0.0-1.0)0.3
extraction.autoExtractbooleanAuto-extract memories from conversationstrue
extraction.minConfidencenumberMinimum confidence for extraction0.5

Memory Types

TypeDescription
profileUser preferences, traits, persistent info
episodeRecent interactions and events
knowledgeFacts and learned information

See Memory Documentation for details.


Remote Server Configuration

Recommended setup: Run heavy services (Ollama, VOICEVOX, PLaMo) on a powerful server (e.g., Mac Studio) and connect via Tailscale:

yaml
# config/providers.yaml
providers:
  llm:
    provider: ollama
    ollama:
      model: "gpt-oss:20b"
      baseUrl: "http://studio:11434"  # Tailscale hostname

  tts:
    provider: voicevox
    voicevox:
      speaker: 3
      baseUrl: "http://studio:50021"  # Tailscale hostname

  embedding:
    enabled: true
    provider: ollama  # PLaMo server provides Ollama-compatible API
    ollama:
      model: plamo-embedding-1b
      baseUrl: "http://studio:8000"  # PLaMo embedding server

  memory:
    enabled: true
    rag:
      topK: 8
      threshold: 0.3
    extraction:
      autoExtract: true
      minConfidence: 0.5

See Docker Documentation for details.

Released under the MIT License.