⚙️ Configuration

Complete configuration reference

🤖

ROBO CODED — This documentation was made with AI and may not be 100% sane. But the code does work! 🎉

⚙️ Configuration Reference

All configuration is done via environment variables. This page documents every available option.


📞 SIP Settings

VariableRequiredDefaultDescription
SIP_USERYesai-assistantSIP account username
SIP_PASSWORDYes-SIP account password
SIP_DOMAINYeslocalhostSIP server domain/IP
SIP_PORTNo5060SIP server port
SIP_TRANSPORTNoudpTransport: udp, tcp, tls
SIP_REGISTRARNo-Optional separate registrar

Example:

# 📞 SIP Connection
SIP_USER=ai-assistant
SIP_PASSWORD=super-secret-password
SIP_DOMAIN=pbx.example.com
SIP_PORT=5060
SIP_TRANSPORT=udp

🎤 Speaches Settings (STT + TTS)

This project uses Speaches as a unified speech server.

VariableRequiredDefaultDescription
SPEACHES_API_URLYeshttp://localhost:8001Speaches server URL
STT_MODENobatchbatch or realtime
WHISPER_MODELNoSystran/faster-distil-whisper-small.enWhisper model
WHISPER_LANGUAGENoenLanguage code

🎯 STT Modes

ModeDescriptionRecommended
batchBuffer audio locally, send on silenceYes
realtimeStream continuously to server⚠️ Experimental

Example:

# 🎤 Speech Recognition
SPEACHES_API_URL=http://speaches:8001
STT_MODE=batch
WHISPER_MODEL=Systran/faster-distil-whisper-small.en
WHISPER_LANGUAGE=en

🔊 TTS Settings

VariableRequiredDefaultDescription
TTS_MODELNospeaches-ai/Kokoro-82M-v1.0-ONNXTTS model
TTS_VOICENoaf_heartVoice ID
TTS_SPEEDNo1.0Speech speed (0.5-2.0)
TTS_RESPONSE_FORMATNowavFormat: wav, mp3, opus

Example:

# 🔊 Text-to-Speech
TTS_MODEL=speaches-ai/Kokoro-82M-v1.0-ONNX
TTS_VOICE=af_heart
TTS_SPEED=1.0
TTS_RESPONSE_FORMAT=wav

Available voices:

af_heart    - American Female (warm)
af_bella    - American Female (professional)
am_adam     - American Male (casual)
am_michael  - American Male (professional)
bf_emma     - British Female
bm_george   - British Male

🧠 LLM Settings

VariableRequiredDefaultDescription
LLM_BASE_URLYeshttp://vllm:8000/v1OpenAI-compatible API URL
LLM_MODELYesopenai-community/gpt2-xlModel name
LLM_API_KEYNonot-neededAPI key (if required)
LLM_BACKENDNovllmBackend type
LLM_MAX_TOKENSNo512Max response tokens
LLM_TEMPERATURENo0.6Creativity (0.0-1.0)
LLM_TOP_PNo0.85Nucleus sampling

Example configurations:

# Using vLLM with openGPT
LLM_BASE_URL=http://vllm:8000/v1
LLM_MODEL=openai-community/gpt2-xl
LLM_MAX_TOKENS=512
LLM_TEMPERATURE=0.6
# Using OpenAI API
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-4
LLM_API_KEY=sk-your-api-key
# Using Ollama
LLM_BASE_URL=http://ollama:11434/v1
LLM_MODEL=llama3.1

🎮 Recommended Models by GPU

NVIDIA H100 / A100 (80GB HBM)

Data center GPUs with maximum performance.

ComponentModelNotes
LLMmeta-llama/Llama-3.1-70B-InstructBest quality
LLMQwen/Qwen2.5-72B-InstructAlternative
STTSystran/faster-whisper-large-v3Best accuracy
TTSaf_heartWarm voice
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heart

NVIDIA DGX Spark (128GB Unified)

Grace Blackwell GB10 with shared CPU/GPU memory.

ComponentModelNotes
LLMmeta-llama/Llama-3.1-70B-InstructFits unified memory
LLMdeepseek-ai/DeepSeek-R1-Distill-Llama-70BReasoning focused
STTSystran/faster-whisper-large-v3Best accuracy
TTSaf_heartWarm voice
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heart

NVIDIA RTX 5090 (32GB GDDR7)

Next-gen consumer flagship.

ComponentModelNotes
LLMQwen/Qwen2.5-32B-InstructBest for 32GB
LLMmistralai/Mistral-Small-24B-Instruct-2501Good balance
STTSystran/faster-whisper-large-v3Best accuracy
TTSaf_heartWarm voice
LLM_MODEL=Qwen/Qwen2.5-32B-Instruct
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heart

NVIDIA RTX 4090 (24GB GDDR6X)

Current consumer flagship.

ComponentModelNotes
LLMQwen/Qwen2.5-14B-InstructBest for 24GB
LLMmeta-llama/Llama-3.1-8B-InstructFaster
STTSystran/faster-whisper-large-v3Best accuracy
TTSaf_heartWarm voice
LLM_MODEL=Qwen/Qwen2.5-14B-Instruct
STT_MODEL=Systran/faster-whisper-large-v3
TTS_VOICE=af_heart

NVIDIA RTX 3090 / 4080 (16-24GB)

High-end consumer GPUs.

ComponentModelNotes
LLMmeta-llama/Llama-3.1-8B-InstructBest for 16-24GB
LLMQwen/Qwen2.5-7B-InstructFaster
STTSystran/faster-whisper-mediumGood balance
TTSaf_heartWarm voice
LLM_MODEL=meta-llama/Llama-3.1-8B-Instruct
STT_MODEL=Systran/faster-whisper-medium
TTS_VOICE=af_heart

NVIDIA RTX 3080 / 4070 (10-12GB)

Mid-range GPUs.

ComponentModelNotes
LLMQwen/Qwen2.5-7B-InstructBest for 10-12GB
LLMmicrosoft/Phi-3-mini-4k-instructVery fast
STTSystran/faster-whisper-smallLow VRAM
TTSaf_heartWarm voice
LLM_MODEL=Qwen/Qwen2.5-7B-Instruct
STT_MODEL=Systran/faster-whisper-small
TTS_VOICE=af_heart

Low-Latency Stack (Any GPU)

Optimized for fastest response times.

LLM_MODEL=Qwen/Qwen2.5-3B-Instruct
STT_MODEL=Systran/faster-whisper-tiny.en
TTS_VOICE=af_heart
TTS_SPEED=1.1

TTS Voice Options

VoiceStyleGenderAccent
af_heartWarm, friendlyFemaleAmerican
af_bellaProfessionalFemaleAmerican
af_sarahCasualFemaleAmerican
am_adamNeutralMaleAmerican
am_michaelProfessionalMaleAmerican
bf_emmaWarmFemaleBritish
bm_georgeProfessionalMaleBritish

🎚️ Audio & VAD Settings

VariableRequiredDefaultDescription
MIN_SPEECH_DURATION_MSNo200Min speech to process (ms)
MAX_SPEECH_DURATION_SNo10.0Max utterance length (s)
SILENCE_TIMEOUT_MSNo750Silence before end-of-speech
BARGE_IN_MIN_DURATIONNo400Min duration to interrupt (ms)
BARGE_IN_ENERGY_THRESHOLDNo2000Energy threshold

Example:

# 🎚️ Audio Processing
MIN_SPEECH_DURATION_MS=200
MAX_SPEECH_DURATION_S=10.0
SILENCE_TIMEOUT_MS=750
BARGE_IN_MIN_DURATION=400
BARGE_IN_ENERGY_THRESHOLD=2000

💬 Conversation Settings

VariableRequiredDefaultDescription
MAX_CONVERSATION_TURNSNo10Max turns before ending
CALLBACK_RING_TIMEOUTNo30Callback ring timeout (s)

🌤️ Weather (Tempest) Settings

VariableRequiredDefaultDescription
TEMPEST_STATION_IDNo-WeatherFlow station ID
TEMPEST_API_TOKENNo-WeatherFlow API token

Get your credentials:

  1. Go to tempestwx.com
  2. Navigate to Settings → Data Authorizations
  3. Create a new token
  4. Find your station ID in the URL
Tempest API settings

Example:

# 🌤️ Weather Station
TEMPEST_STATION_ID=12345
TEMPEST_API_TOKEN=a1b2c3d4-e5f6-7890-abcd-ef1234567890

🔄 API Retry Settings

VariableRequiredDefaultDescription
API_RETRY_ATTEMPTSNo3Retry attempts
API_RETRY_BASE_DELAY_SNo0.5Base retry delay
API_RETRY_MAX_DELAY_SNo5.0Max retry delay
API_TIMEOUT_SNo30.0Request timeout

📊 Telemetry Settings

VariableRequiredDefaultDescription
LOG_LEVELNoINFODEBUG, INFO, WARNING, ERROR
OTEL_ENABLEDNotrueEnable OpenTelemetry
OTEL_EXPORTER_OTLP_ENDPOINTNohttp://otel-collector:4317OTLP endpoint
OTEL_SERVICE_NAMENosip-agentService name

Example:

# 📊 Logging & Telemetry
LOG_LEVEL=INFO
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_SERVICE_NAME=sip-agent

💾 Storage Settings

VariableRequiredDefaultDescription
DATA_DIRNo./dataPersistent data directory
REDIS_URLNoredis://localhost:6379/0Redis URL

🗣️ Phrases Configuration

Customize the assistant's pre-generated phrases for greetings, goodbyes, acknowledgments, and more.

Configuration Methods

Method 1: Environment Variables (JSON array)

PHRASES_GREETINGS=["Hello! How can I help?","Hi there!","Hey!"]
PHRASES_GOODBYES=["Goodbye!","Take care!","See ya!"]

Method 2: Environment Variables (comma-separated)

PHRASES_GREETINGS=Hello! How can I help?,Hi there!,Hey!
PHRASES_GOODBYES=Goodbye!,Take care!,See ya!

Method 3: JSON File (recommended for complex setups)

Create data/phrases.json:

{
  "greetings": [
    "Hello! How can I help you today?",
    "Hi there! What can I do for you?",
    "Hey! What do you need?"
  ],
  "goodbyes": [
    "Goodbye!",
    "Take care!",
    "Have a great day!"
  ],
  "acknowledgments": [
    "Okay.",
    "Got it.",
    "One moment.",
    "Sure.",
    "Copy that."
  ],
  "thinking": [
    "Let me check.",
    "One moment.",
    "Working on it."
  ],
  "errors": [
    "Sorry, I didn't catch that.",
    "Could you repeat that please?",
    "I didn't quite get that."
  ],
  "followups": [
    "Is there anything else I can help with?",
    "Can I help with anything else?",
    "Anything else?"
  ],
  "precache_extra": [
    "Hello",
    "Goodbye",
    "Yes",
    "No",
    "Thank you"
  ]
}

Phrase Categories

VariableCategoryDescription
PHRASES_GREETINGS👋 GreetingsPlayed when call is answered
PHRASES_GOODBYES👋 GoodbyesPlayed when ending call
PHRASES_ACKNOWLEDGMENTS✅ AcknowledgmentsQuick responses while processing
PHRASES_THINKING🤔 ThinkingPlayed while waiting for LLM
PHRASES_ERRORS❌ ErrorsPlayed when speech not understood
PHRASES_FOLLOWUPS🔄 Follow-upsPlayed after completing a task
PHRASES_PRECACHE⚡ Pre-cacheAdditional phrases to pre-synthesize

Example: Custom Personality

Friendly Assistant:

{
  "greetings": [
    "Hey there, friend! What can I do for you?",
    "Hello! I'm so happy to help!",
    "Hi! Ready when you are!"
  ],
  "goodbyes": [
    "Take care! Talk soon!",
    "Bye bye! Have an awesome day!",
    "See you later, alligator!"
  ]
}

Professional Assistant:

{
  "greetings": [
    "Good day. How may I assist you?",
    "Hello. What can I help you with today?",
    "Greetings. Please state your request."
  ],
  "goodbyes": [
    "Thank you for calling. Goodbye.",
    "Have a pleasant day. Goodbye.",
    "Thank you. Take care."
  ]
}

Sassy Robot:

{
  "greetings": [
    "Beep boop! What do you want, human?",
    "State your business, meatbag!",
    "Oh great, another call. What is it?"
  ],
  "goodbyes": [
    "Finally! Goodbye!",
    "Don't let the door hit you!",
    "Bye! Try not to miss me too much!"
  ],
  "errors": [
    "Did you just make a sound? Try again.",
    "My audio sensors must be malfunctioning.",
    "I'm sorry, I don't speak mumble."
  ]
}

Pre-caching Behavior

All configured phrases are automatically pre-synthesized at startup for instant playback:

┌─────────────────────────────────────────────────────────────┐
│ 🚀 Startup                                                  │
├─────────────────────────────────────────────────────────────┤
│ 📄 Load phrases from config                                 │
│ 🎤 Pre-synthesize all phrases via TTS                      │
│ 💾 Cache audio in memory                                    │
│ ⚡ Ready for instant playback!                              │
└─────────────────────────────────────────────────────────────┘

Startup log:

INFO: Pre-caching 25 phrases...
INFO: Cached 25 phrases
INFO: Speaches TTS ready, 25 phrases cached

📋 Complete Example

# =============================================================================
# 📞 SIP AI Assistant - Complete Configuration
# =============================================================================

# ──────────────────────────────────────────────────────────────────────────────
# 📞 SIP Connection
# ──────────────────────────────────────────────────────────────────────────────
SIP_USER=ai-assistant
SIP_PASSWORD=super-secret-password
SIP_DOMAIN=pbx.example.com
SIP_PORT=5060
SIP_TRANSPORT=udp

# ──────────────────────────────────────────────────────────────────────────────
# 🎤 Speaches (STT + TTS)
# ──────────────────────────────────────────────────────────────────────────────
SPEACHES_API_URL=http://speaches:8001
STT_MODE=batch
WHISPER_MODEL=Systran/faster-distil-whisper-small.en
WHISPER_LANGUAGE=en
TTS_MODEL=speaches-ai/Kokoro-82M-v1.0-ONNX
TTS_VOICE=af_heart
TTS_SPEED=1.0

# ──────────────────────────────────────────────────────────────────────────────
# 🧠 LLM (Language Model)
# ──────────────────────────────────────────────────────────────────────────────
LLM_BASE_URL=http://vllm:8000/v1
LLM_MODEL=openai-community/gpt2-xl
LLM_MAX_TOKENS=512
LLM_TEMPERATURE=0.6
LLM_TOP_P=0.85

# ──────────────────────────────────────────────────────────────────────────────
# 🎚️ Audio Processing
# ──────────────────────────────────────────────────────────────────────────────
MIN_SPEECH_DURATION_MS=200
MAX_SPEECH_DURATION_S=10.0
SILENCE_TIMEOUT_MS=750
BARGE_IN_MIN_DURATION=400
BARGE_IN_ENERGY_THRESHOLD=2000

# ──────────────────────────────────────────────────────────────────────────────
# 💬 Conversation
# ──────────────────────────────────────────────────────────────────────────────
MAX_CONVERSATION_TURNS=10
CALLBACK_RING_TIMEOUT=30

# ──────────────────────────────────────────────────────────────────────────────
# 🌤️ Weather Station (Optional)
# ──────────────────────────────────────────────────────────────────────────────
TEMPEST_STATION_ID=12345
TEMPEST_API_TOKEN=your-api-token

# ──────────────────────────────────────────────────────────────────────────────
# 📊 Logging & Telemetry
# ──────────────────────────────────────────────────────────────────────────────
LOG_LEVEL=INFO
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_SERVICE_NAME=sip-agent

# ──────────────────────────────────────────────────────────────────────────────
# 💾 Storage
# ──────────────────────────────────────────────────────────────────────────────
DATA_DIR=./data

# ──────────────────────────────────────────────────────────────────────────────
# 🗣️ Phrases (Optional - or use data/phrases.json)
# ──────────────────────────────────────────────────────────────────────────────
# PHRASES_GREETINGS=["Hello! How can I help?","Hi there!","Hey!"]
# PHRASES_GOODBYES=["Goodbye!","Take care!","Have a great day!"]
# PHRASES_ACKNOWLEDGMENTS=["Okay.","Got it.","One moment."]
# PHRASES_THINKING=["Let me check.","One moment."]
# PHRASES_ERRORS=["Sorry, I didn't catch that.","Could you repeat that?"]
# PHRASES_FOLLOWUPS=["Anything else?","Can I help with anything else?"]

📊 Grafana Dashboard

Import the included dashboard for monitoring:

# Dashboard JSON location
grafana/dashboards/sip-agent.json
Grafana dashboard

Metrics available:

┌─────────────────────────────────────────────────────────────┐
│ 📊 SIP Agent Dashboard                                      │
├─────────────────────────────────────────────────────────────┤
│ 📞 Active Calls: 1                                          │
│ 📈 Total Calls Today: 47                                    │
│ ⏱️ Avg Call Duration: 2m 34s                                │
│ 🎤 STT Latency (p95): 245ms                                │
│ 🔊 TTS Latency (p95): 180ms                                │
│ 🧠 LLM Latency (p95): 890ms                                │
│ 🔧 Tool Executions: 23                                     │
└─────────────────────────────────────────────────────────────┘

🔐 Secrets Management

Docker Secrets

# docker-compose.yml
services:
  sip-agent:
    secrets:
      - sip_password
      - llm_api_key
    environment:
      - SIP_PASSWORD_FILE=/run/secrets/sip_password
      - LLM_API_KEY_FILE=/run/secrets/llm_api_key

secrets:
  sip_password:
    file: ./secrets/sip_password.txt
  llm_api_key:
    file: ./secrets/llm_api_key.txt

Environment Variable Precedence

1. 🥇 Direct environment variables (docker run -e)
2. 🥈 .env file in working directory
3. 🥉 Default values