๐Ÿค– Multi-Channel AI Agent System

Sears KAIros - AI-Powered Home Manager Communication Platform

โœจ Complete Multi-Step Orchestration Workflows

Overview

A unified AI agentic platform that orchestrates customer communications across multiple channels (voice, SMS, email) using OpenAI Realtime APILow-latency speech-to-speech AI, WebSocketReal-time audio streaming protocol, and MongoDBCloud database for state management. The system embodies the Home Manager role - making all customer calls for appointment confirmations, "on the way" notifications, and post-service follow-ups. Covers 12 communication scenarios with 60+ templates, ensuring technicians focus solely on service delivery while AI handles all customer interactions.

Core Architecture

4 Layers
graph TB
    subgraph orchestration["๐ŸŽฏ AGENTS"]
        Journey["๐ŸŽฏ Communication Orchestrator
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โœ“ Multi-step workflows
โœ“ Parallel multi-channel
โœ“ Context preservation
โœ“ State management"] PreCall["๐Ÿ“ž Pre-Call Agent
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โœ“ Confirmation Call
โœ“ Narrow Window
โœ“ On the Way Call"] FollowUp["๐Ÿ“‹ Follow-Up Agent
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โœ“ Post-Service SMS
โœ“ Satisfaction Check
โœ“ Service Complete"] Comms["๐Ÿ’ฌ Communications Hub
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
โœ“ 12 Scenarios
โœ“ Multi-Channel
โœ“ Templates"] end subgraph context["๐Ÿ’พ CONTEXT & STATE MANAGEMENT"] StateManager["๐Ÿง  Conversation State Manager
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Cross-channel tracking
2. Intent & sentiment
3. History & timeline
4. Next-best-action"] subgraph dataSources["๐Ÿ“Š Data Sources"] direction LR MongoDB["๐Ÿ—„๏ธ MongoDB Data Store
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Conversations
2. Customers
3. Interactions
4. Context
5. Analytics"] ServicePower["๐Ÿ”ง ServicePower/Snowflake
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Service orders
2. Appointments
3. Customer details
4. Schedules
5. Parts
6. Tech info"] end end subgraph channels["๐Ÿค– CHANNEL AGENTS LAYER"] subgraph voice["๐ŸŽค Voice Agents"] OutVoice["๐Ÿ“ค Outbound Voice
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Initiate calls
2. Real-time conv
3. Voicemail
4. Trigger actions"] InVoice["๐Ÿ“ฅ Inbound Voice
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Answer calls
2. Context-aware
3. KB-powered"] end subgraph sms["๐Ÿ’ฌ SMS Agents"] OutSMS["๐Ÿ“ค Outbound SMS
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Send messages
2. Link sharing
3. Confirmations
4. Trigger actions"] InSMS["๐Ÿ“ฅ Inbound SMS
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Process replies
2. Context-aware
3. Determine intent
4. Auto-respond"] end subgraph email["๐Ÿ“ง Email Agents"] OutEmail["๐Ÿ“ค Outbound Email
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Compose
2. Attachments
3. Follow-ups"] InEmail["๐Ÿ“ฅ Inbound Email
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Process replies
2. Extract info
3. Classify intent
4. Auto-respond"] end end subgraph integration["๐Ÿ”Œ INTEGRATION LAYER"] OpenAI["๐Ÿค– OpenAI Realtime
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Speech-to-Speech
2. Function Calling
3. Voice Detection
4. Audio Streaming"] Twilio["๐Ÿ“ฑ Twilio Voice/SMS
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Media Streams
2. TwiML
3. Messaging API
4. Webhooks"] Graph["๐Ÿ“ฎ Microsoft Graph
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Mail API
2. Attachments"] end PreCall --> StateManager FollowUp --> StateManager Comms --> StateManager StateManager --> dataSources MongoDB ~~~ ServicePower dataSources --> OutVoice dataSources --> InVoice dataSources --> OutSMS dataSources --> InSMS dataSources --> OutEmail dataSources --> InEmail OutVoice --> Twilio InVoice --> Twilio OutVoice --> OpenAI InVoice --> OpenAI OutSMS --> Twilio InSMS --> Twilio OutEmail --> Graph InEmail --> Graph style orchestration fill:#e8f0f5,stroke:#1a1f3a,stroke-width:3px style context fill:#f5f0e8,stroke:#8b6f47,stroke-width:3px style channels fill:#f0e8f0,stroke:#4a2c5c,stroke-width:3px style integration fill:#e8f5f0,stroke:#2d5f4f,stroke-width:3px style dataSources fill:#f5ede5,stroke:#c9a961,stroke-width:2px style voice fill:#e8f0f5,stroke:#1e4d4d,stroke-width:2px style sms fill:#f5e8eb,stroke:#b76e79,stroke-width:2px style email fill:#f5f0e0,stroke:#d4a574,stroke-width:2px
โฌ†๏ธ Back to Top

Pre-Arrival Confirmation Flow

Home Manager Jessica makes ALL customer calls - technicians NEVER call customers. Two calls: 1) Confirmation + narrow window (1-2hrs before), 2) "On the way" (30min before arrival).

sequenceDiagram
    participant Scheduler as โฐ Scheduler
    participant PreCall as ๐Ÿ“ž Home Manager
(Pre-Call Agent) participant DB as ๐Ÿ—„๏ธ MongoDB participant GPT as ๐Ÿค– GPT-4o participant OutVoice as ๐Ÿ“ค Outbound Voice participant Twilio as ๐Ÿ“ฑ Twilio participant OpenAI as ๐ŸŽค OpenAI Realtime participant Customer as ๐Ÿ‘ค Customer participant SMS as ๐Ÿ’ฌ SMS Agent Note over Scheduler,PreCall: 1-2 hours before appointment Scheduler->>PreCall: Trigger confirmation workflow PreCall->>DB: Load appointment data DB-->>PreCall: Service details, window, customer info PreCall->>GPT: Generate confirmation script Note right of GPT: โ€ข Technician name
โ€ข Narrowed window (4-5pm)
โ€ข Service type
โ€ข Payment ($89)
โ€ข Pet/access notes GPT-->>PreCall: Personalized script ready PreCall->>OutVoice: Initiate confirmation call OutVoice->>Twilio: Place call to +1234567890 Twilio->>Customer: ๐Ÿ“ž Calling... Customer-->>Twilio: Answer Twilio->>OpenAI: Connect audio stream OpenAI-->>Customer: ๐Ÿ—ฃ๏ธ "Hi! This is about your 4-5pm appointment..." Customer-->>OpenAI: ๐Ÿ—ฃ๏ธ "Yes, I'll be home" OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Adult 18+ present?" Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Yes" OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "$89 trip charge, pets secured?" Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Got it, dog will be in backyard" OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "I'll call when tech is 30 min away" Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Perfect, thanks!" OpenAI-->>OutVoice: Call completed (2 min 15 sec) OutVoice->>SMS: Send confirmation SMS SMS->>Customer: ๐Ÿ“ฑ "Confirmed: Tech Tom, 4-5pm today. $89 trip charge." OutVoice->>DB: Log call details & confirmation DB-->>OutVoice: โœ“ Saved Note over Scheduler,Customer: 30 minutes before arrival Scheduler->>PreCall: Trigger "on the way" call PreCall->>OutVoice: Make brief notification call OutVoice->>Twilio: Call customer Twilio->>OpenAI: Connect OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Tech finishing up, heading your way!" Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Great, ready!" OpenAI-->>OutVoice: Call complete (35 sec) OutVoice->>DB: Update: Customer ready โœ“
โฌ†๏ธ Back to Top

Multi-Step Orchestration Flow: Overview

Multi-Channel Outreach

Full customer journey workflow showing multi-channel outreach strategy. When customer doesn't answer the call, the system immediately sends voicemail + SMS + Email to maximize response channels. Demonstrates how the Communication Orchestrator manages parallel multi-channel communication with complete context preservation across all touchpoints. Customer can respond via ANY channel at any time.

Step 1: Initial Outbound Call & Multi-Channel Outreach

Communication Orchestrator initiates the workflow by triggering an outbound call. If customer answers, workflow completes immediately. If no answer, the system detects voicemail, leaves a contextual message, and immediately sends SMS and Email to provide multiple response channels.

sequenceDiagram
    participant Scheduler as โฐ Scheduler
    participant Journey as ๐ŸŽฏ Communication Orchestrator
    participant OutVoice as ๐Ÿ“ค Outbound Voice
    participant OutSMS as ๐Ÿ“ค Outbound SMS
    participant OutEmail as ๐Ÿ“ค Outbound Email
    participant DB as ๐Ÿ—„๏ธ MongoDB State
    participant OpenAI as ๐ŸŽค OpenAI Realtime
    participant Twilio as ๐Ÿ“ฑ Twilio
    participant Graph as ๐Ÿ“ฎ MS Graph
    participant Customer as ๐Ÿ‘ค Customer

    Scheduler->>Journey: Trigger confirmation workflow
    Journey->>DB: Create communication state
    Note right of DB: State: workflow_initiated
Status: in_progress
Step: outbound_call Journey->>OutVoice: Initiate confirmation call OutVoice->>Twilio: Place call to customer Twilio->>Customer: ๐Ÿ“ž Calling... alt Customer Answers Call Customer->>Twilio: โœ“ Answers Twilio->>OpenAI: Connect audio stream OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Hi! This is about your appointment at 4-5pm today..." Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Yes, I'll be there!" OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Great! Tech Tom will arrive between 4-5pm. $89 trip charge." Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Perfect, see you then!" OpenAI-->>OutVoice: Call completed successfully OutVoice->>DB: Update: confirmed, workflow_complete Note right of DB: Status: confirmed
Channel: voice_outbound
Duration: 1m 45s Note over Journey,DB: โœ… SUCCESS - Workflow Complete else No Answer / Voicemail Detected Customer-->>Twilio: No answer (rings) Twilio->>OpenAI: Voicemail greeting detected Note right of OpenAI: AI detects:
โ€ข Beep sound
โ€ข Greeting patterns
โ€ข No human response OpenAI->>Customer: ๐Ÿ—ฃ๏ธ Leave voicemail message Note right of OpenAI: "Hi, this is Jessica from
Sears about your dishwasher
appointment today 4-5pm.
Please call back to confirm." OpenAI-->>OutVoice: Voicemail left (22 seconds) OutVoice->>DB: Update: voicemail_left Note right of DB: Status: voicemail_left
Timestamp: 13:30 Note over Journey,OutEmail: Immediately send SMS and Email (parallel) par Send SMS OutVoice->>Journey: Trigger SMS Journey->>OutSMS: Send confirmation SMS OutSMS->>Twilio: Send message Twilio->>Customer: ๐Ÿ’ฌ "Hi, we called about your appointment today at 4-5pm. Reply YES to confirm or call us back." OutSMS->>DB: Log: sms_sent and Send Email OutVoice->>Journey: Trigger Email Journey->>OutEmail: Send confirmation email OutEmail->>Graph: Send message Graph->>Customer: ๐Ÿ“ง "Appointment Confirmation - Please Respond" OutEmail->>DB: Log: email_sent end Note right of DB: All channels contacted:
โ€ข Voicemail: 13:30
โ€ข SMS: 13:30
โ€ข Email: 13:30 Journey->>Journey: Wait for response on ANY channel Note over Journey,DB: โณ Waiting for customer response end

Outcome: If customer answers, workflow completes immediately (85% success rate). If voicemail is detected, system immediately sends voicemail + SMS + Email simultaneously, providing customer with multiple convenient ways to respond.

Step 2: Customer Response via Voice

After multi-channel outreach, customer can call back at any time. When they do, the inbound voice agent has full context about all previous contact attempts (voicemail, SMS, email) and can provide a seamless, personalized conversation.

sequenceDiagram
    participant Customer as ๐Ÿ‘ค Customer
    participant Twilio as ๐Ÿ“ฑ Twilio
    participant InVoice as ๐Ÿ“ฅ Inbound Voice
    participant DB as ๐Ÿ—„๏ธ MongoDB State
    participant OpenAI as ๐ŸŽค OpenAI Realtime

    Note over Customer,InVoice: Customer received: Voicemail + SMS + Email
    
    Customer->>Twilio: ๐Ÿ“ž Calls company number
    Twilio->>InVoice: Incoming call detected
    
    InVoice->>DB: Lookup by phone: +1234567890
    DB-->>InVoice: Load communication state
    Note right of DB: Context loaded:
โ€ข Voicemail left at 13:30
โ€ข SMS sent at 13:30
โ€ข Email sent at 13:30
โ€ข Appointment: 4-5pm today InVoice->>OpenAI: Connect with full context OpenAI->>Twilio: Ready for conversation Twilio->>Customer: โœ“ Connected OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Hi! Thanks for calling back about your appointment today..." Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Yes, I got your messages. I'll be home." OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Perfect! Tech Tom will arrive between 4-5pm. $89 trip charge." Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Great, thanks!" OpenAI-->>InVoice: Call completed InVoice->>DB: Update: confirmed_via_callback, workflow_complete Note right of DB: Status: confirmed
Channel: voice_inbound
Response time: 25 min Note over InVoice,DB: โœ… SUCCESS - Workflow Complete

Key Feature: Inbound agent recognizes caller from phone number, loads complete context about all contact attempts (voicemail, SMS, email), and provides seamless conversation.

Step 3: Customer Response via SMS

Customer can respond to the SMS message sent in Step 1. The inbound SMS agent analyzes their reply, has full context about all previous contact attempts, and can confirm the appointment or handle other responses. Customer may also choose to call instead of texting.

sequenceDiagram
    participant Customer as ๐Ÿ‘ค Customer
    participant Twilio as ๐Ÿ“ฑ Twilio
    participant InSMS as ๐Ÿ“ฅ Inbound SMS
    participant InVoice as ๐Ÿ“ฅ Inbound Voice
    participant DB as ๐Ÿ—„๏ธ MongoDB State
    participant OpenAI as ๐ŸŽค OpenAI Realtime

    Note over Customer,InSMS: Customer received: Voicemail + SMS + Email
    
    alt Customer Replies to SMS
        Customer->>Twilio: ๐Ÿ’ฌ "YES"
        Twilio->>InSMS: Incoming SMS webhook
        
        InSMS->>DB: Load communication state
        DB-->>InSMS: Full context loaded
        Note right of DB: Complete history:
โ€ข Voicemail at 13:30
โ€ข SMS at 13:30
โ€ข Email at 13:30
โ€ข Customer replied InSMS->>InSMS: Analyze response: "YES" = confirmation InSMS->>Twilio: Send confirmation reply Twilio->>Customer: ๐Ÿ’ฌ "Great! Tech Tom arriving 4-5pm. $89 trip charge. See you then!" InSMS->>DB: Update: confirmed_via_sms, workflow_complete Note right of DB: Status: confirmed
Channel: sms_inbound
Response time: 12 min Note over InSMS,DB: โœ… SUCCESS - Workflow Complete else Customer Calls Instead of SMS Reply Customer->>Twilio: ๐Ÿ“ž Calls company number Twilio->>InVoice: Incoming call InVoice->>DB: Load communication state DB-->>InVoice: Full context loaded Note right of DB: Context:
โ€ข Voicemail at 13:30
โ€ข SMS at 13:30
โ€ข Email at 13:30
โ€ข Customer calling back InVoice->>OpenAI: Connect with full context OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Hi! Thanks for calling about your appointment today..." Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Yes, I can confirm. I'll be home at 4pm." OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Perfect! We'll see you between 4-5pm." OpenAI-->>InVoice: Call completed InVoice->>DB: Update: confirmed_via_callback, workflow_complete Note right of DB: Status: confirmed
Channel: voice_inbound Note over InVoice,DB: โœ… SUCCESS - Workflow Complete end

Flexibility: Customer can respond via ANY channel. If they call instead of texting, the inbound voice agent has complete context about all contact attempts.

Step 4: Customer Response via Email

Customer can respond to the email sent in Step 1. The inbound email agent processes their reply, analyzes the content, and has full context about all previous contact attempts. Customer may also choose to call back instead of replying via email.

sequenceDiagram
    participant Customer as ๐Ÿ‘ค Customer
    participant Graph as ๐Ÿ“ฎ MS Graph
    participant InEmail as ๐Ÿ“ฅ Inbound Email
    participant Twilio as ๐Ÿ“ฑ Twilio
    participant InVoice as ๐Ÿ“ฅ Inbound Voice
    participant DB as ๐Ÿ—„๏ธ MongoDB State
    participant OpenAI as ๐ŸŽค OpenAI Realtime

    Note over Customer,InEmail: Customer received: Voicemail + SMS + Email
    
    alt Customer Replies to Email
        Customer->>Graph: ๐Ÿ“ง Reply: "Yes, I'll be there"
        Graph->>InEmail: Incoming email webhook
        
        InEmail->>DB: Load communication state
        DB-->>InEmail: Full context loaded
        Note right of DB: Complete history:
โ€ข Voicemail at 13:30
โ€ข SMS at 13:30
โ€ข Email at 13:30
โ€ข Customer replied InEmail->>InEmail: Analyze email: confirmation detected InEmail->>Graph: Send auto-reply Graph->>Customer: ๐Ÿ“ง "Thank you! Confirmed for 4-5pm. Tech Tom will arrive with $89 trip charge." InEmail->>DB: Update: confirmed_via_email, workflow_complete Note right of DB: Status: confirmed
Channel: email_inbound
Response time: 35 min Note over InEmail,DB: โœ… SUCCESS - Workflow Complete else Customer Calls Instead of Email Reply Customer->>Twilio: ๐Ÿ“ž Calls company number Twilio->>InVoice: Incoming call InVoice->>DB: Load communication state DB-->>InVoice: Complete communication history Note right of DB: All channels contacted:
โ€ข Voicemail at 13:30
โ€ข SMS at 13:30
โ€ข Email at 13:30
โ€ข Customer calling back InVoice->>OpenAI: Connect with full context OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Thanks for calling back about your appointment today!" Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Yes, I confirm, I'll be there at 4pm." OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Perfect! We'll see you between 4-5pm." OpenAI-->>InVoice: Call completed InVoice->>DB: Update: confirmed_via_callback, workflow_complete Note right of DB: Status: confirmed
Channel: voice_inbound Note over InVoice,DB: โœ… SUCCESS - Workflow Complete end

Multi-Channel Advantage: Customer has multiple convenient ways to respond (voice, SMS, or email). Whichever channel they choose, the system has complete context and provides seamless confirmation.

Multi-Step Orchestration: Summary

Key Workflow Principles

  1. Parallel Multi-Channel Outreach: When customer doesn't answer, system immediately sends Voicemail + SMS + Email simultaneously
  2. Immediate Delivery:
    • Voicemail: Left during initial call attempt
    • SMS: Sent within seconds after voicemail
    • Email: Sent within seconds after voicemail
    • All messages delivered at same timestamp (~13:30)
  3. Context Preservation: Every agent has full history of all contact attempts across all channels (voicemail, SMS, email)
  4. Flexible Response: Customer chooses their preferred channel - Voice, SMS, or Email - workflow completes on first response
  5. Mid-Conversation Actions: During any call, agent can send SMS/Email with links, documents, or additional information in real-time
  6. Voicemail Intelligence: System detects voicemail via AI (beep sound, greeting patterns), leaves contextual message, triggers parallel outreach

๐ŸŽฏ Parallel Outreach

The Communication Orchestrator triggers parallel multi-channel communication (Voicemail + SMS + Email) simultaneously when customer doesn't answer, maximizing response opportunities and convenience.

๐Ÿ”„ Customer Choice

Customer chooses their preferred response channel - voice call back, SMS reply, or email response. The system maintains complete context regardless of which channel they select.

๐Ÿ“Š Real-Time State

MongoDB stores complete communication state including all contact attempts (voicemail, SMS, email) at same timestamp. Every interaction updates state in real-time for perfect context awareness across channels.

โฌ†๏ธ Back to Top

Voicemail Detection & Handling

AI Voice Intelligence

When customers don't answer calls, the Outbound Voice Agent uses Voice Activity DetectionDetects when a person starts/stops speaking to identify voicemail systems and leaves contextual messages. This ensures customers receive appointment information even when unavailable to answer.

๐ŸŽฏ Detection Capabilities

  • Audio Pattern Analysis: Agent analyzes audio patterns in real-time during the call
  • Greeting Detection: Distinguishes voicemail greeting from human voice responses
  • Beep Signal Identification: Recognizes voicemail beep to start leaving message
  • Timeout Detection: Identifies no response after greeting (indicates voicemail)

๐Ÿ’ฌ Message Leaving

  • Scenario-Based Messages: Leaves contextual voicemail messages tailored to the specific scenario
  • Essential Information: Includes callback number, appointment details, and urgency level
  • Optimal Duration: Messages are 15-30 seconds for maximum effectiveness
  • Natural Voice: Uses OpenAI Realtime API for natural-sounding speech

Example Voicemail Message:

"Hi, this is Jessica from Sears Home Services calling about your dishwasher appointment today between 4 and 5 PM. Please call us back at 1-800-4-MY-HOME to confirm. Thanks!"

๐Ÿ“Š Message Logging

  • Record Keeping: Stores the voicemail message sent for complete record-keeping
  • Tracking: Logs voicemail_left status, timestamp, message_type, and message_content
  • Context Availability: Available for all agents in future interactions for seamless customer experience
  • MongoDB Storage: Stored in communication state alongside SMS and email contact attempts

โš ๏ธ Error Handling

  • Detection Failure: If voicemail detection fails โ†’ log as "no_answer" for tracking
  • Parallel Outreach: SMS and Email already sent in parallel with voicemail (no escalation needed)
  • Workflow Management: Communication Orchestrator manages overall workflow timing and retry logic
  • State Tracking: All attempts tracked in MongoDB for complete visibility
โฌ†๏ธ Back to Top

Context Propagation Mechanics

State Management

The Conversation State Manager ensures seamless context flow across all channels using Communication StateComplete context of all customer interactions. When a customer is contacted via voicemail, SMS, and email simultaneously, all subsequent interactionsโ€”regardless of channelโ€”have complete visibility into previous contact attempts, enabling natural and personalized conversations.

๐Ÿ“‹ Communication State Schema

MongoDB stores complete communication state for every customer interaction:

{
  "conversation_id": "conv_abc123",
  "customer_id": "cust_xyz789",
  "communication_type": "pre_call_confirmation",
  "communication_state": "in_progress",
  
  "timeline": [
    {
      "timestamp": "2024-01-15T13:30:00Z",
      "channel": "voice_outbound",
      "action": "call_initiated",
      "outcome": "voicemail_left",
      "duration_seconds": 22,
      "message": "Left voicemail about 4-5pm appointment"
    },
    {
      "timestamp": "2024-01-15T13:30:00Z",
      "channel": "sms_outbound",
      "action": "sms_sent",
      "message": "Confirmation SMS sent",
      "content": "Hi Sarah, please confirm your 4-5pm..."
    },
    {
      "timestamp": "2024-01-15T13:30:00Z",
      "channel": "email_outbound",
      "action": "email_sent",
      "message": "Confirmation email sent",
      "subject": "Appointment Confirmation - Please Respond"
    },
    {
      "timestamp": "2024-01-15T15:45:00Z",
      "channel": "voice_inbound",
      "action": "customer_callback",
      "outcome": "confirmed",
      "duration_seconds": 95
    }
  ],
  
  "context": {
    "last_contact_channel": "voice_inbound",
    "last_contact_time": "2024-01-15T15:45:00Z",
    "attempts": {
      "voice": 2,
      "sms": 1,
      "email": 1
    },
    "customer_preferences": {
      "preferred_channel": "voice",
      "callback_number": "+1234567890"
    },
    "sentiment": "positive",
    "intent": "appointment_confirmed"
  },
  
  "appointment_data": {
    "order_number": "0008169-12407837",
    "scheduled_window": "16:00-17:00",
    "technician": "Tom",
    "service_type": "dishwasher_repair"
  }
}

๐Ÿ”„ Context Flow Between Channels

1๏ธโƒฃ Voicemail โ†’ Customer Callback

  • Inbound agent receives: Voicemail left timestamp, message content, all parallel outreach (SMS, email)
  • Opens conversation with: "Hi! Thanks for calling back. I see we reached out about your appointment today..."
  • Context awareness: Knows exact message left, time elapsed since voicemail, customer called back proactively

2๏ธโƒฃ SMS โ†’ Customer Reply or Call

  • If customer texts back: Inbound SMS agent loads complete history (voicemail + email also sent)
  • If customer calls instead: Inbound voice agent knows SMS was sent, customer chose voice over text
  • Tone adjustment: More helpful (customer took initiative), references SMS if relevant

3๏ธโƒฃ Email โ†’ Any Channel Response

  • All agents see: Email sent, subject line, timestamp, attachments included
  • Can reference: "I sent you the appointment details by email..."
  • Flexibility: Customer can reply via email or choose to call/text instead

4๏ธโƒฃ Mid-Call โ†’ Multi-Channel Actions

  • During voice call: Agent can trigger SMS (send link) or Email (send document) in real-time
  • Actions logged immediately: State updated in MongoDB as events occur
  • Customer sees: SMS or email arrives while still on the call

โšก Real-Time State Updates

  • Instant Updates: Every interaction (call, SMS, email) updates MongoDB state in real-time
  • Cross-Channel Access: All channel agents have immediate access to updated context
  • State Subscription: Agents can subscribe to state changes for real-time notifications
  • Communication Orchestrator: Monitors communication state for workflow management and timing decisions

๐ŸŽฏ Key Benefits

  • No Repeated Questions: Agents never ask "How can I help you?" when customer is calling back about known issue
  • Seamless Handoffs: Customer can switch channels mid-communication without losing context
  • Personalized Interactions: Every touchpoint references previous attempts and customer history
  • Complete Visibility: Support teams can see entire customer journey across all channels
โฌ†๏ธ Back to Top

Customer Callback to Reschedule Flow

sequenceDiagram
    participant Customer as ๐Ÿ‘ค Customer
    participant Twilio as ๐Ÿ“ฑ Twilio
    participant Webhook as ๐Ÿ”— Webhook
    participant InVoice as ๐Ÿ“ฅ Inbound Voice
    participant DB as ๐Ÿ—„๏ธ MongoDB
    participant KB as ๐Ÿ“š Knowledge Base
    participant OpenAI as ๐ŸŽค OpenAI Realtime
    participant SMS as ๐Ÿ’ฌ SMS Agent
    participant Email as ๐Ÿ“ง Email Agent

    Customer->>Twilio: ๐Ÿ“ž Calls company number
    Twilio->>Webhook: Incoming call webhook
    Webhook->>InVoice: Trigger inbound agent
    
    InVoice->>DB: Lookup by phone: +1234567890
    DB-->>InVoice: Customer ID: cust_xyz789
    
    InVoice->>DB: Get conversation history
    DB-->>InVoice: โ€ข AI called 2 hours ago for confirmation
โ€ข Left voicemail: "Appt tomorrow 2-4pm"
โ€ข Sent SMS + Email at same time
โ€ข Customer calling back InVoice->>KB: Search relevant context KB-->>InVoice: Rescheduling policies loaded InVoice->>OpenAI: Connect with full context Note right of OpenAI: Context includes:
โ€ข Confirmation call attempt
โ€ข Voicemail + SMS + Email sent
โ€ข Appt: Tomorrow 2-4pm
โ€ข Customer history + policies OpenAI->>Twilio: Ready for conversation Twilio->>Customer: โœ“ Connected OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Hi! Thanks for calling back. I see we reached out about confirming your appointment tomorrow 2-4pm..." Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Yes, I got your message. I need to reschedule - I can't make it tomorrow" OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "I can help! What date works better?" Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Next Thursday if possible" OpenAI->>DB: Check availability DB-->>OpenAI: Thursday 2-4pm available โœ“ OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Thursday 2-4pm is open, should I confirm?" Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "Perfect!" Note over OpenAI,SMS: Mid-call actions OpenAI->>SMS: Send confirmation SMS SMS->>Customer: ๐Ÿ“ฑ "Rescheduled to Thu 2-4pm. Confirmation details sent." OpenAI->>Email: Send confirmation email Email->>Customer: ๐Ÿ“ง "Appointment Confirmation Details" OpenAI->>Customer: ๐Ÿ—ฃ๏ธ "Sent confirmation via SMS and email. Anything else?" Customer->>OpenAI: ๐Ÿ—ฃ๏ธ "No, that's all. Thanks!" OpenAI-->>InVoice: Call complete InVoice->>DB: Update conversation Note right of DB: โ€ข Call duration: 1m 45s
โ€ข Outcome: Rescheduled
โ€ข Sentiment: Positive
โ€ข Next action: None
โฌ†๏ธ Back to Top

Customer Journey: Same-Day Service Timeline

Real example: Sarah's dishwasher repair with Home Manager Jessica and Technician Tom. Shows the complete flow from confirmation call to service completion.

๐Ÿ“… Day 1: Home Manager Confirmation Call (1:30 PM)

graph LR
    A[๐ŸŽง Jessica
Home Manager calls] --> B[๐Ÿ“ž Hi Sarah, about
your dishwasher today] B --> C[โœ“ Narrow window:
1-5pm โ†’ 4-5pm] C --> D[โœ“ Tech Tom
will arrive 4-5pm] D --> E[โœ“ Confirm: Adult 18+,
pets secured, payment] E --> F[โœ“ Promise: I'll call
when he's 30 min away] F --> G[๐Ÿ’ฌ Send confirmation
SMS immediately] G --> H[๐Ÿ’พ Log: confirmed,
customer prepared] style A fill:#e8f0f5,stroke:#1e4d4d,stroke-width:2px style C fill:#f5f0e8,stroke:#8b6f47,stroke-width:2px style G fill:#f5e8eb,stroke:#b76e79,stroke-width:2px style H fill:#e8f5f0,stroke:#2d5f4f,stroke-width:2px

Key: Home Manager Jessica (not the technician) makes this 4-minute call to confirm appointment, narrow the window, and prepare customer. Call duration: 4 min 10 sec.

๐Ÿ“… Same Day: "On the Way" Call (3:28 PM)

graph LR
    A[โฐ 30 minutes
before arrival] --> B[๐ŸŽง Jessica calls
Sarah again] B --> C[๐Ÿ“ž Tom finishing up,
heading your way] C --> D[๐Ÿ“ž Should arrive
around 4pm] D --> E[๐Ÿ‘ค Customer:
All set! Cat's ready] E --> F[๐Ÿ’พ Log: on_way_call
32 seconds] style A fill:#f5f0e8,stroke:#8b6f47,stroke-width:2px style B fill:#e8f0f5,stroke:#1a1f3a,stroke-width:2px style E fill:#e8f5f0,stroke:#2d5f4f,stroke-width:2px style F fill:#e8f5f0,stroke:#1e4d4d,stroke-width:2px

Key: Jessica (Home Manager) makes second brief call. Tech Tom does NOT call - he just drives to location. Call duration: 32 seconds.

๐Ÿ“… Same Day: Tech Arrival & Service (3:58 PM - 5:10 PM)

graph LR
    A[๐Ÿš— Tom arrives
NO CALL NEEDED] --> B[๐Ÿšช Knocks on
door] B --> C[๐Ÿ‘ค Sarah answers:
Hi, you must be Tom] C --> D[๐Ÿ”ง Tom: Thanks for
securing cat!] D --> E[๐Ÿ”ง Diagnoses:
Failed drain pump] E --> F[๐Ÿ’ต Quote: $285 total
$89 + $196] F --> G[โœ… Sarah approves
immediately] G --> H[๐Ÿ”ง Repair completed
tested working] H --> I[๐Ÿ’ณ Payment via
credit card] I --> J[๐Ÿ“ฑ Auto SMS follow-up:
How did we do?] style A fill:#e8f0f5,stroke:#1e4d4d,stroke-width:2px style E fill:#f5f0e8,stroke:#8b6f47,stroke-width:2px style G fill:#e8f5f0,stroke:#2d5f4f,stroke-width:2px style J fill:#f5e8eb,stroke:#b76e79,stroke-width:2px

Result: No confusion, no disputes, smooth process. Customer was prepared because Jessica handled all communications. Total service time: 1hr 12min.

๐ŸŽง Home Manager Role

AI embodies Jessica the Home Manager - makes ALL customer calls. Technicians NEVER call customers, they focus 100% on service delivery.

๐Ÿ“‹ 12 Communication Scenarios

Covers appointment confirmations, reminders, en route notifications, follow-ups, estimates, rescheduling, complaints, payments, parts delays, maintenance offers, referrals, and general inquiries.

โœ… Zero Confusion

Customer prepared before tech arrives: knows timing, payment ($89 applied), has pets secured. Result: smooth service, no disputes, faster completion.

โฌ†๏ธ Back to Top

12 Communication Scenarios

60+ Templates

The system handles all customer communications across the service lifecycle with 60+ pre-written templates optimized for voice, SMS, email, and web chat channels.

Core Service Communications

  1. Appointment Confirmations - Scheduled service confirmations with window narrowing
  2. Appointment Reminders - Day-before and day-of reminders
  3. Technician En Route - "On the way" notifications 30 minutes before arrival
  4. Service Follow-ups - Post-completion satisfaction checks and ratings

Service Management

  1. Estimate Approvals - Repair quotes and customer approval workflow
  2. Rescheduling - Appointment changes and availability management
  3. Parts Delays - Managing expectations around part availability
  4. Payment & Billing - Financial communications and trip charge explanations

Customer Care & Growth

  1. Complaints & Issues - Professional problem resolution protocols
  2. Maintenance Offers - Preventive maintenance upsells and protection agreements
  3. Referrals - Referral program messaging and incentives
  4. General Inquiries - Questions, information requests, and support
โฌ†๏ธ Back to Top

Deployment Architecture

graph TB
    subgraph internet["๐ŸŒ Internet"]
        Users["๐Ÿ‘ฅ Users
(Customers)"] Twilio["๐Ÿ“ฑ Twilio
Voice/SMS"] OpenAI["๐Ÿค– OpenAI
Realtime API"] MSGraph["๐Ÿ“ฎ Microsoft
Graph API"] end subgraph aws["โ˜๏ธ AWS Cloud"] subgraph network["VPC - Private Network"] ALB["โš–๏ธ Application Load Balancer
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. SSL/TLS Termination
2. Health Checks
3. Auto-scaling"] subgraph compute["Compute Layer (ECS/Fargate)"] API1["๐Ÿš€ FastAPI Instance 1
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. WebSocket Support
2. Async I/O
3. Agent Orchestration"] API2["๐Ÿš€ FastAPI Instance 2
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. WebSocket Support
2. Async I/O
3. Agent Orchestration"] API3["๐Ÿš€ FastAPI Instance N
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Auto-scaled
2. On-demand"] end subgraph cache["Caching Layer"] Redis["๐Ÿ”ด Redis
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Session Cache
2. Rate Limiting
3. Queue System"] end end S3["๐Ÿ“ฆ S3 Bucket
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Call Recordings
2. Attachments
3. Static Assets"] end subgraph database["๐Ÿ—„๏ธ Database Layer"] MongoDB["๐Ÿƒ MongoDB Atlas
(Managed Cluster)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Conversations
2. Customers
3. Interactions
4. Context
5. Analytics
6. Knowledge_base"] end subgraph monitoring["๐Ÿ“Š Monitoring & Logging"] CloudWatch["๐Ÿ“ˆ AWS CloudWatch
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Logs
2. Metrics
3. Alarms"] Grafana["๐Ÿ“Š KAIros Pulse
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
1. Dashboard
2. Visualization"] end Users -->|HTTPS| ALB Twilio -->|Webhooks| ALB ALB --> API1 ALB --> API2 ALB --> API3 API1 <--> Redis API2 <--> Redis API3 <--> Redis API1 --> MongoDB API2 --> MongoDB API3 --> MongoDB API1 --> S3 API2 --> S3 API3 --> S3 API1 <-->|WebSocket| OpenAI API2 <-->|WebSocket| OpenAI API1 <-->|REST API| Twilio API2 <-->|REST API| Twilio API1 <-->|REST API| MSGraph API2 <-->|REST API| MSGraph API1 --> CloudWatch API2 --> CloudWatch API3 --> CloudWatch CloudWatch --> Grafana style internet fill:#e8f0f5,stroke:#1e4d4d,stroke-width:2px style aws fill:#f5f0e8,stroke:#8b6f47,stroke-width:3px style network fill:#f0e8f0,stroke:#4a2c5c,stroke-width:2px style compute fill:#e8f5f0,stroke:#2d5f4f,stroke-width:2px style cache fill:#f5e8eb,stroke:#6b2c3e,stroke-width:2px style database fill:#f5f0e0,stroke:#c9a961,stroke-width:3px style monitoring fill:#e8f0f0,stroke:#1e4d4d,stroke-width:2px
โฌ†๏ธ Back to Top

๐Ÿ“š Technical Glossary

Click on any term to expand and view its definition, purpose, and technical details.

TwiML (Twilio Markup Language)

โ–ผ

Definition: XML-based language for controlling phone calls and SMS messages in Twilio.

Purpose: Instructs Twilio on how to handle incoming/outgoing calls and messages.

In This System:

  • Connects phone calls to WebSocket streams for OpenAI Realtime API
  • Controls call flow and routing
  • Manages voicemail recording and playback

WebSocket

โ–ผ

Definition: Full-duplex communication protocol for real-time bidirectional data transfer.

Purpose: Enables persistent, low-latency connection between server and client.

In This System:

  • Streams audio between Twilio and OpenAI Realtime API
  • Enables real-time voice conversations (~300-500ms latency)
  • Server-to-server audio streaming (not WebRTC)

OpenAI Realtime API

โ–ผ

Definition: Low-latency speech-to-speech conversational AI API powered by GPT-4o.

Purpose: Enables natural voice conversations with AI without separate STT/TTS.

Key Features:

  • Ultra-low latency (~300-500ms response time)
  • Voice Activity Detection (VAD) for natural turn-taking
  • Function calling during live conversations
  • Support for interruptions and natural speech patterns
  • 6 preset voices available

MongoDB Atlas

โ–ผ

Definition: Cloud-hosted NoSQL database service for flexible data storage.

Purpose: Stores conversation state, customer data, and interaction history.

In This System:

  • Complete communication state management
  • Real-time updates on every interaction
  • Cross-channel context preservation
  • Customer profiles and preferences

Function Calling

โ–ผ

Definition: OpenAI capability that allows AI to invoke external functions/tools.

Purpose: Enables AI to take actions like sending SMS, emails, or updating databases.

In This System:

  • Mid-call actions (send SMS/email during voice conversation)
  • Database queries and updates
  • Calendar/scheduling operations
  • Multi-channel orchestration

VAD (Voice Activity Detection)

โ–ผ

Definition: Technology that detects when a person starts and stops speaking.

Purpose: Enables natural conversation flow without explicit turn-taking signals.

In This System:

  • Automatically detects when customer finishes speaking
  • AI knows when to respond without delays
  • Supports natural interruptions
  • Part of OpenAI Realtime API

Communication Orchestrator

โ–ผ

Definition: Central agent that manages multi-step workflows across channels.

Purpose: Coordinates timing, sequencing, and routing of communications.

Responsibilities:

  • Trigger parallel multi-channel outreach (voicemail + SMS + email)
  • Manage workflow timing and retry logic
  • Route responses to appropriate channel agents
  • Track workflow completion

Communication State

โ–ผ

Definition: Complete context of all customer interactions across all channels.

Purpose: Ensures seamless experience regardless of which channel customer uses.

Includes:

  • Full timeline of all contact attempts
  • Customer preferences and sentiment
  • Appointment/service details
  • Next planned actions
  • Real-time updates across all agents

Microsoft Graph API

โ–ผ

Definition: Unified API for accessing Microsoft 365 services including Outlook email.

Purpose: Send and receive emails through Microsoft/Outlook infrastructure.

In This System:

  • Send appointment confirmations and reminders
  • Attach service agreements and documents
  • Process email replies from customers

AWS CloudWatch

โ–ผ

Definition: Amazon Web Services monitoring and observability platform.

Purpose: Track system performance, logs, and metrics in real-time.

In This System:

  • Monitor Lambda function performance
  • Track API response times and errors
  • Centralized logging for all services
  • Alerts for system issues

KAIros Pulse

โ–ผ

Definition: Custom dashboard for visualizing system metrics and performance.

Purpose: Provide real-time visibility into AI agent operations and customer interactions.

Displays:

  • Call volume and success rates
  • Channel usage patterns
  • Customer sentiment trends
  • System health metrics

Media Streams

โ–ผ

Definition: Twilio feature for streaming live audio from phone calls.

Purpose: Enable real-time audio processing and AI voice interactions.

In This System:

  • Stream call audio to OpenAI Realtime API via WebSocket
  • Bidirectional audio flow (customer โ†” AI)
  • ฮผ-law or linear16 audio encoding
โฌ†๏ธ Back to Top