D
DeepLearningAI
#Voice AI#Vocal Bridge#AI Agents

Vocal Bridge: Build Voice AI UIs and Agents in Minutes

Vocal Bridge is a fully managed Voice-AI platform that simplifies the integration of voice interfaces into applications and AI agents. It allows developers to add voice capabilities to their apps, empower AI agents with voice, and enable agents to use voice as a tool for real-world interactions, significantly reducing development time.

5 min readAI Guide

Introduction

Introduction
Vocal Bridge is a fully managed Voice-AI platform that enables developers to add voice interfaces to their applications and AI agents. It simplifies the complex process of integrating speech-to-text, text-to-speech, and advanced dialogue management, allowing for rapid development of voice-enabled products.

Configuration Checklist

Element Version / Link
Language / Runtime Node.js (React), Python
Main libraries @vocalbridgeai/sdk, @vocalbridgeai/bridges/react, vocal-bridge (Python CLI)
Required APIs Vocal Bridge API Key
Keys / credentials needed Vocal Bridge API Key (for minting session tokens)

Step-by-Step Guide

Step-by-Step Guide

Step 0 — Install Vocal Bridge SDKs

Before you begin, install the necessary Vocal Bridge SDKs for your React application and the Python CLI for agent configuration.

npm install @vocalbridgeai/sdk @vocalbridgeai/bridges/react
pip install vocal-bridge

Step 1 — Wrap your app

Integrate the VocalBridgeProvider into your React application's root component. This provider establishes the connection to the Vocal Bridge platform and enables voice capabilities throughout your app. You'll need to point it to a server-side API endpoint that can mint Vocal Bridge tokens using your API key.

import { VocalBridgeProvider } from '@vocalbridgeai/bridges/react';

// [Editor's note: The actual API endpoint for token minting is not shown, but implied]
// You would typically have a server-side endpoint like '/api/token' that returns a session token.
// Never expose your API key to the frontend.

export default function App({ children }) {
  return (
    <VocalBridgeProvider
      options={{ token: "TopSecretToken" }} // Replace with actual token fetching logic
    >
      {children}
    </VocalBridgeProvider>
  );
}

Step 2 — Wire actions both ways

## Step 2 — Wire actions both ways
This is the core step for enabling bidirectional communication between your voice agent and your application's UI. Vocal Bridge provides hooks to handle actions initiated by the agent and to send actions from your application back to the agent.

import { useAgent } from '@vocalbridgeai/bridges/react';

export function MyGameComponent() {
  const { onAction, sendAction } = useAgent();

  // When the agent decides to play (e.g., in Tic-Tac-Toe)
  onAction('place_mark', (res) => {
    // res.payload will contain { row, col } from the agent
    // Update your UI based on the agent's move
    setBoard(prevBoard => updateBoard(prevBoard, res.payload.row, res.payload.col, 'O'));
  });

  // When the user clicks a cell, send this information back to the agent
  const handleUserMove = (row, col) => {
    // Update local UI first
    setBoard(prevBoard => updateBoard(prevBoard, row, col, 'X'));
    // Send action to the agent, including the current board state
    sendAction('user_place_mark', { row, col, board: getBoardState() });
  };

  return (
    // Your React UI for the game, with onClick handlers calling handleUserMove
    <button onClick={() => handleUserMove(0, 0)}>Top Left</button>
  );
}

Step 3 — Describe actions to the agent

To enable the voice agent to understand and interact with your application's actions, you need to provide a declarative JSON schema. This schema tells the agent about the available actions, their direction (agent to app or app to agent), and the expected payload.

// actions.json
[
  {
    "name": "place_mark",
    "direction": "agent_to_app",
    "description": "Agent decides to place a mark. Payload (row, col)",
    "behavior": "respond"
  },
  {
    "name": "user_place_mark",
    "direction": "app_to_agent",
    "description": "User placed a cell. Payload (row, col, board)",
    "behavior": "ingest"
  }
]

Configure the agent using the CLI:

vb push prompt --file prompt.md
vb push config set --file client-side-actions.json

Give your agent a voice (AI Agent Mode)

To enable your voice agent to intelligently handle conversations and delegate complex queries to your LLM, configure its behavior using the CLI. The agent description helps Vocal Bridge understand when to delegate a query to your LLM versus handling small talk itself.

vb config set \
  --ai-agent.enabled true \
  --ai-agent.description "Your flash-powered expert"

Forward every query to your LLM:
On the client side, use the useAgent hook's onQuery callback to forward user queries to your LLM's endpoint. Vocal Bridge will then decide how to communicate the LLM's response back to the user conversationally.

import { useAgent } from '@vocalbridgeai/bridges/react';

export default function MyAgentComponent() {
  const { onQuery, speak, listen } = useAgent();

  onQuery(async (question) => {
    // Forward the question to your LLM (e.g., Claude, GPT, or a custom model)
    const response = await fetch('/api/your-llm-endpoint', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ question }),
    });
    const data = await response.json();
    // Return the LLM's text response to Vocal Bridge
    return data.answer;
  });

  return (
    // Your chat interface, with a button to trigger listening
    <button onClick={listen}>Speak</button>
  );
}

Voice as a Function Call (Outbound Calling)

Vocal Bridge allows your AI agent to make real phone calls by defining a tool schema and configuring an outbound agent. This enables truly multimodal agents that can interact with the real world.

Declare the tool:
Define a JSON schema for the make_phone_call tool, specifying the required parameters like phone_number, name, and purpose.

// tool_schema.js
export const makePhoneCallTool = {
  name: "make_phone_call",
  description: "Place an outbound phone call via Vocal Bridge.",
  input_schema: {
    type: "object",
    properties: {
      phone_number: { type: "string", description: "E.164 format" },
      name: { type: "string", description: "Name of the person to call" },
      purpose: { type: "string", description: "Reason for the call" }
    },
    required: ["phone_number"]
  }
};

On tool-use, shell out:
When the LLM decides to use the make_phone_call tool, Vocal Bridge will execute the corresponding command-line interface (CLI) call.

// tool_use_handler.js
import { exec } from 'child_process';

export async function runMakePhoneCallTool(client, toolCall) {
  const { phone_number, name, purpose } = toolCall.input;
  // Execute the VB CLI command to make the call
  const command = `vb call --phone-number "${phone_number}" --name "${name}" --purpose "${purpose}"`;
  return new Promise((resolve, reject) => {
    exec(command, (error, stdout, stderr) => {
      if (error) {
        console.error(`exec error: ${error}`);
        return reject(error);
      }
      console.log(`stdout: ${stdout}`);
      console.error(`stderr: ${stderr}`);
      resolve({ status: "Call initiated" });
    });
  });
}

Configure the outbound agent:
Set up your outbound agent's greeting and ensure terms of service are accepted for compliance.

vb config set \
  --outbound.enabled true \
  --outbound.accept-tos true \
  --outbound.greeting "Hi, this is an AI calling from home."

⚠️ Common Mistakes & Pitfalls

  • Manual Voice Stack Wiring: Attempting to manually integrate and manage all components of a voice AI stack (ASR, TTS, VAD, RTC, etc.) leads to significant development overhead and complexity. Vocal Bridge abstracts this complexity into a single platform.
  • LLM Overload with Conversation State: Relying on the LLM to manage the entire conversation state, including small talk and context, can crowd its context window and distract it from its primary task. Vocal Bridge's agent mode intelligently handles conversational flow, delegating only necessary queries to the LLM.
  • Exposing API Keys: Directly embedding your Vocal Bridge API key in client-side code is a security risk. Always use a server-side endpoint to mint session tokens, keeping your API key secure.

Glossary

  • Voice Activity Detection (VAD): A technique used to detect the presence or absence of human speech in an audio signal.
  • Prosody: The rhythm, stress, and intonation of speech, which convey meaning and emotion beyond the literal words.
  • Barge-in Handling: The ability of a voice AI system to detect when a user interrupts its speech and respond appropriately, creating a more natural conversational flow.
  • LLM (Large Language Model): A type of artificial intelligence model trained on vast amounts of text data to understand, generate, and respond to human language.

Key Takeaways

  • Vocal Bridge provides a fully managed platform to integrate voice into applications and AI agents with minimal code.
  • It supports three main interfaces: voice for your app, voice for your agent, and voice as a tool (function calls).
  • The platform handles complex underlying voice AI infrastructure, including STT, TTS, ASR, VAD, and real-time communication (RTC).
  • Declarative JSON schemas are used to define and configure client actions, simplifying bidirectional communication between the agent and the application UI.
  • Vocal Bridge agents can intelligently decide when to handle small talk internally and when to delegate complex queries to an external LLM, optimizing context window usage.
  • The "Voice as a Tool" feature allows AI agents to perform real-world actions like making phone calls, acting as a truly multimodal assistant.
  • The Python CLI and React SDK facilitate quick setup and configuration, drastically reducing development time from months to minutes.

Resources

  • Vocal Bridge Website: vocalbridgeai.com
  • NPM Package: npm install @vocalbridgeai/sdk @vocalbridgeai/bridges/react
  • PyPI Package: pip install vocal-bridge
  • Promo Code: AIDEV26 (for $20/month + 5000 minutes free for the first three months, expires Fri, Aug 1, 2026, 12:09:49 PM PDT)
  • Support: support@vocalbridgeai.com