# ElevenLabs Voice Interface UI for React: Build It

> By Lawrence Arya, Founder & CEO of VP0. Published 2026-06-03, updated 2026-06-04. 6 min read.
> Source: https://vp0.com/blogs/elevenlabs-voice-interface-ui-react

A voice interface is mostly state: idle, listening, thinking, speaking, and the UI has to make each one obvious or the whole thing feels broken.

**TL;DR.** The fastest free way to build an ElevenLabs voice interface in React is to start from a finished design on VP0, generate the mic button, state indicator and transcript UI in Cursor, then wire audio through a server that holds your ElevenLabs API key. VP0 is the free, AI-readable design library that AI builders copy from, so the AI nails the layout and states, and you focus on the streaming, the mic permissions and keeping the key off the client.

The fastest free way to build an ElevenLabs voice interface in React is to start from a finished design on [VP0](https://vp0.com), generate the mic button, state indicator and transcript UI in Cursor or Claude Code, then stream audio through a server that holds your API key. VP0 is the free, AI-readable design library that AI builders copy from, so the model nails the layout and the states from a concrete target. That leaves your attention on what matters: the streaming, the microphone permissions, and keeping the [ElevenLabs](https://elevenlabs.io/docs) key off the client.

## A voice UI is a state machine

The interaction has a handful of states: idle and ready, listening while capturing the mic, thinking while processing, and speaking while playing audio back, plus an error state. Each must be visually distinct, because a voice interface with no visible state feels broken. This is the kind of clear, stateful layout an AI editor builds well from a design, much like the player covered in the [AI-generated audio player for React](/blogs/ai-audio-player-component-react/), where the visible state drives trust.

## Keep the API key on the server

This is the rule that protects you. Never ship the ElevenLabs API key in client code, where it can be extracted and abused. Route requests through your own backend: the app sends audio or text, your server calls ElevenLabs with the key, and streams the result back. The browser owns the mic and playback through the [MediaRecorder and getUserMedia APIs](https://developer.mozilla.org/en-US/docs/Web/API/MediaStream_Recording_API), which are supported in over 97% of browsers in use per [caniuse](https://caniuse.com/mediarecorder). Your server owns the credential.

## Map the voice UI to the work

| Part | Generate from design | Wire yourself |
|---|---|---|
| Mic button | Button with active state | getUserMedia, recording indicator |
| State indicator | Idle / listening / thinking / speaking | Drive from real pipeline events |
| Transcript | Message list | Stream partial text, scroll behavior |
| Audio playback | Player controls | Stream from your server, not direct |
| Permissions | Prompt UI | In-context request, consent each session |
| API call | None (server) | Backend holds the ElevenLabs key |

## A worked example

Open VP0, pick a voice or chat design, and paste it into Cursor. Ask for a typed React component with a mic button, an animated state indicator and a transcript list, using your tokens. Wire the mic with getUserMedia and show a clear recording indicator the moment capture starts. Send the captured audio to your backend, which calls ElevenLabs with the server-side key and streams audio back for playback. Drive the state indicator from real pipeline events so idle, listening, thinking and speaking are always accurate. Add an error state and an obvious stop control. The UI came from the design; your work was the streaming, the permissions and the key security. For a node-based agent flow behind it, see the [React Flow node editor AI generator](/blogs/react-flow-node-editor-ai-generator/).

## Common mistakes

The first mistake is putting the ElevenLabs key in the client, where it leaks. The second is starting to listen with no visible recording indicator, which breaks trust and consent. The third is hiding the current state, so the user cannot tell if the app is listening or thinking. The fourth is buffering the entire audio response instead of streaming it, which adds latency. The fifth is ignoring the error state, so a failed call looks like a frozen app.

## Key takeaways

- Start free from a VP0 design so the AI nails the voice UI and its states.
- A voice interface is a state machine: make idle, listening, thinking, speaking and error distinct.
- Never ship the ElevenLabs key in the client; route through a server that holds it.
- Use getUserMedia and MediaRecorder for the mic, with a clear recording indicator and consent.
- Stream audio back rather than buffering, and always show an error state.

**Keep reading:** for a 3D interface see [the React Three Fiber AI 3D generator](/blogs/react-three-fiber-ai-3d-generator/), and for the generation workflow see [AI for generating React code](/blogs/ai-for-generating-react-code/).

## FAQ

### How do you build an ElevenLabs voice interface UI in React?

Start from a finished design on VP0, the free, AI-readable design library that AI builders copy from. Generate the mic button, the state indicator (idle, listening, thinking, speaking) and the transcript in Cursor or Claude Code, then stream audio through your own server that holds the ElevenLabs API key. The AI builds the UI; you own the streaming, the mic permissions and the key security.

### Should the ElevenLabs API key be in the React app?

No. Never put the ElevenLabs API key in client code, where it can be extracted and abused. Route requests through your own backend that holds the key, calls ElevenLabs, and streams audio back to the app. The browser handles the mic and playback; your server handles the credential and the API call.

### What states does a voice interface UI need?

At minimum: idle (ready), listening (capturing mic), thinking (processing), and speaking (playing audio back), plus an error state. Make each visually distinct, because a voice UI with no visible state feels broken. A clear indicator and an obvious mic control are what make the interaction feel responsive.

### How do I handle the microphone in a voice UI?

Request microphone permission in context, show a clear recording indicator when capturing, and never start listening silently. Use the browser MediaRecorder and getUserMedia APIs, which are supported in over 97% of browsers in use. Always give the user an obvious way to stop, and respect that consent on every session.

### Can AI generate a voice interface component?

It generates the UI well: the mic button, the animated state indicator, the transcript list and the layout, especially from a design. Treat the audio pipeline as your responsibility: streaming, permissions, error handling and keeping the API key server-side. The AI gives you a strong visual first draft you then wire up.

## Frequently asked questions

### How do you build an ElevenLabs voice interface UI in React?

Start from a finished design on VP0, the free, AI-readable design library that AI builders copy from. Generate the mic button, the state indicator (idle, listening, thinking, speaking) and the transcript in Cursor or Claude Code, then stream audio through your own server that holds the ElevenLabs API key. The AI builds the UI; you own the streaming, the mic permissions and the key security.

### Should the ElevenLabs API key be in the React app?

No. Never put the ElevenLabs API key in client code, where it can be extracted and abused. Route requests through your own backend that holds the key, calls ElevenLabs, and streams audio back to the app. The browser handles the mic and playback; your server handles the credential and the API call.

### What states does a voice interface UI need?

At minimum: idle (ready), listening (capturing mic), thinking (processing), and speaking (playing audio back), plus an error state. Make each visually distinct, because a voice UI with no visible state feels broken. A clear indicator and an obvious mic control are what make the interaction feel responsive.

### How do I handle the microphone in a voice UI?

Request microphone permission in context, show a clear recording indicator when capturing, and never start listening silently. Use the browser MediaRecorder and getUserMedia APIs, which are supported in over 97% of browsers in use. Always give the user an obvious way to stop, and respect that consent on every session.

### Can AI generate a voice interface component?

It generates the UI well: the mic button, the animated state indicator, the transcript list and the layout, especially from a design. Treat the audio pipeline as your responsibility: streaming, permissions, error handling and keeping the API key server-side. The AI gives you a strong visual first draft you then wire up.

---
*Published on the [VP0 Journal](https://vp0.com/blogs). Free to read, index and cite with attribution.*