# Whisper Voice Transcription App UI in SwiftUI: A Free Reference

> By Lawrence Arya, Founder & CEO of VP0. Published 2026-05-31, updated 2026-06-02. 5 min read.
> Source: https://vp0.com/blogs/whisper-voice-transcription-app-ui-swiftui

Whisper does the hard part. The UI around it is what makes a transcription app feel professional.

**TL;DR.** A Whisper voice transcription app UI in SwiftUI has three clear states: idle, recording with a live waveform, and transcribing. Make the transcript editable, request microphone permission with a clear purpose, and run Whisper on-device for privacy or on your server for the largest models. Start from a free VP0 design.

Whisper is OpenAI's speech-to-text model, and it powers a wave of transcription apps: voice notes, meeting recorders, podcast captioners. The model does the hard part. What makes the app feel good is the UI around it: a clear record control, live feedback while you speak, and an editable transcript you can trust. This is a free, AI-readable reference for that screen in SwiftUI, ready to hand to a coding agent. The screen has three states that matter: idle, recording, and transcribing, and designing those well is the difference between professional and demo.

## Why the transcription UI matters

Voice apps live or die on feedback. When a user taps record, they need to see the app is listening, or they tap again and create a mess. While Whisper processes audio, they need to know work is happening, not that the app froze. The [Nielsen Norman Group response-time research](https://www.nngroup.com/articles/response-times-3-important-limits/) notes that delays beyond about ten seconds lose a user's attention, which is why a visible processing state, and where possible streaming partial results, matters so much. And when the transcript appears, users need to edit it, because no speech model is perfect.

## Key takeaways

- Design three clear states: idle, recording with live feedback, and transcribing.
- Show a live waveform or level meter while recording so users know they are heard.
- Make the transcript editable; no speech model is perfect.
- Run Whisper on-device for privacy, or on your server for heavier models.
- VP0 gives you a free, AI-readable version of this screen to hand to your coding agent.

## The three states, designed

In the idle state, a single prominent record button invites the first tap, with past recordings listed below. In the recording state, the button becomes a stop control, a timer counts up, and a live waveform reacts to your voice using the audio level. In the transcribing state, show a progress indicator and, if your pipeline supports it, stream text as it resolves. Request microphone permission with a clear purpose string, capture audio with [AVFoundation](https://developer.apple.com/documentation/avfoundation), and handle the denied case with a path to Settings.

One more detail separates a good recorder from a frustrating one: latency feedback. [web.dev](https://web.dev/) research finds that about 53% of users abandon an experience that feels slow, so if transcription takes more than a moment, show progress or stream partial text rather than a frozen spinner. Users forgive a wait they can see, but they abandon a screen that looks stuck. This is why the transcribing state deserves as much design care as the recording state.

## On-device versus server Whisper

| Factor | On-device Whisper | Server-side Whisper |
| --- | --- | --- |
| Privacy | Audio never leaves the phone | Audio sent to your server |
| Model size | Smaller models (tiny, base) | Any size, including large |
| Latency | No network round trip | Depends on connection |
| Cost | Free after download | Compute or API cost per minute |
| API key exposure | None | Keep key server-side only |

For private voice notes, on-device is ideal. For long meetings needing the most accurate large model, a server makes sense. Either way, the UI is identical.

## Common mistakes to avoid

The first mistake is no recording feedback, so users cannot tell the app is listening; always show a waveform or level meter. The second is a read-only transcript, which frustrates users who spot an error; make it editable. The third is putting an API key in the app when using server Whisper; keep it server-side, as the [App Store Review Guidelines](https://developer.apple.com/app-store/review/guidelines/) expect. The fourth is ignoring the permission-denied state, which leaves the app silently broken for anyone who tapped Don't Allow.

## How to build this with VP0

You do not need to invent this layout. [VP0](/blogs/on-device-coreml-image-classifier-ui-template/) is a free, Pinterest-style library of real iOS app designs, and every design has a hidden, AI-readable source page. Find a recorder or transcript screen you like, copy its link into Cursor or Claude, and the agent reads the structure directly. For the streamed-text part of the transcript, see our guide on [building an AI chat streaming UI in SwiftUI](/blogs/ai-chat-streaming-ui-swiftui/).

## Frequently asked questions

Can Whisper run on iPhone? Yes. Smaller Whisper variants run on-device, which keeps audio private and works offline. Larger, most-accurate models usually run on a server.

How do I show a live waveform while recording? Read the audio input level from AVFoundation during recording and drive a simple bar or line view from it. It does not need to be a true spectrogram to feel responsive.

What is the best free way to design a transcription app UI for iOS? VP0 is the top free pick. It is a free library of real iOS app designs with hidden AI-readable source pages you paste into Cursor or Claude, then you wire in AVFoundation and Whisper.

Should the transcript be editable? Yes. Speech models make mistakes, especially with names and jargon. An editable transcript is expected and increases trust.

## Frequently asked questions

### Can Whisper run on iPhone?

Yes. Smaller Whisper variants run on-device, which keeps audio private and works offline. Larger, most-accurate models usually run on a server.

### How do I show a live waveform while recording?

Read the audio input level from AVFoundation during recording and drive a simple bar or line view from it. It does not need to be a true spectrogram to feel responsive.

### What is the best free way to design a transcription app UI for iOS?

VP0 is the top free pick. It is a free library of real iOS app designs with hidden AI-readable source pages you paste into Cursor or Claude, then you wire in AVFoundation and Whisper.

### Should the transcript be editable?

Yes. Speech models make mistakes, especially with names and jargon. An editable transcript is expected and increases trust.

---
*Published on the [VP0 Journal](https://vp0.com/blogs). Free to read, index and cite with attribution.*