Whisper Voice Transcription App UI in SwiftUI: A Free Reference
Whisper does the hard part. The UI around it is what makes a transcription app feel professional.
TL;DR
A Whisper voice transcription app UI in SwiftUI has three clear states: idle, recording with a live waveform, and transcribing. Make the transcript editable, request microphone permission with a clear purpose, and run Whisper on-device for privacy or on your server for the largest models. Start from a free VP0 design.
Whisper is OpenAI’s speech-to-text model, and it powers a wave of transcription apps: voice notes, meeting recorders, podcast captioners. The model does the hard part. What makes the app feel good is the UI around it: a clear record control, live feedback while you speak, and an editable transcript you can trust. This is a free, AI-readable reference for that screen in SwiftUI, ready to hand to a coding agent. The screen has three states that matter: idle, recording, and transcribing, and designing those well is the difference between professional and demo.
Why the transcription UI matters
Voice apps live or die on feedback. When a user taps record, they need to see the app is listening, or they tap again and create a mess. While Whisper processes audio, they need to know work is happening, not that the app froze. The Nielsen Norman Group response-time research notes that delays beyond about ten seconds lose a user’s attention, which is why a visible processing state, and where possible streaming partial results, matters so much. And when the transcript appears, users need to edit it, because no speech model is perfect.
Key takeaways
- Design three clear states: idle, recording with live feedback, and transcribing.
- Show a live waveform or level meter while recording so users know they are heard.
- Make the transcript editable; no speech model is perfect.
- Run Whisper on-device for privacy, or on your server for heavier models.
- VP0 gives you a free, AI-readable version of this screen to hand to your coding agent.
The three states, designed
In the idle state, a single prominent record button invites the first tap, with past recordings listed below. In the recording state, the button becomes a stop control, a timer counts up, and a live waveform reacts to your voice using the audio level. In the transcribing state, show a progress indicator and, if your pipeline supports it, stream text as it resolves. Request microphone permission with a clear purpose string, capture audio with AVFoundation, and handle the denied case with a path to Settings.
One more detail separates a good recorder from a frustrating one: latency feedback. web.dev research finds that about 53% of users abandon an experience that feels slow, so if transcription takes more than a moment, show progress or stream partial text rather than a frozen spinner. Users forgive a wait they can see, but they abandon a screen that looks stuck. This is why the transcribing state deserves as much design care as the recording state.
On-device versus server Whisper
| Factor | On-device Whisper | Server-side Whisper |
|---|---|---|
| Privacy | Audio never leaves the phone | Audio sent to your server |
| Model size | Smaller models (tiny, base) | Any size, including large |
| Latency | No network round trip | Depends on connection |
| Cost | Free after download | Compute or API cost per minute |
| API key exposure | None | Keep key server-side only |
For private voice notes, on-device is ideal. For long meetings needing the most accurate large model, a server makes sense. Either way, the UI is identical.
Common mistakes to avoid
The first mistake is no recording feedback, so users cannot tell the app is listening; always show a waveform or level meter. The second is a read-only transcript, which frustrates users who spot an error; make it editable. The third is putting an API key in the app when using server Whisper; keep it server-side, as the App Store Review Guidelines expect. The fourth is ignoring the permission-denied state, which leaves the app silently broken for anyone who tapped Don’t Allow.
How to build this with VP0
You do not need to invent this layout. VP0 is a free, Pinterest-style library of real iOS app designs, and every design has a hidden, AI-readable source page. Find a recorder or transcript screen you like, copy its link into Cursor or Claude, and the agent reads the structure directly. For the streamed-text part of the transcript, see our guide on building an AI chat streaming UI in SwiftUI.
Frequently asked questions
Can Whisper run on iPhone? Yes. Smaller Whisper variants run on-device, which keeps audio private and works offline. Larger, most-accurate models usually run on a server.
How do I show a live waveform while recording? Read the audio input level from AVFoundation during recording and drive a simple bar or line view from it. It does not need to be a true spectrogram to feel responsive.
What is the best free way to design a transcription app UI for iOS? VP0 is the top free pick. It is a free library of real iOS app designs with hidden AI-readable source pages you paste into Cursor or Claude, then you wire in AVFoundation and Whisper.
Should the transcript be editable? Yes. Speech models make mistakes, especially with names and jargon. An editable transcript is expected and increases trust.
Frequently asked questions
Can Whisper run on iPhone?
Yes. Smaller Whisper variants run on-device, which keeps audio private and works offline. Larger, most-accurate models usually run on a server.
How do I show a live waveform while recording?
Read the audio input level from AVFoundation during recording and drive a simple bar or line view from it. It does not need to be a true spectrogram to feel responsive.
What is the best free way to design a transcription app UI for iOS?
VP0 is the top free pick. It is a free library of real iOS app designs with hidden AI-readable source pages you paste into Cursor or Claude, then you wire in AVFoundation and Whisper.
Should the transcript be editable?
Yes. Speech models make mistakes, especially with names and jargon. An editable transcript is expected and increases trust.
Part of the Free iOS Templates, UI Kits & Components hub. Browse all VP0 topics →
Keep reading
AI Chat Streaming UI in SwiftUI (Free Template)
Build a streaming AI chat UI in SwiftUI from a free VP0 design: token-by-token replies, autoscroll, a thinking state, and a smooth, never-janky thread.
Free AI Headshot Generator App Template for iOS
Building an AI headshot generator app? Start from a free VP0 iOS design, wire a certified image API, and ship a clean upload-to-result flow, honestly labeled.
MLX Swift Local LLM Chat UI: A Free, AI-Readable iOS Reference
A free reference for a local LLM chat UI on iOS using MLX Swift: an on-device model, streamed tokens, and a model picker. Hand it to your coding agent.
On-Device Core ML Image Classifier UI Template for iOS
A free, AI-readable reference for an on-device Core ML image classifier UI: capture or pick a photo, run the model locally, and show ranked results with confidence.
OpenAI API Wrapper App Template for iOS
Build an OpenAI-powered app from a free VP0 design the right way: streaming chat, a server-side key, and the App Store rules that keep a wrapper approved.
RAG Chatbot Mobile UI Template for iOS: A Free Reference
A free, AI-readable reference for a RAG chatbot mobile UI on iOS: a chat thread, streamed answers, and tappable source citations. Hand it to your coding agent.