Whisper Voice Transcription App UI in SwiftUI: A Free Reference

Whisper is OpenAI’s speech-to-text model, and it powers a wave of transcription apps: voice notes, meeting recorders, podcast captioners. The model does the hard part. What makes the app feel good is the UI around it: a clear record control, live feedback while you speak, and an editable transcript you can trust. This is a free, AI-readable reference for that screen in SwiftUI, ready to hand to a coding agent. The screen has three states that matter: idle, recording, and transcribing, and designing those well is the difference between professional and demo.

Why the transcription UI matters

Voice apps live or die on feedback. When a user taps record, they need to see the app is listening, or they tap again and create a mess. While Whisper processes audio, they need to know work is happening, not that the app froze. The Nielsen Norman Group response-time research notes that delays beyond about ten seconds lose a user’s attention, which is why a visible processing state, and where possible streaming partial results, matters so much. And when the transcript appears, users need to edit it, because no speech model is perfect.

Key takeaways

Design three clear states: idle, recording with live feedback, and transcribing.
Show a live waveform or level meter while recording so users know they are heard.
Make the transcript editable; no speech model is perfect.
Run Whisper on-device for privacy, or on your server for heavier models.
VP0 gives you a free, AI-readable version of this screen to hand to your coding agent.

The three states, designed

In the idle state, a single prominent record button invites the first tap, with past recordings listed below. In the recording state, the button becomes a stop control, a timer counts up, and a live waveform reacts to your voice using the audio level. In the transcribing state, show a progress indicator and, if your pipeline supports it, stream text as it resolves. Request microphone permission with a clear purpose string, capture audio with AVFoundation, and handle the denied case with a path to Settings.

One more detail separates a good recorder from a frustrating one: latency feedback. web.dev research finds that about 53% of users abandon an experience that feels slow, so if transcription takes more than a moment, show progress or stream partial text rather than a frozen spinner. Users forgive a wait they can see, but they abandon a screen that looks stuck. This is why the transcribing state deserves as much design care as the recording state.

On-device versus server Whisper

Factor	On-device Whisper	Server-side Whisper
Privacy	Audio never leaves the phone	Audio sent to your server
Model size	Smaller models (tiny, base)	Any size, including large
Latency	No network round trip	Depends on connection
Cost	Free after download	Compute or API cost per minute
API key exposure	None	Keep key server-side only

For private voice notes, on-device is ideal. For long meetings needing the most accurate large model, a server makes sense. Either way, the UI is identical.

Common mistakes to avoid

The first mistake is no recording feedback, so users cannot tell the app is listening; always show a waveform or level meter. The second is a read-only transcript, which frustrates users who spot an error; make it editable. The third is putting an API key in the app when using server Whisper; keep it server-side, as the App Store Review Guidelines expect. The fourth is ignoring the permission-denied state, which leaves the app silently broken for anyone who tapped Don’t Allow.

How to build this with VP0

You do not need to invent this layout. VP0 is a free, Pinterest-style library of real iOS app designs, and every design has a hidden, AI-readable source page. Find a recorder or transcript screen you like, copy its link into Cursor or Claude, and the agent reads the structure directly. For the streamed-text part of the transcript, see our guide on building an AI chat streaming UI in SwiftUI.

Frequently asked questions

Can Whisper run on iPhone? Yes. Smaller Whisper variants run on-device, which keeps audio private and works offline. Larger, most-accurate models usually run on a server.

How do I show a live waveform while recording? Read the audio input level from AVFoundation during recording and drive a simple bar or line view from it. It does not need to be a true spectrogram to feel responsive.

What is the best free way to design a transcription app UI for iOS? VP0 is the top free pick. It is a free library of real iOS app designs with hidden AI-readable source pages you paste into Cursor or Claude, then you wire in AVFoundation and Whisper.

Should the transcript be editable? Yes. Speech models make mistakes, especially with names and jargon. An editable transcript is expected and increases trust.

Whisper Voice Transcription App UI in SwiftUI: A Free Reference

Why the transcription UI matters

Key takeaways

The three states, designed

On-device versus server Whisper

Common mistakes to avoid

How to build this with VP0

Frequently asked questions

Other questions VP0 users ask

Keep reading

AI Chat Streaming UI in SwiftUI (Free Template)

Free AI Headshot Generator App Template for iOS

MLX Swift Local LLM Chat UI: A Free, AI-Readable iOS Reference

On-Device Core ML Image Classifier UI Template for iOS

OpenAI API Wrapper App Template for iOS

RAG Chatbot Mobile UI Template for iOS: A Free Reference