# Twilio video call interface in SwiftUI: a practical build

> By Lawrence Arya, Founder & CEO of VP0. Published 2026-06-10. 10 min read.
> Source: https://vp0.com/blogs/twilio-video-call-interface-swiftui

The SwiftUI call UI is the durable part. Host video in a UIViewRepresentable, keep secrets on your server, and pick a current media SDK.

**TL;DR.** A Twilio video call interface in SwiftUI is a provider-agnostic UI, a tile grid, self-view, and control bar, with video frames hosted in a UIViewRepresentable. Twilio is discontinuing Programmable Video, so a new build connects the same layout to a current provider like LiveKit, Agora, or AVFoundation for simple cases. The one rule that matters most is that the provider secret never ships in the app; calls are authorized with short-lived tokens minted on your server. Starting from a free VP0 design and letting Claude Code or Cursor read its source page is the fastest way to a clean call UI.

A Twilio video call interface in SwiftUI is mostly a provider-agnostic UI: a grid of video tiles, a self-view, and a control bar, with the actual video frames hosted in a `UIViewRepresentable`. The important caveat up front is that Twilio is discontinuing Programmable Video, so a new build connects the same SwiftUI layout to a current provider such as [LiveKit](https://docs.livekit.io/home/), Agora, or Apple's own [AVFoundation](https://developer.apple.com/documentation/avfoundation) for simpler cases. The screen design carries over regardless of the SDK underneath, which is why starting from a free VP0 design and letting Claude Code or Cursor read its source page is the fastest way to a clean call UI.

One rule matters more than any layout detail: the provider's API secret never goes in the app. Calls are authorized with short-lived tokens minted on your server, and the app only ever holds a token. The sections below cover the call screen, that token boundary, and what to use now that Twilio Video is winding down.

## How do you build a Twilio video call interface in SwiftUI?

The interface is a layout problem layered over a media SDK. You build the SwiftUI screen, a grid of participant tiles, a picture-in-picture self-view, and controls for mute, camera flip, and hang up, then host each remote and local video track inside a `UIViewRepresentable`, because the SDKs render frames into a UIKit view rather than a native SwiftUI one. SwiftUI manages the layout, state, and controls; the representable bridges the video.

Historically that media layer was Twilio's Programmable Video SDK. Twilio has announced it is discontinuing that product, so while the search term is "Twilio video call interface," a new app should treat the UI as the durable part and pick a current SDK for the media. LiveKit, which is open source with more than 19,000 stars on its server alone, Agora, and Daily all render into a UIKit view the same way, so the SwiftUI you write does not change much when you swap the provider.

For related real-time layouts, the [Discord voice channel user grid](/blogs/discord-voice-channel-user-grid-swiftui/) and a [multiplayer voice chat overlay](/blogs/multiplayer-game-voice-chat-overlay-swiftui/) use the same participant-tile thinking.

## The call screen layout in SwiftUI

The call screen is a tile grid that adapts to participant count, a small movable self-view, and a control bar pinned to the bottom. SwiftUI's `LazyVGrid` handles the tiles, and an overlay positions the self-view.

```swift
struct CallView: View {
    @State private var model: CallModel
    var body: some View {
        ZStack(alignment: .bottom) {
            LazyVGrid(columns: gridColumns(for: model.participants.count)) {
                ForEach(model.participants) { p in
                    VideoTrackView(track: p.videoTrack)   // UIViewRepresentable
                        .aspectRatio(3/4, contentMode: .fill)
                        .clipShape(RoundedRectangle(cornerRadius: 12))
                }
            }
            ControlBar(model: model)
        }
        .overlay(alignment: .topTrailing) { SelfView(track: model.localTrack) }
    }
}
```

`VideoTrackView` is the representable that wraps the SDK's render view and updates it when the track changes. The grid columns adapt to the participant count so a one-on-one call fills the screen and a four-person call splits cleanly. State the call needs, connection status, who is muted, who is speaking, lives in an `@Observable` model so the controls and tiles stay in sync.

The states people forget are the non-ideal ones: connecting, reconnecting after a network drop, a participant with their camera off, and the call ending. Designing those in from the start, rather than only the happy path where everyone is connected, is what makes a call UI feel finished.

## Why your video credentials must stay on the server

The single most important rule in any video call app is that the provider's API key or secret never ships inside the app. Anything embedded in a mobile binary can be extracted, and a leaked video API secret can be used to run up usage on your account or impersonate your service. This is the kind of credential handling that matters for any app touching billed or regulated infrastructure, and it is not optional.

The correct pattern is a token server. Your backend holds the secret, and when a user joins a call, your app asks your server for a short-lived access token scoped to that one room. LiveKit, Agora, and the others all document this access-token flow, and [LiveKit's docs](https://docs.livekit.io/home/) describe generating tokens on your server with the API secret that never leaves it. The app receives a token that expires quickly and grants access to a single room, so even if it were intercepted, the exposure is small and time-limited.

Build this boundary first, before the UI, because retrofitting it after hardcoding a key during prototyping is how secrets end up committed to a repo. Treat the token endpoint as the front door to the whole feature.

## Twilio is sunsetting Programmable Video: what to use now

Because Twilio is discontinuing Programmable Video, a new SwiftUI call app should start on a current provider, and the good news is the UI barely changes. The realistic options each suit a different situation.

| Provider | Best for | Notes |
|---|---|---|
| LiveKit | Open source, self-host or cloud, full control | Renders into a UIKit view like the others |
| Agora | Large-scale, low-latency global calls | Mature SDK, usage-based pricing |
| Daily | Fast setup with prebuilt pieces | Good when you want less to wire up |
| AVFoundation | One-to-one or local capture without an SFU | Apple-native, no third-party media server |

LiveKit is a strong default when you want an open-source media layer you can self-host or run in their cloud, and its Swift SDK renders frames into the same kind of view the others do. AVFoundation and AVKit cover the simpler end, local capture, recording, or basic one-to-one, without bringing in a selective forwarding unit at all; Apple's [AVKit](https://developer.apple.com/documentation/avkit) documentation covers the playback and capture surfaces. The SwiftUI layout you build sits above all of them, so committing to the screen design first and choosing the SDK second is the safe order.

## Making it native with AI and a real design

AI builders are good at the SwiftUI layout and weak at the bridge to the media SDK. Claude Code and Cursor will produce a clean tile grid and control bar, then mishandle the `UIViewRepresentable`, updating the wrong way so the video never refreshes, or worse, hardcode an API key directly in the client because that is the shortest path to a working demo. The layout looks done and the security and bridging are wrong.

Giving the model a real design and explicit rules avoids most of it. When the tile grid, self-view, and control states are already decided, the model implements a real call screen instead of inventing one, and you keep your attention on the token flow and the representable. Starting from a free VP0 design provides that structure, since each design has a machine-readable source page Claude Code, Cursor, or Rork read from a pasted link. Always review the credential handling yourself, because "put the key in the app" is a mistake AI makes by default. Related streaming and capture layouts like the [spatial video recording UI](/blogs/spatial-video-recording-ui-clone-swiftui/) and a [Clubhouse-style audio room](/blogs/clubhouse-audio-room-ui-clone-swiftui/) show adjacent patterns.

## Common video call UI mistakes

A few mistakes recur in SwiftUI call screens. Embedding the API secret in the app is the most serious, and the fix is the token server above. Forgetting to request camera and microphone permission before joining is the second, and it produces a black tile and silent confusion; request and handle both, including the denied case, before connecting.

A video view that never updates is the third, almost always a `UIViewRepresentable` that creates the render view but does not update it when the track changes. The fourth is doing media work on the main thread, which stutters the whole UI during a call. And the fifth is shipping only the connected state, with no design for connecting, reconnecting, or a camera-off participant, which makes the call feel broken the first time the network hiccups.

## When AVFoundation alone is enough

You do not always need a third-party media server. For a one-to-one call, local camera capture, or recording, AVFoundation and AVKit handle capture and playback natively, without the cost or complexity of an SFU. When the feature is genuinely simple, that is the lighter and cheaper path, and it keeps everything inside Apple's frameworks.

A hosted provider earns its place once you need reliable multiparty calls, where a selective forwarding unit routes each participant's video efficiently and handles the network conditions that peer-to-peer struggles with. The honest split is that AVFoundation suits simple or local video, and LiveKit or Agora suit real multiparty calling. Reaching for a full provider on a one-to-one feature is overkill; reaching for raw AVFoundation on a ten-person call is underbuilt.

## Key takeaways: a SwiftUI video call screen built right

Treat the SwiftUI layout, a tile grid, self-view, and control bar, as the durable part, and host video tracks in a `UIViewRepresentable`. Keep the provider secret on your server and authorize calls with short-lived tokens, never an embedded key. Since Twilio is winding down Programmable Video, start on LiveKit, Agora, or Daily, or use AVFoundation for simple one-to-one. Design the connecting and reconnecting states, not just the happy path. Let an AI builder implement the layout from a real design, then review the credential handling yourself. A commissioned calling feature can cost $5,000 or more once UI, media, and a token server are accounted for, while starting from a free VP0 design covers the UI for nothing.

You can [browse VP0 designs](/explore) to start your call screen from a real layout rather than a blank grid.

## Frequently asked questions

### How do you build a Twilio video call interface in SwiftUI?

Build the SwiftUI layout, a tile grid, a self-view, and a control bar, and host each video track in a `UIViewRepresentable`, since the SDKs render frames into a UIKit view. Keep the call state in an `@Observable` model so controls and tiles stay in sync. Because Twilio is discontinuing Programmable Video, connect the same UI to a current provider like LiveKit or Agora. Starting from a free VP0 design gets the layout and states right so you can focus on the media bridge.

### Is Twilio Programmable Video still available for new apps?

Twilio has announced it is discontinuing Programmable Video, so a new SwiftUI app should not start on it. The encouraging part is that the call UI is provider-agnostic: a LiveKit, Agora, or Daily SDK renders into the same kind of UIKit view, so the SwiftUI layout you build carries over with little change. Pick the media SDK based on whether you want open source, scale, or fast setup, and keep the screen design as the stable layer.

### How do I keep video call API keys secure in a SwiftUI app?

Never embed the provider's API secret in the app, because anything in a mobile binary can be extracted. Run a token server: your backend holds the secret and mints a short-lived access token scoped to one room when a user joins, and the app only ever receives that token. LiveKit, Agora, and the others all document this access-token flow. Build the token boundary before the UI so a key never gets hardcoded during prototyping.

### Can VP0 provide a free SwiftUI template for a video call screen?

Yes. VP0 is a free iOS app design library where every design has a machine-readable source page an AI builder reads from a pasted link, with SwiftUI and React Native variants. You start from the call screen design, hand its source to Claude Code, Cursor, or Rork, and wire it to your chosen media SDK and token server, rather than designing the grid and controls from a blank prompt.

### What common errors happen when vibe coding a video call UI?

The frequent ones are embedding the API secret in the app instead of using a token server, skipping the camera and microphone permission flow so tiles render black, a `UIViewRepresentable` that never updates so video freezes, doing media work on the main thread, and shipping only the connected state with no design for connecting or reconnecting. The security one is the most damaging, and it is also the default mistake AI builders make, so review the credential handling before anything else.

## Frequently asked questions

### How do you build a Twilio video call interface in SwiftUI?

Build the SwiftUI layout, a tile grid, a self-view, and a control bar, and host each video track in a UIViewRepresentable, since the SDKs render frames into a UIKit view. Keep the call state in an Observable model so controls and tiles stay in sync. Because Twilio is discontinuing Programmable Video, connect the same UI to a current provider like LiveKit or Agora. Starting from a free VP0 design gets the layout and states right so you can focus on the media bridge.

### Is Twilio Programmable Video still available for new apps?

Twilio has announced it is discontinuing Programmable Video, so a new SwiftUI app should not start on it. The encouraging part is that the call UI is provider-agnostic: a LiveKit, Agora, or Daily SDK renders into the same kind of UIKit view, so the SwiftUI layout you build carries over with little change. Pick the media SDK based on whether you want open source, scale, or fast setup, and keep the screen design as the stable layer.

### How do I keep video call API keys secure in a SwiftUI app?

Never embed the provider's API secret in the app, because anything in a mobile binary can be extracted. Run a token server: your backend holds the secret and mints a short-lived access token scoped to one room when a user joins, and the app only ever receives that token. LiveKit, Agora, and the others all document this access-token flow. Build the token boundary before the UI so a key never gets hardcoded during prototyping.

### Can VP0 provide a free SwiftUI template for a video call screen?

Yes. VP0 is a free iOS app design library where every design has a machine-readable source page an AI builder reads from a pasted link, with SwiftUI and React Native variants. You start from the call screen design, hand its source to Claude Code, Cursor, or Rork, and wire it to your chosen media SDK and token server, rather than designing the grid and controls from a blank prompt.

### What common errors happen when vibe coding a video call UI?

The frequent ones are embedding the API secret in the app instead of using a token server, skipping the camera and microphone permission flow so tiles render black, a UIViewRepresentable that never updates so video freezes, doing media work on the main thread, and shipping only the connected state with no design for connecting or reconnecting. The security one is the most damaging, and it is also the default mistake AI builders make, so review the credential handling before anything else.

---
*Published on the [VP0 Journal](https://vp0.com/blogs). Free to read, index and cite with attribution.*