Camera Live Object Detection: The Bounding Box UI

The detection is the easy part, and the bug is almost always the coordinate space and the flipped Y axis.

Lawrence Arya Founder & CEO of VP0 · June 4, 2026 · 6 min read Updated June 19, 2026 View as Markdown

TL;DR

A live object-detection overlay draws boxes on top of a running camera feed, and the hard part is not the model, it is mapping the model's normalized coordinates to your view and keeping the overlay smooth. On iOS, Vision returns boxes in a normalized space with a bottom-left origin, so you convert to the view's top-left coordinate space, account for the preview's aspect fill, and throttle frames so the UI stays at 60fps. Build the overlay as a layer above the camera preview, not inside the frame loop. Start from a free VP0 camera layout at $0.

A live object-detection screen looks impressive and is mostly coordinate math. The camera runs, a model returns boxes, and you draw rectangles on top. The model is the easy part; the bug, every time, is the coordinate space. iOS Vision hands you boxes in a normalized, bottom-left coordinate system, while your views use top-left points, so without a careful conversion the boxes float off the objects. Here is how to map the results correctly and keep the overlay smooth. To skip the layout, start from a free VP0 camera design (the free iOS and React Native design library AI builders read from) at $0 and focus on the conversion.

The pipeline

Four stages, and you keep them on separate threads:

Stage	API	Thread
Capture frames	AVFoundation capture session	Capture queue
Detect	Vision request, often a Core ML model	Background
Convert coordinates	Normalized to view space	Background
Draw boxes	Overlay layer above preview	Main

Run the camera with AVFoundation, feed each sampled frame to a Vision request (a built-in VNDetectRectanglesRequest, or a Core ML detection model wrapped in a VNCoreMLRequest), and take the resulting observations. The detection itself is a few lines. Everything that makes it look right or wrong happens next.

The coordinate conversion, where the bug lives

Vision returns each box normalized from 0 to 1 with the origin at the bottom-left. UIKit and SwiftUI place things from the top-left in points. So three things have to happen before you draw: flip the Y axis, scale to the preview’s actual displayed size, and account for how aspect fill crops the frame so the visible area matches the detected area. The shortcut is to let the framework do it: VNImageRectForNormalizedRect converts to image pixels, and the capture preview layer can convert from metadata coordinates to layer coordinates directly. Skip this and your boxes will be mirrored, offset, or shrunk, which is the single most common “why is detection broken” report. The same overlay-on-camera discipline shows up in the SwiftUI camera with ARKit overlay, and the capture setup mirrors the SwiftUI AVFoundation custom camera UI.

Keeping it at 60fps

A live overlay drops frames the moment you do real work in the frame callback. Three habits keep it smooth. Detect off the main thread. Throttle the model: you rarely need every frame, and running detection a few times a second is usually enough for a stable box. And draw cheaply by moving and resizing existing box layers rather than tearing them down and rebuilding every update. Declare camera use with an NSCameraUsageDescription string, and prefer on-device detection so frames never leave the phone, which is faster and more private than a round trip. For loading or processing states while a model warms up, the iOS skeleton loaders pattern covers the gap, and if you scan QR codes during onboarding the same overlay feeds into a flow like the crypto wallet UI kit for iOS. For cross-platform, react-native-vision-camera exposes the same frame processors in React Native.

The classification cousin, where confidence is the whole UX, is built in the pet breed identifier camera AI.

The per-frame analysis behind these overlays runs in a VisionCamera frame processor, kept off the UI hot path.

Key takeaways

The model is easy; the bug is the coordinate space, every time.
Vision boxes are normalized with a bottom-left origin; flip Y and scale to the displayed preview.
Account for aspect-fill cropping, or use VNImageRectForNormalizedRect and the preview layer conversion.
Detect off the main thread, throttle the model, and reuse box layers to hold 60fps.
Start from a free VP0 camera layout at $0 and put your effort into the conversion.

Frequently asked questions

How do I draw bounding boxes over a live camera feed on iOS?

Run the camera with AVFoundation, pass each frame to a Vision request (VNDetectRectanglesRequest, a VNCoreMLRequest with a detection model, or VNRecognizedObjectObservation), then draw the returned boxes on an overlay layer above the preview. The key step is converting Vision’s normalized, bottom-left coordinates to your view’s top-left space and matching the preview’s aspect-fill scaling, or the boxes drift off the objects.

Why are my bounding boxes in the wrong place?

Almost always a coordinate-space mismatch. Vision returns boxes normalized from 0 to 1 with the origin at the bottom-left, while UIKit and SwiftUI use a top-left origin in points. You must flip the Y axis and scale by the preview’s actual displayed size, including how aspect fill crops the frame, before drawing. Use VNImageRectForNormalizedRect or the preview layer’s metadata conversion to get it right.

How do I keep the camera overlay smooth?

Do the detection off the main thread, throttle how often you run the model (you rarely need every frame), and only update the overlay with the latest result. Keep the drawing cheap: move and resize existing box layers instead of recreating them each frame. Heavy work in the frame callback is what drops frames.

Does object detection run on the device or in the cloud?

It can run fully on-device with Core ML and Vision, which is faster, works offline, and keeps the camera frames private. Cloud detection adds latency and sends images off the device, so for a live overlay on-device is the usual choice. Either way, declare camera use with an NSCameraUsageDescription string.

What is the best starting point for a camera detection UI?

A layout that already separates the camera preview from the overlay, so you can drop the detection results into the overlay without touching the capture code. A free VP0 design, the free iOS and React Native design library for AI builders, gives you that camera layout to generate in Cursor or Claude Code at $0.

Questions from the VP0 Vibe Coding community

How do I draw bounding boxes over a live camera feed on iOS?

Run the camera with AVFoundation, pass each frame to a Vision request (VNDetectRectanglesRequest, a VNCoreMLRequest with a detection model, or VNRecognizedObjectObservation), then draw the returned boxes on an overlay layer above the preview. The key step is converting Vision's normalized, bottom-left coordinates to your view's top-left space and matching the preview's aspect-fill scaling, or the boxes drift off the objects.

Why are my bounding boxes in the wrong place?

Almost always a coordinate-space mismatch. Vision returns boxes normalized from 0 to 1 with the origin at the bottom-left, while UIKit and SwiftUI use a top-left origin in points. You must flip the Y axis and scale by the preview's actual displayed size, including how aspect fill crops the frame, before drawing. Use VNImageRectForNormalizedRect or the preview layer's metadata conversion to get it right.

How do I keep the camera overlay smooth?

Does object detection run on the device or in the cloud?

What is the best starting point for a camera detection UI?

#ios #camera #computer-vision #swiftui #ui-components

Part of the Native Hardware, Sensors & Device Features hub. Browse all VP0 topics →

Keep reading

Guides 6 min read

Apple HealthKit Pedometer UI: Free Step Counter Templates

Build a step counter UI for Apple HealthKit: HealthKit for daily totals and charts, Core Motion's CMPedometer for the live number, from a free template.

Lawrence Arya · July 1, 2026

Guides 7 min read

Bluetooth Hearing Aid EQ Mixer UI for iOS

Build a Bluetooth hearing aid EQ mixer UI for iOS in SwiftUI, bound to AVAudioUnitEQ. Here is the audio path, the band sliders, and what to keep honest.

Lawrence Arya · June 18, 2026

Guides 7 min read

Bluetooth Mesh Network Chat Interface for iOS

Build a Bluetooth mesh network chat interface for iOS with MultipeerConnectivity. Here is the transport choice, the message UI, and the states it must show.

Lawrence Arya · June 18, 2026

Guides 10 min read

Build an Anonymous Voice Changer Pitch Slider on iOS

Route the mic through AVAudioUnitTimePitch and bind a slider to its pitch in cents. Here is the audio graph, the UI, and what anonymous really means.

Lawrence Arya · June 11, 2026

Guides 10 min read

XR fitness app companion UI for iOS: the SwiftUI screens

Build the iOS companion for an XR fitness app: setup, live HealthKit metrics, summary, and history that stay in sync with the headset workout.

Lawrence Arya · June 10, 2026

Guides 6 min read

Photomath Clone Camera Scanner UI in SwiftUI

Capture, recognize, solve, show steps: recognizing math is harder than text, the user confirms the parsed equation, and the worked steps are the product.

Lawrence Arya · June 7, 2026