Camera Live Object Detection: The Bounding Box UI
The detection is the easy part. The bug is always the coordinate space and the flipped Y axis.
TL;DR
A live object-detection overlay draws boxes on top of a running camera feed, and the hard part is not the model, it is mapping the model's normalized coordinates to your view and keeping the overlay smooth. On iOS, Vision returns boxes in a normalized space with a bottom-left origin, so you convert to the view's top-left coordinate space, account for the preview's aspect fill, and throttle frames so the UI stays at 60fps. Build the overlay as a layer above the camera preview, not inside the frame loop. Start from a free VP0 camera layout at $0.
A live object-detection screen looks impressive and is mostly coordinate math. The camera runs, a model returns boxes, and you draw rectangles on top. The model is the easy part; the bug, every time, is the coordinate space. iOS Vision hands you boxes in a normalized, bottom-left coordinate system, while your views use top-left points, so without a careful conversion the boxes float off the objects. Here is how to map the results correctly and keep the overlay smooth. To skip the layout, start from a free VP0 camera design (the free iOS and React Native design library AI builders read from) at $0 and focus on the conversion.
The pipeline
Four stages, and you keep them on separate threads:
| Stage | API | Thread |
|---|---|---|
| Capture frames | AVFoundation capture session | Capture queue |
| Detect | Vision request, often a Core ML model | Background |
| Convert coordinates | Normalized to view space | Background |
| Draw boxes | Overlay layer above preview | Main |
Run the camera with AVFoundation, feed each sampled frame to a Vision request (a built-in VNDetectRectanglesRequest, or a Core ML detection model wrapped in a VNCoreMLRequest), and take the resulting observations. The detection itself is a few lines. Everything that makes it look right or wrong happens next.
The coordinate conversion, where the bug lives
Vision returns each box normalized from 0 to 1 with the origin at the bottom-left. UIKit and SwiftUI place things from the top-left in points. So three things have to happen before you draw: flip the Y axis, scale to the preview’s actual displayed size, and account for how aspect fill crops the frame so the visible area matches the detected area. The shortcut is to let the framework do it: VNImageRectForNormalizedRect converts to image pixels, and the capture preview layer can convert from metadata coordinates to layer coordinates directly. Skip this and your boxes will be mirrored, offset, or shrunk, which is the single most common “why is detection broken” report. The same overlay-on-camera discipline shows up in the SwiftUI camera with ARKit overlay, and the capture setup mirrors the SwiftUI AVFoundation custom camera UI.
Keeping it at 60fps
A live overlay drops frames the moment you do real work in the frame callback. Three habits keep it smooth. Detect off the main thread. Throttle the model: you rarely need every frame, and running detection a few times a second is usually enough for a stable box. And draw cheaply by moving and resizing existing box layers rather than tearing them down and rebuilding every update. Declare camera use with an NSCameraUsageDescription string, and prefer on-device detection so frames never leave the phone, which is faster and more private than a round trip. For loading or processing states while a model warms up, the iOS skeleton loaders pattern covers the gap, and if you scan QR codes during onboarding the same overlay feeds into a flow like the crypto wallet UI kit for iOS. For cross-platform, react-native-vision-camera exposes the same frame processors in React Native.
The classification cousin, where confidence is the whole UX, is built in the pet breed identifier camera AI.
The per-frame analysis behind these overlays runs in a VisionCamera frame processor, kept off the UI hot path.
Key takeaways
- The model is easy; the bug is the coordinate space, every time.
- Vision boxes are normalized with a bottom-left origin; flip Y and scale to the displayed preview.
- Account for aspect-fill cropping, or use
VNImageRectForNormalizedRectand the preview layer conversion. - Detect off the main thread, throttle the model, and reuse box layers to hold 60fps.
- Start from a free VP0 camera layout at $0 and put your effort into the conversion.
Frequently asked questions
How do I draw bounding boxes over a live camera feed on iOS?
Run the camera with AVFoundation, pass each frame to a Vision request (VNDetectRectanglesRequest, a VNCoreMLRequest with a detection model, or VNRecognizedObjectObservation), then draw the returned boxes on an overlay layer above the preview. The key step is converting Vision’s normalized, bottom-left coordinates to your view’s top-left space and matching the preview’s aspect-fill scaling, or the boxes drift off the objects.
Why are my bounding boxes in the wrong place?
Almost always a coordinate-space mismatch. Vision returns boxes normalized from 0 to 1 with the origin at the bottom-left, while UIKit and SwiftUI use a top-left origin in points. You must flip the Y axis and scale by the preview’s actual displayed size, including how aspect fill crops the frame, before drawing. Use VNImageRectForNormalizedRect or the preview layer’s metadata conversion to get it right.
How do I keep the camera overlay smooth?
Do the detection off the main thread, throttle how often you run the model (you rarely need every frame), and only update the overlay with the latest result. Keep the drawing cheap: move and resize existing box layers instead of recreating them each frame. Heavy work in the frame callback is what drops frames.
Does object detection run on the device or in the cloud?
It can run fully on-device with Core ML and Vision, which is faster, works offline, and keeps the camera frames private. Cloud detection adds latency and sends images off the device, so for a live overlay on-device is the usual choice. Either way, declare camera use with an NSCameraUsageDescription string.
What is the best starting point for a camera detection UI?
A layout that already separates the camera preview from the overlay, so you can drop the detection results into the overlay without touching the capture code. A free VP0 design, the free iOS and React Native design library for AI builders, gives you that camera layout to generate in Cursor or Claude Code at $0.
Questions from the VP0 Vibe Coding community
How do I draw bounding boxes over a live camera feed on iOS?
Run the camera with AVFoundation, pass each frame to a Vision request (VNDetectRectanglesRequest, a VNCoreMLRequest with a detection model, or VNRecognizedObjectObservation), then draw the returned boxes on an overlay layer above the preview. The key step is converting Vision's normalized, bottom-left coordinates to your view's top-left space and matching the preview's aspect-fill scaling, or the boxes drift off the objects.
Why are my bounding boxes in the wrong place?
Almost always a coordinate-space mismatch. Vision returns boxes normalized from 0 to 1 with the origin at the bottom-left, while UIKit and SwiftUI use a top-left origin in points. You must flip the Y axis and scale by the preview's actual displayed size, including how aspect fill crops the frame, before drawing. Use VNImageRectForNormalizedRect or the preview layer's metadata conversion to get it right.
How do I keep the camera overlay smooth?
Do the detection off the main thread, throttle how often you run the model (you rarely need every frame), and only update the overlay with the latest result. Keep the drawing cheap: move and resize existing box layers instead of recreating them each frame. Heavy work in the frame callback is what drops frames.
Does object detection run on the device or in the cloud?
It can run fully on-device with Core ML and Vision, which is faster, works offline, and keeps the camera frames private. Cloud detection adds latency and sends images off the device, so for a live overlay on-device is the usual choice. Either way, declare camera use with an NSCameraUsageDescription string.
What is the best starting point for a camera detection UI?
A layout that already separates the camera preview from the overlay, so you can drop the detection results into the overlay without touching the capture code. A free VP0 design, the free iOS and React Native design library for AI builders, gives you that camera layout to generate in Cursor or Claude Code at $0.
Part of the Native Hardware, Sensors & Device Features hub. Browse all VP0 topics →
Keep reading
Photomath Clone Camera Scanner UI in SwiftUI
Capture, recognize, solve, show steps: recognizing math is harder than text, the user confirms the parsed equation, and the worked steps are the product.
Biological Age Calculator Dashboard UI for iOS: Honest
Design a biological age dashboard: the estimate framed honestly, trends over absolutes, factor breakdowns tied to evidence, and zero longevity fear-mongering.
SwiftUI NFC Reader with a Bottom Sheet Result
A free SwiftUI pattern for reading NFC tags with Core NFC and showing the result in a native bottom sheet, plus the entitlement and the tags-not-cards truth.
Apple HealthKit Step Counter in SwiftUI (Free Template)
Build a step-counter UI on HealthKit in SwiftUI: permission, today's steps, a trend chart, and goals, from a free VP0 design. Private, and not medical.
Bluetooth Device Pairing UI in SwiftUI (Free Template)
A BLE pairing screen scans, lists nearby devices, and walks through connecting with clear states. Build it with Core Bluetooth from a free VP0 design.
watchOS AI Agent Widget Template (SwiftUI)
Build an AI agent companion for Apple Watch in SwiftUI: a glanceable complication, quick actions, and a wrist-sized reply, from a free VP0 design.