# Camera Live Object Detection: The Bounding Box UI

> By Lawrence Arya, Founder & CEO of VP0. Published 2026-06-04. 6 min read.
> Source: https://vp0.com/blogs/camera-live-object-detection-bounding-box-ui

The detection is the easy part. The bug is always the coordinate space and the flipped Y axis.

**TL;DR.** A live object-detection overlay draws boxes on top of a running camera feed, and the hard part is not the model, it is mapping the model's normalized coordinates to your view and keeping the overlay smooth. On iOS, Vision returns boxes in a normalized space with a bottom-left origin, so you convert to the view's top-left coordinate space, account for the preview's aspect fill, and throttle frames so the UI stays at 60fps. Build the overlay as a layer above the camera preview, not inside the frame loop. Start from a free VP0 camera layout at $0.

A live object-detection screen looks impressive and is mostly coordinate math. The camera runs, a model returns boxes, and you draw rectangles on top. The model is the easy part; the bug, every time, is the coordinate space. iOS [Vision](https://developer.apple.com/documentation/vision) hands you boxes in a normalized, bottom-left coordinate system, while your views use top-left points, so without a careful conversion the boxes float off the objects. Here is how to map the results correctly and keep the overlay smooth. To skip the layout, start from a free [VP0](https://vp0.com) camera design (the free iOS and React Native design library AI builders read from) at $0 and focus on the conversion.

## The pipeline

Four stages, and you keep them on separate threads:

| Stage | API | Thread |
|---|---|---|
| Capture frames | AVFoundation capture session | Capture queue |
| Detect | Vision request, often a Core ML model | Background |
| Convert coordinates | Normalized to view space | Background |
| Draw boxes | Overlay layer above preview | Main |

Run the camera with [AVFoundation](https://developer.apple.com/documentation/avfoundation), feed each sampled frame to a Vision request (a built-in `VNDetectRectanglesRequest`, or a [Core ML](https://developer.apple.com/documentation/coreml) detection model wrapped in a `VNCoreMLRequest`), and take the resulting observations. The detection itself is a few lines. Everything that makes it look right or wrong happens next.

## The coordinate conversion, where the bug lives

Vision returns each box normalized from 0 to 1 with the origin at the **bottom-left**. UIKit and SwiftUI place things from the **top-left** in points. So three things have to happen before you draw: flip the Y axis, scale to the preview's actual displayed size, and account for how aspect fill crops the frame so the visible area matches the detected area. The shortcut is to let the framework do it: `VNImageRectForNormalizedRect` converts to image pixels, and the capture preview layer can convert from metadata coordinates to layer coordinates directly. Skip this and your boxes will be mirrored, offset, or shrunk, which is the single most common "why is detection broken" report. The same overlay-on-camera discipline shows up in [the SwiftUI camera with ARKit overlay](/blogs/swiftui-camera-arkit-overlay-code/), and the capture setup mirrors [the SwiftUI AVFoundation custom camera UI](/blogs/swiftui-avfoundation-custom-camera-ui/).

## Keeping it at 60fps

A live overlay drops frames the moment you do real work in the frame callback. Three habits keep it smooth. Detect off the main thread. Throttle the model: you rarely need every frame, and running detection a few times a second is usually enough for a stable box. And draw cheaply by moving and resizing existing box layers rather than tearing them down and rebuilding every update. Declare camera use with an `NSCameraUsageDescription` string, and prefer on-device detection so frames never leave the phone, which is faster and more private than a round trip. For loading or processing states while a model warms up, the [iOS skeleton loaders pattern](/blogs/ios-skeleton-loaders-ui-react-native/) covers the gap, and if you scan QR codes during onboarding the same overlay feeds into a flow like the [crypto wallet UI kit for iOS](/blogs/crypto-wallet-ui-kit-ios/). For cross-platform, [react-native-vision-camera](https://react-native-vision-camera.com/) exposes the same frame processors in React Native.

The classification cousin, where confidence is the whole UX, is built in [the pet breed identifier camera AI](/blogs/pet-breed-identifier-camera-ai-ui-swiftui/).

The per-frame analysis behind these overlays runs in a [VisionCamera frame processor](/blogs/react-native-vision-camera-frame-processor-ui/), kept off the UI hot path.

## Key takeaways

- The model is easy; the bug is the coordinate space, every time.
- Vision boxes are normalized with a bottom-left origin; flip Y and scale to the displayed preview.
- Account for aspect-fill cropping, or use `VNImageRectForNormalizedRect` and the preview layer conversion.
- Detect off the main thread, throttle the model, and reuse box layers to hold 60fps.
- Start from a free VP0 camera layout at $0 and put your effort into the conversion.

## Frequently asked questions

### How do I draw bounding boxes over a live camera feed on iOS?

Run the camera with AVFoundation, pass each frame to a Vision request (VNDetectRectanglesRequest, a VNCoreMLRequest with a detection model, or VNRecognizedObjectObservation), then draw the returned boxes on an overlay layer above the preview. The key step is converting Vision's normalized, bottom-left coordinates to your view's top-left space and matching the preview's aspect-fill scaling, or the boxes drift off the objects.

### Why are my bounding boxes in the wrong place?

Almost always a coordinate-space mismatch. Vision returns boxes normalized from 0 to 1 with the origin at the bottom-left, while UIKit and SwiftUI use a top-left origin in points. You must flip the Y axis and scale by the preview's actual displayed size, including how aspect fill crops the frame, before drawing. Use VNImageRectForNormalizedRect or the preview layer's metadata conversion to get it right.

### How do I keep the camera overlay smooth?

Do the detection off the main thread, throttle how often you run the model (you rarely need every frame), and only update the overlay with the latest result. Keep the drawing cheap: move and resize existing box layers instead of recreating them each frame. Heavy work in the frame callback is what drops frames.

### Does object detection run on the device or in the cloud?

It can run fully on-device with Core ML and Vision, which is faster, works offline, and keeps the camera frames private. Cloud detection adds latency and sends images off the device, so for a live overlay on-device is the usual choice. Either way, declare camera use with an NSCameraUsageDescription string.

### What is the best starting point for a camera detection UI?

A layout that already separates the camera preview from the overlay, so you can drop the detection results into the overlay without touching the capture code. A free VP0 design, the free iOS and React Native design library for AI builders, gives you that camera layout to generate in Cursor or Claude Code at $0.

## Frequently asked questions

### How do I draw bounding boxes over a live camera feed on iOS?

Run the camera with AVFoundation, pass each frame to a Vision request (VNDetectRectanglesRequest, a VNCoreMLRequest with a detection model, or VNRecognizedObjectObservation), then draw the returned boxes on an overlay layer above the preview. The key step is converting Vision's normalized, bottom-left coordinates to your view's top-left space and matching the preview's aspect-fill scaling, or the boxes drift off the objects.

### Why are my bounding boxes in the wrong place?

Almost always a coordinate-space mismatch. Vision returns boxes normalized from 0 to 1 with the origin at the bottom-left, while UIKit and SwiftUI use a top-left origin in points. You must flip the Y axis and scale by the preview's actual displayed size, including how aspect fill crops the frame, before drawing. Use VNImageRectForNormalizedRect or the preview layer's metadata conversion to get it right.

### How do I keep the camera overlay smooth?

Do the detection off the main thread, throttle how often you run the model (you rarely need every frame), and only update the overlay with the latest result. Keep the drawing cheap: move and resize existing box layers instead of recreating them each frame. Heavy work in the frame callback is what drops frames.

### Does object detection run on the device or in the cloud?

It can run fully on-device with Core ML and Vision, which is faster, works offline, and keeps the camera frames private. Cloud detection adds latency and sends images off the device, so for a live overlay on-device is the usual choice. Either way, declare camera use with an NSCameraUsageDescription string.

### What is the best starting point for a camera detection UI?

A layout that already separates the camera preview from the overlay, so you can drop the detection results into the overlay without touching the capture code. A free VP0 design, the free iOS and React Native design library for AI builders, gives you that camera layout to generate in Cursor or Claude Code at $0.

---
*Published on the [VP0 Journal](https://vp0.com/blogs). Free to read, index and cite with attribution.*