Midjourney-Style Image Grid Selector UI in SwiftUI

Generation is the model's job; the grid is where the human picks. That decision UX is the product.

Lawrence Arya Founder & CEO of VP0 · June 7, 2026 · 6 min read Updated June 7, 2026 View as Markdown

Midjourney-Style Image Grid Selector UI in SwiftUI: a glass photo icon surrounded by chat, music, heart, camera and shopping app icons on a pastel gradient

TL;DR

A Midjourney-style image grid selector is the decision screen after generation, separate from the prompt input: it presents several candidate images, shows which are still rendering, and makes picking fast. Build it with a LazyVGrid where each cell is a state machine (queued, generating, loaded, failed) loading via AsyncImage and holding its aspect ratio, so the grid fills in cell by cell and one failure is retryable alone. Selection is tap-to-focus (a matchedGeometryEffect zoom to judge the image) with first-class upscale, vary, and save actions and primary re-roll. Be honest about cost before the tap, real per-cell progress, and retryable failures. The app builds the grid; the model generates. A free VP0 design supplies the grid and detail screens.

What is the image-grid selector for, and why is it a distinct screen?

It is the moment after generation, where the model returns several candidates and the user picks. Midjourney made the 2x2 grid of variations famous: you prompt, four images come back, and you choose one to upscale, vary, or save. That selection screen is its own design problem, separate from the prompt input, because it has to present multiple in-progress and finished images at once, communicate which are still rendering, and make picking, comparing, and acting on a result feel fast. Generation is the model’s job; the grid is where the human makes the decision, and that decision UX is the product.

The honest scope: the app builds the grid, the actions, and the state handling; the images come from a generation API (your own model, or a hosted one, commonly returning 1,024 by 1,024 results), and the screen’s whole job is making a slow, asynchronous, sometimes-failing process feel controllable.

How is the grid built in SwiftUI?

With LazyVGrid and asynchronous image loading, designed around the fact that the images do not all arrive at once. The structure:

A 2x2 (or NxN) LazyVGrid of result cells, each cell a state machine, not just an image: queued, generating (a progress shimmer), loaded, or failed.
Per-cell progress, because generation is slow and parallel: each of the four images may finish at a different time, so the grid fills in cell by cell rather than flashing complete all at once, and a single global spinner would hide that real progress.
AsyncImage or a caching loader for the finished images, with the cell holding its place at the right aspect ratio so the grid does not reflow as images pop in.

The cell-as-state-machine is the load-bearing idea: a grid that only knows “loading” versus “image” cannot show that image 3 failed while 1, 2, and 4 succeeded, which is exactly the case the user needs to see so they can retry just the one. This is the same honest-per-item-state discipline as any async collection. For browsing finished images rather than generating them, the same LazyVGrid pattern powers a SwiftUI photo gallery with pinch-to-zoom.

What actions does the grid need, and how does selection feel?

The Midjourney verbs, adapted: tap a result to select it, then act, upscale (get the full-resolution version), make variations (generate four more like this one), save, or share. The interaction details that make it feel good:

Tap to focus, not just select: tapping a cell should open it large (a matchedGeometryEffect zoom from the cell to a detail view) so the user can actually judge the image before committing, then the actions live there.
Compare easily: four small images are hard to judge, so a quick way to enlarge each, or swipe between them full-screen, is what turns the grid from a gallery into a decision tool.
Re-roll and variation are first-class: “none of these, try again” and “more like this one” are the two most common next steps, so they are primary buttons, not buried.

The selection moment is the payoff, and a satisfying zoom-to-detail with the action set is what makes the whole generate-pick loop feel responsive even when generation itself is slow.

Where does the honesty live?

In the cost, the wait, and the failures, because generation is slow, metered, and unreliable. Each generation spends money or credits, so the UI shows the cost before the tap and never silently burns a re-roll; the wait is a real per-cell progress state, not a fake spinner that completes early; and a failed image is a retryable cell with an honest error, not a blank square that looks like a bug. This is the same cost-and-wait honesty as the image outpainting brush tool, where the model generates and the app’s job is making the slow, metered loop trustworthy.

One content-honesty note: generated images are synthetic, and any app where that matters (stock-style use, anything presented as real) should keep that clear. The screens, the result grid, the focused detail with actions, the generation history, come as a free VP0 design, so an agent builds the LazyVGrid state machine and the generation-API wiring onto a UI already shaped for the generate, compare, pick loop.

Key takeaways: a Midjourney-style image grid selector

The grid is the decision screen, separate from the prompt: present multiple results, show which are still rendering, make picking fast.
Each cell is a state machine: queued, generating, loaded, failed, so the grid fills in cell by cell and one failure is retryable alone.
Build it with LazyVGrid + AsyncImage, cells holding aspect ratio so the grid does not reflow as images arrive.
Selection is tap-to-focus with first-class actions: zoom to judge, then upscale, vary, save; re-roll and variation are primary.
Be honest about cost, wait, and failures: show cost before the tap, real per-cell progress, retryable errors, never a fake spinner.

Frequently asked questions

How do I build a Midjourney-style image grid selector in SwiftUI? Use a LazyVGrid of result cells where each cell is a state machine (queued, generating, loaded, failed) loading via AsyncImage, fill in cells as each generation finishes, and make tap-to-focus open a detail view with upscale, vary, and save actions. A free VP0 design supplies the grid, detail, and history screens to wire your generation API onto.

Why should each grid cell be a state machine? Because the four images do not arrive together and any can fail: a grid that only knows loading versus image cannot show that image 3 failed while the others succeeded, which is exactly what the user needs to see to retry just that one. Per-cell states (queued, generating, loaded, failed) make the slow, parallel, fallible process legible.

How do I make selecting an image feel good? Tap to focus, not just select: a matchedGeometryEffect zoom from the cell to a large detail view lets the user actually judge the image before committing, with the actions (upscale, make variations, save) living there. Four small thumbnails are hard to judge, so easy enlargement turns the grid into a real decision tool.

Does the app generate the images itself? No: generation runs on a model API, your own or hosted, and the app builds the grid, the actions, and the state handling. The screen’s job is making a slow, asynchronous, sometimes-failing process feel controllable, which is a different and equally important problem from the generation itself.

How should the grid handle generation cost and failures? Honestly: show the credit or money cost before the tap and never silently burn a re-roll, render a real per-cell progress state rather than a fake spinner, and make a failed image a retryable cell with a clear error instead of a blank square. Generation is metered and unreliable, so the UI earns trust by being candid about both.

Questions from the VP0 Vibe Coding community

How do I build a Midjourney-style image grid selector in SwiftUI?

Use a LazyVGrid of result cells where each cell is a state machine (queued, generating, loaded, failed) loading via AsyncImage, fill in cells as each generation finishes, and make tap-to-focus open a detail view with upscale, vary, and save actions. A free VP0 design supplies the grid, detail, and history screens to wire your generation API onto.

Why should each image grid cell be a state machine?

Because the four images do not arrive together and any can fail: a grid that only knows loading versus image cannot show that image 3 failed while the others succeeded, which is exactly what the user needs to retry just that one. Per-cell states make the slow, parallel, fallible process legible.

How do I make selecting a generated image feel good?

Tap to focus, not just select: a matchedGeometryEffect zoom from the cell to a large detail view lets the user judge the image before committing, with upscale, make-variations, and save actions there. Four small thumbnails are hard to judge, so easy enlargement turns the grid into a real decision tool.

Does the app generate the images itself?

No: generation runs on a model API, your own or hosted, and the app builds the grid, the actions, and the state handling. The screen's job is making a slow, asynchronous, sometimes-failing process feel controllable, a different and equally important problem from generation itself.

How should the grid handle generation cost and failures?

Honestly: show the credit or money cost before the tap and never silently burn a re-roll, render a real per-cell progress state rather than a fake spinner, and make a failed image a retryable cell with a clear error instead of a blank square. Generation is metered and unreliable, so candor earns trust.

#swiftui #ai #image-generation #lazyvgrid #ui-patterns

Part of the Native Apple & SwiftUI: The iOS Ecosystem hub. Browse all VP0 topics →

Keep reading

Guides 6 min read

Image Outpainting Brush Tool UI in SwiftUI

The model generates; the app builds the spec. A PencilKit mask layer, feathered edges as the quality lever, and cost shown before every tap.

Lawrence Arya · June 7, 2026

Guides 5 min read

Pet Breed Identifier Camera AI UI in SwiftUI

The model classifies; the app presents uncertainty honestly. On-device Core ML, top-three confidences, capture coaching, and a fun estimate, not a pedigree.

Lawrence Arya · June 7, 2026

Guides 5 min read

AI Agent Thinking Animation in SwiftUI: Honest Motion

The SwiftUI vocabulary for AI activity: thinking dots, streaming text, named tool states, and typing animations that never fake what already arrived.

Lawrence Arya · June 5, 2026

Guides 5 min read

AI Essay Grader Feedback Highlight UI: Teacher in the Loop

Design an AI essay grading UI: span-anchored highlights, rubric-mapped feedback categories, the teacher approval pass, and student views built for revision.

Lawrence Arya · June 5, 2026

Guides 5 min read

SwiftUI Audio Transcription Template: Whisper On-Device

Build a SwiftUI transcription app with Whisper on-device via WhisperKit or Apple's Speech framework: live partials, model size trade-offs, and privacy honesty.

Lawrence Arya · June 5, 2026

Guides 9 min read

Build a Stock Market Heat Map Grid UI in SwiftUI

A market heat map colors and sizes tiles by gain and market cap. Here is how to build the stock market heat map grid in SwiftUI, with an accessible color scale.

Lawrence Arya · June 9, 2026