TUTORIALS 11 min read

The Complete AI Image Generation Workflow: From Prompt to Production-Ready Asset

A practical workflow for creating consistent, high-quality AI images using Midjourney, Flux, and ComfyUI. Covers prompting, upscaling, editing, and batch production.

By EgoistAI ·
The Complete AI Image Generation Workflow: From Prompt to Production-Ready Asset

Making one good AI image is easy. Making twenty consistent, brand-aligned, production-ready images for a real project is hard. This guide covers the complete workflow — from initial concept to final delivery — that professional AI artists and creative teams use in 2026.

We’ll cover three generation tools (Midjourney v7, Flux 1.1 Pro, and ComfyUI with SDXL/Flux), because no single tool is best for every use case. The workflow principles apply regardless of which tool you choose.

The Workflow Overview

1. Brief → 2. Style Reference → 3. Prompt Engineering → 4. Generation
    ↓                                                         ↓
5. Selection → 6. Upscaling → 7. Editing → 8. Format Export

Most people jump straight to step 4 (generation) and wonder why their results are inconsistent. The preparation steps (1-3) determine 80% of the output quality.

Step 1: Define the Brief

Before touching any AI tool, answer these questions:

## Image Brief Template

**Purpose:** What is this image for? (blog header, social media, product mockup, etc.)
**Dimensions:** What aspect ratio and resolution do you need?
**Style:** What visual style? (photorealistic, illustration, 3D render, flat design)
**Mood:** What emotion should it convey? (professional, playful, dramatic, calm)
**Color Palette:** Any brand colors or restrictions?
**Subject:** What must be in the image?
**Exclusions:** What must NOT be in the image?
**Consistency:** Does this need to match existing images? (if so, attach references)
**Quantity:** How many variations do you need?

A clear brief prevents the most common waste: generating 50 images and using none because they don’t match what was actually needed.

Step 2: Build a Style Reference Library

Consistency across multiple images requires style references. Collect 5-10 reference images that represent the visual language you want.

For Midjourney: Style Reference (—sref)

Midjourney v7’s --sref parameter lets you provide a URL to a style reference image:

A modern office workspace with natural lighting --sref https://example.com/your-style-ref.jpg --sw 100 --ar 16:9
  • --sref: URL to your style reference image
  • --sw 100: Style weight (0-1000, higher = stronger style influence)
  • --ar 16:9: Aspect ratio

For a series of images, use the same --sref across all generations. This is the simplest path to visual consistency.

For Flux: IP-Adapter and ControlNet

Flux doesn’t have a built-in style reference parameter, but through ComfyUI, you can use IP-Adapter to inject style:

# ComfyUI workflow nodes:
1. Load Flux model
2. Load IP-Adapter model
3. Load style reference image
4. IP-Adapter Apply (strength: 0.6-0.8)
5. KSampler (steps: 28, cfg: 3.5, scheduler: euler)
6. VAE Decode
7. Save Image

The IP-Adapter strength is crucial: 0.4-0.6 gives a subtle style influence; 0.8-1.0 closely mimics the reference. Start at 0.6 and adjust.

Step 3: Prompt Engineering for Images

Image prompts are fundamentally different from text prompts. Here’s the structure that works:

The Anatomy of an Effective Image Prompt

[Subject] + [Environment/Setting] + [Style/Medium] + [Lighting] + [Camera/Composition] + [Mood/Atmosphere] + [Quality Modifiers]

Example:

A senior software engineer working at a standing desk with three monitors, 
modern minimalist office with floor-to-ceiling windows showing a city skyline, 
editorial photography style, soft natural window light with subtle rim lighting, 
shot from a slight low angle with shallow depth of field, 
focused and confident atmosphere,
8K, ultra-detailed, professional color grading

Prompt Component Library

Build a reusable library of prompt components:

Style modifiers:

Photorealistic:     "editorial photography, DSLR, 85mm lens"
Illustration:       "digital illustration, clean linework, vibrant colors"
3D Render:          "3D render, octane render, subsurface scattering"
Flat Design:        "flat design, vector art, geometric shapes, bold colors"
Cinematic:          "cinematic, anamorphic lens, film grain, color graded"

Lighting modifiers:

Natural:      "soft natural light, golden hour, window light"
Studio:       "studio lighting, three-point lighting, softbox"
Dramatic:     "dramatic chiaroscuro lighting, strong shadows"
Neon:         "neon lighting, cyberpunk, reflective surfaces"
Ambient:      "ambient occlusion, soft diffused light, overcast"

Composition modifiers:

Portrait:     "close-up, shallow depth of field, bokeh background"
Wide:         "wide angle, establishing shot, environmental"
Overhead:     "top-down, bird's eye view, flat lay"
Isometric:    "isometric view, 30-degree angle, technical illustration"

Step 4: Generation — Tool Selection

When to Use Midjourney v7

Best for:

  • Quick concept exploration (30 seconds per generation)
  • Photorealistic images of people and environments
  • Art direction with style references
  • Non-technical users who prefer a chat interface
/imagine A minimalist SaaS dashboard UI with analytics charts and a clean 
sidebar navigation, on a MacBook Pro screen sitting on a wooden desk, 
soft natural lighting from the left, product photography style, 
shallow depth of field --ar 16:9 --sref [your-style-url] --v 7

When to Use Flux 1.1 Pro

Best for:

  • Text rendering in images (Flux handles text better than any other model)
  • Precise prompt following (Flux is more literal than Midjourney)
  • API-based batch generation
  • Integration into automated pipelines
import replicate

output = replicate.run(
    "black-forest-labs/flux-1.1-pro",
    input={
        "prompt": "A professional blog header image showing a laptop with code on screen, minimalist desk setup, soft gradient background in deep blue to purple, the text 'AI Engineering' visible on the screen, editorial photography style",
        "aspect_ratio": "16:9",
        "output_format": "webp",
        "output_quality": 90,
        "num_inference_steps": 28,
    }
)

When to Use ComfyUI

Best for:

  • Maximum control over every generation parameter
  • Custom workflows with ControlNet, IP-Adapter, inpainting
  • Batch processing with consistent settings
  • Running locally (free, no API costs)

ComfyUI has a steep learning curve but offers unmatched flexibility. A typical production workflow:

Load Checkpoint → CLIP Text Encode → KSampler → VAE Decode → Upscale → Save

Additional nodes for production:
- ControlNet (for pose/composition control)
- IP-Adapter (for style consistency)  
- Face Detailer (for face quality)
- Inpainting (for selective edits)

Step 5: Selection and Curation

Generate 4-8 variations per concept. Evaluate each against your brief:

## Image Evaluation Checklist
□ Does it match the brief's subject requirements?
□ Is the style consistent with references?
□ Are there any anatomical errors (hands, faces)?
□ Is the composition balanced?
□ Does the color palette match brand guidelines?
□ Are there any artifacts or visual glitches?
□ Is text (if any) readable and correctly spelled?
□ Would this look professional in the final context?

Be ruthless. It’s faster to generate a new batch than to fix a fundamentally flawed image.

Step 6: Upscaling

Most AI generators output at 1024x1024 or similar resolutions. Production assets typically need 2x-4x more resolution.

1. AI Generation (1024x1024)

2. First upscale: Topaz Photo AI or Real-ESRGAN (4x → 4096x4096)

3. Optional: Magnific AI for creative detail enhancement

4. Final resize to target dimensions

Using ComfyUI, you can integrate upscaling into the generation workflow:

KSampler (1024x1024) → Upscale Latent (2x) → KSampler (high-res fix, 0.4 denoise) → VAE Decode → Save (2048x2048)

The high-res fix approach (upscale in latent space, then re-denoise at low strength) adds coherent detail rather than just scaling pixels.

Step 7: Post-Processing

Photoshop / GIMP Fixes

Even the best AI images need minor fixes:

  • Color correction to match brand palette
  • Cropping to exact required dimensions
  • Spot removal for minor artifacts
  • Background cleanup for product images

Batch Processing with ImageMagick

For bulk post-processing:

# Resize all images to 1200x630 (blog headers)
for img in generated/*.webp; do
    magick "$img" -resize 1200x630^ -gravity center -extent 1200x630 \
        -quality 85 "output/$(basename $img)"
done

# Add consistent color grading (warm tone)
for img in generated/*.webp; do
    magick "$img" -modulate 100,95,102 -brightness-contrast 2x5 \
        -quality 85 "output/$(basename $img)"
done

# Convert to multiple formats
for img in output/*.webp; do
    base=$(basename "$img" .webp)
    magick "$img" -quality 85 "output/${base}.jpg"
    magick "$img" -resize 600x315 "output/${base}-thumb.webp"
done

Step 8: Format and Export

Different platforms require different formats:

PlatformFormatDimensionsMax Size
Blog headerWebP1200x630200KB
Twitter/XPNG/JPG1200x6755MB
InstagramJPG1080x108030MB
LinkedInPNG1200x62710MB
ThumbnailWebP400x22550KB
Print (A4)TIFF/PNG3508x2480N/A

Always export at the exact dimensions needed. Don’t rely on the platform to resize — they’ll compress aggressively and degrade quality.

Building a Consistent Image System

For a project that needs 20+ consistent images (like a blog, course, or marketing campaign), systematize:

1. Create a Prompt Template

[SUBJECT]: {varies per image}
[STYLE]: editorial photography, soft natural lighting, shallow depth of field
[PALETTE]: deep navy (#1a1a2e), warm gold (#e8d5b5), clean white (#fafafa)
[MOOD]: professional, approachable, modern
[COMPOSITION]: rule of thirds, negative space on left for text overlay
[QUALITY]: 8K, ultra-detailed, professional color grading

2. Generate a Style Reference Image First

Create one “hero” image that defines the visual language. Use this as the --sref (Midjourney) or IP-Adapter reference (ComfyUI) for all subsequent generations.

3. Batch Generate with Variables

subjects = [
    "AI chatbot interface on a laptop screen",
    "data visualization dashboard with charts",
    "team meeting with AI assistant on screen",
    "developer coding with AI pair programming",
]

base_prompt = """
{subject}, modern minimalist office environment,
editorial photography, soft window lighting,
shallow depth of field, professional color grading,
clean and modern aesthetic
"""

for subject in subjects:
    prompt = base_prompt.format(subject=subject)
    generate_image(prompt, style_ref="hero-image.webp")

4. Quality Control Checklist

After each batch:

  • Lay out all images side by side
  • Check for visual consistency (same lighting, same color temperature)
  • Identify outliers and regenerate
  • Apply identical post-processing to all images

Common Mistakes

  1. No style reference. Generating images without references produces random visual styles. Always have a reference.

  2. Over-prompting. Cramming 200 words into a prompt doesn’t make it better. Focus on the most important visual elements.

  3. Ignoring aspect ratio. Generating square images when you need 16:9 and then cropping wastes the most important parts of the composition.

  4. Skipping upscaling. 1024x1024 looks fine on a phone but terrible as a blog header on a desktop monitor.

  5. No batch consistency check. Looking at images individually instead of as a set. They might each look good but clash visually when displayed together.

The difference between amateur and professional AI image workflows isn’t the tool — it’s the process. Build the system, follow the steps, and your output will be consistent, on-brand, and production-ready every time.

Share this article

> Want more like this?

Get the best AI insights delivered weekly.

> Related Articles

Tags

AI image generationMidjourneyFluxComfyUIStable Diffusiontutorial

> Stay in the loop

Weekly AI tools & insights.