Nano Banana 2 Multi-Image Blending: How to Combine Up to 14 Reference Images
Learn how to blend up to 14 reference images in Nano Banana 2 with copy-paste prompts, blending weight controls, and a step-by-step editing workflow.
Nano Banana 2 — Google’s viral AI image generator built on Gemini 3.1 Flash Image — launched on February 26, 2026, and it came with a feature list that made the image-generation crowd do a double take. The headliner: you can now feed it up to 14 reference images at once and blend them into a single coherent output. That’s not a typo. Fourteen.
If you’ve ever tried to describe a complex visual concept in words and watched an AI confidently misunderstand you, you already know why this matters. Multi-image blending flips the workflow — instead of writing a novel-length prompt, you show the model what you mean. Reference a product shot, a mood board, a texture, a face, a logo, and let Nano Banana 2 do the synthesis. This tutorial walks you through exactly how to do that, from setup to final output.
What You’ll Achieve
By the end of this tutorial, you’ll know how to upload multiple reference images, structure your prompts to give the model clear blending instructions, control how much weight each reference gets, and avoid the common mistakes that turn a brilliant mood board into visual soup. The techniques here apply whether you’re working in the Gemini app, AI Studio, the Gemini API, or Vertex AI.
What You Need
Access to Nano Banana 2 via any of its four entry points: the Gemini app (easiest, best for casual use), Google AI Studio (free, great for prompt experimentation), the Gemini API (for developers who want programmatic control), or Vertex AI (enterprise tier, highest throughput and SLA). The multi-image upload feature is available across all four, though the UI looks different in each. AI Studio is the recommended sandbox for this tutorial — it lets you see exactly what you’re sending to the model without extra layers of abstraction.
Note 💡
In the Gemini app, image uploads live behind the paperclip icon in the prompt bar. In AI Studio, use the media panel on the left sidebar. In the API, images go into the
partsarray as base64-encoded strings or Google Cloud Storage URIs. Vertex AI follows the same API schema.
Step 1: Prepare Your Reference Images
Before you touch a prompt, get your references in order. Nano Banana 2 accepts up to 14 images, but more isn’t always better. The model weighs all inputs simultaneously, so feeding it 14 loosely related images tends to produce something that pleases nobody. Start with 3–6 tightly curated references, add more only when you need to introduce a specific detail the smaller set can’t capture.
Format-wise, JPEG and PNG both work fine. Keep individual files under 20MB. Aspect ratios don’t need to match — the model handles mixed orientations — but wildly different resolutions in the same batch can sometimes bias the output toward the higher-res images. When in doubt, resize everything to a consistent resolution before uploading. For 4K output targets, 2160p source images give the model more to work with.
Pro tip ✅
Label your references mentally before you write the prompt. Think of them as Reference A (primary subject), Reference B (lighting/mood), Reference C (texture or material), and so on. Your prompt should explicitly call out what role each image plays — the model responds much better to “use the lighting from the second image” than to “blend everything together.”
Step 2: Structure Your Blending Prompt
The prompt architecture for multi-image blending follows a consistent pattern: subject anchor → reference instructions → style and technical parameters. Deviating from this order doesn’t break anything, but it does tend to produce less predictable results.
Here’s a foundational blending prompt to start with:
Subject: the woman from image 1, full body, standing pose.
Lighting: match the golden hour rim lighting from image 2.
Background: replace with the architectural environment from image 3.
Texture: apply the fabric texture from image 4 to her jacket.
Style: editorial fashion photography, 4K resolution, sharp focus, shot on medium format.
That prompt gives the model five distinct references with five distinct jobs. Notice that it never says “blend images 1 through 4 together” — vague merge instructions produce vague results. Specific role assignments produce specific results.
For product photography, the structure shifts slightly:
Product: the perfume bottle from image 1.
Surface: place it on the marble surface from image 2, exact material and veining.
Lighting setup: three-point studio lighting matching image 3, soft shadows.
Background: solid color sampled from image 4, no texture.
Output: commercial product photography, 4K, white balance neutral, no post-processing artifacts.
Pro tip ✅
The phrase “exact material” is doing real work in that product prompt. Nano Banana 2’s material rendering is strong enough to distinguish between “marble-ish” and “Carrara marble with grey veining.” Be that specific and you’ll be surprised what comes back.
Step 3: Control Blending Weight
When you want one reference to dominate and others to play a supporting role, say so directly in the prompt. Nano Banana 2 responds to relative weight language more reliably than most models.
Primary reference: image 1 — maintain the subject's facial features and expression exactly.
Secondary reference: image 2 — adapt the color palette only, do not import any compositional elements.
Accent reference: image 3 — borrow the background bokeh style, keep it subtle.
Style: portrait photography, natural skin tones, 4K, 85mm equivalent focal length.
Words like “exactly,” “only,” “subtle,” and “do not import” function as soft weight controls. They’re not as precise as a numerical slider, but they’re effective enough for most creative workflows. If you need surgical precision over blending ratios, the Gemini API on Vertex AI lets you experiment with temperature and sampling parameters that influence how strictly the model adheres to reference constraints.
Blend: take the architectural silhouette from image 1 and the night sky from image 2.
Dominant element: the building (image 1) should occupy 70% of the frame.
Atmosphere: import fog and color grading from image 3, apply to entire scene.
Do not: introduce any elements not present in the three references.
Output: long-exposure night photography aesthetic, 4K, cinematic aspect ratio 2.39:1.
Warning ⚠️
Telling the model “do not introduce any elements not present in the references” is useful but not a guarantee. Nano Banana 2 will occasionally hallucinate environmental details — a lamp post that wasn’t there, a reflection that doesn’t match the light source. Always review outputs critically before using them professionally. SynthID watermarks are embedded in every generated image, so AI-origin is traceable even after edits.
Step 4: Subject Consistency Across Multiple Outputs
One of Nano Banana 2’s headline features is subject consistency for up to five characters — meaning you can generate multiple images and have the same person, creature, or object look like itself across all of them. When you’re blending references, you can anchor consistency by always including the same character reference image and explicitly naming it as the consistency anchor.
Character consistency anchor: image 1 — this is the primary character. Maintain exact facial features, hair color, and build across all variations.
Scene: place the character from image 1 in the cafe environment from image 2.
Lighting: overcast daylight from image 3.
Action: seated, reading a book, relaxed posture.
Style: lifestyle photography, candid aesthetic, 4K, 35mm focal length.
For a multi-character scene, add a second anchor:
Character A: image 1 — maintain facial features exactly.
Character B: image 2 — maintain facial features exactly.
Scene: both characters from images 1 and 2 together in the street environment from image 3.
Interaction: they are having a conversation, facing each other, mid-laugh.
Lighting: match the afternoon sun direction from image 4.
Style: street photography, documentary aesthetic, 4K, 50mm focal length, slight film grain.
Step 5: Text Rendering in Blended Outputs
Nano Banana 2’s text rendering is the best in its generation — which means it’s actually useful now, not just occasionally impressive. When blending references that include typography or brand elements, you can specify text directly in the prompt rather than hoping the model picks it up from an image.
Base: the poster layout from image 1, maintain composition and color blocks.
Text overlay: replace all text in image 1 with the following — headline: "OPEN LATE" in bold sans-serif, centered; subline: "Every Friday Until Midnight" in regular weight, same font family.
Color: text color sampled from the lightest element in image 2.
Style: modern event poster, print-ready, 4K resolution, clean edges on all text elements.
Pro tip ✅
For text rendering, shorter strings are more accurate than long ones. If you need a paragraph of body copy, generate the layout without text first, then use a separate generation pass — or an editing workflow — to add the text block. Nano Banana 2’s editing mode handles text insertion on existing images cleanly.
Step 6: Social Media and Editorial Formats
Different platforms want different dimensions, and Nano Banana 2 accepts aspect ratio instructions directly in the prompt. Here are format-specific blending prompts for common use cases:
For Instagram square (1:1):
References: product from image 1, flat-lay styling from image 2, color palette from image 3.
Composition: overhead flat-lay, product centered, styling elements arranged symmetrically.
Aspect ratio: 1:1 square format.
Style: lifestyle product photography, warm tones, clean whites, Instagram aesthetic, 4K.
For LinkedIn editorial (16:9):
References: speaker from image 1 (maintain likeness), conference environment from image 2, brand color scheme from image 3.
Composition: speaker in foreground left, environment in background right, depth of field separation.
Aspect ratio: 16:9 landscape.
Style: professional editorial photography, corporate but not sterile, 4K, 85mm focal length.
For portrait/Story (9:16):
References: model from image 1, fashion garment detail from image 2, urban background from image 3.
Composition: full-body portrait, model centered, background blurred f/1.8 equivalent.
Aspect ratio: 9:16 portrait for Stories/Reels.
Style: fashion editorial, high contrast, 4K, dramatic directional lighting from image 4.
Pro tip ✅
Real-time web grounding in Nano Banana 2 means you can reference current visual trends directly — “trending editorial color palette February 2026” or “current minimalist UI design conventions” — and the model pulls actual context rather than hallucinating what it thinks trends look like. This is genuinely useful for keeping social content current without building a separate trend-research step into your workflow.
Step 7: The Editing Workflow
Multi-image blending isn’t always a one-shot process. Nano Banana 2’s editing mode lets you take a generated image and use it as a new reference for a second round. This is useful when the first blend gets the composition right but misses on color, or nails the lighting but loses a detail from one of your references.
The edit workflow: generate your first blend → if 80% right, feed it back as the primary reference → add the specific reference image that carries the detail you lost → prompt the edit explicitly.
Base image: the generated image [upload previous output].
Problem: the jacket texture from the original reference image 3 was lost in the first generation.
Fix: reapply the leather texture from image 3 to the jacket only. Do not alter anything else — maintain the face, background, lighting, and pose exactly as they appear in the base image.
Output: 4K, match the exact resolution and aspect ratio of the base image.
This iterative approach gives you much tighter control than trying to nail everything in one massive multi-reference prompt.
Pro tip ✅
Every Nano Banana 2 output carries a SynthID watermark embedded at the pixel level — invisible to the eye, detectable by Google’s verification tools. This survives most common edits including cropping, color grading, and moderate compression. If you’re delivering images to clients, disclose the AI origin. If you’re building on top of the API, SynthID is there whether you want it or not.
Where to Go From Here
Multi-image blending in Nano Banana 2 is one of those features that sounds like a gimmick until you’re twenty minutes in and realize you’ve just collapsed three hours of Photoshop compositing into a single prompt iteration. The ceiling is high — 14 reference images is more than most creative workflows will ever need — but the floor is also accessible enough that you don’t need to be a prompt engineer to get useful results out of it on day one.
Start with three references, be specific about what role each one plays, and build up from there. The model rewards clarity far more than creativity in the prompt itself — save the creativity for the references you choose.


