Nano Banana 2 on Vertex AI: The Enterprise Image Generation Setup Guide
Step-by-step tutorial: set up Nano Banana 2 (Gemini 3.1 Flash Image) on Vertex AI for enterprise image generation, with 8 copy-paste prompts.
Nano Banana 2 — Google’s viral AI image generator built on Gemini 3.1 Flash Image — landed on February 26, 2026, and the enterprise crowd immediately started asking the right question: can we run this properly at scale, with access controls, billing visibility, and all the grown-up infrastructure that “just use the Gemini app” doesn’t provide? The answer is yes, and it lives on Vertex AI.
Vertex AI is Google Cloud’s managed AI platform, and it’s where Nano Banana 2 gets serious. You get SynthID watermarking baked in, model versioning, project-level IAM controls, regional data residency options, and API access that your legal team won’t have a panic attack over. This tutorial walks through the full setup — from provisioning a project to running prompts that actually produce the results you want, including subject consistency across up to five characters, 4K output, and precise text rendering.
Whether you’re a solo developer who wants API access without the Gemini app’s consumer guardrails, or an enterprise team building a product image pipeline, this is the setup guide. Bring your Google Cloud account and a decent prompt.
What You’ll Achieve
By the end of this tutorial, you’ll have a working Vertex AI project configured for Nano Banana 2, understand how to call the API with production-grade prompts, know how to handle subject consistency across multiple characters, and have at least eight copy-paste prompts ready to drop into your pipeline. You’ll also know which settings to tweak for 4K output and how the real-time web grounding feature changes what’s possible for editorial and news-adjacent use cases.
Requirements Before You Start
You need a Google Cloud account with billing enabled — Vertex AI won’t activate without it. A project already created is helpful but not required; the tutorial covers that step. You’ll also want the Google Cloud CLI installed locally if you plan to script API calls, though the Vertex AI Studio interface in the browser works fine for testing. Basic familiarity with REST APIs or Python is useful for the integration section, but you can get through the setup entirely through the UI.
Step 1 — Create or Select a Google Cloud Project
Go to console.cloud.google.com and either create a new project or select an existing one. Give it a name that makes billing attribution obvious — “nanbanana-prod” or “imageGen-Q1-2026” beats “my-project-3” when your finance team asks questions. Note the Project ID; you’ll need it in every API call.
Navigate to the API Library and enable two APIs: the Vertex AI API and the Cloud Storage API (you’ll want Cloud Storage for batching output at scale). Search each by name and click Enable. This takes about thirty seconds per API.
Step 2 — Set Up IAM Permissions
Nano Banana 2 on Vertex AI respects Google Cloud’s IAM model, which is the main reason enterprises prefer this over direct Gemini API access. For a service account running your image generation pipeline, assign the Vertex AI User role at minimum. If your pipeline needs to write output to Cloud Storage automatically, add Storage Object Creator to the same service account.
Pro tip ✅
Create a dedicated service account specifically for image generation — don’t reuse your general Vertex AI service account. When you need to audit what generated what, or revoke access cleanly, you’ll thank yourself. Name it something explicit like
nanbanana-image-sa@your-project.iam.gserviceaccount.com.
Download the service account key as a JSON file and store it somewhere your application can reach it — ideally via Secret Manager rather than committed to your repo. Set the environment variable: GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/key.json.
Step 3 — Access Nano Banana 2 in Vertex AI Studio
In the Google Cloud Console, go to Vertex AI → Vertex AI Studio → Generate Images. Nano Banana 2 (Gemini 3.1 Flash Image) appears in the model selector. Select it. The Studio interface lets you test prompts before you commit them to code, which is exactly how you should use it — treat Studio as your prompt sandbox, not your production environment.
The Studio UI exposes the key parameters directly: output resolution (up to 4K), number of images per generation (1–4), safety filter level, and the web grounding toggle. Turn web grounding on when your prompts reference anything time-sensitive — current product designs, recent architectural styles, or anything where “latest” actually matters. Leave it off for purely stylistic or abstract prompts where you don’t need real-world accuracy.
Pro tip ✅
The web grounding feature in Nano Banana 2 pulls real-world visual references to anchor your output. It’s genuinely useful for prompts like “a 2026 electric vehicle dashboard UI” where the model would otherwise hallucinate controls that looked current in 2023. Toggle it on, compare outputs, and decide per use case.
Step 4 — Your First API Call
Once you’re happy with how a prompt performs in Studio, move it to code. Here’s the basic Python structure using the Vertex AI SDK:
import vertexai
from vertexai.preview.vision_models import ImageGenerationModel
vertexai.init(project="your-project-id", location="us-central1")
model = ImageGenerationModel.from_pretrained("gemini-3-1-flash-image-001")
images = model.generate_images(
prompt="Your prompt here",
number_of_images=1,
aspect_ratio="16:9",
output_mime_type="image/png",
)
images[0].save(location="output.png")
Replace your-project-id with your actual Project ID and adjust the location to whichever region you’ve provisioned. The aspect_ratio parameter accepts “1:1”, “9:16”, “16:9”, “4:3”, and “3:4” — pick based on your output format before you start batch processing, not after.
Step 5 — Prompts That Actually Work
This is where most Vertex AI tutorials tap out and leave you with “generate an image of a cat.” That’s not useful. Here are eight production-grade prompts across different use cases, built for Nano Banana 2’s specific strengths.
Product Photography — E-commerce
Studio product photograph of a matte black insulated water bottle, 750ml, placed on a white marble surface with soft natural light from the left, minimal shadow, clean white background, sharp focus on label area, 4K resolution, commercial photography style
The explicit mention of “label area” in sharp focus triggers Nano Banana 2’s precise text rendering capability — critical if you need legible product labels in your output. The marble surface and lighting direction give the model enough environmental constraints to produce something consistent across multiple generations.
Portrait — Editorial Style
Editorial portrait of a woman in her early 40s, silver-streaked dark hair pulled back, wearing a structured navy blazer, seated at a minimalist concrete desk, shallow depth of field, natural window light, New York Times Magazine style photography, 4K, film grain texture
“New York Times Magazine style” is a known aesthetic shorthand that Nano Banana 2 recognizes well — it pushes toward high contrast, thoughtful composition, and journalistic restraint rather than over-processed Instagram aesthetics.
Subject Consistency — Two Characters
Two colleagues in a modern open-plan office, woman with short red hair and green blazer reviewing documents, man with dark beard and white shirt standing behind her pointing at laptop screen, warm afternoon light through floor-to-ceiling windows, photorealistic, 4K, candid workplace photography style
Nano Banana 2 supports up to five characters in a single scene while maintaining subject consistency. The key is giving each character a distinct visual identifier — hair color, clothing color, and one physical feature. Vague descriptors like “a woman” and “another woman” produce drift. Specific ones like “woman with short red hair” give the model anchor points.
Pro tip ✅
For multi-character scenes, always assign each person at least three distinct visual attributes: hair style/color, clothing color, and one physical detail like glasses or beard. With fewer anchors, Nano Banana 2 tends to blend characters together in complex compositions. Five attributes per character is overkill — three is the sweet spot.
Social Media — Square Format
Overhead flat lay, wooden table surface, ceramic coffee mug with latte art, open notebook with handwritten text reading "Monday Goals", small succulent plant, brass pen, morning light, warm tones, lifestyle photography, square format 1:1, 4K
Notice “handwritten text reading ‘Monday Goals'” — this is Nano Banana 2’s text rendering in action. Specify exactly what text you want, in quotes, and the model renders it. For longer text, keep it under six words per instance for best legibility.
Architectural Visualization
Exterior architectural render of a two-story Scandinavian-style family home, white rendered walls, dark timber accents, large floor-to-ceiling windows, surrounded by mature pine trees, golden hour lighting, fresh snow on roof, photorealistic, wide angle, 4K resolution
Architectural prompts benefit from specifying the time of day explicitly — “golden hour” changes shadow angle and warmth dramatically compared to “midday” or “overcast.” For client presentations, generate the same structure at three different lighting conditions and let stakeholders choose.
Infographic-Style Visual with Text
Clean minimal infographic poster, white background, three bold statistics arranged vertically: "47M users", "3.2B images", "99.1% uptime", each with a simple geometric icon, sans-serif typography, blue and charcoal color palette, professional design, 4K
This is Nano Banana 2’s text rendering pushed harder — multiple text elements in a structured layout. The model handles this well when you give it a clear typographic style (“sans-serif”), specific colors, and a layout description (“arranged vertically”). Avoid asking for more than five text elements in one image; accuracy degrades past that.
Warning ⚠️
SynthID watermarks are embedded in every image Nano Banana 2 generates — including on Vertex AI. They’re invisible to the eye and survive most common image transformations (compression, cropping, color adjustments). This is non-negotiable and not something Vertex AI configuration can disable. Factor this into your workflow if you’re generating images for contexts where watermark detection matters.
Portrait — Product Model Consistency
Young woman with curly auburn hair, light skin, freckles, wearing a cream linen summer dress, walking through a sunlit European cobblestone street, natural candid style, shallow depth of field, 4K, fashion editorial photography — image 1 of product campaign series
Adding “image 1 of product campaign series” in the prompt doesn’t literally generate a series, but it signals to the model that this is a character meant to recur, which produces slightly more stable feature rendering. When you generate subsequent images, repeat the full character description exactly — copy-paste the attribute block verbatim.
Nano Banana 2 vs. Nano Banana (Original) — Quick Comparison Prompt
Hyperrealistic close-up of a vintage mechanical watch, rose gold case, ivory dial, arabic numerals, sapphire crystal, macro photography, studio lighting, 4K, product photography for luxury catalog
This prompt is a reliable benchmark between Nano Banana 2 and the original. Run it on both. The difference you’ll notice most in the output: 2 handles the text on the watch dial (numerals, brand text) with noticeably sharper accuracy, and the micro-texture on the watch case has more definition. The original isn’t bad — 2 is just better at fine material detail.
Step 6 — Scaling with Batch Prediction
For production pipelines generating dozens or hundreds of images, Vertex AI’s batch prediction endpoint is more cost-efficient than hitting the online prediction endpoint in a loop. Create a JSONL input file where each line is a separate generation request, point Vertex AI at a Cloud Storage bucket for output, and submit a batch job through the Console or SDK. Output lands in your bucket automatically, tagged with request metadata for traceability.
Pro tip ✅
Always include a
seedparameter in batch jobs when you need reproducibility across runs. Nano Banana 2 supports deterministic generation via seed values — same prompt, same seed, same output. Essential for A/B testing where you want to isolate prompt changes as the only variable.
Note 💡
Vertex AI gives you proper spend controls that the Gemini API and AI Studio don’t. Set a budget alert at 80% of your monthly cap before you run your first batch job. Image generation at 4K across hundreds of requests adds up faster than text generation, and Vertex AI will happily process everything you throw at it right up to your billing limit.
What This Means for Your Stack
Nano Banana 2 on Vertex AI is the version you choose when the Gemini app is too casual and direct API access is too unmanaged. You get the same image quality — the same 4K output, the same multi-character consistency, the same text rendering that makes competing models look like they skipped typesetting class — but wrapped in the access controls, audit logging, and regional compliance options that production workloads actually require. The setup takes maybe two hours end-to-end for someone doing it for the first time. After that, you have an image generation pipeline that your cloud billing dashboard, your security team, and your design team can all live with simultaneously, which is rarer than it should be.


