Nano Banana 2 in AI Studio: Free Tier Tutorial for Developers
Step-by-step tutorial for Nano Banana 2 (Gemini 3.1 Flash Image) in AI Studio’s free tier — prompts, API setup, and tips for developers.
Google’s image generation story has been quietly getting good. Nano Banana 2 — built on Gemini 3.1 Flash Image — landed on February 26, 2026, and it fixes most of the things that made earlier Gemini image generators feel like a demo rather than a tool. Subject consistency across multiple characters, sharper text rendering, 4K output, and real-time web grounding are the headline features. The free tier in AI Studio means you can test all of this without spending a cent before you decide whether to wire it into a real project.
This tutorial walks through the full workflow: getting set up in AI Studio, structuring prompts that actually produce consistent results, using the API for developers who want to go beyond the UI, and a stack of copy-paste prompts covering the formats you’ll actually need — product shots, editorial images, social media content, portraits, and more.
What You’ll Achieve
By the end of this tutorial you’ll know how to access Nano Banana 2 through AI Studio’s free tier, write prompts that get consistent output across multiple subjects, render text inside images without it turning into gibberish, and make a basic API call to Gemini 3.1 Flash Image from your own code. You’ll also know the limits of the free tier so there are no surprises when you hit a rate ceiling mid-project.
Requirements
You need a Google account and access to Google AI Studio at aistudio.google.com. The free tier is available without a paid plan — you just sign in and you’re in. For API access, you’ll generate an API key directly inside AI Studio at no cost. If you’re on Vertex AI or want production-scale throughput, that’s a separate billing conversation, but everything in this tutorial runs on the free tier. No credit card required to follow along.
Step 1 — Access Nano Banana 2 in AI Studio
Head to aistudio.google.com and sign in. In the model selector, choose Gemini 3.1 Flash Image — that’s the underlying model powering Nano Banana 2. If you don’t see it immediately, check the “Image” or “Multimodal” filter in the model list. Once selected, you’ll see a prompt input that accepts both text and image uploads, which matters when you’re doing edit workflows rather than generation from scratch.
The interface gives you a system prompt field, a temperature slider, and output settings. For image generation specifically, leave temperature at default (1.0) unless you have a specific reason to push it higher for more experimental outputs. Higher temperature on image generation tends to produce creative chaos rather than creative variation — fun for exploration, not great for consistency.
Step 2 — Your First Nano Banana 2 Prompt
The fastest way to understand what Nano Banana 2 can do differently is to start with a prompt that would have tripped up earlier Gemini image models: multiple subjects in a consistent visual style, with text in the frame. Here’s a starting point:
A product photograph of three artisan coffee cups on a white marble surface, studio lighting from the left, shallow depth of field, each cup has a different latte art pattern, photorealistic, 4K resolution
This prompt does several things on purpose. It specifies a concrete surface and lighting direction so the model isn’t guessing at context. It asks for variation within consistency — three cups, different patterns — which tests subject consistency without requiring identical objects. “Photorealistic, 4K resolution” signals output quality. You should get a clean product shot that would work on an e-commerce page without much editing.
Now try adding text to the scene:
A product label design for a craft coffee brand called "PEAK ROAST", minimal sans-serif typography, the label shows elevation lines like a topographic map, warm brown and cream color palette, white background, print-ready quality
Text rendering is where Nano Banana 2 earns its upgrade status. Earlier Gemini image models frequently mangled custom text — letters would swap, spacing would collapse. With Gemini 3.1 Flash Image, short brand names and labels render correctly at a much higher rate. “PEAK ROAST” in all caps is a deliberate test: short, bold, should come back clean.
Pro tip ✅
For text rendering, use all-caps for brand names and short labels. Nano Banana 2 handles uppercase strings more reliably than mixed case, especially for 3–8 character strings. If you need mixed case, add “precise typography, clean letterforms” to your prompt.
Step 3 — Portrait and Editorial Prompts
Portrait prompts benefit from Nano Banana 2’s subject consistency feature, which can maintain coherent character appearance across up to five subjects in a single scene. Here’s a prompt that tests that:
Editorial portrait of two entrepreneurs in a minimalist co-working space, natural light from floor-to-ceiling windows, one person standing reviewing documents, one seated at a laptop, business casual attire, medium format photography aesthetic, muted tones
The key structure here is “person A doing X, person B doing Y” — giving each subject a distinct action reduces the model’s tendency to blend or duplicate features. “Medium format photography aesthetic” pulls the output away from generic stock photo territory.
Close-up portrait of a woman in her 40s, architect, standing in front of a half-built wooden structure, golden hour lighting, thoughtful expression, Canon R5 aesthetic, 85mm lens, slight bokeh in background, editorial magazine style
Camera reference cues like “Canon R5 aesthetic” and “85mm lens” consistently improve portrait output quality in Gemini 3.1 Flash Image. They’re not technically processed as hardware specs — the model uses them as style anchors. “Golden hour lighting” combined with “slight bokeh” gives you that editorial-warm look that social media content teams want without needing a photographer on site.
Pro tip ✅
For editorial portraits, always specify the subject’s action or mental state (“thoughtful expression”, “mid-conversation”, “reviewing documents”) rather than just their appearance. Nano Banana 2 uses behavioral context to position subjects naturally in the frame — static “standing and smiling” prompts produce stock-photo stiffness.
Step 4 — Social Media and Product Formats
Square and vertical formats matter for social, and you can specify aspect ratio intent directly in the prompt even though AI Studio’s default output is square:
Vertical social media post format, a smoothie bowl from directly above (flat lay), vibrant acai base topped with fresh strawberries, banana slices, granola, and edible flowers, white ceramic bowl on a light wood surface, natural morning light, food photography, Instagram-ready
“Vertical social media post format” and “flat lay” together push composition toward the 9:16 crop that Instagram Reels and Stories demand. The model interprets format cues as compositional instructions.
Product hero shot, a matte black water bottle standing upright against a gradient background transitioning from deep navy to electric blue, single dramatic light source from the right casting a sharp shadow left, minimal composition, luxury goods aesthetic
Product hero shots respond well to lighting specificity. “Single dramatic light source from the right casting a sharp shadow left” is more useful than “dramatic lighting” because it tells the model exactly what spatial relationship you want between light, subject, and shadow. The “luxury goods aesthetic” cue shifts material rendering toward higher specular detail.
Pro tip ✅
When generating product shots for e-commerce, include the shadow direction explicitly. Nano Banana 2 produces commercially usable results when you specify “hard shadow”, “soft diffused shadow”, or “no shadow, seamless white background” — leaving it unspecified usually gives you a generic studio look that clashes with existing product imagery on a site.
Step 5 — Real-Time Web Grounding in Prompts
One of the genuinely useful additions in Nano Banana 2 is real-time web grounding — the model can reference current visual contexts to inform image generation. In practical terms, this means you can reference current design trends, recent architectural styles, or contemporary visual references and get outputs that feel current rather than trained-data-static.
A café interior designed in the aesthetic that dominated upscale coffee shops in early 2026, warm terrazzo floors, arched plaster walls, pendant lighting with visible Edison filaments, a hand-lettered menu board, plants trailing from wooden shelving, photo-realistic interior design render
The phrase “aesthetic that dominated upscale coffee shops in early 2026” is a web grounding cue. The model draws on current indexed visual data to interpret that reference rather than falling back purely on training data. You won’t always get perfect results — it’s not magic — but for trend-adjacent content, it reduces how dated your outputs feel.
Note 💡
Web grounding works best for broad cultural and design references rather than highly specific current events. It’s genuinely useful for “current design trends” or “contemporary fashion aesthetic 2026” — it’s less useful for “reproduce the exact look of [specific recent movie]” where copyright and specificity collide.
Step 6 — API Access in AI Studio
For developers, the real value is in the API. In AI Studio, click “Get API key” in the left sidebar. You get a free-tier key immediately. The free tier has rate limits — the specific numbers can shift, so check the current quota page in AI Studio — but for prototyping and light production use, it’s enough to build with.
Here’s the minimal Python call to Gemini 3.1 Flash Image via the Gemini API:
import google.generativeai as genai
import PIL.Image
import io
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-3.1-flash-image")
response = model.generate_content(
"A minimalist poster design for a jazz festival, bold geometric shapes,
deep indigo and gold color palette, 1960s inspiration,
the text reads BLUE NOTE FEST 2026, clean sans-serif typography"
)
for part in response.candidates[0].content.parts:
if hasattr(part, 'inline_data'):
image_data = part.inline_data.data
image = PIL.Image.open(io.BytesIO(image_data))
image.save("output.png")
This call generates a single image and saves it locally. The model name string “gemini-3.1-flash-image” is what targets Nano Banana 2 specifically. For Vertex AI access, the setup uses the vertexai SDK instead of google.generativeai, and you’ll need a GCP project with the Vertex AI API enabled — same prompts, different authentication path.
Pro tip ✅
When using the API for batch generation, add a short delay between calls even within free tier limits. Hitting rate limits mid-batch means restarting the whole run. A 2–3 second sleep between calls costs you nothing in generation quality and saves significant frustration when you’re generating 50 product images at midnight.
Step 7 — The Editing Workflow
Nano Banana 2 supports image editing via the same API endpoint — you pass an existing image alongside a text instruction. This is where the subject consistency feature becomes practically useful. You can generate a base image, then make targeted edits without the character or product drifting into something different.
INPUT IMAGE: [product shot of blue sneaker]
EDIT PROMPT: Change the colorway to all-white with a gum sole, keep identical lighting, shadow, and composition, photorealistic
The edit prompt structure is straightforward: describe what changes, then explicitly tell it what to preserve. “Keep identical lighting, shadow, and composition” is the consistency anchor. Without that instruction, edits have a tendency to reinterpret the whole scene rather than making the surgical change you asked for.
Warning ⚠️
All images generated by Nano Banana 2 — whether through the Gemini app, AI Studio, or the API — carry a SynthID watermark embedded at the pixel level by Google DeepMind. SynthID is invisible to the eye and survives most common image edits including compression, cropping, and color adjustments. It doesn’t affect commercial use rights under Google’s terms, but it does mean every image is traceable to its AI origin. Worth knowing before you send anything to a client who asks “was this AI-generated?”
Free Tier vs. Nano Banana Pro — What Actually Differs
The free tier in AI Studio gives you access to the same Gemini 3.1 Flash Image model that Nano Banana Pro users hit — the difference is throughput, not capability. Free tier caps requests per minute and per day. Nano Banana Pro (accessed through a paid Gemini subscription or Vertex AI production billing) removes those caps and adds priority queue access, which matters when the servers are busy and your free-tier calls start timing out. For tutorials, side projects, and client prototypes, free tier is more than adequate. For production pipelines generating hundreds of images daily, you’ll want to budget for the paid tier before you build something that depends on it.
What This Means for You
Nano Banana 2 in AI Studio is the most accessible entry point Google has offered for serious image generation. The free tier is genuinely usable — not a crippled demo — and the API integration is clean enough that a developer can go from zero to a working image generation endpoint in under an hour. The subject consistency improvements and text rendering upgrades make it a practical tool for product work and branded content, not just art experimentation. If you’ve been waiting for Google’s image generation to become something you’d actually ship, this is the version where that argument gets easier to make.


