The Nano Banana Prompt Formula That Actually Works Every Time
Master the five-layer Nano Banana prompt formula — Subject, Action, Environment, Lighting, Spec — with 8 copy-paste prompts for portraits, products, and more.
Nano Banana has a dirty secret: it’s not the model that’s holding you back. It’s your prompts. Google’s AI image generator is genuinely capable — subject consistency across multiple characters, sharp text rendering, crisp high-resolution output — but none of that matters if you’re feeding it vague instructions and crossing your fingers. The model can only work with what you give it.
The good news is there’s a repeatable formula. Not a magic word list or a Reddit paste-and-pray thread — an actual structural approach to writing prompts that reliably produces what you’re picturing. This guide breaks it down layer by layer, with copy-paste prompts you can use right now across portrait, product, editorial, and social formats.
Why Prompt Structure Matters More Than Prompt Length
Nano Banana, like any Gemini-based image generator, responds to clarity of intent, not word count. Dumping 200 words of adjectives into the prompt box doesn’t help — it confuses the model about what actually matters. The trick is organizing your description so the model knows what’s the subject, what’s the context, what’s the mood, and what’s the technical spec. Those are four distinct layers, and keeping them distinct is the whole game.
The formula looks like this: [Subject] + [Action/State] + [Environment] + [Mood/Lighting] + [Technical spec]. That order isn’t arbitrary. The model reads left-to-right and front-loads weight to earlier tokens, so your subject should always come first. Everything after it is context that shapes how the subject is rendered.
The Five-Layer Formula, Layer by Layer
Layer 1 — Subject: Be specific. Not “a woman” but “a woman in her 40s with silver-streaked hair and paint-stained hands.” Not “a coffee cup” but “a ceramic espresso cup with a hairline crack along the handle.” The more precise your subject description, the less the model has to guess — and guessing is where consistency breaks down.
Layer 2 — Action or State: What is the subject doing, or what condition is it in? “Standing” versus “mid-stride” versus “frozen, looking over her shoulder” produces three very different images. Static descriptions (“sitting”) work fine for product shots. Dynamic action (“reaching up to adjust a neon sign”) works better for editorial and narrative images.
Layer 3 — Environment: Where is this happening? Include architectural details, weather, time of day, and surface textures. “In a busy Tokyo convenience store” is fine. “In a Tokyo convenience store at 2am, fluorescent lights flickering, rain-streaked windows behind the register” is better. Environment is where mood lives — don’t skip it.
Layer 4 — Mood and Lighting: This layer is where a lot of prompts drop the ball. Lighting is not decoration — it’s structure. Call out your light source explicitly: “single overhead tungsten bulb,” “soft north-facing window light,” “harsh midday sun from camera left,” “neon backlight with lens flare.” Add a mood word if it helps: cinematic, melancholic, tense, playful.
Layer 5 — Technical Spec: Close the prompt with format and quality cues. Aspect ratio, photographic style, resolution intent. “Shot on 35mm film” gives grain and warmth. “Studio photography, white backdrop, product shot” snaps the model into commercial mode. “Photorealistic, ultra-detailed, 4K” pushes for sharpness. Pick what fits your use case and make it the last thing in the prompt.
Ready-to-Copy Prompts — Paste These Now
Here are eight prompts built on the five-layer formula, covering the most common use cases. Each one is ready to drop into the Nano Banana prompt box.
Portrait — editorial magazine style:
Close-up portrait of a woman in her 50s with deep laugh lines and silver locs, sitting very still, hands folded in her lap, sparse white studio with a single grey seamless backdrop, soft diffused window light from camera left, dignified and quietly powerful mood, shot on medium format camera, photorealistic, sharp focus on eyes
This prompt works because the subject description is detailed enough that the model has no ambiguity to fill in with generic “pretty woman” defaults. The lighting instruction is explicit about direction, and the emotional tone (“dignified, quietly powerful”) guides expression and composition. The medium format callout pushes the model toward shallow-depth-of-field, high-detail rendering.
Portrait — variant with environmental context:
Close-up portrait of a woman in her 50s with deep laugh lines and silver locs, gazing slightly off-camera, seated at a worn kitchen table with morning coffee, warm early light streaming through net curtains, intimate and lived-in atmosphere, photorealistic, documentary photography style, 4K resolution
Same subject, completely different feel. Swapping the white studio for a morning kitchen and the seamless backdrop for net curtains moves this from editorial to documentary. Notice that the subject description stays consistent — that’s intentional. Holding the subject constant while changing environment is the cleanest way to explore different visual directions without starting from scratch.
Product shot — minimal:
Ceramic matte-black espresso cup with a hairline crack along the handle, sitting on a rough concrete surface, single beam of natural light from above casting a dramatic shadow to the right, no background clutter, product photography, ultra-clean composition, 4K
Product prompts should strip the environment down and let light do the work. The crack detail is deliberate — it gives the image a story and keeps it from looking like a stock catalog. “No background clutter” is a useful negative instruction that costs nothing and saves the model from decorating the scene.
Product shot — lifestyle variant:
Ceramic matte-black espresso cup held in two hands by a person in a chunky oat-coloured knit sweater, standing near a rain-streaked window, autumn afternoon light, shallow depth of field blurring the window behind, warm and slightly melancholic mood, lifestyle product photography, photorealistic
The cup is still the subject, but now it’s embedded in a moment. Lifestyle product shots typically outperform clean product shots for social media because they give the viewer something to inhabit. The sweater detail adds texture contrast and seasonal context without needing to describe the full scene.
Social media — bold typographic concept:
The word BOLD in large serif type printed directly on a crumpled piece of kraft paper, dramatic raking light from the left creating deep shadows in the paper wrinkles, high contrast black and white, close-up macro shot, graphic and editorial mood, photorealistic, 4K
Nano Banana handles text rendering well when you set it up correctly. Keep the text short, specify the surface it’s printed or written on, and use a lighting setup that gives the text dimensional context. “Crumpled kraft paper with raking light” creates shadows that make the typography feel physical rather than pasted-on.
Architecture and space — cinematic wide:
Interior of an abandoned brutalist library, rows of empty concrete shelves stretching to a cracked glass ceiling, weak golden afternoon light filtering through dust particles, one shaft of light hitting the centre floor, cinematic wide angle, slightly desaturated colour grade, photorealistic, ultra-detailed
Architecture prompts benefit from specifying architectural style explicitly — “brutalist,” “bauhaus,” “Victorian gothic” all give the model a rich visual shorthand to draw from. The dust particles and single shaft of light are the storytelling details here. They take a static architectural shot and give it drama.
Multi-character scene — keeping consistency:
Two characters sitting across from each other at a diner booth: on the left, a tall lanky man in his 30s with red curly hair and round wire-framed glasses wearing a green jacket; on the right, a shorter woman in her late 20s with a shaved head and a gold nose ring wearing a denim jacket. Both looking down at a map spread across the table between them. Late-night American diner, fluorescent overhead light, photorealistic, cinematic framing, eye-level shot
Multi-character prompts live or die on specificity. Describe each character separately with distinct physical markers — height, hair, clothing colour. Giving them a shared action (looking at the map) ties them into a single scene instead of two unrelated people who happen to be in frame. For subject consistency across multiple images, keep this description block identical and change only environment or action.
Illustrative / stylized — not photorealistic:
A small orange fox sitting on top of a stack of old hardcover books, surrounded by soft floating glowing particles, dark forest background with moonlight, painterly illustration style, rich jewel-tone colours, detailed fur texture, storybook atmosphere, digital art
Switch to illustration mode by replacing “photorealistic” with “painterly illustration style” or “digital art.” The glowing particles prompt is a useful trick for adding magic-hour atmosphere without overcomplicating the environment description. Rich colour descriptor (“jewel-tone”) steers palette direction without listing specific hex codes.
The Variables That Change Everything
Once you have a base prompt that works, the highest-leverage changes are: lighting direction, aspect ratio, and the photographic style callout. Swapping “soft window light from the left” to “harsh direct flash” on the same subject gives you two completely different images. Changing aspect ratio from square to 16:9 changes composition logic. These are the three knobs to reach for first when iterating.
Pro tip ✅
Always write your subject description first, before any environment or mood details. Nano Banana front-loads attention to the start of the prompt — if your subject is buried in the middle of a long sentence, expect generic results. Subject first, context after, always.
Pro tip ✅
For text rendering, keep it to five words or fewer per image. Nano Banana handles short text well, but the longer the phrase, the more likely you’ll see garbled letters. If you need a full sentence rendered accurately, break it into a separate image and composite.
Warning ⚠️
Stacking too many style modifiers at the end — “photorealistic, cinematic, film grain, vintage, 4K, ultra-detailed, award-winning” — actually degrades output quality. Pick two or three that are truly relevant and cut the rest. More modifiers is not more control.
Pro tip ✅
For consistent character appearances across multiple images, save your subject description block as a text snippet and paste it unchanged into every prompt. Only edit the environment, action, and technical spec layers. This is the manual version of character consistency — it works reliably without any special settings.
Note 💡
All images generated through Nano Banana carry a SynthID watermark embedded at the pixel level — invisible to the naked eye but detectable by Google’s verification tools. This won’t affect your workflow, but it’s worth knowing if you’re generating content for clients who ask about provenance.
Pro tip ✅
When using Nano Banana via the Gemini app, you can iterate conversationally — generate an image, then type “make the lighting warmer and move the subject to the left third of the frame.” The model carries context from the previous generation. This is dramatically faster than rewriting the full prompt from scratch each time.
Avoid 🚫
Don’t describe what you don’t want in the main prompt — phrases like “no blur,” “not dark,” “without people” tend to backfire because the model reads the nouns, not the negations. Instead, describe what you do want: “sharp focus throughout,” “bright ambient light,” “empty street.”
Access: Where to Actually Run These Prompts
Nano Banana runs through the Gemini app for casual use — fastest way to get started, no setup required. For API access and higher-volume work, Google AI Studio gives you direct access to the underlying model with more parameter control. Vertex AI is the enterprise route if you’re building this into a production pipeline. All three access points run the same model, so the prompt formula above applies regardless of where you’re generating.
The Part That Actually Matters
The formula — Subject, Action, Environment, Mood/Lighting, Technical Spec — isn’t a creative straitjacket. It’s a checklist. Once you’ve run a few dozen prompts with it, the structure becomes instinctive and you stop needing to think about it consciously. What you get instead is a reliable baseline: prompts that produce something coherent on the first try, leaving you to iterate on the interesting decisions rather than fighting to get the model to understand what you’re asking for in the first place. That’s the whole point. Start with structure, then break it when you have a good reason to.


