Skip to content
Nano Banana

How to Use Nano Banana via the Gemini API — Developer Setup in 15 Minutes

Set up Nano Banana image generation via the Gemini API in 15 minutes with copy-paste Python and Node.js code, plus 8 production-ready prompts.

9 min read
How to Use Nano Banana via the Gemini API — Developer Setup in 15 Minutes

Nano Banana — Promptyze’s name for Google’s Gemini Flash image generation model — is one of those rare AI tools that’s actually as fast to integrate as it is to use in a browser. The Gemini API supports image generation through the gemini-2.0-flash-exp model (the engine powering Nano Banana), and if you’ve ever wrestled with Stable Diffusion’s dependency hell or Midjourney’s closed API situation, you’ll find this refreshingly straightforward. A working image generator in your app, in about 15 minutes, with official Python and JavaScript SDKs that don’t make you want to flip a table.

This tutorial walks through the full setup: getting your API key, making your first image generation call, structuring prompts that actually produce good results, and a few tricks that separate mediocre API output from something you’d actually ship. No PhD required. Basic Python or Node.js knowledge is enough.

What You’ll Actually Achieve

By the end of this, you’ll have a working local script that sends a text prompt to the Gemini API and saves a generated image to disk. From there, dropping it into a web app, a Slack bot, or a scheduled pipeline is your own adventure — but the hard part (the auth, the request structure, the response parsing) will already be solved.

Requirements

You need a Google account, access to Google AI Studio (aistudio.google.com) to generate a free API key, and either Python 3.9+ with the google-genai package, or Node.js 18+ with the @google/genai npm package. That’s genuinely the full list. No Docker, no GPU, no cloud account required for basic usage — the model runs on Google’s infrastructure.

Note 💡

The free tier at AI Studio gives you a meaningful number of requests per day for testing. For production workloads, the paid tier through Google Cloud’s Vertex AI removes rate limits significantly. Check the current quota page at ai.google.dev — limits update frequently and the docs are accurate.

Step 1 — Get Your API Key

Head to aistudio.google.com, sign in, and click “Get API Key” in the left sidebar. Create a new key, copy it somewhere safe (a .env file, not hardcoded in your script — please). The key works immediately. No approval process, no waitlist as of early 2026.

Step 2 — Install the SDK

For Python, one line does it:

pip install google-genai

For Node.js:

npm install @google/genai

Both are official Google packages, actively maintained, and the package names have stabilized after the earlier google-generativeai era — use the new google-genai packages for the latest models.

Step 3 — Your First Image Generation Call (Python)

Here’s the minimal working script. Copy, paste your API key into the environment variable, run it:

import os
from google import genai
from google.genai import types
import base64

client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))

response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents="A red fox sitting in a snowy forest at dusk, photorealistic, golden hour light",
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE", "TEXT"]
    )
)

for part in response.candidates[0].content.parts:
    if part.inline_data is not None:
        image_data = base64.b64decode(part.inline_data.data)
        with open("output.png", "wb") as f:
            f.write(image_data)
        print("Image saved to output.png")
    else:
        print(part.text)

The response_modalities parameter is the key thing people miss. You have to explicitly tell the API you want an image back, otherwise you get a text description of the image instead — which is technically a response, but deeply unsatisfying. The script iterates through response parts because the model can return both an image and a text caption in the same response.

Step 4 — Node.js Version (Same Result, Different Syntax)

import { GoogleGenAI } from "@google/genai";
import * as fs from "fs";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

async function generateImage() {
  const response = await ai.models.generateContent({
    model: "gemini-2.0-flash-exp",
    contents: "A red fox sitting in a snowy forest at dusk, photorealistic, golden hour light",
    config: {
      responseModalities: ["IMAGE", "TEXT"],
    },
  });

  for (const part of response.candidates[0].content.parts) {
    if (part.inlineData) {
      const imageBuffer = Buffer.from(part.inlineData.data, "base64");
      fs.writeFileSync("output.png", imageBuffer);
      console.log("Image saved to output.png");
    }
  }
}

generateImage();

Run it with node --experimental-vm-modules script.js or add "type": "module" to your package.json. The output is identical to the Python version — same API endpoint, same model, same image.

Pro tip ✅

Store your API key in a .env file and load it with python-dotenv (Python) or the dotenv npm package (Node). Then add .env to your .gitignore before you push to GitHub and spend a weekend rotating credentials.

Prompts That Actually Work — Copy and Paste These

The difference between a mediocre API integration and a great one is prompt quality. Here are concrete, tested prompt structures for different use cases.

Portrait / Editorial:

Portrait of a 35-year-old architect, natural window light, shallow depth of field, shot on Leica, editorial magazine style, neutral concrete background, confident expression, business casual clothing

This works because it combines subject description, lighting source, camera reference, stylistic context, and background in one sentence. Gemini’s image model responds well to photography-vocabulary prompts — words like “shot on,” “f/1.8,” “bokeh,” and “editorial” steer the output toward photorealistic rather than illustrated.

Product Photography:

Minimalist product shot of a matte black water bottle, white studio background, soft diffused lighting from the left, sharp focus, commercial photography style, slight reflection on white surface below

Product shots live or die by background and lighting. Specifying the surface reflection gives the image physical plausibility — objects sitting in space look more real than objects floating against a void.

Overhead flat lay of coffee equipment: French press, ceramic mug, coffee beans scattered, linen napkin, warm morning light, food photography style, high-end lifestyle magazine aesthetic

Flat lay prompts benefit from listing every element you want visible, since the model has to decide what to include. Explicit enumeration beats vague descriptions like “coffee scene.”

Social Media / Lifestyle:

Young woman reading a book in a sun-drenched café, golden hour streaming through large windows, film grain texture, warm amber tones, candid lifestyle photography, bokeh background with blurred coffee cups

The phrase “film grain texture” is doing a lot of work here — it shifts the output from clinical digital photography toward something that looks intentionally artistic and platform-native for Instagram.

Aerial view of a coastal city at sunset, warm orange and pink sky, reflections in the water, cinematic wide angle, ultra-detailed, travel photography style

Aerial/drone perspectives are harder to photograph in real life, which makes them genuinely useful for AI generation. Always pair them with “cinematic” or a specific photography style, or you get a satellite map aesthetic.

Architectural / Interior:

Modern Scandinavian living room interior, floor-to-ceiling windows, oak wood floors, linen sofa, fiddle leaf fig plant, afternoon light casting long shadows, architectural digest style, ultra-clean composition

Interior design prompts need material specificity. “Sofa” generates a generic sofa. “Linen sofa with visible fabric texture” gets you something that looks like it belongs in a real shoot.

Illustration / Non-Photo Style:

Isometric illustration of a futuristic city block, clean vector art style, pastel color palette, tiny detailed characters, buildings with solar panels and rooftop gardens, white background, studio Ghibli influence

Switching from photography to illustration requires explicitly naming the art direction. “Isometric,” “vector art,” and “Studio Ghibli influence” collectively anchor the style — vague prompts like “illustrated city” produce inconsistent results.

Pro tip ✅

Gemini’s image model responds strongly to aspect ratio cues embedded in the prompt. Writing “horizontal landscape composition” or “vertical portrait orientation” steers the framing even before you set any API parameters. It’s not a guaranteed pixel dimension, but it changes the crop meaningfully.

Handling Multiple Characters — Subject Consistency

One of Nano Banana’s practical strengths is keeping multiple characters readable within a scene. For multi-subject prompts, describe each character explicitly and assign them spatial positions:

Two colleagues collaborating at a standing desk, woman on the left with short natural hair pointing at a laptop screen, man on the right with glasses and a blue shirt looking thoughtful, modern open-plan office background, candid documentary photography style, natural fluorescent lighting

The model can handle up to around five subjects before compositional coherence starts to degrade. Beyond three, prioritize clarity of position (“far left,” “center foreground,” “background right”) over detailed individual descriptions.

Warning ⚠️

Gemini’s safety filters apply to API calls exactly as they do in the Gemini app. Prompts involving real people by name, violent imagery, or explicit content will return an error rather than an image. Build error handling into your API wrapper from day one — check for SAFETY finish reasons in the response and handle them gracefully so your app doesn’t just crash silently.

Editing Workflow — Conversational Image Iteration

One feature that’s underused in API integrations is conversational image editing. The Gemini API supports multi-turn conversations that include images — meaning you can generate an image, then send it back with an edit instruction:

# After getting your initial image, send it back for editing
from google.genai import types

image_part = types.Part.from_bytes(
    data=image_data,
    mime_type="image/png"
)

edit_response = client.models.generate_content(
    model="gemini-2.0-flash-exp",
    contents=[
        image_part,
        "Change the background to a sunset beach scene, keep the subject identical"
    ],
    config=types.GenerateContentConfig(
        response_modalities=["IMAGE", "TEXT"]
    )
)

This is where Nano Banana genuinely pulls ahead for developer workflows — you can build iterative editing chains without re-describing the entire scene from scratch each time. Subject consistency across edits is solid for background swaps and lighting changes; more complex structural edits (changing clothing, adding objects) are less reliable but improving.

Pro tip ✅

Every image generated via the Gemini API includes a SynthID watermark — Google’s invisible digital watermark for AI-generated content. It’s imperceptible to the human eye and survives most image processing operations. This is not optional and not removable. Factor it into your use case if watermark detection matters for your downstream workflow.

Pro tip ✅

Wrap your API calls in a simple retry loop with exponential backoff. Even on paid tiers, occasional transient errors happen. Three retries with 1s, 2s, 4s delays handles the vast majority of them without any manual intervention.

Quick Reference: Prompt Structure That Consistently Works

After testing dozens of prompts against the API, the structure that produces the most consistent quality is: [Subject + detail] + [Setting/background] + [Lighting] + [Style reference] + [Technical photography term]. Every element adds information the model uses. Stripping any one of them doesn’t break the prompt, but the output gets vaguer in exactly that dimension.

Avoid 🚫

Don’t chain conflicting style references in one prompt. “Photorealistic Studio Ghibli anime oil painting” is asking the model to be four things at once. Pick one dominant aesthetic and one supporting influence maximum — “oil painting with photorealistic lighting” is coherent; “photorealistic anime watercolor” is not.

Get Building

The Gemini API’s image generation endpoint is, bluntly, one of the most accessible production-ready image generation APIs available right now. The SDK is clean, the documentation at ai.google.dev is accurate and kept up to date, and the prompting behavior is consistent enough that the examples above will work for you on first try. The 15-minute setup estimate in this tutorial’s title is not a stretch — it’s actually conservative if you already have Python or Node installed. Get the key, run the script, break things deliberately, then build something real.

author avatar
promptyze

promptyze

ADMINISTRATOR