Storyboarding has always been the gatekeeper of professional video production. You either know how to draw, or you pay someone who does. Google’s new Scene Graph mode for Veo 3, released on March 4, 2026, quietly dismantles that requirement. Instead of sketching shot sequences on paper or wrangling a timeline editor, you draw a node diagram — think flowchart, not art class — and Veo 3 handles the cinematography from there.
The idea is straightforward: each node represents a scene element (a location, a subject, a lighting state), and the connections between nodes tell Veo 3 how to move between them. The model infers camera angles, transition types, and lighting logic automatically. Real estate videographers and commercial producers are already the obvious early adopters, but the workflow applies to anyone who needs a structured multi-shot video without a full production crew.
This tutorial walks you through the Scene Graph workflow step by step — from setting up your first node diagram to prompting individual scene nodes for maximum cinematic control. By the end, you’ll have a repeatable system for generating professional-looking video sequences in a fraction of the time it would take to shoot them.
After following this guide, you’ll be able to build a multi-scene video sequence in Veo 3 using Scene Graph mode, with consistent camera movement logic and automatic lighting transitions between nodes. The workflow here focuses on a real estate walkthrough — one of the cleanest use cases for this feature — but the same node structure applies directly to product commercials, brand videos, and event showcases.
Access to Veo 3 through Google AI Studio or VideoFX (Google’s consumer-facing video tool) is the baseline requirement. Scene Graph mode is available within the Veo 3 interface — look for the “Scene Graph” tab alongside the standard prompt input. You don’t need any video editing software, storyboarding tools, or design experience. A clear idea of your shot sequence is genuinely all the prep work required.
Note 💡
Scene Graph mode works best when you have a defined narrative or spatial logic — a beginning, middle, and end to your video. Random collections of unrelated shots will generate technically, but the automatic camera transitions won’t have coherent direction to work with.
Inside Veo 3’s interface, switch from the default “Prompt” tab to “Scene Graph.” You’ll see a canvas — blank by default — with a toolbar on the left for adding nodes, and a properties panel on the right for configuring each one. The canvas works like any basic node editor: drag to pan, scroll to zoom, click to select.
Your first node is always the “Entry” node, which Veo 3 places automatically. This represents the opening frame of your video. Every subsequent node you add will be a scene segment, and every connection (drawn by clicking and dragging between node outputs and inputs) tells Veo 3 to generate a transition between those two scenes.
Pro tip ✅
Keep your first Scene Graph to five nodes or fewer. Veo 3 handles the camera logic well at this scale, and it gives you a fast feedback loop to understand how your node descriptions translate into actual footage before you scale up to more complex sequences.
Each node takes its own text prompt describing what happens in that scene segment. This is where Veo 3’s generation quality lives — the node prompt controls the visual content, and the connections control the transitions. Think of node prompts as individual shot descriptions rather than a full video brief.
For a real estate walkthrough of a modern apartment, a five-node sequence might look like this:
Node 1 — Exterior Approach:
Aerial establishing shot of a modern glass-facade apartment building, golden hour lighting, slow descending drone movement, wide angle lens, cinematic color grade, 4K
This sets your opening frame. The “slow descending drone movement” instruction feeds directly into Veo 3’s camera motion system — it reads directional and speed language inside node prompts.
Node 2 — Building Entrance:
Ground-level shot of a sleek apartment building entrance, polished concrete floor, automatic glass doors opening, soft morning light, smooth dolly push-in, architectural photography style
Notice the transition logic: from aerial descending (Node 1) to ground-level push-in (Node 2). Veo 3 generates a cut or dissolve between these based on the connection type you set — more on that in Step 3.
Node 3 — Living Room:
Interior living room with floor-to-ceiling windows overlooking the city, natural daylight, slow pan left to right revealing sectional sofa and minimalist decor, warm interior lighting, shallow depth of field, real estate photography tone
Node 4 — Kitchen:
Modern open-plan kitchen, matte black cabinetry, quartz countertops, overhead pendant lights, smooth tracking shot moving along the island, late afternoon light casting long shadows, crisp detail
Node 5 — Balcony Closing Shot:
Wide shot from apartment balcony overlooking city skyline at dusk, subject standing at railing looking out, slow zoom out revealing full cityscape, blue hour lighting, cinematic widescreen aspect ratio

Pro tip ✅
Include a camera movement verb in every node prompt — “pan,” “dolly,” “tracking shot,” “zoom,” “crane up,” “push in.” Veo 3’s Scene Graph reads these as motion instructions and uses them to calculate transition logic between nodes. Nodes without explicit movement cues default to static shots, which can make the final sequence feel choppy.
Connections between nodes aren’t just wires — they carry transition metadata. When you draw a connection, the properties panel on the right gives you three main transition options: Cut, Dissolve, and Match Cut. Each one tells Veo 3 something different about how to bridge the two scene segments.
Cut works exactly as it sounds — a direct frame-to-frame switch. Use it between scenes with a strong spatial jump (exterior to interior, for example). Dissolve generates a brief overlapping blend, which works well between scenes in the same space with a time-of-day shift. Match Cut is the most interesting option: Veo 3 attempts to align a visual element between the two nodes — a door frame, a vertical line, a color shape — to create a graphic match transition. Set it between Node 2 (entrance doors) and Node 3 (living room) and the model will try to find or generate a matching vertical frame element to bridge the cut.
Warning ⚠️
Match Cut transitions are computationally heavier and don’t always succeed on the first generation. If the output looks like a broken dissolve rather than a clean graphic match, try reprompting Node 2 with a more explicit framing cue — “centered on a vertical doorframe, symmetrical composition” gives the model a cleaner anchor point to match against.
Above the canvas, the global settings panel controls parameters that apply across all nodes: aspect ratio, total duration, frame rate, and the overall cinematic style preset. For real estate content, these settings work well as a baseline:
Aspect ratio: 16:9 Duration per node: 4-6 seconds Frame rate: 24fps Style preset: Architectural / Commercial Color science: Warm natural
For a product commercial, swap the style preset to “Studio / Product” and push the frame rate to 30fps for a cleaner, crisper look that reads better in digital ad placements. The style preset doesn’t override your node-level prompts — it acts as a global color and lighting bias that the node prompts then modify locally.
Pro tip ✅
Set node duration to at least 4 seconds per node when using camera movement prompts. Veo 3 needs enough frames to complete a pan or dolly move — at 2-3 seconds, the movement either gets cut short or speeds up unnaturally.

Hit Generate. Veo 3 processes the entire Scene Graph as a single job, returning a stitched sequence that includes all your nodes and their connecting transitions. First-generation output takes roughly 2-4 minutes for a five-node sequence, depending on server load.
When the preview loads, review each node segment individually using the timeline at the bottom of the preview panel. You don’t need to regenerate the entire sequence if one node misses — click the node in question, adjust the prompt, and hit “Regenerate Node” to replace only that segment. The surrounding nodes stay locked.
Node 3 revised prompt — if the living room pan feels too fast: Interior living room, floor-to-ceiling windows, city view, very slow deliberate pan left to right, 6-second duration, natural daylight, shallow depth of field, luxury real estate tone, unhurried pacing
Adding “very slow deliberate” and calling out pacing explicitly tends to override Veo 3’s default tendency to rush camera movements in shorter node durations.
Pro tip ✅
After generating, download the sequence as individual node clips rather than the stitched composite if you plan to do any color grading or editing downstream. The individual clips give you clean edit points and avoid recompression artifacts from Veo 3’s internal stitching.
The same five-node logic applies cleanly to a product commercial. Here’s a compact node sequence for a premium headphone launch:
Node 1 — Product Reveal:
Extreme close-up of matte black over-ear headphones on a dark reflective surface, slow pull-back reveal, studio lighting with single key light, high contrast, product photography aesthetic
Node 2 — Detail Shots:
Macro tracking shot along the headphone ear cup, stitching and material texture detail, smooth lateral movement, soft rim lighting, dark background, luxury product feel
Node 3 — Lifestyle Context:
Person sitting in a minimalist apartment listening to headphones, bokeh city view through window, warm ambient light, medium shot, relaxed and cinematic, natural color grade
Node 4 — Feature Highlight:
Close-up of finger pressing noise-cancellation button on headphone ear cup, subtle LED indicator activating, macro lens, shallow depth of field, clinical white lighting
Node 5 — Closing Brand Shot:
Headphones centered on minimal white surface, slow top-down crane shot pulling up and away, clean white background, soft diffused lighting, advertising photography style, wide final frame

Avoid 🚫
Don’t mix wildly different lighting logic between adjacent nodes — jumping from a dark studio close-up directly to a bright outdoor lifestyle shot without a logical transition node in between will produce a jarring cut that no transition type fully fixes. Add an intermediate node with bridging lighting if your scenes live at opposite ends of the exposure spectrum.
Reuse node diagrams as templates. Once you’ve built a five-node real estate graph that generates well, export the graph structure (File → Export Scene Graph) and reload it for the next property. Swap the node prompts without rebuilding the connections and transition logic from scratch — the bones of a good sequence carry over regardless of the subject matter.
Name your nodes descriptively in the properties panel. “Node 3” means nothing when you’re reviewing a twelve-node sequence. “Living Room Pan” and “Kitchen Reveal” let you jump directly to problem nodes during review without counting from the start.
Use the “Seed Lock” option on nodes that generate exactly right on the first pass. This pins the random seed for that node, so subsequent regenerations of adjacent nodes don’t accidentally alter your best clips.
Pro tip ✅
For real estate use, generate two versions of each exterior node — one at golden hour and one at blue hour — then pick per-property based on which lighting suits the architecture. The interior nodes stay constant; only the exterior lighting swaps. Two Scene Graph runs, twice the usable content library.
Scene Graph mode doesn’t replace a director’s eye or a DP’s instincts — but it does replace the two-week pre-production cycle that most small creators and commercial producers couldn’t afford anyway. For real estate agents generating weekly property videos, or brand teams producing product content at scale, the node-based workflow turns a multi-day shoot into a two-hour generation session.
The prompts above are starting points, not ceilings. Adjust the lighting language, swap the movement verbs, push the style presets — every parameter change produces a meaningfully different output. The system rewards experimentation precisely because regenerating a single node costs seconds, not reshoots. That’s the actual shift here: iteration speed. Build your Scene Graph once, iterate on individual nodes, and treat the whole thing as a living document rather than a finished storyboard.
