How to Build a 30-Second TikTok Ad in Kling 3.0 — Shot by Shot

Build a complete 30-second TikTok ad in Kling 3.0 using a five-shot storyboard workflow, with copy-paste prompts, transition timing, and audio sync tips.
How to Build a 30-Second TikTok Ad in Kling 3.0 — Shot by Shot
Five-shot sequence, one polished ad.
Share

Kling 3.0 — Kuaishou’s AI video generation platform — has quietly become one of the most practical tools for solo creators who need to produce short-form video ads without a production crew, a budget, or three weeks of back-and-forth with an agency. The storyboard-style workflow lets you plan, generate, and assemble individual shots in sequence, then push audio and transitions into the same interface before export. The result: a 30-second TikTok ad that looks like someone spent a day on it, built in under an hour.

This tutorial walks through the full process — from writing your first scene prompt to syncing audio on the final cut. Every prompt below has been structured around Kling 3.0’s text-to-video generation parameters. Work through the steps in order and you’ll have a complete, export-ready ad by the end.

What You’ll Achieve

By the end of this guide you’ll have a 30-second, multi-scene TikTok ad built entirely inside Kling 3.0. Specifically: five individually generated shots (each 5–7 seconds), joined with intentional transitions, with a music bed or voiceover dropped in via Kling’s audio sync panel. The workflow runs roughly 45 minutes on a first attempt and speeds up significantly once your prompt style is locked in.

What You Need Before You Start

You need an active Kling account at kling.kuaishou.com — the platform runs on a credit system, and a 30-second multi-scene project will consume somewhere between 15 and 30 credits depending on resolution and generation attempts. Set your output to 1080×1920 from the start; TikTok’s algorithm penalizes vertical videos that get cropped or letterboxed after the fact. Have your product or concept written down in one clear sentence before you open the tool — vague briefs produce vague shots, and Kling is not a mind reader.

Note 💡

Kling 3.0 generates video clips, not a single continuous render. Think of it like editing: you’re producing individual shots and assembling them into a sequence, not pressing one button and walking away.

Step 1 — Map Your 5-Shot Structure Before Touching the Tool

A 30-second TikTok ad has almost no room for waste. The standard structure that performs well on short-form platforms follows a simple arc: hook (0–3s), problem or context (3–8s), product or solution in action (8–18s), social proof or result (18–24s), call to action (24–30s). That maps cleanly onto five shots in Kling’s storyboard. Write these five beats out as one sentence each before you generate anything — it will save you from burning credits on shots you don’t need.

Pro tip ✅

Write your five-beat structure on paper or a notes app first. Creators who start prompting immediately almost always generate twice as many clips and end up with a disjointed sequence. Ten minutes of planning saves twenty minutes of regeneration.

Step 2 — Generate Shot 1: The Hook

The hook is the one shot that determines whether anyone watches the rest. In Kling 3.0, open a new project, select Text to Video, set duration to 5 seconds, aspect ratio to 9:16, and motion intensity to High. The hook needs movement and a visual surprise in the first two frames.

Extreme close-up of a single coffee bean falling in slow motion into a porcelain espresso cup, dark moody kitchen background, warm amber rim lighting, cinematic grain, shallow depth of field, 4K quality, vertical 9:16 format

This prompt hits three things Kling responds well to: a specific subject (one object, not a scene), a defined camera style (extreme close-up), and lighting direction (amber rim). The slow motion instruction pushes the platform toward a higher motion smoothness setting internally. Change “coffee bean” to your product’s smallest recognizable element — a single drop of serum, a sneaker lace, a credit card edge — and the hook logic holds.

Step 3 — Generate Shot 2: Context or Problem

Shot 2 establishes why the viewer should care. Keep it grounded — this is where lifestyle context lives. Set duration to 6 seconds, motion intensity to Medium. You want something that feels real, not like a stock photo came to life.

Medium shot of a tired young woman in her late 20s sitting at a cluttered home office desk, overcast morning light through a window, coffee mug in hand, looking slightly distracted, muted color palette, cinematic realism, 9:16 vertical

Notice the lack of brand language in this prompt — Kling doesn’t need it, and inserting product names usually produces inconsistent results. The “muted color palette” instruction helps this shot feel tonally distinct from the high-contrast hook, which creates a natural visual beat for the viewer.

Pro tip ✅

Use “cinematic realism” as a style anchor in every Shot 2 context clip. It tells Kling to pull back from the stylized look of the hook and ground the viewer in something that feels documentary rather than commercial — which paradoxically makes the product reveal feel more earned.

Step 4 — Generate Shot 3: Product in Action

This is your longest shot — 7 to 8 seconds — and the one you’ll regenerate most often. Set motion intensity to Medium-High. The goal is to show the product doing its job in a way that’s visually interesting without being confusing.

Close-up product shot of a sleek matte black thermos being filled with steaming coffee, smooth camera pull-back to reveal a clean modern kitchen, warm natural light, slow deliberate motion, minimal background clutter, soft focus background, 4K, 9:16 vertical, commercial photography style

The “camera pull-back” instruction is key here — it creates inherent motion without requiring complex scene changes, which Kling handles more reliably than cuts within a single generated clip. If your product isn’t a physical object (an app, a service, a course), substitute a screen recording aesthetic prompt instead:

Close-up of a smartphone screen showing a clean productivity app interface, hands navigating the app with confident gestures, soft office background slightly out of focus, bright airy lighting, 9:16 vertical, smooth motion

Warning ⚠️

Kling 3.0 struggles with text legibility in generated video — any on-screen text in your prompts will likely render blurry or inconsistent. Plan to add text overlays in post (CapCut, Premiere) rather than asking Kling to generate readable text within the clip.

Step 5 — Generate Shot 4: Result or Social Proof

This shot needs to communicate outcome. A face showing satisfaction, a before-after implied by body language, or a simple reaction moment. Duration: 5 seconds, motion intensity: Low to Medium (you want controlled, readable emotion here, not a busy frame).

Medium close-up of a confident young professional smiling slightly while looking at their laptop screen, bright well-lit home office, warm tones, subtle head movement, authentic candid feel, not overly staged, 9:16 vertical, soft depth of field

The “not overly staged” note in the prompt is a real lever with Kling — it nudges the generation away from the plastic-looking poses that AI video tools default to. It won’t always work perfectly, but it shifts the output toward something more usable. If you’re selling to a different demographic, swap the subject description directly: “confident woman in her 50s,” “young man in his early 20s,” “small business owner at a market stall.”

Pro tip ✅

Generate Shot 4 twice with the same prompt and pick the better take. Emotional expressions are the hardest thing for AI video to nail consistently, and a second generation costs one extra credit but frequently produces a noticeably different — sometimes much better — result.

Step 6 — Generate Shot 5: Call to Action

The final shot is visual punctuation. Short (4–5 seconds), clear, and high-contrast. This is where your brand color or product identity can be most explicit, because the viewer has already been sold on the problem and solution.

Product flat lay on a clean white marble surface, single thermos centered in frame, small green plant and coffee beans as minimal props, top-down overhead shot, bright even studio lighting, sharp focus, commercial product photography aesthetic, 9:16 vertical

A top-down overhead shot is one of Kling’s stronger angles for product stills-in-motion — it produces a stable, visually clean result that works well as a final frame. You’ll add your CTA text (“Shop now,” “Link in bio”) as a text overlay in post.

Step 7 — Assemble the Sequence and Set Transitions

Once all five clips are generated, bring them into Kling’s timeline editor. Arrange them in your planned order and set transition duration to 0.3–0.5 seconds between each clip. Kling 3.0 offers several transition types — for a TikTok ad, stick to Cut (no transition effect) between Shots 1–2 and 2–3, then use a soft Dissolve (0.4s) between Shots 3–4 and 4–5. The hard cuts early create energy; the dissolves in the back half slow the viewer down at the moment of decision.

Pro tip ✅

Resist the urge to use motion transitions (swipe, spin, zoom) between every clip. They read as filler on TikTok — experienced viewers have seen them ten thousand times and they signal “AI-made” faster than anything else in the sequence. Plain cuts are underrated.

Step 8 — Audio Sync

Kling 3.0 includes an audio panel where you can drop in a music track or voiceover file and adjust sync points against the timeline. Upload your audio file (MP3 or WAV), then use the waveform view to align beats or voiceover cues with specific shots. For a 30-second TikTok ad, the standard approach is: music bed at 60–70% volume running the full sequence, with any voiceover or caption-style audio sitting on top. If you’re using a royalty-free track, align the track’s first downbeat with the start of Shot 3 (the product-in-action clip) — this is where you want the energy peak, and a beat drop or melodic lift at that moment significantly increases watch-through rate.

Note 💡

TikTok’s own audio library is a valid source for music if you’re posting directly to the platform — tracks sourced from TikTok’s library are pre-cleared for commercial use in organic posts. If you’re running paid ads, check the licensing terms separately before using any music bed.

Step 9 — Export and Final Check

Export at 1080×1920, H.264, at least 30fps. Before you post, run a quick checklist: does Shot 1 have visible motion in the first two seconds? Does any generated face look unnatural in a way that would trigger viewer distrust? Is the audio loud enough to register without headphones (TikTok is frequently watched on phone speakers)? Does the sequence make sense with sound off (captions matter)? If yes to all of these, you’re done.

Avoid 🚫

Don’t export and post straight from Kling without reviewing at full volume and with sound muted. AI-generated video often has subtle motion artifacts that are invisible at small preview sizes but obvious on a phone screen, and audio levels that sound fine in a browser can be muddy on a speaker. One minute of review prevents an embarrassing post.

The Full Prompt Stack — Copy and Use

Here’s the complete set of five prompts in sequence for a coffee product TikTok ad. Swap the product details to apply this to your own brief:

SHOT 1 (Hook, 5s, High motion):
Extreme close-up of a single coffee bean falling in slow motion into a porcelain espresso cup, dark moody kitchen background, warm amber rim lighting, cinematic grain, shallow depth of field, 4K quality, vertical 9:16 format
SHOT 2 (Context, 6s, Medium motion):
Medium shot of a tired young woman in her late 20s sitting at a cluttered home office desk, overcast morning light through a window, coffee mug in hand, looking slightly distracted, muted color palette, cinematic realism, 9:16 vertical
SHOT 3 (Product in action, 7s, Medium-High motion):
Close-up product shot of a sleek matte black thermos being filled with steaming coffee, smooth camera pull-back to reveal a clean modern kitchen, warm natural light, slow deliberate motion, minimal background clutter, soft focus background, 4K, 9:16 vertical, commercial photography style
SHOT 4 (Result, 5s, Low-Medium motion):
Medium close-up of a confident young professional smiling slightly while looking at their laptop screen, bright well-lit home office, warm tones, subtle head movement, authentic candid feel, not overly staged, 9:16 vertical, soft depth of field
SHOT 5 (CTA frame, 4s, Low motion):
Product flat lay on a clean white marble surface, single thermos centered in frame, small green plant and coffee beans as minimal props, top-down overhead shot, bright even studio lighting, sharp focus, commercial product photography aesthetic, 9:16 vertical

What to Do When You’re Done (and When to Do It Again)

The first time through this workflow takes 45–60 minutes. The second time, once your prompt structure is internalized and your shot map is a template you can duplicate, it takes 25. That’s the actual value here — not any single video, but a repeatable process that produces something usable every time rather than something occasionally brilliant and usually unusable.

Kling 3.0 is not going to replace a production team for brand campaigns with serious budgets. What it does replace is the dead zone between having an idea and having a proof of concept — the stage where most solo creators and small brand teams either pay for something they can’t afford or abandon the project. For TikTok ads in particular, where the audience’s tolerance for polished-but-soulless content is historically low, a well-directed AI-generated sequence with honest prompts and smart editing holds up better than you’d expect. Build the five shots, assemble them tight, get the audio right, and ship it. The algorithm doesn’t care how you made it.

author avatar
promptyze
How to Animate Hand-Drawn Sketches into Video Using Kling AI — The Shortcut Creators Actually Use

How to Animate Hand-Drawn Sketches into Video Using Kling AI — The Shortcut Creators Actually Use

Prev
Apple's Siri and Claude: What's Real, What's Rumor, and Why It Matters

Apple’s Siri and Claude: What’s Real, What’s Rumor, and Why It Matters

Next