Three years ago, getting a photorealistic 360-degree product render meant either hiring a 3D artist or spending three weekends learning Blender well enough to not embarrass yourself. Neither option was fun. Today, AI video generation tools have changed the math considerably — you can describe a product, a lighting setup, and a camera movement in plain English, and get back something that looks like it came out of a proper product photography studio.
This tutorial covers a practical workflow for generating product mockup visuals using AI video and image tools — specifically Kling 3.0, Runway Gen-4.5, and Midjourney V7, all of which are verified, available, and actually good at this in early 2026. We’ll cover how to write prompts that nail studio lighting, how to control camera angles, how to handle backgrounds, and how to pull clean stills for marketing materials. No mesh topology knowledge required.
What You’ll Actually Achieve
By the end of this tutorial, you’ll have a repeatable workflow for generating product visuals that work for social media, e-commerce listings, pitch decks, and ad creative. You’ll know how to prompt for specific lighting conditions, direct virtual camera angles like a cinematographer, and extract high-quality stills from generated video clips. The whole process, once you know it, takes under an hour per product.
What You Need
You’ll need access to at least one of the following: Midjourney V7 (for hero stills and angle exploration), Kling 3.0 (for smooth orbital product video), or Runway Gen-4.5 (for motion and camera control). A free or paid account on any of these gets you started. For extracting stills, any basic video editor works — CapCut, DaVinci Resolve, or even QuickTime on Mac. That’s the whole toolkit.
Step 1 — Nail the Product Description First
The single biggest mistake people make with product mockup prompts is vagueness. “A bottle of perfume on a table” gets you something generic. You need to front-load your prompt with specificity: material, finish, color, shape, and scale cues. Think of it like briefing a photographer — they need to know what they’re shooting before they pick up a light meter.
Start with Midjourney V7 to rapidly explore angles and lighting before committing to video generation. Midjourney is faster and cheaper per iteration, which makes it ideal for the exploration phase. Once you’ve found a look you like, you port that visual language into Kling or Runway for motion.
Matte black cylindrical perfume bottle, brushed aluminum cap, 10cm tall, product photography, floating on white seamless studio background, three-point lighting setup, soft key light from upper left, subtle fill light from right, rim light highlighting metallic cap edge, sharp focus, ultra-detailed surface texture, 8K --ar 1:1 --v 7 --style raw
The –style raw flag in Midjourney V7 tells the model to stay close to photographic realism rather than drifting toward painterly aesthetics. Drop it if you want something more stylized. The –ar 1:1 gives you a square crop perfect for Instagram or product listing thumbnails. Swap to –ar 4:5 for portrait ads or –ar 16:9 for banner placements.
Pro tip ✅
Run your first prompt four times (one generation batch in Midjourney) before tweaking anything. You’re looking for which lighting direction the model naturally favors for your product shape. That tells you where to push in subsequent iterations rather than fighting the model’s default tendencies.
Step 2 — Control Your Lighting Like a Director
Lighting is where amateur AI product shots fall apart. The model will default to “nice and bright” if you don’t specify, which gives you flat, uninspiring results. Learn four lighting setups and rotate through them depending on product type.
Three-point studio works for most consumer products — clean, professional, neutral. Side-raking light is great for textured surfaces like leather, wood grain, or embossed packaging — it makes surface detail pop dramatically. Backlit with lens flare suits beverages, glass bottles, and anything translucent. Dramatic single-source with deep shadows works for luxury and fragrance — it reads as aspirational rather than cataloguey.
Premium leather wallet, dark tan vegetable-tanned leather, hand-stitched edges, product photography, dark studio background, single strong side-raking light from camera left at 30 degrees, deep shadows on right side, visible grain and texture on leather surface, close-up shot showing stitching detail, no background distractions, cinematic quality --ar 4:5 --v 7 --style raw --q 2
The –q 2 flag increases render quality in Midjourney V7, using more compute per image. Worth it when you’re narrowing in on a final look. For fast iteration in early stages, –q 1 is fine.
Glass gin bottle with botanical label, crystal clear glass, amber liquid, product photography, pure white seamless background, strong backlight from behind creating translucent glow through liquid, gentle front fill light, slight condensation on glass surface, hyper-realistic, sharp label text, marketing campaign quality --ar 2:3 --v 7 --style raw
Pro tip ✅
For glass and liquid products, always add “strong backlight” or “transmitted light” to your prompt. AI models handle translucency beautifully when prompted correctly, but they default to opaque surface lighting if you don’t specify. The difference in output quality is striking.
Step 3 — Generate Orbital Video in Kling 3.0
Once you have a Midjourney still you’re happy with, upload it to Kling 3.0 as an image-to-video starting frame. Kling 3.0 is particularly strong at smooth, controlled camera movements around static subjects — which is exactly what you want for a product orbit shot. The key is being explicit about the camera motion in your video prompt while keeping the product itself static.
Slow 360-degree orbital camera movement around the product, camera rotates clockwise at constant speed, product remains perfectly still in center frame, studio lighting maintained throughout rotation, seamless loop, no camera shake, smooth dolly motion, product photography video, professional commercial quality
In Kling 3.0’s interface, set motion intensity to low or medium — high motion intensity will deform the product shape over time, which is not what you want. Duration of 5-10 seconds gives you enough footage to extract multiple unique stills at different angles. Generate at the highest resolution your account tier allows.
Camera slowly pushes in toward the product from medium shot to close-up detail shot, focus pulls to surface texture, studio lighting unchanged, product stationary, cinematic easing in and out, commercial product video, 4K quality
This push-in variant is useful for social video ads where you want a reveal moment — starts wide showing the full product, ends tight on a key detail like a logo, texture, or feature.
Warning ⚠️
Kling 3.0 can drift on small text and logos during motion sequences. If your product has critical label text, generate the motion first, then composite the label back in Canva AI or Photoshop using the still as a reference. Trying to preserve fine text through AI video generation is a losing battle right now.
Step 4 — Camera Angles That Actually Sell Products
Most marketers default to straight-on hero shots and miss half the angles that convert. Here’s a prompt set covering four angles worth generating for every product campaign.
Overhead flat-lay shot, matte black coffee bag with gold foil logo, arranged on white marble surface, product photography, directly above camera angle, soft diffused daylight from upper left, minimal shadow, clean composition, no props, breathing room around product, e-commerce listing quality --ar 1:1 --v 7 --style raw
The overhead flat-lay is non-negotiable for e-commerce. It reads clearly at thumbnail size and works across every platform without cropping issues.
Low angle hero shot, sports water bottle, brushed stainless steel, looking up at bottle from 20 degrees below eye level, dramatic sky gradient background, strong rim lighting, aspirational athletic feel, fitness brand marketing, wide angle slight distortion adding drama --ar 9:16 --v 7 --style raw
Low-angle hero shots make products look imposing and premium. Works especially well for anything in the fitness, tech, or luxury categories. The –ar 9:16 ratio is pre-cropped for Instagram Stories and TikTok.
45-degree three-quarter angle, luxury skincare serum bottle, frosted glass, gold dropper cap, soft pink background, beauty product photography, shot from slightly above and to the right, catchlight reflected in glass, bokeh background, high-end cosmetics catalog quality --ar 4:5 --v 7 --style raw --style raw
Pro tip ✅
Generate the same product across three different background color variants in one session — neutral white, dark dramatic, and brand-color background. You’ll almost always use all three in a real campaign across different placements, and it costs you an extra two minutes of prompting now versus hours of reshooting later.
Step 5 — Background Control Without Fighting the Model
Background control is the area where most people waste the most time. The trick is to be specific about background texture and color rather than just saying “simple background” — the model’s interpretation of “simple” varies wildly.
Wireless earbuds in charging case, matte white, product photography, pure #FFFFFF white seamless paper backdrop, no texture, no gradient, no shadows, clean e-commerce white background, shot for Amazon listing, front-facing, perfectly centered --ar 1:1 --v 7 --style raw
Specifying the hex code for white (#FFFFFF) in your prompt is a useful trick — it anchors the model toward a cleaner, more literal interpretation. For dark backgrounds, #0A0A0A gets you closer to true studio black than just typing “black.”
Scented candle in amber glass jar, cream-colored wax, minimalist product photography, textured warm beige linen surface, shallow depth of field, background bokeh, lifestyle product photography, Scandinavian aesthetic, natural side window light, warm color temperature --ar 4:5 --v 7
For lifestyle product shots (as opposed to clean e-commerce), you want to introduce surface texture and material context. Linen, marble, slate, wood grain — name the material and the model handles it well.
Note 💡
When you need a completely clean background for compositing — like dropping the product into a brand template in Canva or Figma — generate on pure white first, then use Canva AI’s background remover or Adobe’s Generative Fill to place the product wherever you need it. Trying to generate on a complex custom background from scratch is slower than compositing in post.
Step 6 — Extract Stills from Video for Marketing
If you generated an orbital video in Kling or Runway, you now have a clip full of product angles you didn’t have to individually prompt. Extracting the best frames takes about five minutes. In DaVinci Resolve (free), scrub through the clip and hit Ctrl+Shift+, to grab a still at any point. In CapCut, the grab frame button is in the export menu. Aim for frames where the lighting creates a strong catchlight on the product surface — these tend to be the most visually striking stills.
For a standard product campaign, a 10-second orbital clip typically yields six to eight usable still angles, which covers hero image, three social formats, and two ad variants. That’s a full campaign asset set from a single video generation run.
Pro tip ✅
After extracting stills from video, run them through Midjourney’s image upscale feature or Runway’s upscaler to bring them up to print-ready resolution. Video frames at native resolution are usually fine for digital, but if a client needs files for large-format print, the upscale step takes two minutes and saves an awkward conversation about DPI.
Putting It All Together — A Real Workflow
Here’s the sequence that works in practice: start with ten rapid Midjourney iterations to find your lighting and angle direction (20 minutes), pick your two best results, use one as an image-to-video starting frame in Kling 3.0 for an orbital shot (10 minutes of generation time), extract eight stills from the resulting clip, then use the second Midjourney result as your hero image. Run both through Canva AI to drop them into your brand template with your actual product copy. Total time: under 90 minutes for a complete product visual set.
Is it perfect? No — fine text on labels still needs manual cleanup, and very complex geometric shapes occasionally deform under camera motion. But for 80% of product categories — bottles, boxes, bags, devices, accessories — this workflow produces results that are genuinely indistinguishable from a half-day product photography studio shoot. The other 20% of cases is why 3D artists still have jobs. For now.
Go Ship Something
The barrier to photorealistic product visuals dropped to basically zero in the last eighteen months, and most people are still acting like it didn’t happen. If you’re still waiting on a photographer’s availability or a 3D artist’s quote for assets you need this week, you’re leaving time on the table. Learn these six prompt patterns, run through the workflow once with a real product, and you’ll have a repeatable system you’ll use constantly. The tools are good enough. The limiting factor now is knowing how to ask.