A working headline has been floating around claiming Kling 3.1 introduced ‘Depth-Aware Rendering’ that outperforms Runway Gen-4.5 on spatial consistency — complete with a 92% reduction in temporal glitches stat. It sounds compelling. It’s also not verifiable. The research team at Promptyze went looking for the Kling blog post, the independent test results, and any official documentation of Kling 3.1. None of it exists in any publicly accessible source as of early March 2026.
The latest confirmed Kling release is version 2.6, developed by Chinese tech company Kuaishou. There is no public announcement of Kling 3.1, no technical specification for a ‘Depth-Aware Rendering’ system, and no benchmark methodology behind that 92% figure. That number appeared without a source, and it still doesn’t have one. Publishing it as fact would be doing readers a disservice — so we’re not going to.
The underlying topic — spatial consistency in AI video generation — is genuinely one of the most important battlegrounds in the field right now. Camera movements that cause objects to warp, duplicate, or dissolve mid-shot are among the most common failure modes in tools like Kling, Runway Gen-4.5, and Sora. Anyone who has tried to push these models through a slow dolly shot or a parallax pan knows the feeling: great first frame, chaotic everything else.
Runway Gen-4.5 is documented to have invested significantly in improving spatial coherence, and it shows in user tests circulating on YouTube and X. Kling 2.6 has its own strengths — particularly on motion smoothness for certain scene types — but neither tool has solved the depth consistency problem cleanly. The idea that a depth map conditioning input could address this is technically plausible and genuinely interesting. That’s probably why the claim spread. Plausible plus specific-sounding number equals believable headline.

To be clear about the concept itself: depth map conditioning in video generation means feeding the model explicit per-pixel distance information — essentially a grayscale map where brightness encodes how far each element is from the camera. If a model understands depth at that level, it can theoretically maintain object scale and occlusion correctly as the virtual camera moves, rather than guessing from texture and context alone. Several research papers have explored this approach, and it’s a credible direction for the industry. The claim in the working brief just isn’t backed by any released product doing it right now.
If Kling or Kuaishou does ship something like this, here’s what a well-constructed test prompt for spatial consistency would look like — useful for benchmarking any video model that claims to handle camera movement well:
A ceramic coffee mug on a wooden table, slow cinematic push-in from wide to close-up, natural window light from the left, shallow depth of field, 4K, no camera shake, objects maintain consistent scale and position throughout the move
That’s the kind of scene that breaks most current models. Static foreground object, deliberate camera movement, depth cues built into the environment. Run it on Kling 2.6 and Runway Gen-4.5 side by side and you’ll see exactly where each tool struggles. A second useful test for parallax specifically:
A city street with pedestrians in the foreground and buildings receding into the background, slow lateral tracking shot from left to right, realistic parallax movement between foreground and background layers, cinematic 24fps

Kuaishou has been moving fast on Kling — the jump from earlier versions to 2.6 was real and noticeable. If a 3.x release is coming with genuine depth conditioning, it would be worth watching closely. But the announcement needs to come from Kuaishou, not from a spec sheet that nobody can locate. Promptyze will cover it when there’s something to cover. Until then, the 92% stat lives in the same drawer as every other AI benchmark that appeared without a methodology attached — which is to say, the bin.
