Tencent has launched Voyager, a powerful new AI mannequin that may remodel a single {photograph} right into a three-dimensional scene. The mannequin concurrently generates each an RGB video and depth info, providing a robust method to 3D reconstruction with out the necessity for conventional modeling strategies. Nevertheless, it requires a major quantity of {hardware} to run successfully.
How Voyager Works

The HunyuanWorld-Voyager mannequin takes a single picture and a user-defined digital camera path—reminiscent of a pan, tilt, or dolly-in movement—to generate a brief video. It produces each the video and a simultaneous depth map, making certain that the spatial relationships of objects within the scene stay constant. The system maintains geometric coherence by evaluating every new body with the earlier content material utilizing 3D level clouds. Nevertheless, distortions can nonetheless happen with lengthy or complicated digital camera actions, notably with 360-degree rotations.
Tencent‘s technical report highlights an extra part referred to as the “world cache,” which shops knowledge from every new body. This enables for knowledge reuse in subsequent frames, considerably preserving geometric consistency over movies which can be a number of minutes lengthy.
Coaching and Necessities

Voyager was skilled on a large dataset of over 100,000 actual and artificial video clips, together with scenes from Unreal Engine environments. This intensive coaching helped the mannequin perceive numerous digital camera actions. The coaching course of used an automatic depth estimation technique, eliminating the necessity for handbook labeling.
Whereas technologically highly effective, Voyager has excessive {hardware} necessities. Working the mannequin at a 540p decision requires 60 GB of GPU reminiscence, and optimum outcomes want 80 GB. The system helps multi-GPU scaling, with an 8-GPU setup working roughly 6.7 occasions quicker than a single GPU. The mannequin weights have been made obtainable to researchers on Hugging Face.
Voyager vs. Different AI Fashions
Voyager’s method units it aside from present video technology fashions. Not like OpenAI’s Sora, which focuses on visible realism, Voyager prioritizes geometric consistency between frames. This focus helped it obtain a high rating of 77.62 on Stanford’s WorldScore benchmark, outperforming opponents like WonderWorld and CogVideoX-I2V. Nevertheless, it nonetheless has some limitations in exact digital camera management.
Moreover, there are some licensing restrictions for Voyager. Its use is prohibited within the European Union, the UK, and South Korea. Industrial purposes serving over 100 million lively customers require an extra settlement.
You Would possibly Additionally Like;
Observe us on TWITTER (X) and be immediately knowledgeable in regards to the newest developments…
Copy URL








