Monocular 3D reconstruction
Input image
Farm3D learns an articulated object category entirely from "free" virtual supervision from a 2D diffusion-based image generator.
We propose a framework that employs an image generator, such as Stable Diffusion, to produce training data for learning a reconstruction network from the ground up.
Additionally, the diffusion model is incorporated as a scoring mechanism to further improve learning.
Our method yields a monocular reconstruction network capable of generating controllable 3D assets from a single input image, whether real or generated, in a matter of seconds.