Video-driven Neural Physically-based Facial Asset for Production
DescriptionProduction-level workflows for producing convincing 3D dynamic human faces have long relied on an assortment of labor-intensive tools for geometry and texture generation, motion capture and rigging, and expression synthesis. Recent neural approaches automate individual components but the corresponding latent representations cannot provide artists with explicit controls as in conventional tools. In this paper, we present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets. Two key components are well-structured latent spaces due to dense temporal samplings from videos and explicit facial expression controls to regulate the latent spaces. For data collection, we construct a hybrid multiview-photometric capture stage, coupling with ultra-fast video cameras to obtain raw 3D facial assets. We then set out to model the facial expression, geometry and physically-based textures using separate VAEs where we impose a global multi-layer perceptron (MLP) based expression mapping across the latent spaces of respective networks, to preserve characteristics across respective attributes while maintaining explicit controls over facial geometry and texture generation. We also introduce the idea to model the delta information as wrinkle maps for the physically-based textures in our texture VAE, achieving high-quality 4K rendering of dynamic textures. We demonstrate our approach in high-fidelity performer-specific facial capture and cross-identity facial motion transfer and retargeting. In addition, our multi-VAE-based neural asset, along with the fast adaptation schemes, can also be deployed to handle in-the-wild videos. Besides, we motivate the utility of our explicit facial disentangling strategy by providing various promising physically-based editing results like geometry and material editing or wrinkle transfer with high realism. Comprehensive experiments show that our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.
Event Type
Technical Papers
TimeTuesday, 6 December 202210:00am - 12:00pm KST
Registration Categories