We present a self-supervised method to learn dynamic 3D deformations of garments worn by parametric human bodies. State-of-the-art data-driven approaches to model 3D garment deformations are trained using supervised strategies that require large datasets, usually obtained by expensive physics-based simulation methods or professional multi-camera capture setups. In contrast, we propose a new training scheme that removes the need for ground-truth samples, enabling self-supervised training of dynamic 3D garment deformations. Our key contribution is to realize that physics-based deformation models, traditionally solved in a frame-by-frame basis by implicit integrators, can be recasted as an optimization problem. We leverage such optimization-based scheme to formulate a set of physics-based loss terms that can be used to train neural networks without precomputing ground-truth data. This allows us to learn models for interactive garments, including dynamic deformations and fine wrinkles, with two orders of magnitude speed up in training time compared to state-of-the-art supervised methods
@article {santesteban2022snug, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, title = {{SNUG}: {S}elf-{S}upervised {N}eural {D}ynamic {G}arments}, author = {Santesteban, Igor and Otaduy, Miguel A and Casas, Dan}, year = {2022} }
We present a self-supervised method to learn dynamic deformations of 3D garments worn by parametric human bodies. The key to our success is realizing that the solution to the equations of motion used in current physics-based methods can also be formulated as an optimization problem. More specifically, we show that the per-time-step numerical integration scheme used to update the vertex position (e.g., backward Euler) in physics-based simulators, can be recast as an optimization problem, and demonstrate that the function for this minimization can become the central ingredient of a self-supervised learning scheme.
We show that when trained using same motions and same architecture, direct supervision at the vertex level leads to smoothing artifacts. In contrast, our self-supervised physics-based loss is able to learn more realistic details, as shown in this test sequence.
In the video below we show a qualitative comparison of SNUG with state-of-the-art methods. SNUG generalizes well to unseen body shapes and motions, and produces detailed folds and wrinkles. SNUG results are, at least, on par with the realism of supervised methods that require large datasets (e.g., Santesteban et al. or TailorNet), and close to state-of-the-art offline physics-based simulation.
Furthermore, we also compared SNUG to PBNS, the only existing self-supervised method for garment deformations. In the video below we demonstrate that SNUG is capable of learning more realistic deformations, mainly due to two contributions: first, PBNS enforces static physical consitency, while SNUG formulates a full dynamic simulation as self-supervised loss; and second, we formulate SNUG losses using the Saint Venant Kirchoff (StVK) model, in contrast to simpler alternatives such as mass-spring that lead to less expressive deformations.
Our method runs at interactive frame rates. Bellow we show a live recording of our demo, using test motion sequences from the AMASS dataset. Notice how our approach enables us to interactively manipulate the shape parameter of the subject, while producing highly realistic garment deformations, without using any supervision at train time.