Clothing plays a fundamental role in digital humans. Current approaches to animate 3D garments are mostly based on realistic physics simulation, however, they typically suffer from two main issues: high computational run-time cost, which hinders their development; and simulation-to-real gap, which impedes the synthesis of specific real-world cloth samples. To circumvent both issues we propose PERGAMO, a data-driven approach to learn a deformable model for 3D garments from monocular images. To this end, we first introduce a novel method to reconstruct the 3D geometry of garments from a single image, and use it to build a dataset of clothing from monocular videos. We use these 3D reconstructions to train a regression model that accurately predicts how the garment deforms as a function of the underlying body pose. We show that our method is capable of producing garment animations that match the real-world behaviour, and generalizes to unseen body motions extracted from motion capture dataset.
@article {casado2022pergamo, journal = {Computer Graphics Forum (Proc. of SCA), 2022}, title = {{PERGAMO}: Personalized 3D Garments from Monocular video}, author = {Casado-Elvira, Andrés and Comino Trinidad, Marc and Casas, Dan}, year = {2022} }
We introduce an approach to learn a deformation model for 3D garments from a single monocular video.
PERGAMO is based on two key features: it is learned from casual real-world images, hence there is no
simulation-to-real gap or need for multi-camera setups; and it is highly-efficient to evaluate,
since at inference time it uses a shallow neural network that directly outputs garment deformations.
All in all, our main contribution is a 3D clothing reconstruction pipeline that is able to recover
the explicit layer of a garment from just monocular RGB input. These reconstructions enable us to
train a data-driven model to infer how a specific garment deforms.
To formulate PERGAMO, we use a novel two-stage approach where we first build a dataset by
reconstructing the 3D geometry of deformed garments, and then learn a nonlinear regressor from the
reconstructed meshes. More specifically, we initially extract human-related features such as body
segmentation, body pose, and body normals, from the input images, which we leverage to deform a mesh
template to reconstruct fine-scale detailed clothing using a differentiable rendering optimization.
Then, we use the reconstructed garments as ground truth data to train a 3D garment deformation
regressor. We show that the learned model outputs pose-dependent garment surface details, such as
fold and wrinkles, that closely match the real-world behaviour of the garment.
Below we depict an overview of our 3D garment reconstructing approach. Starting from an RGB frame,
we first extract human semantic and 3D information. We then fit a coarse garment template to the
estimated body normals, and finally we add fine-scale wrinkles by optimizing per-vertex displacement
using differentiable rendering.
Below we show a visualization of the optimization process used to reconstruct garments.
The following animations show some video clips taken with a mobile phone camera side-by-syde with
their respective reconstruction made by our method.
Below are some comparisons against MonoClothCap, another reconstruction method.
After creating a dataset of reconstructions, a regressor can be trained to predict a garment for a
given pose. Here we show some examples for test motion sequences from the AMASS dataset.