StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal

1FNii, The Chinese University of Hongkong, Shenzhen, China
2SSE, The Chinese University of Hongkong, Shenzhen, China
3Alibaba Group, China
4Max Planck Institute for Intelligent Systems, Germany
*Equal contribution, Corresponding Author
SIGGRAPH Asia 2024 (Journal Track)

This work addresses the challenge of high-quality surface normal estimation from monocular colored inputs ( i.e., images and videos), a field which is recently revolutionized by repurposing diffusion priors. However, these attempts still struggle with high-variance inference, which conflicts with the deterministic nature of Image2Normal task. Our method, StableNormal, aims to reduce the inference variance, thus producing “stable” and “sharp” normal estimates, even under challenging imaging conditions, such as extreme lighting, motion/defocus blur, and low-quality/compressed images. It is also robust against transparent and reflective surfaces, as well as cluttered scenes with numerous objects. Specifically, StableNormal employs a coarseto-fine strategy, which starts with a one-step normal estimator (YOSO) to establish a reliable initial normal, that is relatively coarse, then followed by a semantic-guided refinement process (SG-DRN) that refines the normals to recover geometric details. The effectiveness of StableNormal is demonstrated through competitive performance on standard datasets like DIODE-indoor, iBims, ScannetV2, and NYUv2, and its capability in enhancing various downstream tasks, such as surface reconstruction and normal enhancement, is also showcased. These results evidence that StableNormal retains both the “stability” and “sharpness” necessary for accurate normal estimation. Our StableNormal is a good step to repurpose diffusion priors for deterministic estimation. To democratize this, code and models will be publicly available.

Pipeline

Marigold training scheme

The overall pipeline is composed of two stages: 1) YOSO aims to produce a confident initialization 𝑥t with a novel Noise-Decoupled MSE Loss; 2) SG-DRN plays the role of stable denoising, by leveraging the stronger semantic control information extracted from DINO [Oquab et al. 2024]. The textual prompt for the U-Net in both stages is set to “The normal map”.

Comparison Gallery

Citation

@article{ye2024stablenormal,
title = {StableNormal: Reducing Diffusion Variance for Stable and Sharp Normal},
author = {Ye, Chongjie and Qiu, Lingteng and Gu, Xiaodong and Zuo, Qi and Wu, Yushuang and Dong, Zilong and Bo, Liefeng and Xiu, Yuliang and Han, Xiaoguang},
journal = {ACM Transactions on Graphics},
year = {2024},
}