IntrinsicDiffusion: Joint Intrinsic Layers from Latent Diffusion Models

1University of Bath, 2Adobe Research, 3Meta
*Part of this work was done while the first author was an intern at Adobe.

SIGGRAPH 2024

Abstract

Reasoning about the intrinsic properties of an image, such as albedo, illumination, and surface geometry, is a long-standing problem with many applications in image editing and compositing. Existing solutions to this ill-posed problem either heavily rely on manually designed priors or learn priors from limited datasets that lack diversity. Hence, they fall short in generalizing to in-the-wild test scenarios. In this paper, we show that a large-scale text-to-image generation model trained on a massive amount of visual data can implicitly learn intrinsic image priors. In particular, we introduce a novel conditioning mechanism built on top of a pre-trained foundational image generation model to jointly predict multiple intrinsic modalities from an input image. We demonstrate that predicting different modalities in a collaborative manner improves the overall quality. This design also enables mixing datasets with annotations of only a subset of the modalities during training, contributing to the generalizability of our approach. Our method achieves state-of-the-art performance in intrinsic image decomposition, both qualitatively and quantitatively. We also demonstrate downstream image editing applications, such as relighting and retexturing.

Results for 1K images

HDR reconstruction and brightness adjustment

Our model enables high dynamic range adjustment due to the effective separation in the saturated region and recovery of the lost albedo information. Image source: Laval Indoor HDR Dataset (top), and IIW benchmark (bottom).

Given input image.

Given input image

Loading...

Reconstruction (colorful shading)

Loading...

Reconstruction (gray shading)

Given input image.

Given input image

Loading...

Reconstruction (colorful shading)

Loading...

Reconstruction (gray shading)

Comparison on the IIW benchmark


For each sample, albedo (top) and shading (below) images are shown in the linear RGB space. Our surface normal predictions are shown in the first column of each second row.

BibTeX

@inproceedings{Luo2024IntrinsicDiffusion,
      author    = {Luo, Jundan and Ceylan, Duygu and Yoon, Jae Shin and Zhao, Nanxuan and Philip, Julien and Fr{\"u}hst{\"u}ck, Anna and Li, Wenbin and Richardt, Christian and Wang, Tuanfeng Y.},
      title     = {{IntrinsicDiffusion}: Joint Intrinsic Layers from Latent Diffusion Models},
      booktitle = {SIGGRAPH 2024 Conference Papers},
      year      = {2024},
      doi       = {10.1145/3641519.3657472},
      url       = {https://intrinsicdiffusion.github.io},
    }