PrOSe: Product of orthogonal spheres parameterization for disentangled representation learning

Ankita Shukla, Sarthak Bhagat, Shagun Uppal, Saket Anand, Pavan Turaga

Research output: Contribution to conferencePaperpeer-review

3 Scopus citations


Learning representations that can disentangle explanatory attributes underlying the data improves interpretabilty as well as provides control on data generation. Various learning frameworks such as VAEs, GANs and auto-encoders have been used in the literature to learn such representations. Most often, the latent space is constrained to a partitioned representation or structured by a prior to impose disentangling. In this work, we advance the use of a latent representation based on a product space of Orthogonal Spheres PrOSe. The PrOSe model is motivated by the reasoning that latent-variables related to the physics of image-formation can under certain relaxed assumptions lead to spherical-spaces. Orthogonality between the spheres is motivated via physical independence models. Imposing the orthogonal-sphere constraint is much simpler than other complicated physical models, is fairly general and flexible, and extensible beyond the factors used to motivate its development. Under further relaxed assumptions of equal-sized latent blocks per factor, the constraint can be written down in closed form as an ortho-normality term in the loss function. We show that our approach improves the quality of disentanglement significantly. We find consistent improvement in disentanglement compared to several state-of-the-art approaches, across several benchmarks and metrics.

Original languageEnglish (US)
StatePublished - 2020
Event30th British Machine Vision Conference, BMVC 2019 - Cardiff, United Kingdom
Duration: Sep 9 2019Sep 12 2019


Conference30th British Machine Vision Conference, BMVC 2019
Country/TerritoryUnited Kingdom

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition


Dive into the research topics of 'PrOSe: Product of orthogonal spheres parameterization for disentangled representation learning'. Together they form a unique fingerprint.

Cite this