Combining computer vision and video processing to achieve immersive mobile videoconferencing

Jorge Caviedes, Sin Lin Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations


In this paper we provide a technical perspective in support of the view that mobile videoconferencing is evolving towards immersiveness, with a detailed discussion of the technological readiness of the most critical components, namely the integration of depth and computer vision into the traditional video processing engine. Immersive user experiences are achieved by adding new dimensions to the perceptual space of interaction. Adding depth creates a sense of presence, and providing additional points of view creates a visual flexibility associated with real-life interaction. As the quality of service improves along with the computational capabilities of mobile platforms, we expect to see an evolution towards immersive mobile videoconferencing. Depth capture or extraction is the first element that allows 3D experiences as well as view synthesis. Depth maps or 3D models may be obtained by multiple methods but without application-driven post-processing there is no guarantee that the synthesized outputs provided to the end users will provide the expected quality of experience. The necessary convergence between video processing and computer vision also implies a shift from error-based performance metrics commonly used in computer vision, to visual quality metrics of the type used in video processing.

Original languageEnglish (US)
Title of host publication2014 IEEE International Conference on Image Processing, ICIP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages5
ISBN (Electronic)9781479957514
StatePublished - Jan 28 2014
Externally publishedYes


  • computer vision
  • immersive
  • mobile
  • quality of experience
  • videoconferencing

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition


Dive into the research topics of 'Combining computer vision and video processing to achieve immersive mobile videoconferencing'. Together they form a unique fingerprint.

Cite this