Learning action dictionaries from video

Pavan Turaga, Rama Chellappa

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Summarizing the contents of a video containing human activities is an important problem in computer vision and has important applications in automated surveillance systems. Summarizing a video requires one to identify and learn a 'vocabulary' of action-phrases corresponding to specific events and actions occurring in the video. We propose a generative model for dynamic scenes containing human activities as a composition of independent action-phrases - each of which is derived from an underlying vocabulary. Given a long video sequence, we propose a completely unsupervised approach to learn the vocabulary. Once the vocabulary is learnt, a video segment can be decomposed into a collection of phrases for summarization. We then describe methods to learn the correlations between activities and sequentiality of events. We also propose a novel method for building invariances to spatial transforms in the summarization scheme.

Original languageEnglish (US)
Title of host publication2008 IEEE International Conference on Image Processing, ICIP 2008 Proceedings
Number of pages4
StatePublished - 2008
Externally publishedYes
Event2008 IEEE International Conference on Image Processing, ICIP 2008 - San Diego, CA, United States
Duration: Oct 12 2008Oct 15 2008

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880


Other2008 IEEE International Conference on Image Processing, ICIP 2008
Country/TerritoryUnited States
CitySan Diego, CA


  • Activity analysis
  • Video summarization

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Signal Processing


Dive into the research topics of 'Learning action dictionaries from video'. Together they form a unique fingerprint.

Cite this