Cost-Sensitive Top-down/Bottom-up Inference for Multiscale Activity Recognition

Mohamed R. Amer1     Dan Xie2     Mingtian Zhao2     Sinisa Todorovic1     Song-Chun Zhu2
1Oregon State University       2University of California, Los Angeles


This paper addresses a new problem, that of multiscale activity recognition. Our goal is to detect and localize a wide range of activities, including individual actions and group activities, which may simultaneously co-occur in high-resolution video. The video resolution allows for digital zoom-in (or zoom-out) for examining fine details (or coarser scales), as needed for recognition. The key challenge is how to avoid running a multitude of detectors at all spatiotemporal scales, and yet arrive at a holistically consistent video interpretation. To this end, we use a three-layered AND-OR graph to jointly model group activities, individual actions, and participating objects. The AND-OR graph allows a principled formulation of efficient, cost-sensitive inference via an explore-exploit strategy. Our inference optimally schedules the following computational processes: 1) direct application of activity detectors -- called \(\alpha\) process; 2) bottom-up inference based on detecting activity parts -- called \(\beta\) process; and 3) top-down inference based on detecting activity context -- called \(\gamma\) process. The scheduling iteratively maximizes the log-posteriors of the resulting parse graphs. For evaluation, we have compiled and benchmarked a new dataset of high-resolution videos of group and individual activities co-occurring in a courtyard of the UCLA campus.

Paper and Slides

  • Cost-Sensitive Top-down/Bottom-up Inference for Multiscale Activity Recognition [pdf][web with dataset ]
    M. R. Amer, D. Xie, M. Zhao, S. Todorovic, and S.C. Zhu
    European Conf. on Computer Vision (ECCV), 2012.

  • BibTeX

    author = {Amer, Mohamed R. and Xie, Dan and Zhao, Mingtian and Todorovic, Sinisa and Zhu, Song-Chun},
    booktitle = {ECCV},
    title = {{Cost-sensitive top-down / bottom-up inference for multiscale activity recognition}},
    year = {2012}

    Dataset Download

    Please read this page before registering.

    The dataset is available for free to researchers at academic institutions (universities, schools and government research labs) for non-commercial purpose.

    In order to obtain the dataset, we request that you provide information about the research organization. Requests which do not identify the organization you belong to will be denied. We will review your request and email the dataset to you within a day or so, if you are approved.

    Work based on this dataset should cite paper: [Mohamed R. Amer, Dan Xie, Mingtian Zhao, Sinisa Todorovic, Song-Chun Zhu, “Detecting and Localizing Activities at Different Scales under Time Budget.” ECCV, 2012].We greatly appreciate E-Mail about bugs or suggestions.

    The information provided on this form is used solely to keep of who has a copy of the dataset. We will never sell or distribute the information provided on this form.