Multi-view Multi-modality Data

Events are captured by a network of cameras with overlapping field-of-views.

Data are captured by various devices including regular stationary cameras, cameras mounted on moving vehicles, and infrared cameras.

Challenges

Sudden movements

Heavy Shadows

Illumination Variation

QA in Two Forms
Formal Language Queries

Formal language queries are composed in the form of conjunction of predicates similar to first-order logic sentences. Answers to these queries are either true or false.

Natural Language Questions

Open-ended natural language questions and answers composed by crowd-sourcing human annotators.