Multi-view Multi-modality Data
Events are captured by a network of cameras with overlapping field-of-views.
Data are captured by various devices including regular stationary cameras, cameras mounted on moving vehicles, and infrared cameras.
Challenges
Sudden movements
Heavy Shadows
Illumination Variation
QA in Two Forms
Formal Language Queries
Formal language queries are composed in the form of conjunction of predicates similar to first-order logic sentences. Answers to these queries are either true or false.
Natural Language Questions
Open-ended natural language questions and answers composed by crowd-sourcing human annotators.