Understanding Scenes and Events through
Joint Parsing, Cognitive Reasoning and Lifelong Learning
The goal of this MURI team is to develop machines that have the following capabilities:
- Achieve deep understanding of scenes and events through joint parsing and cognitive reasoning about appearance, geometry, functions, physics, causality, intentsand beliefof agents, and use joint and long-ranged reasoning to fill the performance gap with human vision;
- Represent visual knowledge in probabilistic compositional models across the spatial, temporal, and causal hierarchies augmented with rich relations, which are task-oriented, support efficient task-dependentinference from an agent’s perspective, and preserve uncertainties;
- Acquire massive visual commonsense through web scale continuous lifelong learning from heterogeneous sources through weakly
supervised HCI and dialogue with humans; and
- Understand human needs and values to interact with humans effectively and answer human queries about what, who, where, when, why and how in storylines through Turing tests.
We have assembled a multi-disciplinary team from the US and UK, including experts inexperimental psychology and cognition, computer vision and learning, Cognitive and mathematical modeling. We take a multi-disciplinary approach that integrates four areas:
- Psychology and cognitive experiments;
- Knowledge representation;
- Lifelong learning; and
- Computer vision tasks in an inference engine.
Acknowledgments
This work is supported by the Office of Naval Research grant ONR MURI N00014-16-1-2007