SEE on a Unified Foundation for Representation, Inference and Learning

We will study a unified mathematical foundation of representation, inference and learning (focus area I) for Sensing, Exploitation and Execution (SEE). Based on this foundation, we will develop a two-way end-to-end SEE system (focus area II) with the following capabilities.

In one way, from sensors to users, the system takes multi-modal inputs from heterogeneous sources: videos from a network of optical, PTZ, and IR cameras, and text from COMINT, and performs the following tasks:

  1. Goal-guided inference which extracts the semantic contents through text and image parsing, and optimal scheduling of sensors and top-down/bottom-up computing processes, and generates joint parse graphs about objects, scenes and events in space and time;
  2. Translating the joint parse graphs into common xml and rdf formats, and generating narrative text report to users and domain Knowledge Base (KB) to support information retrieval;
  3. Answering user queries about the: ¡°who¡± and ¡°what¡± through object and scene recognition, ¡°when¡±, ¡°where¡± and ¡°how¡± through action and event recognition, and ¡°why¡± through probabilistic causal reasoning.
In a reverse way, from users to sensors, the system takes from user queries input in natural language and performs the following tasks:
  1. Parsing user query through lexical and syntactic processing, and semantic tagging;
  2. Transferring user queries into utility functions for entities --- objects, scenes, and events (i.e. nodes) in the spatial-temporal graphical representations, and propagating the utility functions through the deep hierarchy;
  3. Using the utility functions and computational costs to guide the online inference (parsing) so that entities (nodes) related to the users tasks are given higher priorities;
  4. Updating the sensor efficacy functions which measure how effective the sensor modalities are for the various entities (nodes) and thus actively control the sensors (such as PTZ cameras) online for better information gain.


This work is supported by the DARPA Award FA 8650-11-1-7149