Synergy between objects, scenes, and events
We have started a series of work (at Feifei’s group and Zhu’s group) that explore the interactions between humans and objects in scenes. Fei-Fei’s work has been focused on sports video where human actions (poses) and equipments (ball, racket) provide mutual context information to improve recognition. In Zhu’s work, human actions are defined by a set of spatial and temporal relations between humans, their parts, and objects in the scenes. Therefore, action recognition relies on object recognition, and event understanding, in return, provide top-down information for action recognition, which further improves object recognition, especially for small objects, such as tea cup, phones involved in drinking tea or making phone calls. Human action, such as trajectory on the floor also helps scene segmentation.