Entry Date:
January 26, 2016

Multi-Sentence Event Recognition


Existing approaches to labeling images and videos with natural-language sentences generate either one sentence or a collection of unrelated sentences. Humans, however, produce a coherent set of sentences, which reference each other and describe the salient activities and relationships being depicted.