EMU:
A Corpus on Visual Misinformation

We present the task of Edited Media Understanding, requiring models to answer open-ended questions that capture the intent and implications of an image edit. EMU contains 48k rich question-answer pairs written in rich natural language on the intent, implications, and potential for misinformation in edited images.

48k Grounded Explainations over 8k Edits

EMU example

Multimodal disinformation, from deepfakes to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless -- such as a filtered vacation photo. The difference between this example, and harmful edits that spread disinformation, is one of intent. Recognizing and describing this intent is a major challenge for today's AI systems.

We present the task of Edited Media Understanding, requiring models to answer open-ended questions that capture the intent and implications of an image edit. We introduce a dataset for our task, EMU, with 48k question-answer pairs written in rich natural language. Our dataset serves as a testbed for the utility of artifical intelligence models in battling visual misinformation.

Paper

> read on arxiv



Authors

This work was done by a team of researchers from the Allen Institute for AI, University of Washington, Stanford University, and the University of Michigan.