Visual Question Answering (VQA) systems combine advances in computer vision and natural language processing to enable machines to answer open‐ended questions about images. At their core, these systems ...