ViInfographicVQA: A Benchmark for Single and Multi-Image Visual Question Answering on Vietnamese Infographics
A Vietnamese benchmark for evaluating single-image and multi-image reasoning on information-rich infographics.
Content tagged with "vision-language models"
A Vietnamese benchmark for evaluating single-image and multi-image reasoning on information-rich infographics.
An investigation of region-level descriptions from the Describe Anything Model for visual question answering on text-rich images.