ViInfographicVQA: A Benchmark for Single and Multi-Image Visual Question Answering on Vietnamese Infographics
A Vietnamese benchmark for evaluating single-image and multi-image reasoning on information-rich infographics.
Peer-reviewed papers and preprints in multimodal learning, visual question answering, and vision-language models.
A Vietnamese benchmark for evaluating single-image and multi-image reasoning on information-rich infographics.
An automated pipeline for constructing a Vietnamese visual question answering dataset with natural-language explanations.
An investigation of region-level descriptions from the Describe Anything Model for visual question answering on text-rich images.