ViInfographicVQA: A Benchmark for Single and Multi-Image Visual Question Answering on Vietnamese Infographics
A Vietnamese benchmark for evaluating single-image and multi-image reasoning on information-rich infographics.
Content tagged with "visual question answering"
A Vietnamese benchmark for evaluating single-image and multi-image reasoning on information-rich infographics.
An automated pipeline for constructing a Vietnamese visual question answering dataset with natural-language explanations.
An investigation of region-level descriptions from the Describe Anything Model for visual question answering on text-rich images.
A counterfactual training framework designed to reduce language bias and improve visual grounding in visual question answering models.