ViInfographicVQA: A Benchmark for Single and Multi-Image Visual Question Answering on Vietnamese Infographics
Summary
ViInfographicVQA is a benchmark for evaluating vision-language models on Vietnamese infographics.
The benchmark covers both single-image and multi-image question answering and requires models to combine text recognition, layout understanding, visual interpretation, and semantic reasoning.
My Contribution
I spearheaded data acquisition for more than 6,000 infographics and contributed to the evaluation of state-of-the-art vision-language models.