ViInfographicVQA: A Benchmark for Single and Multi-Image Visual Question Answering on Vietnamese Infographics

January 2026 Tue-Thu Van-Dinh*, Hoang-Duy Tran*, Truong-Binh Duong, et al. AAAI Workshop on AI for Scientific Research, 2026

Summary

ViInfographicVQA is a benchmark for evaluating vision-language models on Vietnamese infographics.

The benchmark covers both single-image and multi-image question answering and requires models to combine text recognition, layout understanding, visual interpretation, and semantic reasoning.

My Contribution

I spearheaded data acquisition for more than 6,000 infographics and contributed to the evaluation of state-of-the-art vision-language models.

Resources