ViInfographicVQA: A Benchmark for Single and Multi-Image Visual Question Answering on Vietnamese Infographics

January 2026 Tue-Thu Van-Dinh*, Hoang-Duy Tran*, Truong-Binh Duong, et al. AAAI Workshop on AI for Scientific Research, 2026

#Vision-Language Models #Visual Question Answering #Multi-Image Reasoning #Vietnamese AI

Summary

ViInfographicVQA is a benchmark for evaluating vision-language models on Vietnamese infographics.

The benchmark covers both single-image and multi-image question answering and requires models to combine text recognition, layout understanding, visual interpretation, and semantic reasoning.

My Contribution

I spearheaded data acquisition for more than 6,000 infographics and contributed to the evaluation of state-of-the-art vision-language models.

Resources

Paper
Code and benchmark