#text-rich images

2025-10-01 Yen-Linh Vu*, Dinh-Thang Duong*, Truong-Binh Duong, et al. VisionDocs Workshop at ICCV 2025

An investigation of region-level descriptions from the Describe Anything Model for visual question answering on text-rich images.