VNU-HCM, University of Science

Ho Chi Minh City, Vietnam

duongtruongbinh2003@gmail.com

Hello! I’m Truong-Binh Duong, a Computer Science graduate from the University of Science – VNU-HCM, specializing in multi-modal AI and Vision-Language Models.

I recently completed the university’s High-Quality Program with an exceptional GPA of 3.98/4.0, earning Dean’s List recognition for 5 consecutive semesters. My graduation thesis, Counterfactual Reasoning for Robust Visual Question Answering, explored counterfactual training strategies that boost out-of-distribution robustness without extra annotations.

🎯 Current Focus

I’m pursuing AI Research Intern roles where I can advance state-of-the-art multi-modal reasoning systems. I enjoy rigorous experimentation, benchmarking, and translating research insights into production-ready models using PyTorch, Transformers, and modern MLOps practices.

🔬 Research & Publications

I have 2 peer-reviewed publications spanning Vietnamese Visual Question Answering and text-rich image analysis:

ViVQA-X (ICISN 2026, LNNS by Springer Nature) — introduced the first Vietnamese VQA dataset with natural language explanations via an automated, multi-LLM pipeline, now officially published with DOI 10.1007/978-981-95-1746-6_18.
Describe Anything Model for VQA on Text-rich Images (ICCV Workshop 2025) — led inference and benchmarking of Vision-Language Models across six datasets.

💼 Professional Experience

Currently serving as a Teaching Assistant at AI VIETNAM, where I design curriculum, author assessments, and produce teaching assets covering Machine Learning, Deep Learning, AI Agents, and LLM Reasoning.

🚀 Technical Expertise

Programming: Python, C/C++, SQL
Data Science: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, XGBoost
Deep Learning: PyTorch, Hugging Face, Transformers, CNN, RNN, LSTM, MLP
MLOps & Tools: Docker, MLflow, Weights & Biases, FastAPI, Streamlit, Google Cloud (Vertex AI)
Languages: Vietnamese (Native), English (Professional Working Proficiency – VSTEP B2)

🛠️ Featured Projects

Counterfactual Reasoning for Robust Visual Question Answering (Thesis, 2025): Developed batch-contrastive losses, counterfactual sample synthesis, and a 3-stage curriculum that achieved state-of-the-art performance on VQA-CP v2 for annotation-free methods. Read the project summary →
Heineken Image Analysis Tool (Hackathon, 2024): Engineered a multi-model AI pipeline (YOLOv10, Owlv2, PaddleOCR, CLIP) with a FastAPI backend to automate brand compliance and safety audits from images.

I’m passionate about turning research into practical solutions and have hands-on experience with the complete AI development pipeline—from data collection and model training to deployment and scaling.

Feel free to reach out through the social channels below to discuss research opportunities, collaborations, or potential internships!

news

Jan 14, 2026	Our paper “An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset” is now officially published in the Proceedings of the Fifth International Conference on Intelligent Systems and Networks (Lecture Notes in Networks and Systems, Springer Nature Singapore). DOI: 10.1007/978-981-95-1746-6_18 Pages: 164–173 Thrilled to finally share the camera-ready version—this milestone marks the official release of ViVQA-X, the first Vietnamese VQA dataset with natural language explanations.
Jul 31, 2025	Our paper “Describe Anything Model for Visual Question Answering on Text-rich Images” will be presented at ICCV Workshop 2025! Proud to contribute to this collaborative work on advancing VQA for text-rich images. 📊🔍
Feb 13, 2025	Excited to announce that our paper “An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset” has been accepted at ICISN 2025! This work introduces ViVQA-X, the first Vietnamese Visual Question Answering dataset with natural language explanations. 🇻🇳✨
Aug 31, 2024	I’m thrilled to start my role as a Teaching Assistant at AI VIETNAM! I’ll be developing curriculum and teaching materials for the AIO program, covering fundamental ML, Deep Learning, and advanced topics like AI Agents and LLM Reasoning. 🤖📚

latest posts

Mar 26, 2025	a post with plotly.js
Dec 04, 2024	a post with image galleries

selected publications

ICISN 2025

An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset

Truong-Binh Duong, Hoang-Minh Tran, Binh-Nam Le-Nguyen, and 1 more author

In Proceedings of the Fifth International Conference on Intelligent Systems and Networks, 2025

Abs DOI Bib HTML PDF

@inproceedings{duong2026vivqax,
  title = {An Automated Pipeline for Constructing a Vietnamese VQA-NLE Dataset},
  author = {Duong, Truong-Binh and Tran, Hoang-Minh and Le-Nguyen, Binh-Nam and Duong, Dinh-Thang},
  booktitle = {Proceedings of the Fifth International Conference on Intelligent Systems and Networks},
  series = {Lecture Notes in Networks and Systems},
  pages = {164--173},
  year = {2025},
  publisher = {Springer Nature Singapore},
  isbn = {978-981-95-1746-6},
  doi = {10.1007/978-981-95-1746-6_18},
  github = {duongtruongbinh/ViVQA-X},
}

ICCV Workshop
Describe Anything Model for Visual Question Answering on Text-rich Images

Yen-Linh Vu^*, Dinh-Thang Duong^*, Truong-Binh Duong, and 1 more author

ICCV Workshop, 2025

*Equal contribution

Abs arXiv Bib HTML

We propose the Describe Anything Model (DAM-QA) for Visual Question Answering on text-rich images. Our approach demonstrates strong performance across six benchmark datasets through comprehensive evaluation of multiple Vision-Language Models.
@article{vu2025dam, title = {Describe Anything Model for Visual Question Answering on Text-rich Images}, author = {Vu, Yen-Linh and Duong, Dinh-Thang and Duong, Truong-Binh and others}, journal = {ICCV Workshop}, year = {2025}, publisher = {IEEE}, github = {Linvyl/DAM-QA}, note = {*Equal contribution} }