Publications

(2026). CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning. European Chapter of the Association for Computational Linguistics (EACL) 2026 (oral presentation).
(2026). 360° Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method. Submitted to CVPR 2026 (2 weak Accepts and 1 weak Reject), resubmitted to ECCV 2026.
(2025). TB-Bench: Training and Testing Multi-Modal AI for Understanding Spatio-Temporal Traffic Behaviors from Dashcam Images/Videos. CVPR 2025 Workshop on Autonomous Driving.
(2025). Multimodal artificial intelligence approaches using large language models for expert-level landslide image analysis. Computer-Aided Civil and Infrastructure Engineering (CACIE), 2025.
(2025). Critical Scenario Prediction Planning and Reasoning. In submission to IEEE Transactions on Intelligent Vehicles (TIV) 2025.
(2024). KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain. ArXiv preprint, 2024.
(2024). Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction. IEEE Transactions on Intelligent Vehicles (TIV), 2024.
(2023). Visual Abductive Reasoning Meets Driving Hazard Prediction: Problem Formulation and Dataset. ArXiv preprint, 2023.
(2023). Leveraging Video Coding Knowledge for Deep Video Enhancement. ArXiv preprint, 2023.
(2022). GRIT: Faster and Better Image Captioning Transformer Using Dual Visual Features. European Conference on Computer Vision (ECCV) 2022.