Paper-Conference

Improving multimodal table understanding with code-driven reasoning.

Van-Quang Nguyen

• Apr 29, 2026 • 1 min read

A comprehensive benchmark and a training-free method for 360° image perception using MLLMs.

Huyen Tran

• Apr 29, 2026 • 1 min read

Introduces TB-Bench to train and evaluate multimodal agents for understanding complex traffic behaviors captured by dashcams.

Korawat Charoenpitaks

• Jun 1, 2025 • 1 min read

Presents GRIT, a dual-feature transformer that improves both speed and accuracy for image captioning.

Van-Quang Nguyen

• Oct 1, 2022 • 1 min read

Enhances interactive instruction following agents with wide-context perception and iterative reasoning.

Van-Quang Nguyen

• Aug 1, 2021 • 1 min read

Introduces an efficient attention design capturing full interactions in visual dialog systems.

Van-Quang Nguyen

• Aug 1, 2020 • 1 min read

Revisits single-stage detectors and boosts their effectiveness for face detection benchmarks.

Van-Quang Nguyen

• May 1, 2019 • 1 min read

Applies capsule networks to the challenging task of recognizing subtle micro-expressions.

Van-Quang Nguyen

• May 1, 2019 • 1 min read

Proposes a semi-supervised multi-label learning framework that explicitly models label-feature relationships.

Quang-Thuy Ha

• Sep 1, 2018 • 1 min read

Introduces a lifelong topic modeling pipeline tailored for Vietnamese multi-label text classification.

Quang-Thuy Ha

• Mar 1, 2018 • 1 min read