PhD Researcher · ML Engineer · Multi-modal AI Specialist

Ngoc Dung
Huynh

Building large-scale vision-language systems and internet-scale data pipelines. First-author publications at ICCV 2025 and CVPR 2026. Engineering consultant at TII Abu Dhabi and PhD researcher at Deakin University.

Melbourne, VIC, Australia Top 9 Worldwide — Toloka VQA Top 7 Globally — COVID Detection
Ngoc Dung Huynh
4+
Years Experience
8+
Publications

About

Who I am

I am a PhD researcher at Deakin University and an Engineering Consultant at the Technology Innovation Institute (TII, UAE). I specialise in multi-modal AI — the intersection of vision, language, and speech. My work spans architecting internet-scale data pipelines, training large vision-language models on GCP/AWS, building VQA benchmarks, and publishing at top-tier venues including ICCV and CVPR. I hold a BSc in Mathematics and an MSc in Data Science (GPA 86%) from Deakin University.

Core Competencies

What I work with

🤖

ML / AI

Vision-Language Models (VLMs), LLMs, Visual Question Answering, Multi-modal Reasoning, Speech-Vision-Language

⚙️

Frameworks

PyTorch, TensorFlow, Keras, HuggingFace Transformers, Weights & Biases

🗄️

Data Engineering

OCR Pipelines, ETL, Web Crawling, Deduplication, LLM Filtering, SFT Data Generation, Agent-Based Pipelines

☁️

Infrastructure

AWS, GCP, Docker, Linux, Slurm, Flask, React.js, Elasticsearch

💻

Programming

Python, JavaScript, SQL, R, C++

🎯

Specialties

Annotation Systems, STEM-VQA, Distributed Training, Benchmark Evaluation

Experience

Where I've worked

Engineering Consultant — Multi-modal AI & Data
Technology Innovation Institute (TII) · Abu Dhabi, UAE (Remote)
Jan 2025 – Present
  • Architected production-grade ETL pipelines to crawl, deduplicate and normalize internet-scale multi-modal datasets supporting Falcon-H training.
  • Processed 3M+ PDFs via OCR, layout parsing, and CV-based structured text extraction with multi-stage cleaning and deduplication.
  • Synthesized large-scale SFT datasets using GPT-4, Gemini, Claude, and Qwen to accelerate Falcon-H model alignment.
  • Unified data from 10+ agent platforms into multi-modal corpora with content filtering and quality-scoring pipelines.
  • Engineered a React-based annotation platform for segmentation and bounding-box labeling.
  • Trained large-scale VLMs on GCP and AWS across distributed Slurm clusters.
  • Built end-to-end VQA training and evaluation pipelines for STEM, charts, equations, and scientific plots.
Research Intern — Multi-modal AI
Technology Innovation Institute (TII) · Abu Dhabi, UAE
Apr 2024 – Jan 2025
  • Led research on multi-modal reasoning across speech, vision, and language modalities — contributing to ICCV 2025.
  • Developed and integrated ASR, VQA, OCR, and LLM inference components into unified end-to-end pipelines.
  • Co-authored ICCV 2025 paper and contributed to multiple arXiv publications on VLM evaluation.
Research Assistant — Visual Question Answering
Deakin University · Melbourne, Australia
Mar 2022 – Oct 2022
  • Ranked Top 9 worldwide in the Toloka VQA Challenge (WSDM Cup 2023).
  • Achieved Top 7 globally in the COVID Detection Challenge using 3D CT medical imaging.
  • Founded a university-wide AI competition at Deakin to grow the campus ML community.
Software Engineer (Part-time)
Stealth Startup · Singapore (Remote)
Aug 2020 – Nov 2021
  • Designed and delivered a full-stack multi-user annotation platform (Flask + React + Elasticsearch) with RESTful APIs.

Education

Academic background

PhD in Computer Science
Deakin University · Melbourne, Australia
Thesis: Designing Scalable and Interpretable Vision–Language–Speech Systems for Generalised Multi-modal Reasoning
Oct 2022 – Present
MSc in Data Science — GPA: 86%
Deakin University · Melbourne, Australia
Thesis: Speech-to-CDQL — Context Definition and Query Language from Natural Language for Smart Home
Mar 2020 – Mar 2022
BSc in Mathematics
University of Education Hue · Vietnam
Sep 2014 – Sep 2018