Deyu Zhou

Deyu Zhou 周德宇

dzhou861[at]connect.hkust-gz.edu.cn  
My research interests include interactive world model. My life target is chasing AI for social good.
Please feel free to reach me for any discussions, especially crazy and 0-to-1-to-10-to-100 ideas.

Ph.D candidate @ HKUST-GZ, supervised by Prof. Harry SHUM and Prof. Lionel NI.

Selective Projects

Huawei · Face Blur Detection, Virtual Makeup (2019)

Tencent AI Lab · Emotion Classification, Neural Machine Translation (2020-2021)

Xiaobing · Multimodal Conversation, Audio-driven Talking Head Generation (2021-2022)

Step Fun · Text-to-Video Generation, Autoregressive Video Generation (2024)

Research Highlights

30B Text-to-Video Generation Open-sourced Foundation Model

Step-Video-T2V Technical Report

Technical Report

State-of-the-art text-to-video model with 30B parameters, capable of generating 204-frame videos through novel architecture design.

MAGI
Model Arch.
Autoregressive Video Generation Novel Foundation Architecture

Taming Teacher Forcing for Masked Autoregressive Video Generation

CVPR 2025

A novel frame-level autoregressive video generation framework combining masked and causal modeling with Complete Teacher Forcing, achieving +23% FVD improvement.

THPAD
Talking Head Generation Diffusion Prior

TH-PAD: Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors

ICCV 2023

We introduce a novel framework for one-shot audio-driven talking head generation. Unlike prior works that require additional driving sources for controlled synthesis in a deterministic manner, we instead sample all holistic lip-irrelevant facial motions (i.e. pose, expression, blink, gaze, etc.) to semantically match the input audio while still maintaining both the photo-realism of audio-lip synchronization and overall naturalness.