My biggest areas of expertise are generative AI and video understanding with AI.
Current interests
- Agents for video creation (dataset and methods)
- Improving video generation quality
Generative AI
My early works include:
- Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation
- Fast Diffusion Probabilistic Model Sampling through the lens of Backward Error Analysis
- Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion Model
- GBSD: Generative Bokeh with Stage Diffusion
These works investigate the statistical aspects of styles for diffusion, speeding up the optimization process, the generation of motion sequences, and getting Bokeh effects with time-varying prompts.
After the wide availability of SD models, in applications, we focused on various customization/consistency methods. We were one of the first to investigate the idea of inference-time customization. Now this has become a major area in the field.
- Retrieving Conditions from Reference Images for Diffusion Models
- Multimodal Agent Collaborations for Video Generation
In these more recent works, my focus has been on generation consistency, video generation and leveraging agent abilities to generation video data.
Video Understanding
- Feature Combination Meets Attention
- ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization
- SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos
- SoccerNet 2022 Challenges Result
- SoccerNet game state reconstruction: End-to-end athlete tracking and identification on a minimap
In these works, we processed videos with distributed pipelines. We applied transformers, state of the art multi-modal models, detection models, and so on to process Soccer videos.