<p>My biggest areas of expertise are generative AI and video understanding with AI.</p>


<h2>Current interests</h2>
<ul>
    <li>Agents for video creation (dataset and methods) </li>
    <li>Improving video generation quality</li>
</ul>

<h2>Generative AI</h2>
<p>My early works include:</p>
<ul>
    <li>Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image Generation</li>
    <li>Fast Diffusion Probabilistic Model Sampling through the lens of Backward Error Analysis</li>
    <li>Diffusion Motion: Generate Text-Guided 3D Human Motion by Diffusion Model</li>
    <li>GBSD: Generative Bokeh with Stage Diffusion</li>
</ul>
<p>These works investigate the statistical aspects of styles for diffusion, speeding up the optimization process, the generation of motion sequences, and getting Bokeh effects with time-varying prompts. </p>
<p>After the wide availability of SD models, in applications, we focused on various customization/consistency methods. We were one of the first to investigate the idea of inference-time customization. Now this has become a major area in the field.</p>
<ul>
    <li>Retrieving Conditions from Reference Images for Diffusion Models</li>
    <li>Multimodal Agent Collaborations for Video Generation</li>
</ul>
<p>In these more recent works, my focus has been on generation consistency, video generation and leveraging agent abilities to generation video data.</p>

<h2>Video Understanding</h2>
<ul>
    <li>Feature Combination Meets Attention</li>
    <li>ASM-Loc: Action-Aware Segment Modeling for Weakly-Supervised Temporal Action Localization</li>
    <li>SoccerNet-Tracking: Multiple Object Tracking Dataset and Benchmark in Soccer Videos</li>
    <li>SoccerNet 2022 Challenges Result</li>
    <li>SoccerNet game state reconstruction: End-to-end athlete tracking and identification on a minimap</li>
</ul>
<p>In these works, we processed videos with distributed pipelines. We applied transformers, state of the art multi-modal models, detection models, and so on to process Soccer videos.</p>

<!-- <h2>Current Works</h2>
<ul>
    <li>Video generation: Solving the object rigidity problem. Diffusion UNet (DiT to be investigated) are not good at maintaining structure. VLM agent-driven 3D generation enables easy generation of consistent long videos.</li>
    <li>Structure vs texture: Image and videos can be decomposed as structures and textures. 3D engine-driven videos can provide structural input into video generation. A lot of success has been had with works such as ControlNet, Animate Anyone, StableTryOn. Another possibility is to accelerate the diffusion process when structures are present.</li>
    <li>Agent for GUI workflow: Extracting GUI -> structured -> actions workflows from existing tutorials and make useful workflows. This also ties in with 1 for more powerful applications of 3D engines. (Action models should replace GUI eventually).</li>
</ul> -->