type
Page
status
Published
date
May 18, 2022
slug
about
summary
可用链接
/about访问,不会在菜单栏显示tags
category
icon
password

Research Interests
My current research interests lie at the intersection of reinforcement learning and robotics. My long-term goal is to build robots capable of sustainable, lifelong learning, allowing them to operate reliably in the real world, collaborate naturally with humans, and provide meaningful assistance in everyday environments.
Research Experience


Undergraduate Researcher, AMI Group, Nanyang Technological University
Feb 2026 - Present
Advisor: Bo An
Topic: Agentic Reinforcement Learning


Undergraduate Researcher, ICON Lab (BAIR), UC Berkeley
May 2025 - Present
Advisor: Negar Mehr
Topic: Multi-agent System, Continual Robot Learning


Undergraduate Researcher, LAMDA, Nanjing University
Jan 2024 – Present
Advisor: De-chuan Zhan
Topic: Reinforcement Learning, World Model
Publications
- Hongrui Zhao, Xunlan Zhou, Boris Ivanovic, Negar Mehr.

We presents UDON, a real-time multi-agent neural implicit mapping framework that introduces a novel combination of uncertainty-weighted consensus optimization and edge-based updates to achieve high mapping quality under severe communication deterioration. The uncertainty weighting prioritizes more reliable portions of the map, while the edge-based framework isolates and penalizes mapping disagreement between individual pairs of communicating agents.
- Shaowei Zhang, Jiahan Cao, Dian Cheng, Xunlan Zhou, Shenghua Wan, Le Gan, De-Chuan Zhan.
Leveraging Conditional Dependence for Efficient World Model Denoising. In: Advances in Neural Information Processing Systems 38 (NeurIPS-2025), San Diego, California, USA, 2025. [Paper] [Code]

We introduce CsDreamer, a model-based RL approach built upon the world model of Collider-Structure Recurrent State-Space Model (CsRSSM). CsRSSM incorporates colliders to comprehensively model the denoising inference process and explicitly capture the conditional dependence. Furthermore, it employs a decoupling regularization to balance the influence of this conditional dependence. By accurately inferring a task-relevant state space, CsDreamer improves learning efficiency during rollouts.
Preprints
- Xunlan Zhou, Xuanlin Chen, Shaowei Zhang, Xiankun Li, ShengHua Wan, Xiaohai Hu, Lei Yuan, Le Gan, De-chuan Zhan.
MARVL: Multi-Stage Guidance of Reinforcement Learning via Fine-Tuned Visual Language Models. [Paper] [Code]

Vision–language models (VLMs) offer a promising route to zero-shot reward design, but naïve CLIP-based rewards often misalign with task progress and yield spurious matches. We present MARVL—Multi-stAge guidance of Reinforcement Learning via fine-tuned Visual Language models—which fine-tunes a VLM for spatial grounding and trajectory sensitivity, and decomposes tasks into multi-stage subtasks with task direction projected similarity–based rewards. On the Meta-World benchmark, MARVL significantly improves success rate, sample efficiency, and camera robustness over VLM-reward baselines, with the largest gains on semantically well-specified manipulation tasks.
- Xunlan Zhou, Hongrui Zhao, Negar Mehr.
TACO: Temporal Consensus Optimization for Continual Neural Mapping. [Paper] [Code]

We introduce TACO, a Temporal Consensus Optimization framework for continual neural mapping. By incorporating uncertainty awareness into the consensus mechanism, TACO naturally modulates the influence of past knowledge based on its reliability, avoiding both catastrophic forgetting and over-conservative preservation of outdated maps. To the best of our knowledge, this work presents the first formulation of continual neural mapping as a temporal consensus optimization problem.
Unsupervised Reinforcement Learning (URL) allows agents to develop general behaviors and representations without relying on external rewards. This survey provides a clear overview of the field, organizing recent advances through a unified framework along four axes: learning paradigm, optimization objective, data regime, and agent quantity. We look at types that range from exploration driven by intrinsic rewards to discovering skills based on competence. We classify optimization goals, including maximizing entropy and mutual information, as well as maintaining self-supervised consistency. We also review data regimes that include exploration from scratch to offline datasets. By tracking the field’s development from early curiosity-based methods to recent foundation-model-driven pretraining, we highlight new trends focused on data-driven, scalable designs. Finally, we discuss important challenges, such as long-term exploration, transferring skills from simulation to real-world settings, and open-ended learning, while outlining a path for future research on developing autonomous agents.
We propose Building Reusability via Interface Composition Kinetics for Structured World Models (BRICKS-WM), a framework for the modular assembly of structured world models. We hypothesize that global dynamics can be decomposed into distinct subsystems interacting via shared protocols. As a minimal instantiation of this framework, we factorize the latent state space into an actuated Agent module and an external Background module, bridged by a learned latent interface. Distinct from prior object-centric methods that prioritize visual segmentation, BRICKS-WM enforces a functional separation in transition dynamics, ensuring that background physics remains agnostic to the agent's embodiment.