QUEST | ICML 2026

Abstract

Reinforcement learning from demonstrations (RLfD) offers a promising method for robotic manipulation with sparse rewards. However, limited demonstrations often cause agents to encounter out-of-distribution states where world models produce poor predictions. In multi-stage tasks, jointly optimizing a learned reward function and policy introduces a moving target problem, and the resulting non-stationarity intensifies the impact of uncertainty on policy learning.

We propose QUEST, a model-based RL framework that adaptively switches between exploration and exploitation guided by uncertainty to achieve stable and efficient learning. QUEST employs intrinsic rewards to capture environmental stochasticity, leverages ensemble dynamics for uncertainty-guided planning, and introduces a hybrid sampling strategy to prioritize rare successful stage transitions. Experiments show that QUEST outperforms state-of-the-art methods by 17% on average, with gains increasing to 60% on difficult tasks, and enables zero-shot sim-to-real transfer on three real-world tasks.

Method Overview Figure

QUEST method overview pipeline — Overview of the QUEST framework for uncertainty-guided exploration, stable model-based planning, and sparse-reward manipulation.

Key Contributions

A QUEST framework that adaptively switches between exploration and conservative planning under uncertainty for sparse-reward manipulation from limited demonstrations.
RND-based intrinsic rewards applied to Q-function updates, improving exploration while keeping the world model aligned with the true environment.
Ensemble dynamics for uncertainty quantification, combined with a hybrid sampling strategy that prioritizes rare successful stage transitions.
Evaluation on 16 visual multi-stage manipulation tasks with only 10 expert demonstrations, plus zero-shot transfer to real-world robot tasks.

Method

Intrinsic Reward Learning

QUEST computes RND intrinsic rewards for Q-function updates, encouraging exploration in novel states without adding intrinsic rewards directly to world-model learning.

Uncertainty-Aware Planning

Ensemble dynamics quantify model uncertainty, allowing planning to adaptively switch between exploration and conservative behavior in out-of-distribution regions.

Hybrid Replay Sampling

The replay pipeline increases the influence of rare successful transitions, helping multi-stage policies preserve progress across long horizons.

Benchmark Results

QUEST is evaluated on 16 visual multi-stage sparse-reward tasks: 5 manipulation tasks and 2 humanoid tasks from ManiSkill3, 5 tasks from Meta-World, and 4 tasks from Robosuite, using only 10 expert demonstrations per task.

Benchmark average success rates — Average success rates across benchmark suites.

Benchmark learning curves — Learning curves averaged across benchmark tasks.

Ablation Study

Ablations isolate the contribution of intrinsic rewards, adaptive uncertainty-guided planning, hybrid sampling, uncertainty penalty strength, and the number of demonstrations.

QUEST component ablation results — Ablation analysis for major QUEST components.

QUEST demonstration and penalty ablation results — Additional ablations on demonstrations and planning penalties.

Real-World Experiments

QUEST transfers zero-shot to a real Franka Research 3 robot on Pick Cube, Stack Cube, and Lift Peg Upright. The sim-to-real setup uses hand-eye calibration and real background textures to align simulation observations with the deployment scene.

Real-world robot setup — Real-world setup, background scene, and simulator.

Videos

Poke Cube

PullCubeTool

Place Sphere

Citation

@inproceedings{sun2026quest,
  title     = {Uncertainty-Guided Exploration and Stable Planning for Sparse-Reward Manipulation from Limited Demonstrations},
  author    = {Sun, Haowen and Huang, Liqi and Li, Mingyang and Ren, Sihua and Chen, Xinzhe and Ma, Chengzhong and Liu, Zeyang and Chen, Xingyu and Lan, Xuguang},
  booktitle = {International Conference on Machine Learning},
  year      = {2026}
}

Acknowledgements / Contact

We thank the project collaborators and reviewers for their feedback. For questions about QUEST, please contact the authors.