Category‑Agnostic
Neural Object Rigging

CVPR 2025

Guangzhao He^1, Chen Geng^1, Shangzhe Wu^1,2, Jiajun Wu¹

¹Stanford University, ²University of Cambridge

(* Equal Contribution)

Introduction

In this work, we study the problem of modeling and understanding the structure of motion for arbitrary categories of deformable objects without prior knowledge. Due to the inherent low-dimensionality of motion structures, we learn to discover a low-dimensional pose space for dynamic objects by encoding them into a set of blobs. This representation captures interpretable structures for a diverse range of object categories, enabling intuitive object pose manipulation through explicit blob editing.

Try animating with blobs yourself!

Our architecture learns to disentangle object parts and represent them as blobs in a feed-forward and fully self-supervised manner. The blobs can then be used for editing the pose and shape of the objects.
Try out this interactive 3D demo by selecting an object and sliding the sliders !

Select An Object

Quadruped

Fish

Glasses

1. Input Mesh

2. Predicted Blobs

Slide To Edit

3. Edited Blobs

4. Edited Mesh

Slide to animate

Left-click and drag to rotate

Right-click and drag to move

Scroll to zoom

Animating "Clay-Monster"

We capture a real-world category called "clay-monster", with 12 clay figures scanned in 3 to 5 poses each using only an iPhone.

Using this simple scanning pipeline, we are able to train our model for pose manipulation of such artificial "clay-monsters".

Approach

We use a set of feature-embeded blobs to represent the pose space of deformable objects. An encoder takes an object point cloud as input and maps it into blobs using a learnable codebook of query tokens that cross-attend with semantic point-wise features. Once generated, these blobs can be edited by users to adjust the object's pose. To perform decoding on the edited blobs, they are voxelized into a feature volume and mapped to a 3D occupancy volume using a transformer architecture. Finally, we query the decoded volume with sampled 3D coordinates to predict occupancy values, which are used to extract the edited mesh.

Miscellaneous

Bibtex

@InProceedings{He_2025_CVPR,
  author    = {He, Guangzhao and Geng, Chen and Wu, Shangzhe and Wu, Jiajun},
  title     = {Category-Agnostic Neural Object Rigging},
  booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)},
  month     = {June},
  year      = {2025},
  pages     = {22078-22088}
}

Category‑Agnostic Neural Object Rigging