RPMArt: Towards Robust Perception and Manipulation for Articulated Objects

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

1Shanghai Jiao Tong University, 2Stanford University, 3Hefei University of Technology

Abstract

Articulated objects are commonly found in daily life. It is essential that robots can exhibit robust perception and manipulation skills for articulated objects in real-world robotic applications. However, existing methods for articulated objects insufficiently address noise in point clouds and struggle to bridge the gap between simulation and reality, thus limiting the practical deployment in real-world scenarios. To tackle these challenges, we propose a framework towards Robust Perception and Manipulation for Articulated Objects (RPMArt), which learns to estimate the articulation parameters and manipulate the articulation part from the noisy point cloud. Our primary contribution is a Robust Articulation Network (RoArtNet) that is able to predict both joint parameters and affordable points robustly by local feature learning and point tuple voting. Moreover, we introduce an articulation-aware classification scheme to enhance its ability for sim-to-real transfer. Finally, with the estimated affordable point and articulation joint constraint, the robot can generate robust actions to manipulate articulated objects. After learning only from synthetic data, RPMArt is able to transfer zero-shot to real-world articulated objects. Experimental results confirm our approach's effectiveness, with our framework achieving state-of-the-art performance in both noise-added simulation and real-world environments.

Video

Framework

  • Robust articulation network
    • Input: single-view point cloud
    • Output: joint parameters and affordable point
  • Affordance-based physics-guided manipulation
    • Affordable grasp pose selection
    • Articulation joint constraint

During training, several voting targets are generated by part segmentation, joint parameters and affordable points from the simulator to supervise RoArtNet. When given the real-world noisy point cloud observation, RoArtNet can still generate robust joint parameters and affordable points estimation by point tuple voting. Then, affordable initial grasp poses can be selected from AnyGrasp-generated grasp poses based on the estimated affordable points, and subsequent actions can be constrained by the estimated joint parameters.

Framework
RoArtNet

Our primary contribution is the robust articulation network, which is carefully designed to be robust and sim-to-real, by local feature learning, point tuple voting, and an articulation awareness scheme. First, a collection of point tuples are uniformly sampled from the point cloud. For each point tuple, we predict several voting targets with a neural network from the local context features of the point tuple. Further, an articulation score is applied to supervise the neural network so that the network is aware of the articulation structure. Then, we can generate multiple candidates using the predicted voting targets, given the one degree-of-freedom ambiguity constraint. The candidate joint origin, joint direction and affordable point with the most votes, from only point tuples with high articulation score, are selected as the final estimation.

Experimental Results

We evaluate RPMArt in both noise-added simulation and real-world environments. In the simulation environment, we add different levels of noise to the point clouds to test the robustness of models. And we also test the sim-to-real transfer ability of RPMArt by directly applying the model trained on synthetic data to real-world scenarios.

Simulation Perception

sim_perception_origin_all

Joint origin estimation results

sim_perception_direction_all

Joint direction estimation results

sim_perception_affordance_all

Affordable point estimation results

We gradually add higher level of noise to the input point clouds, and test the joint parameters and affordable points estimation performance. Lower is better. Error bars represent the standard deviation. Results are averaged across six object categories. This shows that RoArtNet is robust to noise in the input point clouds.


Results for each object category.
Category Joint origin Joint direction Affordable point
Microwave sim_perception_origin_microwave sim_perception_direction_microwave sim_perception_affordance_microwave
Refrigerator sim_perception_origin_refrigerator sim_perception_direction_refrigerator sim_perception_affordance_refrigerator
Safe sim_perception_origin_safe sim_perception_direction_safe sim_perception_affordance_safe
Storage Furniture sim_perception_origin_storagefurniture sim_perception_direction_storagefurniture sim_perception_affordance_storagefurniture
Drawer sim_perception_origin_drawer sim_perception_direction_drawer sim_perception_affordance_drawer
Washing Machine sim_perception_origin_washingmachine sim_perception_direction_washingmachine sim_perception_affordance_washingmachine

Results show that all baselines and RoArtNet achieve high estimation precision without noise added across all object categories. Nevertheless, with the increasing level of noise, all three baselines exhibit a pronounced increase in estimation errors, while the mean estimation error of RoArtNet increases very slowly. And the baselines also have much higher standard deviation compared to RoArtNet when high level of noise is added. There also exist some interesting phenomena in the results. The StorageFurniture and Drawer categories have much higher estimation errors for ANCSH and GAMMA compared to other categories. This is possibly because the StorageFurniture and Drawer categories comprise relatively small articulation parts, while the ANCSH and GAMMA models somehow rely on the part segmentation to finish the estimation. In addition, the two methods both contain an optimization-based procedure, i.e. RANSAC transformation estimation for ANCSH and DBSCAN part clustering for GAMMA, which may be unsolvable when the part segmentation is not accurate. It is also interesting that the WashingMachine category demonstrates partly a decreasing trend when the noise level is high.


Simulation Manipulation

We also add different levels of noise to the input point clouds, and report the manipulation success rates. We run around 100 trials per object instance for each task. Higher is better. Results are averaged across six tasks. We also run manipulation experiments using the ground truth joint parameters and affordable points, and the average success rates are 96.694% and 99.627% for pulling and pushing respectively. It demonstrates that RPMArt can still stably manipulate articulated objects with inaccurate perception results. We also show the twelve task examples finished by RPMArt under noise level 2. We can observe that the perception results are still robust to noise but actually not perfect, but RPMArt can still successfully manipulate the articulated objects.

sim_manipulation_all

Manipulation results


Pull Microwave


Pull Refrigerator


Pull StorageFurniture (Prismatic)


Pull Safe


Pull StorageFurniture (Revolute)


Pull Drawer


Push Microwave


Push Refrigerator


Push StorageFurniture (Prismatic)


Push Safe


Push StorageFurniture (Revolute)


Push Drawer

Results for each task.
Tasks Results Tasks Results
Pull/Push Microwave sim_manipulation_microwave Pull/Push Refrigerator sim_manipulation_refrigerator
Pull/Push Safe sim_manipulation_safe Pull/Push StorageFurniture (Revolute) sim_manipulation_storagefurniture_revolute
Pull/Push StorageFurniture (Prismatic) sim_manipulation_storagefurniture_prismatic Pull/Push Drawer sim_manipulation_drawer

Results show that RPMArt achieves the highest success rate under noise level 4 across almost all tasks except Push Refrigerator. And we can observe the least degradation in performance of RPMArt with the increase of noise. Actually, when no noise is added, RPMArt only achieves comparable or even slightly worse performance than the baselines, especially GAMMA. There also exist some interesting phenomena in the results. Under noise level 4, PointNet++-based method always achieves better performance than ANCSH and GAMMA in pushing tasks, while ANCSH and GAMMA often have better results in pulling tasks. Note that pulling tasks are often considered more difficult than pushing tasks, since pulling tasks often require first grasping the target part while it is unnecessary for pushing tasks. However, ANCSH and GAMMA often obtain higher success rates than PointNet++-based method without noise added. In addition, for Pull/Push StorageFurniture (Prismatic) and Pull/Push Drawer under noise level 4, ANCSH and GAMMA achieve much lower success rates, which may also be attributed to the fact that these parts are relatively small, as discussed in simulation perception.


Real-world Perception

Category Method Error
Orig. (cm) Dir. (o) Afford. (cm)
Microwave PointNet++ 4.495±3.573 9.273±5.828 15.443±4.730
ANCSH 5.103±5.522 9.166±9.557 12.711±7.927
GAMMA 2.531±2.901 9.911±10.671 7.242±10.191
RoArtNet (ours) 3.830±2.372 5.189±3.619 6.754±3.275
Refrigerator PointNet++ 5.210±4.274 9.605±5.340 12.475±9.505
ANCSH 5.938±5.798 7.998±5.910 12.814±13.604
GAMMA 4.019±4.580 8.684±6.455 12.331±9.974
RoArtNet (ours) 2.111±1.701 8.491±4.270 5.849±2.797
Safe PointNet++ 5.985±4.162 5.936±2.861 9.235±5.630
ANCSH 5.167±6.758 7.706±14.275 8.505±9.768
GAMMA 3.179±3.857 8.156±13.737 9.062±9.667
RoArtNet (ours) 4.116±2.428 5.878±2.769 8.349±4.391
StorageFurniture PointNet++ 7.542±4.517 8.776±4.991 10.634±4.025
ANCSH 6.408±4.222 9.612±6.404 5.176±6.023
GAMMA 3.481±2.275 12.672±10.186 4.742±6.664
RoArtNet (ours) 4.604±2.050 9.682±5.449 7.946±3.402
Drawer PointNet++ 8.331±3.377 7.862±5.295 10.227±4.465
ANCSH 13.849±3.764 12.143±8.030 7.725±4.695
GAMMA 5.058±2.360 14.666±6.767 6.967±3.114
RoArtNet (ours) 5.992±3.063 11.315±5.596 7.733±5.246
WashingMachine PointNet++ 8.845±6.795 37.497±20.679 19.972±9.440
ANCSH 5.162±4.918 16.243±12.089 11.541±8.232
GAMMA 6.488±6.178 28.441±14.866 15.964±13.286
RoArtNet (ours) 1.578±1.201 5.600±2.711 3.250±0.669

We present perception results on real-world point clouds of six articulated objects, using the models trained only on synthetic data. We collect data by capturing depth images for each object with 5 uniformly selected joint states, each from 20 randomly selected camera views, and throw away some bad views which cannot capture enough area of the target part. Here shows the mean estimation errors and standard deviations under the condition of excluding backgrounds and distractors. Lower is better. We can find that RoArtNet demonstrates more stable performance compared to other baselines. However, some performance degradation is found in the StorageFurniture and Drawer categories for RoArtNet, as well as for ANCSH and GAMMA. This could possibly be attributed to the relatively small size of parts in these two objects, where all three models somewhat rely on part segmentation to complete the estimation. Another noteworthy observation pertains to the performance on WashingMachine. Specifically, only RoArtNet successfully estimates targets accurately, while the other three baselines exhibit significantly large estimation errors. We find a potential reason that we take a relatively small washing machine toy as the object, then the influence of noisy points is relative significant.

real_perception_with_with

We also visualize some qualitative results of the performance on real-world articulation perception, with both background and distractors included. Color is used only for visualization here. Red arrows represent the estimated articulation joints, and blue points represent the estimated affordable points. RoArtNet can still generate robust joint parameters and affordable points estimation even with the presence of some unrelated points.


Results under other conditions.
Category Method Error (w/o bkg., w/ distr.)
Orig. (cm) Dir. (o) Afford. (cm)
Microwave PointNet++ 4.221±3.345 12.868±7.452 18.109±6.077
ANCSH 5.537±4.723 9.881±9.128 10.637±9.727
GAMMA 3.617±5.011 12.688±10.575 6.027±6.062
RoArtNet (ours) 3.729±2.093 5.545±3.692 6.680±3.307
Refrigerator PointNet++ 4.438±3.450 8.391±4.060 17.996±9.864
ANCSH 6.539±5.800 11.547±11.638 15.266±25.889
GAMMA 3.727±5.176 13.007±10.917 6.150±6.283
RoArtNet (ours) 2.423±1.826 9.273±4.481 6.289±2.891
Safe PointNet++ 2.946±2.487 7.398±4.105 13.130±6.828
ANCSH 5.582±7.415 5.952±8.616 10.024±11.781
GAMMA 4.255±6.745 8.417±15.101 9.379±11.808
RoArtNet (ours) 4.167±2.404 6.397±3.522 8.467±4.172
StorageFurniture PointNet++ 8.470±3.952 10.673±7.465 12.671±5.034
ANCSH 7.913±4.695 10.727±9.722 6.545±9.330
GAMMA 3.790±3.232 14.266±11.795 5.964±8.158
RoArtNet (ours) 4.683±2.285 9.953±5.314 8.195±3.538
Drawer PointNet++ 8.070±2.994 10.079±8.011 9.893±4.564
ANCSH 13.950±4.327 16.978±10.289 8.993±5.578
GAMMA 5.435±4.522 21.352±10.900 9.292±4.868
RoArtNet (ours) 5.987±2.948 12.260±6.181 7.604±5.094
WashingMachine PointNet++ 7.164±5.784 38.032±7.299 22.339±7.954
ANCSH 5.317±5.283 22.207±11.601 20.071±12.927
GAMMA 10.251±6.197 29.490±14.238 33.499±10.165
RoArtNet (ours) 1.931±1.515 6.610±3.584 3.045±0.815
Category Method Error (w/ bkg., w/o distr.)
Orig. (cm) Dir. (o) Afford. (cm)
Microwave PointNet++ 3.836±3.277 16.282±6.682 29.862±7.500
ANCSH 4.374±3.235 16.242±8.642 10.206±8.468
GAMMA 8.818±6.972 30.027±14.918 10.924±6.667
RoArtNet (ours) 4.090±2.654 8.795±5.443 10.669±4.989
Refrigerator PointNet++ 9.099±6.254 8.776±4.015 37.845±8.664
ANCSH 6.827±6.272 13.108±8.639 40.830±18.239
GAMMA 7.728±4.400 17.593±11.808 36.257±9.741
RoArtNet (ours) 1.759±1.226 13.378±4.940 22.344±13.188
Safe PointNet++ 4.451±4.064 9.227±4.830 27.606±10.407
ANCSH 4.187±3.796 8.189±6.302 22.706±22.378
GAMMA 4.770±4.042 12.588±11.190 28.760±21.317
RoArtNet (ours) 4.080±2.451 9.110±5.211 14.177±11.297
StorageFurniture PointNet++ 8.824±3.337 15.280±7.453 19.263±3.996
ANCSH 9.545±5.131 14.756±8.220 17.906±14.541
GAMMA 7.035±6.011 17.755±12.999 11.537±6.942
RoArtNet (ours) 5.412±2.317 15.231±7.233 16.178±7.671
Drawer PointNet++ 11.941±2.792 15.317±11.009 10.482±4.331
ANCSH 14.548±5.177 25.986±19.512 16.010±9.062
GAMMA 8.278±5.202 24.222±11.627 11.612±5.162
RoArtNet (ours) 5.421±2.640 31.047±14.201 6.975±3.863
WashingMachine PointNet++ 13.173±5.409 28.902±36.464 21.926±7.421
ANCSH 14.284±9.836 35.215±11.183 37.200±15.121
GAMMA 16.715±4.757 31.681±12.746 44.088±17.161
RoArtNet (ours) 2.558±1.548 30.313±6.631 6.937±7.197
Category Method Error (w/ bkg., w/ distr.)
Orig. (cm) Dir. (o) Afford. (cm)
Microwave PointNet++ 3.205±3.742 16.407±6.116 30.523±8.827
ANCSH 4.531±4.053 18.080±9.565 11.850±11.519
GAMMA 8.880±6.624 30.164±14.783 10.646±6.950
RoArtNet (ours) 4.041±2.994 9.096±5.514 11.433±6.030
Refrigerator PointNet++ 8.183±5.680 8.922±4.013 37.193±8.515
ANCSH 7.645±6.023 14.254±8.707 42.693±16.013
GAMMA 8.011±4.291 18.199±12.155 36.599±9.876
RoArtNet (ours) 1.825±1.510 12.609±5.196 28.155±15.367
Safe PointNet++ 4.563±3.826 9.821±5.710 27.856±10.055
ANCSH 4.287±4.162 8.827±7.460 30.179±29.790
GAMMA 4.679±3.051 13.028±10.528 29.171±20.909
RoArtNet (ours) 3.874±3.145 9.723±5.760 16.075±13.660
StorageFurniture PointNet++ 9.131±3.238 15.990±7.914 19.557±4.144
ANCSH 9.035±4.796 15.634±8.659 18.934±16.764
GAMMA 8.079±6.764 19.655±14.928 13.063±8.537
RoArtNet (ours) 5.509±2.312 17.014±12.322 16.960±8.073
Drawer PointNet++ 12.179±2.296 15.733±10.773 10.610±4.371
ANCSH 13.466±3.782 23.667±15.122 16.450±8.489
GAMMA 8.099±5.016 24.720±13.237 12.067±5.979
RoArtNet (ours) 5.644±2.866 24.953±19.676 7.116±3.809
WashingMachine PointNet++ 12.637±5.667 49.705±39.372 21.780±8.179
ANCSH 13.566±9.472 33.461±12.217 37.235±14.823
GAMMA 16.057±4.941 32.291±12.797 41.861±16.963
RoArtNet (ours) 2.673±1.723 30.318±6.537 7.438±6.025

Results show that RoArtNet outperforms other models even with the presence of other unrelated points. It is attributed to our articulation awareness scheme, which can select good point tuples for voting, filtering out the point tuples from the background and distractors. This is reflected particularly when distractors are included but without background, where the performance of RoArtNet almost does not degrade. However, when the background is included, since the background occupies a large portion of the point cloud, RoArtNet also shows a performance drop. This is reflected particularly for WashingMachine category, which itself is relatively small.


Real-world Manipulation

Tasks PointNet++ ANCSH GAMMA RPMArt (ours)
Microwave Pull 6/2/2 4/0/6 8/1/1 9/1/0
Push 5/4/1 3/4/3 6/3/1 7/1/2
Refrigerator Pull 2/1/7 1/1/8 3/1/6 7/0/3
Push 0/0/10 1/0/9 2/0/8 8/1/1
Safe Pull 7/0/3 5/2/3 5/1/4 7/0/3
Push 7/0/3 7/1/2 7/1/2 7/1/2
StorageFurniture Pull 1/0/9 3/1/6 2/1/7 4/0/6
Push 2/2/6 6/2/2 2/3/5 5/2/3
Drawer Pull 1/1/8 2/1/7 0/2/8 2/2/6
Push 2/0/8 2/1/7 0/0/10 3/2/5
WashingMachine Pull 0/0/10 0/1/9 0/0/10 3/3/4
Push 0/0/10 0/0/10 0/0/10 1/2/7

We also apply the models to manipulate the real articulated objects, under the condition of excluding backgrounds and distractors. We run 10 trials for each task, and count the number of successful, half-successful and failed trials. Here, half-successful trials include behaviors like detaching during pulling and pushing forcefully. RPMArt outperforms other counterparts, especially for Refrigerator and WashingMachine. Refrigerator has a glossy surface, while WashingMachine is relatively small, which makes the noise more prominent.

Some half-successful example trials.

Due to inaccurate joint parameters estimation, the robot occasionally detaches the object during manipulation. And due to incomplete point cloud and inaccurate affordable point estimation, the selected grasp pose may not be suitable for manipulation.

We demonstrate three consecutive Pull Microwave tasks finished by RPMArt, each with random initial pose of the microwave. The robot can successfully pull the microwave.


We also list the demo videos for each task finished successfully by RPMArt. The Pull Microwave task video is shown with a closer camera view in another trial, since already shown above. These validate the sim-to-real transfer capability of RPMArt.


A typical failed trial.

Since the estimation of affordable point is inaccurate, and the captured points of the handle are not obvious, the selected grasp pose can be located on the base instead of the target part.


BibTeX

If you find it helpful, please consider citing our work:

@article{wang2024rpmart,
  title={RPMArt: Towards Robust Perception and Manipulation for Articulated Objects},
  author={Wang, Junbo and Liu, Wenhai and Yu, Qiaojun and You, Yang and Liu, Liu and Wang, Weiming and Lu, Cewu},
  journal={arXiv preprint arXiv:2403.16023},
  year={2024}
}

If you have further questions, please feel free to drop an email to sjtuwjb3589635689@sjtu.edu.cn.