Articulated objects are commonly found in daily life. It is essential that robots can exhibit robust perception and manipulation skills for articulated objects in real-world robotic applications. However, existing methods for articulated objects insufficiently address noise in point clouds and struggle to bridge the gap between simulation and reality, thus limiting the practical deployment in real-world scenarios. To tackle these challenges, we propose a framework towards Robust Perception and Manipulation for Articulated Objects (RPMArt), which learns to estimate the articulation parameters and manipulate the articulation part from the noisy point cloud. Our primary contribution is a Robust Articulation Network (RoArtNet) that is able to predict both joint parameters and affordable points robustly by local feature learning and point tuple voting. Moreover, we introduce an articulation-aware classification scheme to enhance its ability for sim-to-real transfer. Finally, with the estimated affordable point and articulation joint constraint, the robot can generate robust actions to manipulate articulated objects. After learning only from synthetic data, RPMArt is able to transfer zero-shot to real-world articulated objects. Experimental results confirm our approach's effectiveness, with our framework achieving state-of-the-art performance in both noise-added simulation and real-world environments.
During training, several voting targets are generated by part segmentation, joint parameters and affordable points from the simulator to supervise RoArtNet. When given the real-world noisy point cloud observation, RoArtNet can still generate robust joint parameters and affordable points estimation by point tuple voting. Then, affordable initial grasp poses can be selected from AnyGrasp-generated grasp poses based on the estimated affordable points, and subsequent actions can be constrained by the estimated joint parameters.
Our primary contribution is the robust articulation network, which is carefully designed to be robust and sim-to-real, by local feature learning, point tuple voting, and an articulation awareness scheme. First, a collection of point tuples are uniformly sampled from the point cloud. For each point tuple, we predict several voting targets with a neural network from the local context features of the point tuple. Further, an articulation score is applied to supervise the neural network so that the network is aware of the articulation structure. Then, we can generate multiple candidates using the predicted voting targets, given the one degree-of-freedom ambiguity constraint. The candidate joint origin, joint direction and affordable point with the most votes, from only point tuples with high articulation score, are selected as the final estimation.
We evaluate RPMArt in both noise-added simulation and real-world environments. In the simulation environment, we add different levels of noise to the point clouds to test the robustness of models. And we also test the sim-to-real transfer ability of RPMArt by directly applying the model trained on synthetic data to real-world scenarios.
Joint origin estimation results
Joint direction estimation results
Affordable point estimation results
We gradually add higher level of noise to the input point clouds, and test the joint parameters and affordable points estimation performance. Lower is better. Error bars represent the standard deviation. Results are averaged across six object categories. This shows that RoArtNet is robust to noise in the input point clouds.
Results show that all baselines and RoArtNet achieve high estimation precision without noise added across all object categories.
Nevertheless, with the increasing level of noise,
all three baselines exhibit a pronounced increase in estimation errors,
while the mean estimation error of RoArtNet increases very slowly.
And the baselines also have much higher standard deviation compared to RoArtNet when high level of noise is added.
There also exist some interesting phenomena in the results.
The StorageFurniture and Drawer categories have much higher estimation errors for ANCSH and GAMMA compared to other categories.
This is possibly because the StorageFurniture and Drawer categories comprise relatively small articulation parts,
while the ANCSH and GAMMA models somehow rely on the part segmentation to finish the estimation.
In addition, the two methods both contain an optimization-based procedure,
i.e. RANSAC transformation estimation for ANCSH and DBSCAN part clustering for GAMMA,
which may be unsolvable when the part segmentation is not accurate.
It is also interesting that the WashingMachine category demonstrates partly a decreasing trend when the noise level is high.
Results for each object category.
Category
Joint origin
Joint direction
Affordable point
Microwave
Refrigerator
Safe
Storage Furniture
Drawer
Washing Machine
We also add different levels of noise to the input point clouds, and report the manipulation success rates. We run around 100 trials per object instance for each task. Higher is better. Results are averaged across six tasks. We also run manipulation experiments using the ground truth joint parameters and affordable points, and the average success rates are 96.694% and 99.627% for pulling and pushing respectively. It demonstrates that RPMArt can still stably manipulate articulated objects with inaccurate perception results. We also show the twelve task examples finished by RPMArt under noise level 2. We can observe that the perception results are still robust to noise but actually not perfect, but RPMArt can still successfully manipulate the articulated objects.
Manipulation results
Pull Microwave
Pull Refrigerator
Pull StorageFurniture (Prismatic)
Pull Safe
Pull StorageFurniture (Revolute)
Pull Drawer
Push Microwave
Push Refrigerator
Push StorageFurniture (Prismatic)
Push Safe
Push StorageFurniture (Revolute)
Push Drawer
Results show that RPMArt achieves the highest success rate under noise level 4 across almost all tasks except Push Refrigerator.
And we can observe the least degradation in performance of RPMArt with the increase of noise.
Actually, when no noise is added, RPMArt only achieves comparable or even slightly worse performance than the baselines,
especially GAMMA.
There also exist some interesting phenomena in the results.
Under noise level 4, PointNet++-based method always achieves better performance than ANCSH and GAMMA in pushing tasks,
while ANCSH and GAMMA often have better results in pulling tasks.
Note that pulling tasks are often considered more difficult than pushing tasks,
since pulling tasks often require first grasping the target part while it is unnecessary for pushing tasks.
However, ANCSH and GAMMA often obtain higher success rates than PointNet++-based method without noise added.
In addition, for Pull/Push StorageFurniture (Prismatic) and Pull/Push Drawer under noise level 4,
ANCSH and GAMMA achieve much lower success rates,
which may also be attributed to the fact that these parts are relatively small,
as discussed in simulation perception.
Results for each task.
Tasks
Results
Tasks
Results
Pull/Push Microwave
Pull/Push Refrigerator
Pull/Push Safe
Pull/Push StorageFurniture (Revolute)
Pull/Push StorageFurniture (Prismatic)
Pull/Push Drawer
Category | Method | Error | ||
---|---|---|---|---|
Orig. (cm) | Dir. (o) | Afford. (cm) | ||
Microwave | PointNet++ | 4.495±3.573 | 9.273±5.828 | 15.443±4.730 |
ANCSH | 5.103±5.522 | 9.166±9.557 | 12.711±7.927 | |
GAMMA | 2.531±2.901 | 9.911±10.671 | 7.242±10.191 | |
RoArtNet (ours) | 3.830±2.372 | 5.189±3.619 | 6.754±3.275 | |
Refrigerator | PointNet++ | 5.210±4.274 | 9.605±5.340 | 12.475±9.505 |
ANCSH | 5.938±5.798 | 7.998±5.910 | 12.814±13.604 | |
GAMMA | 4.019±4.580 | 8.684±6.455 | 12.331±9.974 | |
RoArtNet (ours) | 2.111±1.701 | 8.491±4.270 | 5.849±2.797 | |
Safe | PointNet++ | 5.985±4.162 | 5.936±2.861 | 9.235±5.630 |
ANCSH | 5.167±6.758 | 7.706±14.275 | 8.505±9.768 | |
GAMMA | 3.179±3.857 | 8.156±13.737 | 9.062±9.667 | |
RoArtNet (ours) | 4.116±2.428 | 5.878±2.769 | 8.349±4.391 | |
StorageFurniture | PointNet++ | 7.542±4.517 | 8.776±4.991 | 10.634±4.025 |
ANCSH | 6.408±4.222 | 9.612±6.404 | 5.176±6.023 | |
GAMMA | 3.481±2.275 | 12.672±10.186 | 4.742±6.664 | |
RoArtNet (ours) | 4.604±2.050 | 9.682±5.449 | 7.946±3.402 | |
Drawer | PointNet++ | 8.331±3.377 | 7.862±5.295 | 10.227±4.465 |
ANCSH | 13.849±3.764 | 12.143±8.030 | 7.725±4.695 | |
GAMMA | 5.058±2.360 | 14.666±6.767 | 6.967±3.114 | |
RoArtNet (ours) | 5.992±3.063 | 11.315±5.596 | 7.733±5.246 | |
WashingMachine | PointNet++ | 8.845±6.795 | 37.497±20.679 | 19.972±9.440 |
ANCSH | 5.162±4.918 | 16.243±12.089 | 11.541±8.232 | |
GAMMA | 6.488±6.178 | 28.441±14.866 | 15.964±13.286 | |
RoArtNet (ours) | 1.578±1.201 | 5.600±2.711 | 3.250±0.669 |
We present perception results on real-world point clouds of six articulated objects, using the models trained only on synthetic data. We collect data by capturing depth images for each object with 5 uniformly selected joint states, each from 20 randomly selected camera views, and throw away some bad views which cannot capture enough area of the target part. Here shows the mean estimation errors and standard deviations under the condition of excluding backgrounds and distractors. Lower is better. We can find that RoArtNet demonstrates more stable performance compared to other baselines. However, some performance degradation is found in the StorageFurniture and Drawer categories for RoArtNet, as well as for ANCSH and GAMMA. This could possibly be attributed to the relatively small size of parts in these two objects, where all three models somewhat rely on part segmentation to complete the estimation. Another noteworthy observation pertains to the performance on WashingMachine. Specifically, only RoArtNet successfully estimates targets accurately, while the other three baselines exhibit significantly large estimation errors. We find a potential reason that we take a relatively small washing machine toy as the object, then the influence of noisy points is relative significant.
We also visualize some qualitative results of the performance on real-world articulation perception, with both background and distractors included. Color is used only for visualization here. Red arrows represent the estimated articulation joints, and blue points represent the estimated affordable points. RoArtNet can still generate robust joint parameters and affordable points estimation even with the presence of some unrelated points.
Results show that RoArtNet outperforms other models even with the presence of other unrelated points.
It is attributed to our articulation awareness scheme, which can select good point tuples for voting,
filtering out the point tuples from the background and distractors.
This is reflected particularly when distractors are included but without background,
where the performance of RoArtNet almost does not degrade.
However, when the background is included,
since the background occupies a large portion of the point cloud,
RoArtNet also shows a performance drop.
This is reflected particularly for WashingMachine category,
which itself is relatively small.
Results under other conditions.
Category
Method
Error (w/o bkg., w/ distr.)
Orig. (cm)
Dir. (o)
Afford. (cm)
Microwave
PointNet++
4.221±3.345
12.868±7.452
18.109±6.077
ANCSH
5.537±4.723
9.881±9.128
10.637±9.727
GAMMA
3.617±5.011
12.688±10.575
6.027±6.062
RoArtNet (ours)
3.729±2.093
5.545±3.692
6.680±3.307
Refrigerator
PointNet++
4.438±3.450
8.391±4.060
17.996±9.864
ANCSH
6.539±5.800
11.547±11.638
15.266±25.889
GAMMA
3.727±5.176
13.007±10.917
6.150±6.283
RoArtNet (ours)
2.423±1.826
9.273±4.481
6.289±2.891
Safe
PointNet++
2.946±2.487
7.398±4.105
13.130±6.828
ANCSH
5.582±7.415
5.952±8.616
10.024±11.781
GAMMA
4.255±6.745
8.417±15.101
9.379±11.808
RoArtNet (ours)
4.167±2.404
6.397±3.522
8.467±4.172
StorageFurniture
PointNet++
8.470±3.952
10.673±7.465
12.671±5.034
ANCSH
7.913±4.695
10.727±9.722
6.545±9.330
GAMMA
3.790±3.232
14.266±11.795
5.964±8.158
RoArtNet (ours)
4.683±2.285
9.953±5.314
8.195±3.538
Drawer
PointNet++
8.070±2.994
10.079±8.011
9.893±4.564
ANCSH
13.950±4.327
16.978±10.289
8.993±5.578
GAMMA
5.435±4.522
21.352±10.900
9.292±4.868
RoArtNet (ours)
5.987±2.948
12.260±6.181
7.604±5.094
WashingMachine
PointNet++
7.164±5.784
38.032±7.299
22.339±7.954
ANCSH
5.317±5.283
22.207±11.601
20.071±12.927
GAMMA
10.251±6.197
29.490±14.238
33.499±10.165
RoArtNet (ours)
1.931±1.515
6.610±3.584
3.045±0.815
Category
Method
Error (w/ bkg., w/o distr.)
Orig. (cm)
Dir. (o)
Afford. (cm)
Microwave
PointNet++
3.836±3.277
16.282±6.682
29.862±7.500
ANCSH
4.374±3.235
16.242±8.642
10.206±8.468
GAMMA
8.818±6.972
30.027±14.918
10.924±6.667
RoArtNet (ours)
4.090±2.654
8.795±5.443
10.669±4.989
Refrigerator
PointNet++
9.099±6.254
8.776±4.015
37.845±8.664
ANCSH
6.827±6.272
13.108±8.639
40.830±18.239
GAMMA
7.728±4.400
17.593±11.808
36.257±9.741
RoArtNet (ours)
1.759±1.226
13.378±4.940
22.344±13.188
Safe
PointNet++
4.451±4.064
9.227±4.830
27.606±10.407
ANCSH
4.187±3.796
8.189±6.302
22.706±22.378
GAMMA
4.770±4.042
12.588±11.190
28.760±21.317
RoArtNet (ours)
4.080±2.451
9.110±5.211
14.177±11.297
StorageFurniture
PointNet++
8.824±3.337
15.280±7.453
19.263±3.996
ANCSH
9.545±5.131
14.756±8.220
17.906±14.541
GAMMA
7.035±6.011
17.755±12.999
11.537±6.942
RoArtNet (ours)
5.412±2.317
15.231±7.233
16.178±7.671
Drawer
PointNet++
11.941±2.792
15.317±11.009
10.482±4.331
ANCSH
14.548±5.177
25.986±19.512
16.010±9.062
GAMMA
8.278±5.202
24.222±11.627
11.612±5.162
RoArtNet (ours)
5.421±2.640
31.047±14.201
6.975±3.863
WashingMachine
PointNet++
13.173±5.409
28.902±36.464
21.926±7.421
ANCSH
14.284±9.836
35.215±11.183
37.200±15.121
GAMMA
16.715±4.757
31.681±12.746
44.088±17.161
RoArtNet (ours)
2.558±1.548
30.313±6.631
6.937±7.197
Category
Method
Error (w/ bkg., w/ distr.)
Orig. (cm)
Dir. (o)
Afford. (cm)
Microwave
PointNet++
3.205±3.742
16.407±6.116
30.523±8.827
ANCSH
4.531±4.053
18.080±9.565
11.850±11.519
GAMMA
8.880±6.624
30.164±14.783
10.646±6.950
RoArtNet (ours)
4.041±2.994
9.096±5.514
11.433±6.030
Refrigerator
PointNet++
8.183±5.680
8.922±4.013
37.193±8.515
ANCSH
7.645±6.023
14.254±8.707
42.693±16.013
GAMMA
8.011±4.291
18.199±12.155
36.599±9.876
RoArtNet (ours)
1.825±1.510
12.609±5.196
28.155±15.367
Safe
PointNet++
4.563±3.826
9.821±5.710
27.856±10.055
ANCSH
4.287±4.162
8.827±7.460
30.179±29.790
GAMMA
4.679±3.051
13.028±10.528
29.171±20.909
RoArtNet (ours)
3.874±3.145
9.723±5.760
16.075±13.660
StorageFurniture
PointNet++
9.131±3.238
15.990±7.914
19.557±4.144
ANCSH
9.035±4.796
15.634±8.659
18.934±16.764
GAMMA
8.079±6.764
19.655±14.928
13.063±8.537
RoArtNet (ours)
5.509±2.312
17.014±12.322
16.960±8.073
Drawer
PointNet++
12.179±2.296
15.733±10.773
10.610±4.371
ANCSH
13.466±3.782
23.667±15.122
16.450±8.489
GAMMA
8.099±5.016
24.720±13.237
12.067±5.979
RoArtNet (ours)
5.644±2.866
24.953±19.676
7.116±3.809
WashingMachine
PointNet++
12.637±5.667
49.705±39.372
21.780±8.179
ANCSH
13.566±9.472
33.461±12.217
37.235±14.823
GAMMA
16.057±4.941
32.291±12.797
41.861±16.963
RoArtNet (ours)
2.673±1.723
30.318±6.537
7.438±6.025
Tasks | PointNet++ | ANCSH | GAMMA | RPMArt (ours) | |
---|---|---|---|---|---|
Microwave | Pull | 6/2/2 | 4/0/6 | 8/1/1 | 9/1/0 |
Push | 5/4/1 | 3/4/3 | 6/3/1 | 7/1/2 | |
Refrigerator | Pull | 2/1/7 | 1/1/8 | 3/1/6 | 7/0/3 |
Push | 0/0/10 | 1/0/9 | 2/0/8 | 8/1/1 | |
Safe | Pull | 7/0/3 | 5/2/3 | 5/1/4 | 7/0/3 |
Push | 7/0/3 | 7/1/2 | 7/1/2 | 7/1/2 | |
StorageFurniture | Pull | 1/0/9 | 3/1/6 | 2/1/7 | 4/0/6 |
Push | 2/2/6 | 6/2/2 | 2/3/5 | 5/2/3 | |
Drawer | Pull | 1/1/8 | 2/1/7 | 0/2/8 | 2/2/6 |
Push | 2/0/8 | 2/1/7 | 0/0/10 | 3/2/5 | |
WashingMachine | Pull | 0/0/10 | 0/1/9 | 0/0/10 | 3/3/4 |
Push | 0/0/10 | 0/0/10 | 0/0/10 | 1/2/7 |
We also apply the models to manipulate the real articulated objects, under the condition of excluding backgrounds and distractors. We run 10 trials for each task, and count the number of successful, half-successful and failed trials. Here, half-successful trials include behaviors like detaching during pulling and pushing forcefully. RPMArt outperforms other counterparts, especially for Refrigerator and WashingMachine. Refrigerator has a glossy surface, while WashingMachine is relatively small, which makes the noise more prominent.
Due to inaccurate joint parameters estimation,
the robot occasionally detaches the object during manipulation.
And due to incomplete point cloud and inaccurate affordable point estimation,
the selected grasp pose may not be suitable for manipulation.
Some half-successful example trials.
We demonstrate three consecutive Pull Microwave tasks finished by RPMArt, each with random initial pose of the microwave. The robot can successfully pull the microwave.
We also list the demo videos for each task finished successfully by RPMArt. The Pull Microwave task video is shown with a closer camera view in another trial, since already shown above. These validate the sim-to-real transfer capability of RPMArt.
Since the estimation of affordable point is inaccurate,
and the captured points of the handle are not obvious,
the selected grasp pose can be located on the base instead of the target part.
A typical failed trial.
If you find it helpful, please consider citing our work:
@article{wang2024rpmart,
title={RPMArt: Towards Robust Perception and Manipulation for Articulated Objects},
author={Wang, Junbo and Liu, Wenhai and Yu, Qiaojun and You, Yang and Liu, Liu and Wang, Weiming and Lu, Cewu},
journal={arXiv preprint arXiv:2403.16023},
year={2024}
}
If you have further questions, please feel free to drop an email to sjtuwjb3589635689@sjtu.edu.cn.