RPMArt: Towards Robust Perception and Manipulation for Articulated Objects

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

Junbo Wang¹, Wenhai Liu¹, Qiaojun Yu¹, Yang You², Liu Liu³, Weiming Wang¹, Cewu Lu¹

¹Shanghai Jiao Tong University, ²Stanford University, ³Hefei University of Technology

Abstract

Articulated objects are commonly found in daily life. It is essential that robots can exhibit robust perception and manipulation skills for articulated objects in real-world robotic applications. However, existing methods for articulated objects insufficiently address noise in point clouds and struggle to bridge the gap between simulation and reality, thus limiting the practical deployment in real-world scenarios. To tackle these challenges, we propose a framework towards Robust Perception and Manipulation for Articulated Objects (RPMArt), which learns to estimate the articulation parameters and manipulate the articulation part from the noisy point cloud. Our primary contribution is a Robust Articulation Network (RoArtNet) that is able to predict both joint parameters and affordable points robustly by local feature learning and point tuple voting. Moreover, we introduce an articulation-aware classification scheme to enhance its ability for sim-to-real transfer. Finally, with the estimated affordable point and articulation joint constraint, the robot can generate robust actions to manipulate articulated objects. After learning only from synthetic data, RPMArt is able to transfer zero-shot to real-world articulated objects. Experimental results confirm our approach's effectiveness, with our framework achieving state-of-the-art performance in both noise-added simulation and real-world environments.

Robust articulation network

Input: single-view point cloud
Output: joint parameters and affordable point

Affordance-based physics-guided manipulation

Affordable grasp pose selection
Articulation joint constraint

During training, several voting targets are generated by part segmentation, joint parameters and affordable points from the simulator to supervise RoArtNet. When given the real-world noisy point cloud observation, RoArtNet can still generate robust joint parameters and affordable points estimation by point tuple voting. Then, affordable initial grasp poses can be selected from AnyGrasp-generated grasp poses based on the estimated affordable points, and subsequent actions can be constrained by the estimated joint parameters.

Our primary contribution is the robust articulation network, which is carefully designed to be robust and sim-to-real, by local feature learning, point tuple voting, and an articulation awareness scheme. First, a collection of point tuples are uniformly sampled from the point cloud. For each point tuple, we predict several voting targets with a neural network from the local context features of the point tuple. Further, an articulation score is applied to supervise the neural network so that the network is aware of the articulation structure. Then, we can generate multiple candidates using the predicted voting targets, given the one degree-of-freedom ambiguity constraint. The candidate joint origin, joint direction and affordable point with the most votes, from only point tuples with high articulation score, are selected as the final estimation.

Experimental Results

We evaluate RPMArt in both noise-added simulation and real-world environments. In the simulation environment, we add different levels of noise to the point clouds to test the robustness of models. And we also test the sim-to-real transfer ability of RPMArt by directly applying the model trained on synthetic data to real-world scenarios.

Simulation Perception

Joint origin estimation results

Joint direction estimation results

Affordable point estimation results

Simulation Manipulation

We also add different levels of noise to the input point clouds, and report the manipulation success rates. We run around 100 trials per object instance for each task. Higher is better. Results are averaged across six tasks. We also run manipulation experiments using the ground truth joint parameters and affordable points, and the average success rates are 96.694% and 99.627% for pulling and pushing respectively. It demonstrates that RPMArt can still stably manipulate articulated objects with inaccurate perception results. We also show the twelve task examples finished by RPMArt under noise level 2. We can observe that the perception results are still robust to noise but actually not perfect, but RPMArt can still successfully manipulate the articulated objects.

Manipulation results

Pull Microwave

Pull Refrigerator

Pull StorageFurniture (Prismatic)

Pull Safe

Pull StorageFurniture (Revolute)

Pull Drawer

Push Microwave

Push Refrigerator

Push StorageFurniture (Prismatic)

Push Safe

Push StorageFurniture (Revolute)

Push Drawer

Tasks	Results	Tasks	Results
Pull/Push Microwave		Pull/Push Refrigerator
Pull/Push Safe		Pull/Push StorageFurniture (Revolute)
Pull/Push StorageFurniture (Prismatic)		Pull/Push Drawer

Real-world Perception

Category	Method	Error
Category	Method	Orig. (cm)	Dir. (^o)	Afford. (cm)
Microwave	PointNet++	4.495±3.573	9.273±5.828	15.443±4.730
	ANCSH	5.103±5.522	9.166±9.557	12.711±7.927
	GAMMA	2.531±2.901	9.911±10.671	7.242±10.191
	RoArtNet (ours)	3.830±2.372	5.189±3.619	6.754±3.275
Refrigerator	PointNet++	5.210±4.274	9.605±5.340	12.475±9.505
	ANCSH	5.938±5.798	7.998±5.910	12.814±13.604
	GAMMA	4.019±4.580	8.684±6.455	12.331±9.974
	RoArtNet (ours)	2.111±1.701	8.491±4.270	5.849±2.797
Safe	PointNet++	5.985±4.162	5.936±2.861	9.235±5.630
	ANCSH	5.167±6.758	7.706±14.275	8.505±9.768
	GAMMA	3.179±3.857	8.156±13.737	9.062±9.667
	RoArtNet (ours)	4.116±2.428	5.878±2.769	8.349±4.391
StorageFurniture	PointNet++	7.542±4.517	8.776±4.991	10.634±4.025
	ANCSH	6.408±4.222	9.612±6.404	5.176±6.023
	GAMMA	3.481±2.275	12.672±10.186	4.742±6.664
	RoArtNet (ours)	4.604±2.050	9.682±5.449	7.946±3.402
Drawer	PointNet++	8.331±3.377	7.862±5.295	10.227±4.465
	ANCSH	13.849±3.764	12.143±8.030	7.725±4.695
	GAMMA	5.058±2.360	14.666±6.767	6.967±3.114
	RoArtNet (ours)	5.992±3.063	11.315±5.596	7.733±5.246
WashingMachine	PointNet++	8.845±6.795	37.497±20.679	19.972±9.440
	ANCSH	5.162±4.918	16.243±12.089	11.541±8.232
	GAMMA	6.488±6.178	28.441±14.866	15.964±13.286
	RoArtNet (ours)	1.578±1.201	5.600±2.711	3.250±0.669

We present perception results on real-world point clouds of six articulated objects, using the models trained only on synthetic data. We collect data by capturing depth images for each object with 5 uniformly selected joint states, each from 20 randomly selected camera views, and throw away some bad views which cannot capture enough area of the target part. Here shows the mean estimation errors and standard deviations under the condition of excluding backgrounds and distractors. Lower is better. We can find that RoArtNet demonstrates more stable performance compared to other baselines. However, some performance degradation is found in the StorageFurniture and Drawer categories for RoArtNet, as well as for ANCSH and GAMMA. This could possibly be attributed to the relatively small size of parts in these two objects, where all three models somewhat rely on part segmentation to complete the estimation. Another noteworthy observation pertains to the performance on WashingMachine. Specifically, only RoArtNet successfully estimates targets accurately, while the other three baselines exhibit significantly large estimation errors. We find a potential reason that we take a relatively small washing machine toy as the object, then the influence of noisy points is relative significant.

We also visualize some qualitative results of the performance on real-world articulation perception, with both background and distractors included. Color is used only for visualization here. Red arrows represent the estimated articulation joints, and blue points represent the estimated affordable points. RoArtNet can still generate robust joint parameters and affordable points estimation even with the presence of some unrelated points.

Category	Method	Error (w/o bkg., w/ distr.)
Category	Method	Orig. (cm)	Dir. (^o)	Afford. (cm)
Microwave	PointNet++	4.221±3.345	12.868±7.452	18.109±6.077
	ANCSH	5.537±4.723	9.881±9.128	10.637±9.727
	GAMMA	3.617±5.011	12.688±10.575	6.027±6.062
	RoArtNet (ours)	3.729±2.093	5.545±3.692	6.680±3.307
Refrigerator	PointNet++	4.438±3.450	8.391±4.060	17.996±9.864
	ANCSH	6.539±5.800	11.547±11.638	15.266±25.889
	GAMMA	3.727±5.176	13.007±10.917	6.150±6.283
	RoArtNet (ours)	2.423±1.826	9.273±4.481	6.289±2.891
Safe	PointNet++	2.946±2.487	7.398±4.105	13.130±6.828
	ANCSH	5.582±7.415	5.952±8.616	10.024±11.781
	GAMMA	4.255±6.745	8.417±15.101	9.379±11.808
	RoArtNet (ours)	4.167±2.404	6.397±3.522	8.467±4.172
StorageFurniture	PointNet++	8.470±3.952	10.673±7.465	12.671±5.034
	ANCSH	7.913±4.695	10.727±9.722	6.545±9.330
	GAMMA	3.790±3.232	14.266±11.795	5.964±8.158
	RoArtNet (ours)	4.683±2.285	9.953±5.314	8.195±3.538
Drawer	PointNet++	8.070±2.994	10.079±8.011	9.893±4.564
	ANCSH	13.950±4.327	16.978±10.289	8.993±5.578
	GAMMA	5.435±4.522	21.352±10.900	9.292±4.868
	RoArtNet (ours)	5.987±2.948	12.260±6.181	7.604±5.094
WashingMachine	PointNet++	7.164±5.784	38.032±7.299	22.339±7.954
	ANCSH	5.317±5.283	22.207±11.601	20.071±12.927
	GAMMA	10.251±6.197	29.490±14.238	33.499±10.165
	RoArtNet (ours)	1.931±1.515	6.610±3.584	3.045±0.815

Category	Method	Error (w/ bkg., w/o distr.)
Category	Method	Orig. (cm)	Dir. (^o)	Afford. (cm)
Microwave	PointNet++	3.836±3.277	16.282±6.682	29.862±7.500
	ANCSH	4.374±3.235	16.242±8.642	10.206±8.468
	GAMMA	8.818±6.972	30.027±14.918	10.924±6.667
	RoArtNet (ours)	4.090±2.654	8.795±5.443	10.669±4.989
Refrigerator	PointNet++	9.099±6.254	8.776±4.015	37.845±8.664
	ANCSH	6.827±6.272	13.108±8.639	40.830±18.239
	GAMMA	7.728±4.400	17.593±11.808	36.257±9.741
	RoArtNet (ours)	1.759±1.226	13.378±4.940	22.344±13.188
Safe	PointNet++	4.451±4.064	9.227±4.830	27.606±10.407
	ANCSH	4.187±3.796	8.189±6.302	22.706±22.378
	GAMMA	4.770±4.042	12.588±11.190	28.760±21.317
	RoArtNet (ours)	4.080±2.451	9.110±5.211	14.177±11.297
StorageFurniture	PointNet++	8.824±3.337	15.280±7.453	19.263±3.996
	ANCSH	9.545±5.131	14.756±8.220	17.906±14.541
	GAMMA	7.035±6.011	17.755±12.999	11.537±6.942
	RoArtNet (ours)	5.412±2.317	15.231±7.233	16.178±7.671
Drawer	PointNet++	11.941±2.792	15.317±11.009	10.482±4.331
	ANCSH	14.548±5.177	25.986±19.512	16.010±9.062
	GAMMA	8.278±5.202	24.222±11.627	11.612±5.162
	RoArtNet (ours)	5.421±2.640	31.047±14.201	6.975±3.863
WashingMachine	PointNet++	13.173±5.409	28.902±36.464	21.926±7.421
	ANCSH	14.284±9.836	35.215±11.183	37.200±15.121
	GAMMA	16.715±4.757	31.681±12.746	44.088±17.161
	RoArtNet (ours)	2.558±1.548	30.313±6.631	6.937±7.197

Category	Method	Error (w/ bkg., w/ distr.)
Category	Method	Orig. (cm)	Dir. (^o)	Afford. (cm)
Microwave	PointNet++	3.205±3.742	16.407±6.116	30.523±8.827
	ANCSH	4.531±4.053	18.080±9.565	11.850±11.519
	GAMMA	8.880±6.624	30.164±14.783	10.646±6.950
	RoArtNet (ours)	4.041±2.994	9.096±5.514	11.433±6.030
Refrigerator	PointNet++	8.183±5.680	8.922±4.013	37.193±8.515
	ANCSH	7.645±6.023	14.254±8.707	42.693±16.013
	GAMMA	8.011±4.291	18.199±12.155	36.599±9.876
	RoArtNet (ours)	1.825±1.510	12.609±5.196	28.155±15.367
Safe	PointNet++	4.563±3.826	9.821±5.710	27.856±10.055
	ANCSH	4.287±4.162	8.827±7.460	30.179±29.790
	GAMMA	4.679±3.051	13.028±10.528	29.171±20.909
	RoArtNet (ours)	3.874±3.145	9.723±5.760	16.075±13.660
StorageFurniture	PointNet++	9.131±3.238	15.990±7.914	19.557±4.144
	ANCSH	9.035±4.796	15.634±8.659	18.934±16.764
	GAMMA	8.079±6.764	19.655±14.928	13.063±8.537
	RoArtNet (ours)	5.509±2.312	17.014±12.322	16.960±8.073
Drawer	PointNet++	12.179±2.296	15.733±10.773	10.610±4.371
	ANCSH	13.466±3.782	23.667±15.122	16.450±8.489
	GAMMA	8.099±5.016	24.720±13.237	12.067±5.979
	RoArtNet (ours)	5.644±2.866	24.953±19.676	7.116±3.809
WashingMachine	PointNet++	12.637±5.667	49.705±39.372	21.780±8.179
	ANCSH	13.566±9.472	33.461±12.217	37.235±14.823
	GAMMA	16.057±4.941	32.291±12.797	41.861±16.963
	RoArtNet (ours)	2.673±1.723	30.318±6.537	7.438±6.025

Real-world Manipulation

Tasks		PointNet++	ANCSH	GAMMA	RPMArt (ours)
Microwave	Pull	6/2/2	4/0/6	8/1/1	9/1/0
Microwave	Push	5/4/1	3/4/3	6/3/1	7/1/2
Refrigerator	Pull	2/1/7	1/1/8	3/1/6	7/0/3
Refrigerator	Push	0/0/10	1/0/9	2/0/8	8/1/1
Safe	Pull	7/0/3	5/2/3	5/1/4	7/0/3
Safe	Push	7/0/3	7/1/2	7/1/2	7/1/2
StorageFurniture	Pull	1/0/9	3/1/6	2/1/7	4/0/6
StorageFurniture	Push	2/2/6	6/2/2	2/3/5	5/2/3
Drawer	Pull	1/1/8	2/1/7	0/2/8	2/2/6
Drawer	Push	2/0/8	2/1/7	0/0/10	3/2/5
WashingMachine	Pull	0/0/10	0/1/9	0/0/10	3/3/4
WashingMachine	Push	0/0/10	0/0/10	0/0/10	1/2/7

We also apply the models to manipulate the real articulated objects, under the condition of excluding backgrounds and distractors. We run 10 trials for each task, and count the number of successful, half-successful and failed trials. Here, half-successful trials include behaviors like detaching during pulling and pushing forcefully. RPMArt outperforms other counterparts, especially for Refrigerator and WashingMachine. Refrigerator has a glossy surface, while WashingMachine is relatively small, which makes the noise more prominent.

Some half-successful example trials.

Due to inaccurate joint parameters estimation, the robot occasionally detaches the object during manipulation. And due to incomplete point cloud and inaccurate affordable point estimation, the selected grasp pose may not be suitable for manipulation.

BibTeX

If you find it helpful, please consider citing our work:

@inproceedings{wang2024rpmart,
  title={RPMArt: Towards Robust Perception and Manipulation for Articulated Objects},
  author={Wang, Junbo and Liu, Wenhai and Yu, Qiaojun and You, Yang and Liu, Liu and Wang, Weiming and Lu, Cewu},
  booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2024},
  pages={7270-7277},
  doi={10.1109/IROS58592.2024.10802368}
}

If you have further questions, please feel free to drop an email to sjtuwjb3589635689@sjtu.edu.cn.