Learning task-relevant parameters for human and robot motion primitives to acquire complex manipulation skills on a humanoid robot
Final Report Abstract
Reinforcement learning has great potential to increase the autonomy, adaptivity and robustness of robots operating in everyday human environments. The designer no longer has to provide a set of skills, but the robot rather learns to refine and optimize these skills based on a cost function that tells the robot how well it is doing at achieving the task. In this project, we have made substantial advances in efficient, robust reinforcement learning of robot motion primitives, in particular in the manipulation domain. We have shown that: • Direct Reinforcement Learning based on Path Integrals (the PI2 algorithm) enables robots to learn skills in very high-dimensional action spaces, e.g. on a 34-DOF humanoid robot with a dynamic task requiring a full-body motion. • Reinforcement learning enables robots to simultaneously learn the trajectory generated by a motion primitive, but also the end-point of the movement. In particular, we have shown that robots are able to learn to grasp objects robustly, even if the robot is not certain about the position of the object. • By simultaneously learning end-points and shapes in sequences of motion primitives, robots are able to optimize more complex tasks for which one motion primitive alone does not suffice. We applied this so-called Hierarchical Reinforcement Learning to an everyday pick-and-place task. • Reinforcement learning is applicable to both learning reference trajectory (the motion that the robot plans to execute), and control parameters (that determine how the robot achieves that plan). In particular, we showed that robots are able to learn variable stiffness control, where the robot stiffens up only then when the task requires it, and is compliant as much as it can be. This is more energy efficient, and also much safer for humans in the robot’s workspace, as the robot does not exert high forces when it unexpectedly makes physical contact. • Sets of similar Dynamic Movement Primitives can be more compactly represented by using dimensionality reductions methods, such as Point Distribution Models. Though not intended as a modelling approach, it is very interesting to see that the behaviors our robots learn are often very similar to those witnessed in human subjects in psychophysics experiments, for instance in pick-and-place tasks, when grasping under uncertainty, or in force field experiments. We have demonstrated that reinforcement learning enables the robot to dramatically improve the robustness and safety of its manipulation skills. We therefore believe our contributions will have a big impact on enabling robots to operate in human environments, where robustness and safety are essential properties. Such human-centered robots, for instance robots that assist people with disabilities, have the potential to make a large impact on our society and economy. I am therefore very pleased that this project has lead to an industrial cooperation, with which I am implementing and evaluating motion primitives and reinforcement learning on their state-of-the-art commercial humanoid robot ‘REEM’.
Publications
- (2009). Compact models of motor primitive variations for predictable reaching and obstacle avoidance. In 9th IEEE-RAS International Conference on Humanoid Robots
Stulp, F., Oztop, E., Pastor, P., Beetz, M., and Schaal, S.
- (2010). Reinforcement learning of full-body humanoid motor skills. In 10th IEEE-RAS International Conference on Humanoid Robots
Stulp, F., Buchli, J., Theodorou, E., and Schaal, S.
- (2010). Variable impedance control - a reinforcement learning approach. In Robotics: Science and Systems Conference (RSS)
Buchli, J., Theodorou, E., Stulp, F., and Schaal, S.
- (2011). An iterative path integral stochastic optimal control approach for learning robotic tasks. In 18th World Congress of the International Federation of Automatic Control
Theodorou, E., Stulp, F., Buchli, J., and Schaal, S.
- (2011). Hierarchical reinforcement learning with motion primitives. In 11th IEEE-RAS International Conference on Humanoid Robots
Stulp, F. and Schaal, S.
- (2011). Learning motion primitive goals for robust manipulation. In International Conference on Intelligent Robots and Systems (IROS)
Stulp, F., Theodorou, E., Kalakrishnan, M., Pastor, P., Righetti, L., and Schaal, S.
- (2011). Learning to grasp under uncertainty. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Stulp, F., Theodorou, E., Buchli, J., and Schaal, S.
- (2011). Learning variable impedance control. International Journal of Robotics Research
Buchli, J., Stulp, F., Theodorou, E., and Schaal, S.
- (2011). Movement segmentation using a primitive library. In International Conference on Intelligent Robots and Systems (IROS)
Meier, F., Theodorou, E., Stulp, F., and Schaal, S.
- (2011). Reinforcement learning of impedance control in stochastic force fields. In International Conference on Development and Learning (ICDL)
Stulp, F., Buchli, J., Ellmer, A., Mistry, M., Theodorou, E., and Schaal, S.