Designing a stable walking gait for biped robots with point-feet is stated as a constrained nonlinear optimization problem which is normally solved by an offline numerical optimization method. On the result of an unknown modeling error or environment change, the designed gait may be ineffective and an online gait improvement is impossible after the optimization. In this paper, we apply Generalized Path Integral Stochastic Optimal Control to closed-loop model of planar biped robots with point-feet which leads to an online Reinforcement Learning algorithm to design the walking gait. We study the ability of the proposed method to adapt the controller of RABBIT, which is a planar biped robot with point-feet, for stable walking. The simulation results show that the method, starting a dynamically unstable initial gait, quickly compensates the modeling error and reaches to a walking with exponential stability and desired features in a new situation which was impossible by the past methods.
Mots-clés : Legged locomotion, gait optimization, orbital stability
@article{COCV_2019__25__A81_0, author = {Anjidani, Majid and Jahed Motlagh, M.R. and Fathy, M. and Nili Ahmadabadi, M.}, title = {A novel online gait optimization approach for biped robots with point-feet}, journal = {ESAIM: Control, Optimisation and Calculus of Variations}, publisher = {EDP-Sciences}, volume = {25}, year = {2019}, doi = {10.1051/cocv/2017034}, zbl = {1437.49002}, mrnumber = {4043860}, language = {en}, url = {http://www.numdam.org/articles/10.1051/cocv/2017034/} }
TY - JOUR AU - Anjidani, Majid AU - Jahed Motlagh, M.R. AU - Fathy, M. AU - Nili Ahmadabadi, M. TI - A novel online gait optimization approach for biped robots with point-feet JO - ESAIM: Control, Optimisation and Calculus of Variations PY - 2019 VL - 25 PB - EDP-Sciences UR - http://www.numdam.org/articles/10.1051/cocv/2017034/ DO - 10.1051/cocv/2017034 LA - en ID - COCV_2019__25__A81_0 ER -
%0 Journal Article %A Anjidani, Majid %A Jahed Motlagh, M.R. %A Fathy, M. %A Nili Ahmadabadi, M. %T A novel online gait optimization approach for biped robots with point-feet %J ESAIM: Control, Optimisation and Calculus of Variations %D 2019 %V 25 %I EDP-Sciences %U http://www.numdam.org/articles/10.1051/cocv/2017034/ %R 10.1051/cocv/2017034 %G en %F COCV_2019__25__A81_0
Anjidani, Majid; Jahed Motlagh, M.R.; Fathy, M.; Nili Ahmadabadi, M. A novel online gait optimization approach for biped robots with point-feet. ESAIM: Control, Optimisation and Calculus of Variations, Tome 25 (2019), article no. 81. doi : 10.1051/cocv/2017034. http://www.numdam.org/articles/10.1051/cocv/2017034/
[1] Using Robots in Hazardous Environments: Landmine Detection, De-Mining and Other Applications. Woodhead Publishing Limited (2011). | DOI
and ,[2] A novel framework for virtual prototyping of rehabilitation exoskeletons. IEEE International Conference on Rehabilitation Robotics (2013).
, , and ,[3] Prosthetics, Exoskeletons, and Rehabilitation. IEEE Robotics and Automation Magazine (2007).
and ,[4] Application of human-machine interaction in toy design. Information Technology and Artificial Intelligence Conference (ITAIC) (2011).
and ,[5] Task-Constrained Motion Planning for Underactuated Robots. IEEE International Conference on Robotics and Automation (ICRA). Washington (2015). | DOI
and ,[6] Control of non holonomic or under-actuated mechanical systems:The examples of the unicycle robot and the slider. ESAIM: COCV (2016). | Numdam | MR | Zbl
and ,[7] On the Lagrangian structure of reduced dynamics under virtual holonomic constraints. ESAIM: COCV (2016). | Numdam | MR
, and ,[8] Control of underwater vehicles in inviscid fluids. ESAIM: COCV 20 (2014) 662–703. | Numdam | MR | Zbl
and ,[9] Feedback Control of Dynamic Bipedal Robot Locomotion. CRC Press, Taylor and Francis Group (2007).
, , , and ,[10] Phase-dependent Trajectory Optimization for CPG-based Biped Walking Using Path Integral Reinforcement Learning. 11th IEEE-RAS International Conference on Humanoid Robots, Bled, Slovenia (2011).
and ,[11] A Survey on CPG-Inspired Control Models and System Implementation. IEEE Transactions on Neural Networks and Learning Systems 25 (2014) 441–455. | DOI
, , and ,[12] A realtime pattern generator for biped walking, In Proc. of the 2002 IEEE International Conference on Robotics and Automation. Washington, D.C. (2002) 317.
, , , , and ,[13] The 3D linear inverted pendulum mode: a simple modeling for a biped walking pattern generation, In Proc. of the 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Maui, HI (2001) 23946.
, , , , and ,[14] Biped Locomotion. Springer-Verlag, Berlin (1990). | DOI | Zbl
, , and ,[15] Zero-moment point–thirty five years of its life. International Journal of Humanoid Robotics 1 (2004) 157-73. | DOI
and ,[16] The development of Honda humanoid robot, In Proc. of the 1998 IEEE International Conference on Robotics and Automation. Leuven, Belgium (1998) 132126.
, , and ,[17] Asymptotically Stable Gait Primitives for Planning Dynamic Bipedal Locomotion in Three Dimensions, 2010 IEEE International Conference on Robotics and Automation, Anchorage, Alaska, USA (2010). | DOI
, and ,[18] Postural stability of biped robots and the foot-rotation indicator (FRI) point. International Journal of Robotics Research 18 (1999) 52333. | DOI
,[19] Dynamics of bipedal gait Part 1: objective functions and the contact event of a planar five-link biped. J. Appl. Mechan. 60 (1993) 3316.
,[20] Dynamics of bipedal gait Part 2: stability analysis of a planar five-link biped. J. Applied Mechanics 60 (1993) 33743.
,[21] Proving asymptotic stability of a walking cycle for a five DOF biped robot model. In Proc. of the 1999. Int. Conf. on Climbing and Walking Robots (1999) 69–81.
, and ,[22] Asymptotically stable walking for biped robots: Analysis via systems with impulse effects. IEEE Transactions on Automatic Control 46 (2001) 51–64. | DOI | MR | Zbl
, and ,[23] Poincares method for systems with impulse effects: Application to mechanical biped locomotion, In Proc. of the 1999 IEEE International Conference on Decision and Control, Phoenix, AZ (1999). | DOI
, and ,[24] Experimental validation of a framework for the design of controllers that induce stable walking in planar bipeds. Int. J. Robotics Res. 23 (2004) 5592. | DOI
, and ,[25] Inducing dynamically stable walking in an underactuated prototype planar biped, In Proc. of the 2004 IEEE International Conference on Robotics and Automation, New Orleans, LA (2004) 42349.
, and ,[26] Switching and PI control of walking motions of planar biped walkers. IEEE Trans. Automatic Control 48 (2003) 308-12. | DOI | MR
, and ,[27] Asymptotically stable running for a five-link, four-actuator, planar bipedal robot. Int. J. Robotics Res. 24 (2005) 431–464. | DOI
, and ,[28] Hybrid zero dynamics of planar biped walkers. IEEE Trans. Automatic Control 48 (2003) 42–56. | DOI | MR | Zbl
, and ,[29] Asymptotically stable walking of a five-link underactuated 3-D bipedal robot. Robotics, IEEE Trans. 25 (2009) 37–50. | DOI
, and ,[30] RABBIT: A Testbed for Advanced Control Theory, IEEE Control Systems Magazine, Paper number CSM-02-038, Revision June 8 (2003).
, , , , , and ,[31] A Generalized Path Integral Control Approach to Reinforcement Learning. J. Machine Learning Res. 11 (2010) 3137–3181. | MR | Zbl
, and ,[32] Learning variable impedance control. Int. J. Robot. Res. 30 (2011) 820–833. | DOI
, , and ,[33] Learning Motion Primitive Goals for Robust Manipulation. IEEE/RSJ International Conference on Intelligent Robots and Systems (2011).
, , , , and ,[34] Policy improvement methods: Between blackbox optimization and episodic reinforcement learning, in Journees Francophones sur la Planification, la Decision et l’Apprentissage pour la conduite de systemes (JFPDA) (2012).
and ,[35] Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning (1992). | Zbl
,[36] Reinforcement learning of motor skills with policy gradients. Neural Networks 21 (2008) 682-97. | DOI
and ,[37] Infinite-horizon policy-gradient estimation. J. Artificial Intell. Res. Arch. 15 (2001) 319–350. | MR | Zbl
and ,[38] Policy gradient methods for reinforcement learning with function approximation, In Vol. 12 of Advances in Neural Information Processing Systems. MIT Press (2000) 1057–1063.
, , and ,[39] Natural actor critic, Neurocomputing (2008b).
and ,[40] Policy search for motor primitives, In Vol. 21 of Advances in Neural Information Processing Systems. (NIPS 2008). Vancouver, BC, Cambridge, MA: MIT Press (2008) 297–304.
and ,[41] Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors. Neural Comput. 25 (2013) 328–373. | DOI | MR | Zbl
, , , and ,[42] Optimal Control and Estimation, Dover books on advanced mathematics. Dover Publications, New York (1994). | Zbl
,[43] Path Integral Policy Improvement with Covariance Matrix Adaptation, 29 th International Conference on Machine Learning, Edinburgh, Scotland, UK (2012).
and ,[44] Exponentially stabilizing continuous-time controllers for periodic orbits of hybrid systems: Application to bipedal locomotion with ground height variations. To appear in: Int. J. Robotics Res. (2015). | DOI
, and ,[45] Event-Based Stabilization of Periodic Orbits for Underactuated 3-D Bipedal Robots With Left-Right Symmetry. IEEE Trans. Robotics 30 (2014) 365–381. | DOI
and ,Cité par Sources :