[en] In this paper, a reinforcement learning structure is proposed to auto-tune
PID gains by solving an optimal tracking control problem for robot manipulators. Taking advantage of the actor-critic framework implemented by
neural networks, optimal tracking performance is achieved while unknown
system dynamics are estimated. The critic network is used to learn the optimal cost-to-go function while the actor-network converges it and learns the
optimal PID gains. Furthermore, Lyapunov’s direct method is utilized to
prove the stability of the closed-loop system. By that means, an analytical
procedure is delivered for a stable robot manipulator system to systematically adjust PID gains without the ad-hoc and painstaking process. The
resultant actor-critic PID-like control exhibits stable adaptive and learning
capabilities, while delivered with a simple structure and inexpensive online
computational demands. Numerical simulation is performed to illustrate the
effectiveness and advantages of the proposed actor-critic neural network PID
control.
Disciplines :
Ingénierie, informatique & technologie: Multidisciplinaire, généralités & autres
Auteur, co-auteur :
RAHIMI NOHOOJI, Hamed ✱; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Automation
Zaraki, Abolfazl ✱; University of Hertfordshire
VOOS, Holger ✱; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Automation
✱ Ces auteurs ont contribué de façon équivalente à la publication.
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Actor–Critic Learning Based Pid Control for Robotic Manipulators
Lewis, F.L., Vrabie, D., Syrmos, V.L., Optimal Control. 2012, John Wiley & Sons.
Naidu, D.S., Optimal Control Systems. 2002, CRC Press.
Geering, H.P., Optimal Control with Engineering Applications. 2007, Springer.
Hull, D.G., Optimal Control Theory for Applications. 2013, Springer Science & Business Media.
Rahimi Nohooji, H., Howard, I., Cui, L., Optimal robot-environment interaction using inverse differential Riccati equation. Asian J. Control 22:4 (2020), 1401–1410.
Korayem, M., Haghpanahi, M., Rahimi, H., Nikoobin, A., Finite element method and optimal control theory for path planning of elastic manipulators. New Advances in Intelligent Decision Technologies, 2009, Springer, 117–126.
Perrusquía, A., Yu, W., Identification and optimal control of nonlinear systems using recurrent neural networks and reinforcement learning: An overview. Neurocomputing 438 (2021), 145–154.
Vamvoudakis, K.G., Lewis, F.L., Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46:5 (2010), 878–888.
Bittanti, S., Laub, A.J., Willems, J.C., The Riccati Equation. 2012, Springer Science & Business Media.
Diehl, M., Gerhard, J., Marquardt, W., Mönnigmann, M., Numerical solution approaches for robust nonlinear optimal control problems. Comput. Chem. Eng. 32:6 (2008), 1279–1292.
Rao, A.V., A survey of numerical methods for optimal control. Adv. Astronaut. Sci. 135:1 (2009), 497–528.
Korayem, M.H., Rahimi, H., Nikoobin, A., Mathematical modeling and trajectory planning of mobile manipulators with flexible links and joints. Appl. Math. Model. 36:7 (2012), 3229–3244.
Howard, R.A., Dynamic Programming and Markov Processes. 1960, John Wiley.
Vrabie, D., Lewis, F.L., Generalized policy iteration for continuous-time systems. 2009 International Joint Conference on Neural Networks, 2009, IEEE, 3224–3231.
Bellman, R., Dynamic programming. Science 153:3731 (1966), 34–37.
Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R., Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. C 32:2 (2002), 140–153.
Lewis, F.L., Vrabie, D., Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9:3 (2009), 32–50.
Liu, D., Xue, S., Zhao, B., Luo, B., Wei, Q., Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans. Syst. Man Cybern.: Syst. 51:1 (2020), 142–160.
Li, C., Ding, J., Lewis, F.L., Chai, T., A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica, 129, 2021, 109687.
Sutton, R.S., Barto, A.G., Reinforcement Learning: An Introduction. 2018, MIT Press.
Kaelbling, L.P., Littman, M.L., Moore, A.W., Reinforcement learning: A survey. J. Artificial Intelligence Res. 4 (1996), 237–285.
Wiering, M.A., Van Otterlo, M., Reinforcement learning. Adapt. Learn. Optim., 12(3), 2012, 729.
Shuprajhaa, T., Sujit, S.K., Srinivasan, K., Reinforcement learning based adaptive PID controller design for control of linear/nonlinear unstable processes. Appl. Soft Comput., 128, 2022, 109450.
Wang, T., Gao, J., Xie, O., Sliding mode disturbance observer and Q learning-based bilateral control for underwater teleoperation systems. Appl. Soft Comput., 130, 2022, 109684.
Grondman, I., Busoniu, L., Lopes, G.A., Babuska, R., A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. C 42:6 (2012), 1291–1307.
Ouyang, Y., Dong, L., Wei, Y., Sun, C., Neural network based tracking control for an elastic joint robot with input constraint via actor-critic design. Neurocomputing 409 (2020), 286–295.
Yan, L., Liu, Z., Chen, C.P., Zhang, Y., Wu, Z., Reinforcement learning based adaptive optimal control for constrained nonlinear system via a novel state-dependent transformation. ISA Trans. 133 (2023), 29–41.
Doya, K., Reinforcement learning in continuous time and space. Neural Comput. 12:1 (2000), 219–245.
Wen, G., Chen, C.P., Feng, J., Zhou, N., Optimized multi-agent formation control based on an identifier–actor–critic reinforcement learning algorithm. IEEE Trans. Fuzzy Syst. 26:5 (2017), 2719–2731.
Wen, G., Li, B., Niu, B., Optimized backstepping control using reinforcement learning of observer-critic-actor architecture based on fuzzy system for a class of nonlinear strict-feedback systems. IEEE Trans. Fuzzy Syst. 30:10 (2022), 4322–4335.
Chen, Q., Jin, Y., Song, Y., Fault-tolerant adaptive tracking control of Euler-Lagrange systems–An echo state network approach driven by reinforcement learning. Neurocomputing 484 (2022), 109–116.
Ouyang, Y., Sun, C., Dong, L., Actor–critic learning based coordinated control for a dual-arm robot with prescribed performance and unknown backlash-like hysteresis. ISA Trans. 126 (2022), 1–13.
Song, Y., Huang, X., Wen, C., Robust adaptive fault-tolerant PID control of MIMO nonlinear systems with unknown control direction. IEEE Trans. Ind. Electron. 64:6 (2017), 4876–4884.
Nohooji, H.R., Constrained neural adaptive PID control for robot manipulators. J. Franklin Inst. B 357:7 (2020), 3907–3923.
Viljamaa, P., Koivo, H.N., Fuzzy logic in PID gain scheduling. Third European Congress on Fuzzy and Intelligent Technologies EUFIT’95, 1995, 927–931.
Visioli, A., Tuning of PID controllers with fuzzy logic. IEEE Proc. D 148:1 (2001), 1–8.
Han, J., Shan, X., Liu, H., Xiao, J., Huang, T., Fuzzy gain scheduling PID control of a hybrid robot based on dynamic characteristics. Mech. Mach. Theory, 184, 2023, 105283.
Akhyar, S., Omatu, S., Self-tuning PID control by neural-networks. Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Vol. 3, 1993, IEEE, 2749–2752.
Zhang, J., Zhuang, J., Du, H., et al. Self-organizing genetic algorithm based tuning of PID controllers. Inform. Sci. 179:7 (2009), 1007–1018.
Kim, T.-H., Maruta, I., Sugie, T., Robust PID controller tuning based on the constrained particle swarm optimization. Automatica 44:4 (2008), 1104–1110.
Saraswat, R., Suhag, S., Type-2 fuzzy logic PID control for efficient power balance in an AC microgrid. Sustain. Energy Technol. Assess., 56, 2023, 103048.
Zhang, H., Wei, Q., Liu, D., An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47:1 (2011), 207–214.
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M., Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans. Syst. Man Cybern. B 38:4 (2008), 943–949.
Yang, X., He, H., Wei, Q., Luo, B., Reinforcement learning for robust adaptive control of partially unknown nonlinear systems subject to unmatched uncertainties. Inform. Sci. 463 (2018), 307–322.
Li, Y., Zhang, J., Liu, W., Tong, S., Observer-based adaptive optimized control for stochastic nonlinear systems with input and state constraints. IEEE Trans. Neural Netw. Learn. Syst. 33:12 (2021), 7791–7805.
Pham, T.L., Dao, P.N., et al. Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans. 130 (2022), 277–292.
Lee, T.H., Harris, C.J., Adaptive Neural Network Control of Robotic Manipulators, Vol. 19. 1998, World Scientific.
Slotine, J.-J.E., Li, W., et al. Applied Nonlinear Control, Vol. 199. 1991, Prentice hall, Englewood Cliffs, NJ.
Lewis, F.L., Dawson, D.M., Abdallah, C.T., Robot Manipulator Control: Theory and Practice. 2003, CRC Press.
Liu, J., Radial Basis Function (RBF) Neural Network Control for Mechanical Systems: Design, Analysis and Matlab Simulation. 2013, Springer Science & Business Media.
Yu, H., Xie, T., Paszczyñski, S., Wilamowski, B.M., Advantages of radial basis function networks for dynamic system design. IEEE Trans. Ind. Electron. 58:12 (2011), 5438–5450.
Sanner, R.M., Slotine, J.-J.E., Gaussian networks for direct adaptive control. 1991 American Control Conference, 1991, IEEE, 2153–2159.
Ge, S.S., Wang, C., Adaptive NN control of uncertain nonlinear pure-feedback systems. Automatica 38:4 (2002), 671–682.
Kurdila, A., Narcowich, F.J., Ward, J.D., Persistency of excitation in identification using radial basis function approximants. SIAM J. Control Optim. 33:2 (1995), 625–642.
Wang, C., Hill, D.J., Ge, S.S., Chen, G., An ISS-modular approach for adaptive neural control of pure-feedback systems. Automatica 42:5 (2006), 723–731.