Actor–Critic Learning Based Pid Control for Robotic Manipulators

RAHIMI NOHOOJI, Hamed; Zaraki, Abolfazl; VOOS, Holger

doi:10.1016/j.asoc.2023.111153

Download

Article (Scientific journals)

Actor–Critic Learning Based Pid Control for Robotic Manipulators

RAHIMI NOHOOJI, Hamed; Zaraki, Abolfazl; VOOS, Holger

2024 • In Applied Soft Computing, 151, p. 111153

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/58902

DOI
10.1016/j.asoc.2023.111153

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

SSRN-id4409551 (1).pdf

Author preprint (752.77 kB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Reinforcement learning, Actor-critic, PID control, Neural network, Robot manipulators

Abstract :

[en] In this paper, a reinforcement learning structure is proposed to auto-tune PID gains by solving an optimal tracking control problem for robot manipulators. Taking advantage of the actor-critic framework implemented by neural networks, optimal tracking performance is achieved while unknown system dynamics are estimated. The critic network is used to learn the optimal cost-to-go function while the actor-network converges it and learns the optimal PID gains. Furthermore, Lyapunov’s direct method is utilized to prove the stability of the closed-loop system. By that means, an analytical procedure is delivered for a stable robot manipulator system to systematically adjust PID gains without the ad-hoc and painstaking process. The resultant actor-critic PID-like control exhibits stable adaptive and learning capabilities, while delivered with a simple structure and inexpensive online computational demands. Numerical simulation is performed to illustrate the effectiveness and advantages of the proposed actor-critic neural network PID control.

Disciplines :

Engineering, computing & technology: Multidisciplinary, general & others

Author, co-author :

RAHIMI NOHOOJI, Hamed ^✱; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Automation

Zaraki, Abolfazl ^✱; University of Hertfordshire

VOOS, Holger ^✱; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Automation

^✱ These authors have contributed equally to this work.

External co-authors :

yes

Language :

English

Title :

Actor–Critic Learning Based Pid Control for Robotic Manipulators

Publication date :

2024

Journal title :

Applied Soft Computing

ISSN :

1568-4946

eISSN :

1872-9681

Publisher :

Elsevier, Netherlands

Volume :

151

Pages :

111153

Peer reviewed :

Peer Reviewed verified by ORBi

Focus Area :

Computational Sciences

Additional URL :

https://www.sciencedirect.com/science/article/pii/S1568494623011717

Available on ORBilu :

since 15 December 2023

Statistics

Number of views

154 (10 by Unilu)

Number of downloads

261 (4 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Lewis, F.L., Vrabie, D., Syrmos, V.L., Optimal Control. 2012, John Wiley & Sons.
Naidu, D.S., Optimal Control Systems. 2002, CRC Press.
Geering, H.P., Optimal Control with Engineering Applications. 2007, Springer.
Hull, D.G., Optimal Control Theory for Applications. 2013, Springer Science & Business Media.
Rahimi Nohooji, H., Howard, I., Cui, L., Optimal robot-environment interaction using inverse differential Riccati equation. Asian J. Control 22:4 (2020), 1401–1410.
Korayem, M., Haghpanahi, M., Rahimi, H., Nikoobin, A., Finite element method and optimal control theory for path planning of elastic manipulators. New Advances in Intelligent Decision Technologies, 2009, Springer, 117–126.
Perrusquía, A., Yu, W., Identification and optimal control of nonlinear systems using recurrent neural networks and reinforcement learning: An overview. Neurocomputing 438 (2021), 145–154.
Vamvoudakis, K.G., Lewis, F.L., Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46:5 (2010), 878–888.
Wen, G., Chen, C.P., Ge, S.S., Yang, H., Liu, X., Optimized adaptive nonlinear tracking control using actor–critic reinforcement learning strategy. IEEE Trans. Ind. Inform. 15:9 (2019), 4969–4977.
Bittanti, S., Laub, A.J., Willems, J.C., The Riccati Equation. 2012, Springer Science & Business Media.
Diehl, M., Gerhard, J., Marquardt, W., Mönnigmann, M., Numerical solution approaches for robust nonlinear optimal control problems. Comput. Chem. Eng. 32:6 (2008), 1279–1292.
Rao, A.V., A survey of numerical methods for optimal control. Adv. Astronaut. Sci. 135:1 (2009), 497–528.
Korayem, M.H., Rahimi, H., Nikoobin, A., Mathematical modeling and trajectory planning of mobile manipulators with flexible links and joints. Appl. Math. Model. 36:7 (2012), 3229–3244.
Howard, R.A., Dynamic Programming and Markov Processes. 1960, John Wiley.
Vrabie, D., Lewis, F.L., Generalized policy iteration for continuous-time systems. 2009 International Joint Conference on Neural Networks, 2009, IEEE, 3224–3231.
Bellman, R., Dynamic programming. Science 153:3731 (1966), 34–37.
Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R., Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. C 32:2 (2002), 140–153.
Lewis, F.L., Vrabie, D., Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 9:3 (2009), 32–50.
Liu, D., Xue, S., Zhao, B., Luo, B., Wei, Q., Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans. Syst. Man Cybern.: Syst. 51:1 (2020), 142–160.
Li, C., Ding, J., Lewis, F.L., Chai, T., A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica, 129, 2021, 109687.
Sutton, R.S., Barto, A.G., Reinforcement Learning: An Introduction. 2018, MIT Press.
Kaelbling, L.P., Littman, M.L., Moore, A.W., Reinforcement learning: A survey. J. Artificial Intelligence Res. 4 (1996), 237–285.
Wiering, M.A., Van Otterlo, M., Reinforcement learning. Adapt. Learn. Optim., 12(3), 2012, 729.
Shuprajhaa, T., Sujit, S.K., Srinivasan, K., Reinforcement learning based adaptive PID controller design for control of linear/nonlinear unstable processes. Appl. Soft Comput., 128, 2022, 109450.
Wang, T., Gao, J., Xie, O., Sliding mode disturbance observer and Q learning-based bilateral control for underwater teleoperation systems. Appl. Soft Comput., 130, 2022, 109684.
Grondman, I., Busoniu, L., Lopes, G.A., Babuska, R., A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. C 42:6 (2012), 1291–1307.
Ouyang, Y., Dong, L., Wei, Y., Sun, C., Neural network based tracking control for an elastic joint robot with input constraint via actor-critic design. Neurocomputing 409 (2020), 286–295.
Yan, L., Liu, Z., Chen, C.P., Zhang, Y., Wu, Z., Reinforcement learning based adaptive optimal control for constrained nonlinear system via a novel state-dependent transformation. ISA Trans. 133 (2023), 29–41.
Doya, K., Reinforcement learning in continuous time and space. Neural Comput. 12:1 (2000), 219–245.
Wen, G., Chen, C.P., Feng, J., Zhou, N., Optimized multi-agent formation control based on an identifier–actor–critic reinforcement learning algorithm. IEEE Trans. Fuzzy Syst. 26:5 (2017), 2719–2731.
Wen, G., Li, B., Niu, B., Optimized backstepping control using reinforcement learning of observer-critic-actor architecture based on fuzzy system for a class of nonlinear strict-feedback systems. IEEE Trans. Fuzzy Syst. 30:10 (2022), 4322–4335.
Chen, Q., Jin, Y., Song, Y., Fault-tolerant adaptive tracking control of Euler-Lagrange systems–An echo state network approach driven by reinforcement learning. Neurocomputing 484 (2022), 109–116.
Ouyang, Y., Sun, C., Dong, L., Actor–critic learning based coordinated control for a dual-arm robot with prescribed performance and unknown backlash-like hysteresis. ISA Trans. 126 (2022), 1–13.
Song, Y., Huang, X., Wen, C., Robust adaptive fault-tolerant PID control of MIMO nonlinear systems with unknown control direction. IEEE Trans. Ind. Electron. 64:6 (2017), 4876–4884.
Nohooji, H.R., Constrained neural adaptive PID control for robot manipulators. J. Franklin Inst. B 357:7 (2020), 3907–3923.
Viljamaa, P., Koivo, H.N., Fuzzy logic in PID gain scheduling. Third European Congress on Fuzzy and Intelligent Technologies EUFIT’95, 1995, 927–931.
Visioli, A., Tuning of PID controllers with fuzzy logic. IEEE Proc. D 148:1 (2001), 1–8.
Han, J., Shan, X., Liu, H., Xiao, J., Huang, T., Fuzzy gain scheduling PID control of a hybrid robot based on dynamic characteristics. Mech. Mach. Theory, 184, 2023, 105283.
Akhyar, S., Omatu, S., Self-tuning PID control by neural-networks. Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), Vol. 3, 1993, IEEE, 2749–2752.
Zhang, J., Zhuang, J., Du, H., et al. Self-organizing genetic algorithm based tuning of PID controllers. Inform. Sci. 179:7 (2009), 1007–1018.
Kim, T.-H., Maruta, I., Sugie, T., Robust PID controller tuning based on the constrained particle swarm optimization. Automatica 44:4 (2008), 1104–1110.
Saraswat, R., Suhag, S., Type-2 fuzzy logic PID control for efficient power balance in an AC microgrid. Sustain. Energy Technol. Assess., 56, 2023, 103048.
Zhang, H., Wei, Q., Liu, D., An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica 47:1 (2011), 207–214.
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M., Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans. Syst. Man Cybern. B 38:4 (2008), 943–949.
Yang, X., He, H., Wei, Q., Luo, B., Reinforcement learning for robust adaptive control of partially unknown nonlinear systems subject to unmatched uncertainties. Inform. Sci. 463 (2018), 307–322.
Li, Y., Zhang, J., Liu, W., Tong, S., Observer-based adaptive optimized control for stochastic nonlinear systems with input and state constraints. IEEE Trans. Neural Netw. Learn. Syst. 33:12 (2021), 7791–7805.
Pham, T.L., Dao, P.N., et al. Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans. 130 (2022), 277–292.
Lee, T.H., Harris, C.J., Adaptive Neural Network Control of Robotic Manipulators, Vol. 19. 1998, World Scientific.
Slotine, J.-J.E., Li, W., et al. Applied Nonlinear Control, Vol. 199. 1991, Prentice hall, Englewood Cliffs, NJ.
Lewis, F.L., Dawson, D.M., Abdallah, C.T., Robot Manipulator Control: Theory and Practice. 2003, CRC Press.
Liu, J., Radial Basis Function (RBF) Neural Network Control for Mechanical Systems: Design, Analysis and Matlab Simulation. 2013, Springer Science & Business Media.
Yu, H., Xie, T., Paszczyñski, S., Wilamowski, B.M., Advantages of radial basis function networks for dynamic system design. IEEE Trans. Ind. Electron. 58:12 (2011), 5438–5450.
Sanner, R.M., Slotine, J.-J.E., Gaussian networks for direct adaptive control. 1991 American Control Conference, 1991, IEEE, 2153–2159.
Ge, S.S., Wang, C., Adaptive NN control of uncertain nonlinear pure-feedback systems. Automatica 38:4 (2002), 671–682.
Kurdila, A., Narcowich, F.J., Ward, J.D., Persistency of excitation in identification using radial basis function approximants. SIAM J. Control Optim. 33:2 (1995), 625–642.
Wang, C., Hill, D.J., Ge, S.S., Chen, G., An ISS-modular approach for adaptive neural control of pure-feedback systems. Automatica 42:5 (2006), 723–731.
Apostol, T.M., Ablow, C., Mathematical analysis. Phys. Today, 11(7), 1958, 32.
Nohooji, H.R., Howard, I., Cui, L., Neural impedance adaption for assistive human–robot interaction. Neurocomputing 290 (2018), 50–59.
Li, Y., Chen, L., Tee, K.P., Li, Q., Reinforcement learning control for coordinated manipulation of multi-robots. Neurocomputing 170 (2015), 168–175.
Khalil, H.K., Universal integral controllers for minimum-phase nonlinear systems. IEEE Trans. Automat. Control 45 (2000), 490–494.
Åström, K.J., Murray, R.M., Feedback Systems: An Introduction for Scientists and Engineers. 2021, Princeton University Press.
Baird, L., Advantage Updating: (Technical Report WL-TR-93-1146)., 1993, Wright Laboratory, Wright-Patterson Air Force Base, OH 45433-7301, USA.
Li, Y., Tee, K.P., Yan, R., Ge, S.S., Reinforcement learning for human-robot shared control. Assem. Autom. 40:1 (2020), 105–117.
Liu, C., Dong, C., Zhou, Z., Wang, Z., Barrier Lyapunov function based reinforcement learning control for air-breathing hypersonic vehicle with variable geometry inlet. Aerosp. Sci. Technol., 96, 2020, 105537.
Cao, S., Sun, L., Jiang, J., Zuo, Z., Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation. IEEE Trans. Neural Netw. Learn. Syst. 34:8 (2023), 4584–4595.
Zhou, Z.-G., Zhou, D., Chen, X., Shi, X.-N., Adaptive actor-critic learning-based robust appointed-time attitude tracking control for uncertain rigid spacecrafts with performance and input constraints. Adv. Space Res. 71:9 (2023), 3574–3587.
Young, W.H., On the multiplication of successions of Fourier constants. Proc. R. Soc. A 87:596 (1912), 331–339.