+ All Categories
Home > Documents > Power System Load Frequency Active Disturbance Rejection ...

Power System Load Frequency Active Disturbance Rejection ...

Date post: 31-Jan-2022
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
Embed Size (px)
of 14 /14
This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Powered by TCPDF (www.tcpdf.org) This material is protected by copyright and other intellectual property rights, and duplication or sale of all or part of any of the repository collections is not permitted, except that material may be duplicated by you for your research use or educational purposes in electronic or print form. You must obtain permission for any other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not an authorised user. Zheng, Yuemin; Huang, Zhaoyang; Tao, Jin; Sun, Hao; Sun, Qinglin; Dehmer, Matthias; Sun, Mingwei; Chen, Zengqiang Power system load frequency active disturbance rejection control via reinforcement learning- based memetic particle swarm optimization Published in: IEEE Access DOI: 10.1109/ACCESS.2021.3099904 Published: 01/08/2021 Document Version Publisher's PDF, also known as Version of record Published under the following license: CC BY Please cite the original version: Zheng, Y., Huang, Z., Tao, J., Sun, H., Sun, Q., Dehmer, M., Sun, M., & Chen, Z. (2021). Power system load frequency active disturbance rejection control via reinforcement learning-based memetic particle swarm optimization. IEEE Access, 9, 116194-116206. https://doi.org/10.1109/ACCESS.2021.3099904
Page 1: Power System Load Frequency Active Disturbance Rejection ...

This is an electronic reprint of the original article.This reprint may differ from the original in pagination and typographic detail.

Powered by TCPDF (www.tcpdf.org)

This material is protected by copyright and other intellectual property rights, and duplication or sale of all or part of any of the repository collections is not permitted, except that material may be duplicated by you for your research use or educational purposes in electronic or print form. You must obtain permission for any other use. Electronic or print copies may not be offered, whether for sale or otherwise to anyone who is not an authorised user.

Zheng, Yuemin; Huang, Zhaoyang; Tao, Jin; Sun, Hao; Sun, Qinglin; Dehmer, Matthias; Sun,Mingwei; Chen, ZengqiangPower system load frequency active disturbance rejection control via reinforcement learning-based memetic particle swarm optimization

Published in:IEEE Access


Published: 01/08/2021

Document VersionPublisher's PDF, also known as Version of record

Published under the following license:CC BY

Please cite the original version:Zheng, Y., Huang, Z., Tao, J., Sun, H., Sun, Q., Dehmer, M., Sun, M., & Chen, Z. (2021). Power system loadfrequency active disturbance rejection control via reinforcement learning-based memetic particle swarmoptimization. IEEE Access, 9, 116194-116206. https://doi.org/10.1109/ACCESS.2021.3099904

Page 2: Power System Load Frequency Active Disturbance Rejection ...

Received July 9, 2021, accepted July 21, 2021, date of publication July 26, 2021, date of current version August 26, 2021.

Digital Object Identifier 10.1109/ACCESS.2021.3099904

Power System Load Frequency ActiveDisturbance Rejection Control viaReinforcement Learning-Based MemeticParticle Swarm OptimizationYUEMIN ZHENG 1, ZHAOYANG HUANG1, JIN TAO 1,2, (Member, IEEE), HAO SUN1,QINGLIN SUN 1, MATTHIAS DEHMER 3, MINGWEI SUN 1,AND ZENGQIANG CHEN 1,4, (Member, IEEE)1College of Artificial Intelligence, Nankai University, Tianjin 300350, China2Department of Electrical Engineering and Automation, Aalto University, 02500 Espoo, Finland3Department of Computer Science, Swiss Distance University of Applied Sciences, 3900 Brig, Switzerland4Key Laboratory of Intelligent Robotics of Tianjin, Nankai University, Tianjin 300350, China

Corresponding author: Jin Tao ([email protected])

This work was supported in part by the National Natural Science Foundation of China under Grant 61973172, Grant 61973175,Grant 62003175, and Grant 62003177; in part by the National Key Research and Development Project under Grant 2019YFC1510900; inpart by the Key Technologies Research and Development Program of Tianjin under Grant 19JCZDJC32800; in part by the ChinaPostdoctoral Science Foundation under Grant 2020M670633; and in part by the Academy of Finland under Grant 315660.

ABSTRACT Load frequency control (LFC) is necessary to guarantee the safe operation of power systems.Aiming at the frequency and power stability problems caused by load disturbances in interconnected powersystems, active disturbance rejection control (ADRC) was designed. There are eight parameters that needto be adjusted for an ADRC, which are challenging to adjust manually, thus limiting the development ofthis approach in industrial applications. Regardless of the theory or application, there is still no unifiedand efficient parameter optimization method. The traditional particle swarm optimization (PSO) algorithmsuffers from premature convergence and a high computational cost. Therefore, in this paper, we utilize animproved PSO algorithm, a reinforcement-learning-based memetic particle swarm optimization (RLMPSO),for the parameter tuning of ADRC to obtain better control performance for the controlled system. Finally,to highlight the advantages of the proposed RLMPSO-ADRCmethod and to prove its superiority, the resultswere compared with other control algorithms in both a traditional non-reheat two-area thermal power systemand a non-linear power system with a governor dead band (GDB) and a generation rate constraint (GRC).Moreover, the robustness of the proposed method was tested by simulations with parameter perturbationsand different working conditions. The simulation results showed that the proposed method can meet thedemand for the frequency deviation to stabilize to 0 in LFC with higher performance, and it is worthy ofpopularization and application.

INDEX TERMS Interconnected power system, load frequency control, active disturbance rejection control,parameter optimization, reinforcement-learning-based memetic particle swarm optimization.

I. INTRODUCTIONMaintaining the relative stability of the frequency and volt-age is a prerequisite for ensuring the safe operation of apower system. As a well-known issue in power systems

The associate editor coordinating the review of this manuscript and

approving it for publication was Seyedali Mirjalili .

research, how to implement load frequency control (LFC)to ensure a constant frequency of a power system andimprove the power quality and economic benefits havebeen widely studied. Furthermore, as the demand for powerquality has increased, interconnected power systems haveemerged. However, the exchange power of tie-lines in aninterconnected power system is susceptible to disturbances,

116194 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 9, 2021

Page 3: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

which may cause unnecessary economic losses. Therefore,determining how to design a controller to ensure the safe,stable, and efficient operation of the power system is of greatsignificance.

The essential requirement of a control strategy is to havethe ability to process parameter uncertainty and achievea good anti-disturbance performance while obtaining thedesired dynamic performance to the greatest possible extent.Moreover, the design of the controller should not be too com-plicated to provide practical solutions for engineering debug-ging. At present, as an important means of active frequencymodulation in power systems, LFC control strategies cur-rently cover sliding mode control (SMC) [1], linear operatorinequality [2], robust control [3], and predictive control [4].As the most traditional control method, proportional–integral(PI) [5] and proportional–integral–derivative (PID) [6] con-trollers have been the most widely used in the LFC fielddue to their clear principles, simple implementation, andcertain degrees of robustness. They can restore the stabil-ity of a power system, but their performances are poor.Other control strategies, such as adaptive control, can makecorresponding real-time adjustments to controller parame-ters or rules, but the design is complex, and they are noteasy to apply in industry [7]. A variable-structure con-troller can respond to system disturbances and parameterchanges at faster speeds, thereby significantly improvingthe dynamic performance of the controlled system, but thepractical applications of such a controller are limited [8]. ForLFC, a robust control strategy can theoretically cope withthe problems caused by system disturbances and parametermodeling errors. However, the order is high, and the designalgorithm is relatively complex, relying on engineering expe-rience [9]. Model predictive control (MPC) exhibits strongrobustness and adaptability, and it does not require highmodel accuracy. However, the stability analysis and robustperformance detection of multivariable MPC need furtherstudy [10].

In practical applications, PID still occupies a dominantposition in engineering. Han [11], [12] first proposed anactive disturbance rejection control (ADRC) technology,which was based on the PID control idea of eliminatingthe effects of disturbances based on errors. It attributes alluncertain factors and external disturbances of the system tothe total disturbance of the system based on the system inputsand outputs. The disturbance can be eliminated by designinga control law. ADRC has been widely used in many industrialfields, such as aircraft control [13], motor control [14], shipcontrol [15], and vehicle control [16]. ADRC has also beenapplied for LFC. For example, Rahman and Chowdhury [17]compared the control effects of ADRC and PID for an LFCsystem, and the simulation results showed that ADRC is apowerful substitute for PID and has significant performanceadvantages for LFC. Zheng et al. [18] applied ADRC to athree-area interconnected power system both in regulated andderegulated environments. It is worth mentioning that mostof the controllers in ADRC currently designed for LFC use

linear ADRC (LADRC), which was proposed by Gao [19],where the internal structures of the ADRC method have beengreatly simplified. To a certain extent, this was caused by thedifficulty of ADRC parameter tuning. Therefore, determininghow to tune the parameters for ADRC is of great significanceto promote its application.

Intelligent optimization algorithms are derived from theobservation and simulation of biological systems in the nat-ural world, and they are a new class of strategies suitablefor optimization problems. In terms of parameter tuning,intelligent optimization algorithms, such as particle swarmoptimization (PSO) [20], the genetic algorithm (GA) [21],simulated annealing (SA) [22], and ant colony optimiza-tion (ACO) [23], have shown good optimization capabilities.However, some limitations and shortcomings of intelligentoptimization algorithms cannot be ignored. As mentioned inthe ’No Free Lunch (NFL)’ theorem, no single algorithmcan be designated as the best algorithm that is applicable toall optimization problems. Therefore, various optimizationand improved intelligent algorithms are constantly being pro-posed. For example, for the standard PSO, on the one hand,it can quickly fall into local optima at the beginning of thesearch process; on the other hand, the computational costwill increase with the increase in the sample population size[24]. Therefore, improved PSO algorithms such as the differ-ential evolution particle swarm optimization (DEPSO) [25]and reinforcement-learning-based memetic particle swarmoptimization (RLMPSO) [26] came into being. RLMPSOis an improved algorithm from a memetic algorithm (MA)perspective, where the MA is a hybrid algorithm that consistof a local search method, reinforcement learning (RL), and aglobally optimal PSO algorithm.

In this study, the proposed ADRC is evaluated on atwo-area thermal power system, which mainly contains anon-reheat turbine and nonlinear links with a generation rateconstraint (GRC) and a governor dead band (GDB). For thefirst time, RLMPSO is used to adjust the controller’s param-eters. To verify the effectiveness of the proposed method,simulation analysis on both linear and non-linear power sys-tems with a GRC and a GDB was conducted, and the resultswere compared with those from other methods. Moreover,robustness tests were also carried out for uncertain parametervalues and disturbances. The main contributions of this studyare summarized as follows:

(1) Aiming at the two-area interconnected power systemwith non-reheat turbines, two third-order ADRC controllerswere designed.

(2) The RLMPSO algorithm was used to optimize theeight parameters in the ADRC, and the effectiveness of thedesigned method was verified by comparison with othermethods.

The rest of the paper is arranged as follows. Section 2describes the mathematical model of the LFC system.In Section 3, the ADRC is designed for LFC. Section 4 intro-duces the parameter optimization process based onRLMPSO. Section 5 shows the simulation results, and the

VOLUME 9, 2021 116195

Page 4: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

FIGURE 1. Structure diagram of multi-area interconnected power system.

corresponding analysis is presented. Section 6 concludes thispaper.

II. POWER SYSTEM MODEL DESCRIPTIONLFC is a popular research subject for power systems, wherethe stability of the frequency is a prerequisite for the safeand reliable operation of the power grid. Fig. 1 shows aschematic diagram of the linear structure of the i-th area in theinterconnected power system, which mainly includes threelinks: the governor Ggi, the turbine Gti, and the generatorGpi. The governor mainly controls the guide vane or intakevalve through the feedback of the speed deviation to controlthe speed and load of the turbine. The turbine acts as a primemover to generate mechanical power to drive the generator toconvert mechanical energy into electrical energy. Generally,due to the strong coupling between interconnected powersystems, when the load disturbance occurs in one area, otherareas will also be affected by this area, resulting in instabilityof the entire grid. Therefore, the main goal of LFC is tocontrol the frequency deviation 1fi within a safe range byovercoming the influence of load disturbances.

Furthermore, in the interconnected power system,the tie-line that exchanges power1Ptiei between two adjacentareas also needs to be stabilized at the planned value. Thisstudy does not consider the load frequency control problemunder deregulated environments involving economic bene-fits, so the planned value of the exchange power of the tie-lineis 0. The mathematical expressions of the above three linksare introduced as

Ggi = 1Tgis+1

Gti = 1Ttis+1

Gpi =Kpi



where Tgi, Tti, and Tpi are the time constant of the governor,the non-reheat turbine, and the generator. Kpi represents thegain of the generator.

According to Fig. 1, 1fi can be derived as

1fi =GgiGtiGpi

1+ GgiGtiGpi/Rui −

Gpi1+ GgiGtiGpi/R



1+ GgiGtiGpi/R1Ptiei. (2)

TABLE 1. Meanings of model symbols.

The expression of the area control error (ACE) is writtenas

ACEi = Bi1fi +1Ptiei


1+ GgiGtiGpi/Rui −

GpiBi1+ GgiGtiGpi/R



1+ GgiGtiGpi/R)1Ptiei, (3)

where ui is the increment of the position change, whichneeds to be adjusted by the controller. 1PLi denotes the loaddisturbance. 1Ptiei is expressed as

1Ptiei =n∑j6=i

Tijs(1fi −1fj). (4)

The meanings of the symbols shown in Fig. 1 are describedin Table 1.

III. DESIGN OF ACTIVE DISTURBANCE REJECTIONCONTROL (ADRC)ADRC is derived from the combination of the classic PIDand modern control theory, which has the advantages ofnot relying on model information and eliminating unknowndisturbances. It only needs to know the order of the system.Therefore, this section mainly focuses on the design of theADRC for the abovemodel. For themulti-area interconnectedpower system, to stabilize the frequency deviation and theexchange power of the tie-line to 0, tie-line bias control (TBC)mode is adopted, which uses ACEi as the controller input.

116196 VOLUME 9, 2021

Page 5: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

FIGURE 2. Structure diagram of active disturbance rejection control (ADRC).

In addition, in view of the coupling problem between areas,Tan [27] proposed a decentralized control method and carriedout a theoretical proof, in which the controller in each area canbe designed separately on the premise of ignoring the tie-lineexchanged power.

A. TRANSFER FUNCTION PROCESSINGAccording to the decentralized control mentioned above,ignoring 1Ptiei, Eq. 3 can be written as

Yi(s) = Gi(s)Ui(s)+ Gdi(s)Di(s), (5)

where Y (s), U (s), and D(s) are the Laplace transforms ofACEi, ui, and1PLi. Combining this with Eq. 1, the expressionof Gi(s) and Gdi can be presented as

Gi (s) =KpiBiRi(

Tgis+ 1)(Ttis+ 1)

(Tpis+ 1

)Ri + Kpi

Gdi (s) =

(Tgis+ 1

)(Ttis+ 1)KpiBiRi(

Tgis+ 1)(Ttis+ 1)

(Tpis+ 1

)Ri + Kpi



Converting the above equation into a differential equationform, we obtain

TgiTtiTpiyi(3) (t)+(TgiTti + TgiTpi + TtiTpi

)yi (t)

+(Tgi + Tti + Tpi

)yi (t)+ yi (t)

= KpiBiRiu (t)+ TgiTtiKpiBiRid (t)

+(Tgi + Tti

)KpiBiRid (t)+ KpiBiRid (t) . (7)

Therefore, Eq. 3 can be organized into the followingthird-order system as

y(3) = f0(y, y, y, d, d, d

)+ bu

= (f0 + (b− b0) u)+ b0u

= f + b0u, (8)

where b = KpiBiRiTgiTtiTpi

. f is the total disturbance containing theuncertainties caused by parameter perturbations of the systemmodel and external disturbances caused by load disturbances.Since the actual value of b cannot be known in practice,the adjustable parameter b0 is used to substitute b.

B. DESIGN OF ADRC FOR LOADFREQUENCY CONTROL (LFC)ADRC is composed of a tracking differentiator (TD),extended state observer (ESO), and nonlinear state error feed-back (NLSEF), where the TD can suppress the noise ampli-fication effect in the input signal, the ESO can estimate the

total unknown disturbance f , and the NLSEF can eliminatethe estimated disturbance. Moreover, the ADRC structurediagram corresponding to the third-order system is shownin Fig. 2.

Because the control goal in this study seeks to makeACEi(t) = 0 at steady state, the tracking reference inputv = 0. Moreover, since the load disturbance of the powersystem does not contain white noise, there is no need for afilter. Thus, the design of the TD can be ignored.

According to Eq. 8, the state can be defined as

x1 = y, x2 = y, x3 = y, x4 = f . (9)

The corresponding state equation isx1 = x2x2 = x3x3 = f + b0ux4 = fy = x1.


The ESO can be expressed ase = z1 − yz1 = z2 − β01ez2 = z3 − β02fal (e, 0.5, δ)z3 = z4 − β03fal (e, 0.25, δ)+ b0uz4 = −β04fal (e, 0.125, δ) ,


where z1, z2, z3, and z4 are the estimated values of x1, x2,x3, and x4, respectively. β01, β02, β03, and β04 represent theobserver gains, which can ensure that the state estimation val-ues are close to the actual values when they are appropriate.The nonlinear function fal(·) has the following form:

fal (e, a, δ) ={e/δa−1, |e| ≤ δ|e|asign (e) , |e| > δ,


where a and δ are the adjustable parameters.After the disturbance state is estimated, it needs to be

eliminated. We define ei = vi − zi, i = 1, 2, 3, and then thecontrol law can be designed as

u0 =3∑i=1

βifal (ei, ai, δ0), (13)

where βi, i = 1, 2, 3 are the feedback control gains, and fal(·)has the same form as Eq. 12.

VOLUME 9, 2021 116197

Page 6: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

ui in Eq. 8 is then expressed as

u =u0 − z4b0

. (14)

By substituting Eq. 13 and 14 into Eq. 8, under the premiseof f ≈ z4, we can obtain

y(3) ≈ u0. (15)

Therefore, appropriate feedback control gains can ensurethat ACEi converges to 0. In this study, δ = 0.05, δ0 = 0.005,a1 = 0.3, a2 = 0.8, and a3 = 1.1.The above design process shows that for a third-order

controlled system, the parameters that need to be adjusted inthe ADRC are β01, β02, β03, β04, β1, β2, β3, and b0. It is verydifficult to manually adjust eight parameters at the same time,and an effective optimization method is urgently needed.

IV. REINFORCEMENT-LEARNING-BASED MEMETICPARTICLE SWARM OPTIMIZATION(RLMPSO)-OPTIMIZED ADRCAs a control technology for estimating and compensatingfor uncertain disturbances, ADRC has attracted widespreadattention in industry and academia since its proposal. How-ever, there has not been a set of efficient and unified tuningrules for parameter adjustment. In this study, the RLMPSOalgorithm is applied for the parameter tuning problem of thenonlinear ADRC to test the superiority of the combination ofRL and the intelligent optimization algorithm in a new field.

A. BASICS OF PARTICLE SWARM OPTIMIZATION (PSO)AND REINFORCEMENT LEARNING (RL)1) PRINCIPLE OF PSOThe PSO originated from the simulation of bird predationbehavior. The main concepts of PSO include a population,potential solutions (called particles), and iterative searchspace, where each particle is composed of a position, speed,and fitness value. It moves at an adaptive speed in the searchspace and retains the best position it has ever visited, that is,the position with the lowest function value (in general, onlythe minimization problem is considered). At the same time,it tracks the current optimal particle in the solution space torealize the information exchange between particles and thenadjusts the flight direction and distances between the particlesto complete the optimization search.

In the PSO algorithm, each iteration needs to completetwo main processes. One is the determination of the indi-vidual optimal position pbest of each particle, and the otheris the determination of the optimal particle position gbest ofthe group. The position and velocity of each particle in thepopulation are updated according to Eq. 16.{

Vk+1 = ωkVk + c1r1(pbest − Xk )+ c2r2(gbest − Xk )Xk+1 = Vk+1 + Xk ,


where k is the current iteration number, Xk is the currentposition of the particle, and Vk is the current velocityof the particle. c1 and c2 represent the learning factors.

ωk denotes the inertia weight. r1 and r2 are random numbersbetween [0, 1].The initial population is generally randomly generated by{

Xi = R0(U − L)+ LVi = R0′(Vmax − Vmin)+ Vmin,


where Xi and Vi represent the position and velocity of theparticle, respectively, R0 and R′0 are the random vector withthe same dimension as Xi, and each component is between[0, 1]. U and L represent the upper and lower bounds of theparticle position (that is, the parameter solution), respectively.Vmax and Vmin denote the upper and lower bounds of theparticle velocity, respectively.

2) PRINCIPLE OF RLRL is a method to find the optimal strategy through thecontinuous interaction between an agent and an uncertainenvironment. When the intelligent agent ‘‘communicates’’with the environment through actions, the environment willreturn the current reward to the agent, through which theaction can be evaluated [28]. The basic framework is shownin Fig. 3.

FIGURE 3. Basic framework of reinforcement learning (RL).

We assume that the environment produces state st attime t , and the reward value rt can be obtained based on thereward function. The agent can obtain the optimal action atthrough the state-action value function based on the cumula-tive reward Rc, where the state-action value function can beregarded as the evaluation value of the action.

Q-learning is the most common RL algorithm, first pro-posed by Watkins in 1989 [29]. As an RL method based ona time difference, the selection of the current state and actionin Q-learning can be regarded as an event, and any eventcorresponds to a state-action value function Q(st , at ), whichis stored in the Q table. Through the updated iteration ofthe Q table, the intelligent agent in Q-learning will graduallyapproach the optimal strategy for sequential decision-makingproblems in the continuous in-depth interaction between theagent and the environment. A typical learning process ofQ-learning can be described as follows:

Step 1: Model initialization. For all discrete states s ∈ Sand actions a ∈ A, initialize their corresponding value func-tions Q(s, a) ∈ Q.

116198 VOLUME 9, 2021

Page 7: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

Step 2: Initialize state s. On the basis of a Q table, selectthe initial action value a by an ε − greedy policy, shown as

π (a, s)←

{1− ε + ε

|A(s)| if a = argmaxa∈A

Q(s, a)

ε|A(s)| otherwise,


where ε represents the probability of exploration.From Eq. 18, we observe that the probability of selecting

an action corresponding to the largest Q value is ε. Otherwise,an action value is randomly selected from the action space.

Step 3: Perform action at , thus obtaining the correspondingreward value rt and the state st+1 at the next moment.

Step 4: Update Q(st , at ) according to

Qt+1(st , at )=Q(st , at )+ α

[rt+1 + γ max

aQ(st+1, a)

−Q(st , at )



where α is the learning rate. γ represents the discount factor,which reflects the importance of rewards for future moments.

Step 5: Determinewhether to end the process. If yes, outputthe optimal strategy; otherwise, return to the third step.

B. RLMPSOThe particle swarm algorithm suffers from premature conver-gence and a high computational cost. Improving the particleswarm algorithm from the perspective of amemetic algorithmis a significant research direction.

1) PRINCIPLE OF RLMPSOThe main idea of RLMPSO is to embed RL into the oper-ation of each particle search stage in the particle swarmalgorithm. Under RL control, each particle performs oneof five possible operations: exploration, convergence, high-jump, low-jump, and fine-tuning. Moreover, each actionwill be rewarded or punished based on the performance.In addition, the population size in the RLMPSO algorithmis small, and each particle evolves independently. For exam-ple, one particle performs exploration, while other particlesperform their own operations. The schematic diagram of theRLMPSO structure is shown in Fig. 4, where Q-learning isadopted.

Fig. 4 shows the structure of the entire RLMPSO thatintegrates RL and PSO. Particles in the PSO act as agentsin the Q-learning, and the search space of particles is usedas the environment in Q-learning. The state is expressed asthe current operation of each particle, namely exploration,convergence, high-jump, low-jump, or fine-tuning. Actionsare defined when changing from one state to another. In otherwords, Q-learning controls the operation of each particle inthe PSO group. Specifically, RL adaptively switches particlesfrom one operation (state) to another operation (state) basedon the performances of the particles. Positive rewards aregiven to particles that performwell, and particles that performpoorly are punished.

FIGURE 4. Structure diagram of reinforcement-learning-based memeticparticle swarm optimization (RLMPSO).

2) DEFINITION OF ACTIONAs mentioned earlier, there are five operations that eachparticle can perform. The five operations are introduced inthis section.

Exploration and convergence are two operations definedby the difference in the power of the particle’s global searchduring the search process and the preference of whetherto track the current global optimal position gbest , which isrealized by the differences of the values of ω, c1, and c2. Forthe exploration, the particle probes the solution space with alarger ω, and to maximize the global search, the particle willbe far from the current global optimal position gbest , wherec1 > c2. The convergence operation is the opposite. Thesearch power of the particles in the solution space will beattenuated. Thus, particles will converge in the direction ofgbest . At this time, c1 < c2, and ω is small.The main idea of the jump operation is to avoid premature

convergence of particles by changing the individual optimalparticle pbest,i (i is the particle subscript) to escape the possi-ble local optima. Specifically, a random value is added to eachdimension (that is, each parameter value to be optimized) ofpbest,i, shown as

Xi = pbest,i + rn(U − L), (20)

where rn is a normally distributed random number in therange [0, 1], that is, rn ∼ N (0, σ 2). For the high-jumpoperation, this means that the particle will change pbest,i witha larger step length, and the standard deviation σ is closer to1 at this time, while for the low-jump operation, the particlestep size is smaller, and the standard deviation σ is close to 0.Similar to the jump operation, the fine-tuning operation

also adds a random value to each dimension of the indi-vidual optimal particle pbest,i, so that the entire populationperforms a local search within the current global optimalsolution neighborhood. The difference is that the fine-tuning

VOLUME 9, 2021 116199

Page 8: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

of each dimension of the particles that needs to be manipu-lated is performed independently, and the fine-tuning oper-ation will perform a certain number of fitness evaluationsin cycles when each dimension changes. During the evalu-ation process, pbest,i and the corresponding speed variableVi,d (i represents the current particle subscript, and d is thedimension subscript) will change as the fitness value changes.

For example, supposing that particle i performs fine-tuningoperations, the maximum dimension (the number of parame-ters to be optimized) isD, and themaximumnumber of fitnessevaluations is Em. For the current dimension d ∈ [1,D],the current fitness evaluation number of this dimension ise ∈ [1,Em], and the minimum objective function is f (x). Therealization process (∗) is described as follows:

Step 1: Update the speed using the following equation,where Li,d is the step size, a is the acceleration factor, p isthe parameter that controls the speed attenuation, and r is auniformly distributed random number in [−0.5, 0.5]:

Vi,d =aepr + Li,d . (21)

Step 2: Record the original best fitness value fbest andcalculate f (pbest,i + Vi,d ).Step 3: Update pbest as follows:

pbest,i ={pbest,i + Vi,d , if fbest > f (pbest,i + Vi,d )pbest,i, otherwise.


Step 4: Update Li,d with the following equation and let e =e+ 1:

Li,d ={2Vi,d , if fbest > f (pbest,i + Vi,d )Li,d2 , otherwise.


Step 5: If e ≤ Em, return a; otherwise, let d = d + 1.

3) Q TABLEUnlike the standard PSO, RLMPSO can perform any opera-tion at any stage of the search process, namely exploration,convergence, high-jump, low-jump, and fine-tuning. RL isresponsible for tracking the best performance of each particle.

Each particle has a Q table that only belongs to itselfand not to the population during the search process. Thedimension of the Q table is 5× 5:

E C H L FE a11 a12 a13 a14 a15C a21 a22 a23 a24 a25H a31 a32 a33 a34 a35L a41 a42 a43 a44 a45F a51 a52 a53 a54 a55

. (24)

In Eq. 24, rows represent states, and columns representactions. E, C, H, L, and F represent the five operations,exploration, convergence, high-jump, low-jump, and fine-tuning, respectively. The state of E indicates that the currentparticle execution operation is exploration, and the action ofE indicates that the next operation performed by the particleis exploration.

FIGURE 5. Schematic diagram of particle motion adjustment.

Because the number of fitness evaluations for each dimen-sion in the fine-tuning operation will be calculated based onthe number of iterations, the fine-tuning operation requires alarge number of global iterations, while other operations onlyrequire one. At the same time, since the fine-tuning operationis an operation on the optimal position of the individual,the execution of the fine-tuning must be postponed to theglobal operation, that is, after the exploration, convergence,and jump operations. Therefore, to delay the execution ofthe fine-tuning at the beginning of the search process andgive higher priority to the other operations to be performed,the initial Q table entry of action F (corresponding to thelast column in the Q table) is set to an infinitesimal negativevalue. Before activating the fine-tuning operation, RLMPSOmust perform M iterations and consider the minimum value(including negative numbers) of the other four columns in theQ table as the column item’s initial value of the action F islocated. During the RLMPSO execution, the best action canbe retrieved in the current state from the Q table, presented as

at+1 = argmaxai{Q(st , ai)}, (25)

where at+1 is the action to be performed next, that is, the bestaction for the current state, st represents the current state, ai ∈A denotes an action, and A is the action space collection.

In summary, the principle diagram of themovement adjust-ment of each particle in the RLMPSO is shown in Fig. 5.

C. DESIGN OF RLMPSO-OPTIMIZED ADRCAs reported previously [26], RLMPSO performs significantlybetter than many other PSO variants in optimizing mul-tiple unimodal functions, multimodal functions, compositefunctions, and two practical optimization problems involvingtrain gear and pressure vessel design. This section uses theRLMPSO algorithm to optimize the ADRC parameters tobetter control the control system’s dynamic performance. Thefollowing time-weighted integral of absolute error (ITAE) isselected as the objective function:




|1fi| +n∑i=1


∣∣1Ptie,ij∣∣ tdt, (26)

116200 VOLUME 9, 2021

Page 9: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

FIGURE 6. Schematic diagram of ADRC based on RLMPSO.

TABLE 2. Parameters of RLMPSO.

TABLE 3. Ranges of ADRC parameters.

where n is the total number of controlled system areas, and Tis the simulation time. When the ITAE decreases, the con-troller receives an instant reward 1; otherwise, it receivesa −1. The schematic diagram of the ADRC based on theRLMPSO algorithm is shown in Fig. 6.The parameters of the RLMPSO are shown in Table 2.

In addition, the ranges [L,U ] of the controller parameters aregiven in Table 3. The corresponding speed range is 0.2[L,U ].The description of the optimization process is shown in Fig. 7.

V. NUMERICAL SIMULATION RESULTS AND ANALYSISA. TRADITIONAL TWO-AREA NON-REHEAT THERMALPOWER SYSTEMIn this section, the two-area non-reheat thermal power system[30] is used as the simulation model, where the two areas ofthe interconnected power system have the same structure, andeach area contains a non-reheat turbine. The model parame-ters are listed as

B1 = B2 = 0.425p.u.MW/Hz;R1 = R2 = 2.4Hz/p.u.;Tg1 = Tg2 = 0.03s;Tt1 = Tt2 = 0.3s;Kp1 = Kp2 = 120Hz/p.u.;Tp1 = Tp2 = 20s;T12 = 0.545p.u.;a12 = −1.


TABLE 4. ADRC optimization results of two-area non-reheat thermalpower system based on RLMPSO.

TABLE 5. Comparison of performance indices of a two-area non-reheatthermal power system with different controllers.

1) SIMULATION ANALYSIS OF NOMINAL PARAMETERSSupposing that when t = 0, the first area is disturbed by astep-load disturbance (SLP) 1PL1 = 0.1 p.u. According tothe discussion above, each area adopted a third-order ADRC.The optimized parameters are shown in Table 4. Fig. 8shows the response results of the frequency deviation andtie-line exchanged power, where the optimization results ofvarious algorithms, including BFOA-PID [31], HBFOA-PID[32], hPSO-PS-FUZZY-PID [33], TLBO-PID [34], ISFS-PID [35], DSA-FOPID [36], and DSA-FOPI-FOPD [36], arealso given.

Table 5 shows the adjustment time and ITAE performanceindex results of the frequency deviation in each area and thesystem tie-line exchanged power.

Fig. 8 shows that the frequency deviation and tie-linepower deviation in each area were stable at zero in the steadystate, which means that the steady-state performances ofthe two controllers were the same. As shown in Table 5,the proposed LFC controller with RLMPSO-ADRC achieveda significantly smaller ITAE value than the LFC controllersfrom previous studies. The ITAE value under this methodwas 0.00015, which was 1

120 of the optimal ITAE valueof the other methods (0.018 of DSA-FOPI-FOPD). Thismeans that the proposed method has a better controlperformance.

2) ROBUSTNESS ANALYSISIn modern complex power systems, the uncertainty of thesystem parameters is a crucial issue. Therefore, it is crucialfor the LFC controller to be robust to the uncertainty of theparameters in the system. To examine the robustness of theproposed control strategy, Fig. 9 and 10 show time-domain

VOLUME 9, 2021 116201

Page 10: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

FIGURE 7. Flowchart of parameter optimization of RLMPSO-based ADRC.

FIGURE 8. Time-domain response curve of two-area non-reheat thermalpower system when 1PL1 = 0.1 p.u.

response curves when the system model parameters, B, R, Tp,Kp, Tg, and Tt , were changed by +%30 and −%30, respec-tively. The response performance of the proposedmethodwas

FIGURE 9. Time-domain response curve when system model parameterschange by +%30.

better than those of the other methods, i.e., the minimumovershoot and undershoot and the shortest stabilization time,thus verifying the robustness of the proposed method.

116202 VOLUME 9, 2021

Page 11: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

FIGURE 10. Time-domain response curve when system model parameterschange −%30.

In addition, to test the suppression ability of the proposedmethod to different load disturbances, Fig. 11 shows theoutput response of the system controlled by the proposedmethod with different load disturbances, where 1PL1 = 0p.u. and1PL2 = 0.1 p.u. at t = 0–10 s, and1PL1 = 0.1 p.u.and 1PL2 = 0.1 p.u. at t = 10–20 s.As shown in Fig. 11, the RLMPSO-ADRC control system

could always suppress the system frequency deviation and thefluctuations of the tie-line power deviation under differentload disturbances and restore the system to a stable statein a very short time, thus demonstrating that the proposedLFC controller had good disturbance rejection capabilitiesand robustness.

B. TWO-AREA POWER SYSTEM WITH NONLINEARITYIn practice, the system will inevitably be subject to the inter-nal constraints of the physical system dynamics. The nonlin-earity caused by both the GRC and the GDB are considered,which are shown in Fig. 12 and 13, respectively. The GRCis a saturated nonlinear phenomenon that occurs when thegenerator has difficulty responding on-demand when a largeload disturbance occurs and cannot provide a sufficient rateof change. The GDB can avoid the loss of control due to

FIGURE 11. Time-domain response curve of the system under differentload disturbances: 1PL1 = 0 p.u., 1PL2 = 0.1 p.u. at t = 0–10 s and1PL1 = 0.1 p.u., 1PL2 = 0.1 p.u. at t = 10–20 s.

TABLE 6. ADRC optimization results of two-area non-reheat thermalpower system with GRC and GDB based on RLMPSO.

the excessive response of the governor valve. The modelparameters are shown in Eq. 28, where ld and ud are thebounds of the GDB, and lg, ug are the bounds of the GRC.

B1 = 0.3483p.u.MW/Hz;Tg1 = 0.08s;Tt1 = 0.40s;Tp1 = 11.1333s;Kp1 = 66.67Hz/p.u.;T12 = 0.20p.u.;R1 = 3Hz/p.u.;ld1 = −0.05; ud1 = 0.05; lg1 = −0.167; ug1 = 0.167;B2 = 0.3823p.u.MW/Hz;Tg2 = 0.06s;Tt2 = 0.44s;Tp2 = 13.5625s;Kp2 = 62.50Hz/p.u.;T21 = 0.20p.u.;R2 = 2.73Hz/p.u.;ld2 = −0.0005; ud2 = 0.0005;lg2 = −0.00167pu/sec; ug2 = 0.00167pu/sec.


VOLUME 9, 2021 116203

Page 12: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

TABLE 7. Comparison of performance indices of two-area power systems with GDB and GRC under different controllers.

FIGURE 12. Model diagram of generation rate constraint (GRC).

FIGURE 13. Model diagram of GDB.

FIGURE 14. Time-domain response curves of the system considering GRCand GDB when 1PL2 = 0.02 p.u.

Similarly, using the third-order ADRC, where the con-troller parameter ranges are still as shown in Table 3, the

optimized parameter results are shown in Table 6. For a stepload disturbance of 1PL2 = 0.02 p.u. added to the secondarea at t = 0, the simulation results are shown in Fig. 14.To facilitate the performance comparison with other algo-rithms, this section uses the following performance indicatorsfor evaluation: integral absolute error (IAE), overshoot Osh,undershoot Ush, and settling time Ts, as shown in Table 7,

where IAE=∞∫0|e(t)|dt .

As shown by the response curve of the time-domain systemin Fig. 14, the RLMPSO-ADRC had the smallest fluctuationsof 1f1, 1f2, and 1Ptie compared with the other four algo-rithms and the fastest recovery speed. Table 7 also numeri-cally shows the effectiveness of the proposed method. Thetransient performance and response speed were significantlybetter than those of other algorithms, thus verifying the effec-tiveness of the proposed method for the LFC control system.

VI. CONCLUSIONIn this study, for the problem of frequency instability causedby load disturbances in a power system, a load-frequencyactive disturbance rejection controller was designed. In viewof the difficulty of determining the controller parameters,we introduced the RLMPSO algorithm, with a better con-vergence speed, which combines RL with PSO. To verifythe effectiveness of the proposed RLMPSO-ADRC, we con-ducted many simulations and comparison experiments.We first applied the traditional two-area non-reheat thermalpower systemmethod, and the performance is compared withthose of previously reported methods. Compared with otherstrategies, the proposed method showed a smaller responsedeviation and settling time, thus highlighting the advantagesof the proposed method. The robustness of the method in thepower systemwas studied. The system response under criticalparameter states and different disturbance conditions main-tained good dynamics, proving that the proposed method hadgood robustness to the uncertainties of the system. Finally,the proposedmethodwas applied to a nonlinear power systemcontaining a GDB and a GRC, and the dynamic responseperformance of the system was compared with control strate-gies, such as predictive control and conventional control.The results showed that the method had a stronger ability tosuppress system load disturbances, which indicated that theproposed method is an effective solution to LFC problems.Therefore, the proposed method has theoretical significanceand research value for the application of ADRC in actual LFCsystems.

116204 VOLUME 9, 2021

Page 13: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

REFERENCES[1] J. Guo, ‘‘Application of full order sliding mode control based on different

areas power system with load frequency control,’’ ISA Trans., vol. 92,pp. 23–34, Sep. 2019.

[2] C. Hua andY.Wang, ‘‘Delay-dependent stability for load frequency controlsystem via linear operator inequality,’’ IEEE Trans. Cybern., early access,Dec. 14, 2021, doi: 10.1109/TCYB.2020.3037113.

[3] H. Zhang, J. Liu, and S. Xu, ‘‘H-infinity load frequency control of net-worked power systems via an event-triggered scheme,’’ IEEE Trans. Ind.Electron., vol. 67, no. 8, pp. 7104–7113, Aug. 2020.

[4] Y. Jia, Z. Y. Dong, C. Sun, and K. Meng, ‘‘Cooperation-based distributedeconomic MPC for economic load dispatch and load frequency control ofinterconnected power systems,’’ IEEE Trans. Power Syst., vol. 34, no. 5,pp. 3964–3966, Sep. 2019.

[5] H. Shayeghi, H. A. Shayanfar, and A. Jalili, ‘‘Load frequency controlstrategies: A state-of-the-art survey for the researcher,’’ Energy Convers.Manage., vol. 50, no. 2, pp. 344–353, Feb. 2009.

[6] W. Tan, ‘‘Unified tuning of PID load frequency controller for powersystems via IMC,’’ IEEE Trans. Power Syst., vol. 25, no. 1, pp. 341–350,Feb. 2010.

[7] R. Ramachandran, B. Madasamy, V. Veerasamy, and L. Saravanan, ‘‘Loadfrequency control of a dynamic interconnected power system using gen-eralised Hopfield neural network based self-adaptive PID controller,’’ IETGener., Transmiss. Distrib., vol. 12, no. 21, pp. 5713–5722, Nov. 2018.

[8] P. Dahiya, V. Sharma, and R. Naresh, ‘‘Optimal sliding mode controlfor frequency regulation in deregulated power systems with DFIG-basedwind turbine and TCSC–SMES,’’ Neural Comput. Appl., vol. 31, no. 7,pp. 3039–3056, Jul. 2019.

[9] C. Komboigo, U. Naomitsu, S. Tomonobu, E. L. Mohammed, and L. Liu,‘‘Robust load frequency control schemes in power system using optimizedPID and model predictive controllers,’’ Energies, vol. 11, no. 11, pp. 1–18,Nov. 2018.

[10] J. Liu, Q. Yao, and Y. Hu, ‘‘Model predictive control for load frequencyof hybrid power system with wind power and thermal power,’’ Energy,vol. 172, pp. 555–565, Apr. 2019.

[11] J. Han, ‘‘Auto-disturbance-rejection controller and its applications,’’ Con-trol Decis., vol. 13, no. 1, pp. 19–23, 1998.

[12] J. Han, ‘‘From PID to active disturbance rejection control,’’ IEEE Trans.Ind. Electron., vol. 56, no. 3, pp. 900–906, Mar. 2009.

[13] Z. Wang, R. Zu, D. Duan, and J. Li, ‘‘Tuning of ADRC for QTR intransition process based on NBPO hybrid algorithm,’’ IEEE Access, vol. 7,pp. 177219–177240, 2019.

[14] W. Lu, Q. Li, K. Lu, Y. Lu, L. Guo, W. Yan, and F. Xu, ‘‘Load adaptivePMSM drive system based on an improved ADRC for manipulator joint,’’IEEE Access, vol. 9, pp. 33369–33384, 2021.

[15] J. Tao, L. Du, M. Dehmer, Y. Wen, G. Xie, and Q. Zhou, ‘‘Path followingcontrol for towing system of cylindrical drilling platform in presence of dis-turbances and uncertainties,’’ ISA Trans., vol. 95, pp. 185–193, Dec. 2019.

[16] Y. Xia, F. Pu, S. Li, and Y. Gao, ‘‘Lateral path tracking control ofautonomous land vehicle based on ADRC and differential flatness,’’ IEEETrans. Ind. Electron., vol. 63, no. 5, pp. 3091–3099, May 2016.

[17] M. M. Rahman and A. H. Chowdhury, ‘‘Comparative study of ADRC andPID based load frequency control,’’ in Proc. Int. Conf. Electr. Eng. Inf.Commun. Technol. (ICEEICT), Savar, Bangladesh, May 2015, pp. 21–23.

[18] Y. Zheng, Z. Chen, Z. Huang, M. Sun, and Q. Sun, ‘‘Active disturbancerejection controller for multi-area interconnected power system basedon reinforcement learning,’’ Neurocomputing, vol. 425, pp. 149–159,Feb. 2021.

[19] Z. Gao, ‘‘On the centrality of disturbance rejection in automatic control,’’ISA Trans., vol. 53, no. 4, pp. 850–857, Feb. 2014.

[20] J. Kennedy and R. Eberhart, ‘‘Particle swarm optimization,’’ in Proc. IEEEInt. Conf. Neural Netw., Denver, CO, USA, 2002.

[21] Y. Zhang, P. Li, and X. Wang, ‘‘Intrusion detection for IoT based onimproved genetic algorithm and deep belief network,’’ IEEE Access, vol. 7,pp. 31711–31722, 2019.

[22] W. L. Jorgensen, ‘‘Perspective on equation of state calculations by fastcomputingmachines,’’ Theor. Chem. Acc., vol. 103, nos. 3–4, pp. 225–227,2000.

[23] W. Deng, J. Xu, and H. Zhao, ‘‘An improved ant colony optimization algo-rithm based on hybrid strategies for scheduling problem,’’ IEEE Access,vol. 7, pp. 20281–20292, 2019.

[24] Y. Zhang, S. Wang, P. Phillips, and G. Ji, ‘‘Binary PSOwith mutation oper-ator for feature selection using decision tree applied to spam detection,’’Knowl.-Based Syst., vol. 64, pp. 22–31, Jul. 2014.

[25] W.-J. Zhang and X.-F. Xie, ‘‘DEPSO: Hybrid particle swarm with dif-ferential evolution operator,’’ in Proc. Conf. IEEE Int. Conf. Syst., ManCybern. Conf. Theme Syst. Secur. Assurance (SMC), Washington, DC,USA, Oct. 2003, pp. 3816–3821.

[26] H. Samma, C. P. Lim, and J. M. Saleh, ‘‘A new reinforcement learning-based memetic particle swarm optimizer,’’ Appl. Soft Comput., vol. 43,pp. 276–297, Jun. 2016.

[27] W. Tan, ‘‘Tuning of PID load frequency controller for power systems,’’Energy Convers. Manage., vol. 50, no. 6, pp. 1465–1472, Jun. 2009.

[28] D. Dewey, ‘‘Reinforcement learning and the reward engineering princi-ple,’’ in Proc. AAAI Spring Symp. Ser., Jun. 2014, vol. 1, no. 1, pp. 13–16.

[29] C. Watkins, Learning From Delayed Rewards. Cambridge, U.K.:Cambridge Univ. Press, 1989.

[30] M. R. Sathya and M. M. T. Ansari, ‘‘Load frequency control using batinspired algorithm based dual mode gain scheduling of PI controllers forinterconnected power system,’’ Int. J. Elect. Power Energy Syst., vol. 64,pp. 365–374, Jan. 2015.

[31] E. S. Ali and S. M. Abd-Elazim, ‘‘Bacteria foraging optimization algo-rithm based load frequency controller for interconnected power sys-tem,’’ Int. J. Elect. Power Energy Syst., vol. 33, no. 3, pp. 633–638,Mar. 2011.

[32] S. Panda, B. Mohanty, and P. K. Hota, ‘‘Hybrid BFOA–PSO algorithmfor automatic generation control of linear and nonlinear interconnectedpower systems,’’ Appl. Soft Comput., vol. 13, no. 12, pp. 4718–4730,Dec. 2013.

[33] R. K. Sahu, S. Panda, and G. T. C. Sekhar, ‘‘A novel hybrid PSO-PSoptimized fuzzy PI controller for AGC in multi area interconnectedpower systems,’’ Int. J. Electr. Power Energy Syst., vol. 64, pp. 880–893,Jan. 2015.

[34] R. K. Sahu, S. Panda, U. K. Rout, and D. K. Sahoo, ‘‘Teaching learningbased optimization algorithm for automatic generation control of powersystem using 2-DOF PID controller,’’ Int. J. Electr. Power Energy Syst.,vol. 77, pp. 287–301, May 2016.

[35] E. Celik, ‘‘Improved stochastic fractal search algorithm and modifiedcost function for automatic generation control of interconnected elec-tric power systems,’’ Eng. Appl. Artif. Intell., vol. 88, no. 2, pp. 1–20,Feb. 2020.

[36] E. Çelik, ‘‘Design of new fractional order PI–fractional order PD cascadecontroller through dragonfly search algorithm for advanced load frequencycontrol of power systems,’’ Soft Comput., vol. 25, no. 2, pp. 1193–1217,Jan. 2021.

[37] G. Q. Zeng, X.-Q. Xie, M.-R. Chen, and J. Weng, ‘‘Adaptive popula-tion extremal optimization-based PID neural network for multivariablenonlinear control systems,’’ Swarm Evol. Comput., vol. 44, pp. 320–334,Feb. 2019.

[38] T. H. Mohamed, H. Bevrani, A. A. Hassan, and T. Hiyama, ‘‘Decentral-ized model predictive based load frequency control in an interconnectedpower system,’’ Energy Convers. Manage., vol. 52, no. 2, pp. 1208–1214,2011.

[39] G.-Q. Zeng, J. Chen, M.-R. Chen, Y.-X. Dai, L.-M. Li, K.-D. Lu, andC.-W. Zheng, ‘‘Design of multivariable PID controllers using real-codedpopulation-based extremal optimization,’’ Neurocomputing, vol. 151,pp. 1343–1353, Mar. 2015.

YUEMIN ZHENG was born in 1996. Shereceived the B.E. degree from Shijiazhuang TiedaoUniversity, Shijiazhuang, China, in 2018. She iscurrently a Graduate Student with Nankai Univer-sity, Tianjin, China. Her current research interestsinclude active disturbance rejection control andreinforcement learning.

VOLUME 9, 2021 116205

Page 14: Power System Load Frequency Active Disturbance Rejection ...

Y. Zheng et al.: Power System Load Frequency ADRC via RLMPSO

ZHAOYANG HUANG was born in 1995.He received the B.Sc. degree in automation andthe M.E. degree in control science and engi-neering from Nankai University, Tianjin, China,in 2018 and 2021, respectively. His research inter-ests include intelligent algorithms and intelligentcontrol.

JIN TAO (Member, IEEE) received the B.Sc.degree in automation from Qingdao University ofScience and Technology, Qingdao, China, in 2008,the M.Sc. degree in control theory and controlengineering from Guangxi University of Scienceand Technology, Liuzhou, China, in 2011, and thePh.D. degree in control science and engineeringfrom Nankai University, Tianjin, China, in 2017.He is currently an Associate Professor with theCollege of Artificial Intelligence, Nankai Univer-

sity. He is also with the Department of Electrical Engineering and Automa-tion, Aalto University. He has published more than 50 peer-reviewed articlesin international journals and conferences. His research interests includeintelligent control, evolutionary optimization, and multi-agent systems.

HAO SUN received the B.Sc. degree in infor-mation security from Yunnan University, Yunnan,China, in 2013, the M.Sc. degree in mechatronicengineering from Tianjin University of Tech-nology, Tianjin, China, in 2016, and the Ph.D.degree in control science and engineering from theCollege of Artificial Intelligence, Nankai Univer-sity, Tianjin, in 2019. He is currently a ResearchAssistant with Nankai University. His currentresearch interests include intelligent control, evo-

lutionary optimization, dynamic modeling, and their application in parafoilsystems.

QINGLIN SUN received the B.E. and M.E.degrees in control theory and control engi-neering from Tianjin University, Tianjin, China,in 1985 and 1990, respectively, and the Ph.D.degree in control science and engineering fromNankai University, Tianjin, in 2003. He is cur-rently a Professor with the Intelligence PredictiveAdaptive Control Laboratory, Nankai Univer-sity, and the Associate Dean of the Collegeof Artificial Intelligence. His research interests

include self-adaptive control and the modeling and control of exiblespacecraft, and embedded control systems.

MATTHIAS DEHMER received the B.Sc. degreein mathematics from the University of Siegen,Germany, in 1997, and the Ph.D. degree incomputer science from Darmstadt University ofTechnology, Germany, in 2005. He is currentlya Professor with the Department of BiomedicalComputer Science and Mechatronics, UMIT-TheHealth and Life Sciences University, Tirol,Austria. He has authored and coauthored morethan 200 journal articles. His H-index is 28 and

his i10-index is 78. His research interests include data science, cybersecurity,disaster management, complex networks, risk analysis, information systems,machine learning, information theory, bioinformatics, visual analytics, andcomputational statistics.

MINGWEI SUN received the Ph.D. degree incontrol theory and control engineering from theDepartment of Computer and Systems Science,Nankai University, Tianjin, China, in 2000. From2000 to 2008, he was a Flight Control Engi-neer with Beijing Electro-Mechanical EngineeringResearch Institute, Beijing, China. Since 2009,he has been with Nankai University, where heis currently a Professor. His research interestsinclude flight control, guidance, model predictive

control, active disturbance rejection control, and nonlinear optimization.

ZENGQIANG CHEN (Member, IEEE) was bornin 1964. He received the B.S., M.E., and Ph.D.degrees from Nankai University, in 1987, 1990,and 1997, respectively. He is currently a Professorof control theory and engineering with NankaiUniversity and a Deputy Director of the Instituteof Robotics and Information Automation. His cur-rent research interests include intelligent predic-tive control, chaotic systems and complex dynamicnetworks, and multi-agent system control.

116206 VOLUME 9, 2021