# Genetic least square estimation approach to wind power curve modelling and wind power prediction

Scientific Reports volume 13, Article number: 9188 (2023) Cite this article

Metrics details

Wind power curve (WPC) is an important index of wind turbines, and it plays an important role in wind power prediction and condition monitoring of wind turbines. Motivated by model parameter estimation of logistic function in WPC modelling, aimed at the problem of selecting initial value of model parameter estimation and local optimum result, based on the combination of genetic algorithm and least square estimation method, a genetic least square estimation (GLSE) method of parameter estimation is proposed, and the global optimum estimation result can be obtained. Six evaluation indices including the root mean square error, the coefficient of determination R2, the mean absolute error, the mean absolute percentage error, the improved Akaike information criterion and the Bayesian information criterion are used to select the optimal power curve model in the different candidate models, and avoid the model's over-fitting. Finally, to predict the annual energy production and output power of wind turbines, a two-component Weibull mixture distribution wind speed model and five-parameter logistic function power curve model are applied in a wind farm of Jiangsu Province, China. The results show that the GLSE approach proposed in this paper is feasible and effective in WPC modelling and wind power prediction, which can improve the accuracy of model parameter estimation, and five-parameter logistic function can be preferred compared with high-order polynomial and four-parameter logistic function when the fitting accuracy is close.

With the social and economic development, the process of urbanization is inseparable from the massive use of energy. The dramatic increase in energy demand has led to substantial consumption of non-renewable resources such as coal and oil. However, the traditional energy reserves are limited and cannot be exploited and utilized without restraint. At the same time, the burning of coal, oil and other fossil energy is seriously harmful to the atmosphere, such as the hazy weather in cities. High emissions of carbon dioxide and other greenhouse gases have caused the global environmental problems. Therefore, the demand for green, clean and renewable energy such as wind energy and solar energy is increasingly anticipated.

In the past decades, wind energy has been developed rapidly, but it has many uncertainties compared with the traditional fossil energy. Therefore, an accurate and effective assessment method of wind energy is of great importance for studying large-scale wind power grid connection and wind farm site selection1,2,3,4. To estimate wind turbine power, the volatility and intermittency of wind power system is generally investigated by establishing a mathematical model in statistics method. Nevertheless, the process of modelling is complicated because of the stochastic nature, bimodal or multimodal distributions of wind speed5. Wind power curve (WPC), which expresses the nonlinear relationship between the hub height wind speed and the actual power output of wind turbines, is commonly used to estimate wind resource in a wind farm6,7,8,9,10,11,12,13. Besides wind resource assessment and prediction, WPC is also play an important role in status monitoring of wind turbines. Therefore, WPC is an important performance metric of wind turbines and it is crucial to establish an accurate and reliable WPC14,15,16,17. For various reasons, raw wind data contain some outliers which caused by the faulty of wind turbines and measurement equipment, and extreme weather, etc. To improve the prediction accuracy of wind power, these abnormal data must be cleaned before WPC modelling18,19. In literature, there are two kinds of methods widely used for cleaning abnormal data: (1) clustering method or image recognition method using wind speed-power data and (2) mean value, variance, and probability distribution method based on the distribution characteristics of abnormal data. In this study, a Bayesian change point-quartile combined method is utilized to clean abnormal data20.

According to the modelling theory of WPC, the WPC modelling methods in literature are divided into two categories: parametric and nonparametric methods10,21. Among them, the parametric models are most widely applied, such as a piecewise cubic polynomial model. The advantage of cubic polynomial is that its S-shaped conforms to theoretical power curve of wind turbines. Before modelling, WPC is often divided into three segments by the cut-in, cut-out, and rated wind speed, then a segmented WPC will be obtained using a cubic polynomial fitting technique2,3. Figure 1 is a specific wind turbine power curve fitted by a high-order polynomial after cleaning abnormal data.

Wind turbine power curve. Figure created using Matlab R2014a (8.3.0.532). (https://www.mathworks.com).

Wang et al.4 compared the performance of various types of power curves at different wind farms and seasons, and pointed out that there exists no universal WPC model which can always outperform other models under any environmental conditions, each model has its own advantages, and three main factors which affect the final WPC are abnormal data cleaning method, WPC model and selection of optimization strategies. Carrillo et al.1 compared high-order polynomial with exponential power and cubic polynomial in power curve modelling, and found that high-order polynomial has little improvement on modelling accuracy due to its sensitivity to the values of model parameters, especially to the rated wind speed value. The logistic function (LF), which including three-parameter (3PLF), four-parameter (4PLF) and five-parameter logistic functions (5PLF), is widely applied in WPC modelling because of its S-shaped, continuity and low errors2,22,23. LF is also called S-shaped function or sigmoid function, it is originally applied for fitting S-shaped curve in models of population growth and spread of epidemic diseases35. In these cases, the growth is exponential with the time at the beginning, then some kind of competition appears among the members of the population, so the growth decreases and finally the size of the population reaches its limit. The shape of WPC happens to meet this condition and there exist some analogy between the population growth and the output power: the wind speed equivalent to the time is also increased gradually to obtain the output power, and at last the output power in a limit corresponding to the rated power, so now LF is widely used in WPC modelling. Villanueva and Feijoo studied LF from 3 to 6PLF and used the mean MAPE as the indicator to compare their performances, they considered that the errors made by the 3PLF are approximately the same as those made by the 4PLF, the 6PLF is the best option to model a WPC. However, dealing with six parameters is cumbersome23. Same as Villanueva and Feijoo's work, zou et al. also studied the LF in WPC modelling, they found that the performances of the 3-PLF and 6-PLF models are slightly inferior to those of other models, regardless of the loss function used29. Therefore, in this paper, we study 4PLF and 5PLF models of WPC. However, it is not easy to estimate model parameters for LF, especially for 5PLF, because it has more model parameters. Generally, when using a parametric method to estimate model parameters of LF, an effective initial value is needed because of the nonlinearity of LF. Using the traditional widely used optimization methods such as steepest descent, Levenberg–Marquardt, Newton, and quasi-Newton methods, model parameters can be estimated by an iterative approach. However, these optimization methods are very sensitive to the initial values of the unknown parameters and often fail to converge to the global optimum of the parameter estimation, the quality of the final solution is often dependent upon the position of the initial value in the search space, and there is no guarantee that the procedure can fit the model successfully. Therefore, the initial value is an important factor affecting the convergence of nonlinear model fitting. If the initial value is not selected properly, the final results will fall into a local optimum. In literature, the least square estimate (LSE) method is also used to estimate model parameters of WPC8. Using this method, the optimal power curve can be obtained by minimizing the summed square between the predicted and observed power values. However, for a complex model with nonlinear function, the partial derivatives of the function with respect to model parameters are difficult to calculate and estimate1,24,25,26. To solve this problem, an optimization algorithm is often combined, such as whale optimization algorithm (WOA), particle swarm optimization algorithm (PSOA), genetic algorithm (GA), differential evolution algorithm (DEA), and evolutionary algorithm (EA)2,27,28,29. GA is a robust probabilistic search algorithm combining the mechanics of genetic, it searches the optimal solution based on a population instead of a single point. Therefore, when GA is used to estimate model parameter, it is possible to escape from the local optimum and find the global optimum at a certain probability. In the problem of the selection of initial values, GA require an estimate of the parameter range in which the solution values would be found for the problem. This is because that GA has many potential solutions approach and can search multiple points simultaneously. To improve the predictive accuracy of wind power, other factors including air density, wind shear, age of turbines and wind curtailment are also considered in WPC modelling30,31,32.

Compared with parametric method, non-parametric method does not need the assumption about the distribution of data, so it is more flexible than parametric method, but require a lot more data and training time, and cannot give a definite expression to explicitly reflect the relationship between wind speed and power because of its "black box" nature5,29. Artificial neural network (ANN) method has an extremely wide range of applications and has been used in WPC modelling, which has the advantages including small error results and simple parameter estimation25,33. However, ANN method depends on massive data training to obtain reliable results, and the disadvantages of slow training speed and high data volume requirements are also significant. Fuzzy clustering (FC) method can be used in WPC modelling through finding cluster centers, and can further improve the modelling accuracy by increasing the number of clusters and reducing root mean squared error (RMSE) between the observed values and predicted values, but FC method has a slow convergence speed, and the efficiency and effectiveness of these techniques are highly dependent on the optimal choice of model parameters, so it is often applied in combination with other methods24.

Consequently, in this paper, to model WPC using LF, a global optimization GA algorithm combined with the LSM method named the GLSM method are used to estimate model parameters, which can obtain an effective initial value and ensure the estimation results of logistic model parameter is effective and reliable. The overall flow chart of this study is given in Fig. 2.

The overall flow chart of the study.

The main contributions of this paper are listed below:

(i) Aiming at the problem of the selecting of initial value of model parameter estimation for LF in WPC modelling using the LSE method, a GLSE method is proposed, which can obtain a global optimum estimation results.

(ii) In addition to considering RMSE, the coefficient of determination R2, mean absolute error (MAE) and mean absolute percentage error (MAPE) are used as a criteria of model selection, the improved Akaike information criterion (AIC) and Bayesian information criterion (BIC) are also utilized to select the best wind power model and avoid the problem of model's over-fitting.

The rest of this paper is organized as follows. In "Methodology", WPC model parameter estimation, selection and validation are given. Wind power estimation is described in "Wind speed modelling and wind power prediction". While in "Case study" some information about the observed field and wind data are provided. Results and comparison with the different models are presented in "Results and discussion". Conclusions are drawn in "Conclusion".

To establish a WPC, it is necessary to select fitting points from a large amount of normal data of wind speed and power. These wind data could come from experimental wind farms or from the Supervisory Control and Data Acquisition (SCADA) system5,34. At present, the main methods of choice of fitting point for WPC include the bins method, maximum value method, and maximum likelihood method5. Among them, the bins method is the most widely used. According to the IEC-61400-12-2 standard, the principle of the bins method is that after eliminating the outliers of wind power data, the average value of wind speed and power in each wind speed interval (the size of the interval is 0.5 m/s) can be got, and these points are used as the sample points for fitting power curve, whose expression is given by5

where vi and Pi are the average wind speed and power in the ith interval, ni is the number of wind data in the ith interval, vi, j and Pi, j are the jth wind speed and power in the ith interval.

As mentioned above, the parametric model of WPC is most widely utilized10,21. At present, the commonly used parametric models are high-order polynomial and LF, which describe the relationship between wind speed and power of a specific wind turbine with a mathematical formula. The power expression with an m-order polynomial is given by3,4:

where P(v) is the power value corresponding to wind speed v, m is the order of polynomial, and α = [a0, a1, …, am] is the coefficient.

The expressions of 4PLF and 5PLFs are given respectively by35

and

where θ4 = [a, b, c, d] is model parameters that determine the shape of 4PLF, and a is the maximum value, the other three parameters have no specific meaning; θ5 = [u, l, x, y, z] is model parameters of 5PLF, u and l represent the maximum and minimum values, respectively, x is the inflection point, y is the hill slope, and z is the asymmetry factor, with x ≥ 0, z ≥ 0.

A polynomial is a linear function of unknown model parameters α = [a0, a1, …, am], thus the techniques of LSE for fitting linear model can be used for fitting polynomial model. Suppose that n wind speed and power data pairs (vi, Pi) (i = 1, 2, …, n) are obtained using the bins method, according to Eq. (2), a system of equations of m-th order polynomial can be given by

Rewritten Eq. (5) in matrix by

Thus

Therefore, the model parameters of m-th order polynomial can be obtained by4

For a LF, due to its highly non-linearity, when using the non-linear least squares estimation (NLLSE) method to fit WPC model, the model parameters can be estimated by an iteration procedure, the aim of iteration is to minimize the summed squares of the residuals between the real values and estimation values, the summed squares of residuals also called objective function defined as

where Pi is the real wind power, P(vi; θ) is the estimated power, θ is the vector of unknown model parameters.

NLLSE method is a form of LSE that is used to fit a nonlinear model with np unknown model parameters to n observations (n > np). Computationally, NLLSE are solved through successive iterations of a two-step process. First, the selected nonlinear mathematical model is approximately linearized around an arbitrary value θ(k) of model parameters using a first-order Taylor expansion as follows:

Secondly, after the estimator of model parameters are solved using the LSE method, the error between the real values and estimated value are calculated. The two steps are repeated till an allowable minimum error is obtained. Note that by taking the first order Taylor expansion of P(vi; θ) at an arbitrary point θ(k) given it is differentiable, then as θ is close to θ(k), it gives an approximation. The details of NLLSE method are given as follows:

Step 1. Model approximately linearized Taylor expansion.

Replacing the left-hand side of Eq. (10) with Pi and giving a mathematical transformation, Eq. (11) can be got as

Let \(\Delta P_{i}^{\left( k \right)} = P_{i} - P\left( {v_{i} ;{\varvec{\theta}}^{\left( k \right)} } \right), \, D_{i,j}^{\left( k \right)} = \left[ {\frac{{\partial P(v_{i} ;{\varvec{\theta}})}}{{\partial \theta_{j} }}} \right]_{{{\varvec{\theta}} = {\varvec{\theta}}^{\left( k \right)} }} , \, \Delta \theta_{j}^{\left( k \right)} = \left( {\theta_{j} - \theta_{j}^{\left( k \right)} } \right)\), then

Equation (12) is a linear combination, rewritten in matrix form by

Step 2. Parameter estimation and error calculating.

Using the LSE method, Eq. (14) can be obtained as

At last, the (k + 1)th approximate estimators of the model parameters are calculated by

The iterative process is stopped when an allowable minimum error is reached.

To reduce calculation times and enable iteration convergence, a good initial value θ(0) should be initiated before the iterative regression process. Unlike a high-order polynomial, using the 4PLF and 5PLFs to fit WPC, the model parameters cannot be estimated directly. Because the choice of initial value, which has a great impact on the final fitting result, is needed. If the initial value is not chosen properly, a local optimum result instead of a global optimum will be obtained. Therefore, before using a LF to fit power curve, it is necessary to find an appropriate initial value by GA. GA is a global search technique based on a combination of natural laws and genetics, including competition, variation and evolution. In contrast to the most optimization methods, GA do not require an initial guess, since they initiate the heuristic solution procedure with a randomly generated population within the solution space. GA also do not require exact or approximate calculations of function derivatives. According to the rule of survival of the fittest, the individuals with higher fitness are more inherited to the next population. The repeat iterations ultimately result in an optimal individual whose phenotype will reach or approach the optimal solution. Therefore, GA is used to solve the initial problem of parameter estimation for LF fitting. The flow chart of GA is shown in Fig. 3.

Flow chart of GA approach.

The parameter set of LF can be regarded as an individual of the population. For a LF with np parameters, the individual is represented as a vector of length np. Suppose that there are M individuals in the population, all population is given by a matrix as follows:

where \({\text{X}}_{j} = \left[ {{\text{X}}_{1j} ,{\text{X}}_{2j} , \cdots ,{\text{X}}_{{n_{p} j}} } \right]^{T}\) represents the jth individual, Xij is the jth estimation solution of the ith parameter.

The objection function is defined as the same as Eq. (9). The main processes of GA are given as follows:

Step 1 Initialization: The population is initialized randomly within the minimum and maximum limits of the parameters of LF. The constraints of LF are given as θl ≤ θ ≤ θu.

Step 2 Evaluation: GA can only handle maximization problems, the fitness value is taken as the inverse of the objective function, therefore, Eq. (17) is selected as the fitness function to calculate and evaluate each individual in the population.

Step 3 Selection: M individuals are selected based on stochastic uniform selection and the fitness values. And the individuals who have better fitness values may have a higher chance of being selected as parents to constitute the new population by crossover and mutation.

Step 4 Crossover: The crossover operator combines two parents to produce a child for the next generation. Let parent chromosomes X1 and X2 are selected randomly to be crossed, parameter r be a random number chosen from [0, 1], and Pc be the crossover probability usually between 0.6 and 0.9. Arithmetic crossover operator is used here. If r ≤ pc, then the offspring Y1 and Y2 are created as follows36:

Step 5 Mutation: The mutation operator introduces new genetic structures into the population and generate a few random changes in the individuals through the population. It avoids the trap of local minimum and provides generic diversity in the population. The mutation probability pm is generally between 0.01 and 0.1. Non-uniform mutation operator is applied, then a new mutation offspring would be generated as36

where f(g) = [r2(1-(g/Gmax))]h, g is the current generation, h is the shape parameter, Gmax is the maximum number of generations.

If the maximum number of iterations is not reached, the above procedure is repeated from step 2. Otherwise, the best individual of the current population is the optimum parameter.

In this study, the GA parameters are selected as follows: Maximum number of iteration = 5000; Population size = 300; Crossover rate = 0.80; Mutation rate = 0.03.

When evaluating the accuracy of model fitting, the statistical indices including RMSE, the coefficient of determination R, mean absolute error (MAE) and mean absolute percentage error (MAPE) are used to judge the goodness-of-fit of different models to wind speed and power data. Because RMSE indicates the root mean square error between the forecast values and observed values, and the coefficient of determination R quantifies the correlation between the predicted values and observed values. They are defined respectively by

and

where RSS is the residual sum of squares, yi is the ith observed value, ym is the mean value of all observations, and \(\hat{y}_{i}\) is the ith estimated value, respectively. The high values of RMSE, MAE and MAPE indicate a poor fit, and the smaller these values are, the higher the model fitting accuracy is. Unlike RMSE, MAE and MAPE, a larger R2 value indicates that the proposed model fits the wind data well in all candidate models and has a higher fitting accuracy.

To avoid the overfitting problem of fitting model, the AIC and BIC are also used to select the best model in all candidate models. This is because that AIC criterion considers both model complexity and fitting accuracy, and the lower value of AIC indicates that the model is fitted well relatively, compared with AIC information criterion, BIC can avoid overfitting resulting by increasing the number of model parameters by introducing a penalty term for the extra number of parameters, similarly the model with the lowest value of BIC prefers as the best model, the corresponding expressions of AIC and BIC are given as2,3:

where q is the number of model parameters to be estimated, n is the number of all samples to be fitted, and maxlnL is the maximum log-likelihood of the model.

However, in nonlinear regression applications, instead of using the maximum log-likelihood, the RSS is used as a reference37,38. In this case, the improved AIC and BIC are rewritten as follows:

Thus, the optimal model can be selected by comparing the calculating results of various types of power curves with four evaluation indices of RMSE, R2 and improved AIC and BIC.

Using WPC model and wind speed model, the annual energy production (AEP) of wind turbines can be calculated4, the working status of wind turbine can also be monitored. Among them, the working status monitoring of wind turbines is mainly applied to the real-time monitoring and fault determination of wind turbines. This section focuses on the application of WPC in AEP and wind power prediction.

The two-parameter Weibull distribution is most commonly used in wind speed probability distribution modelling because of its simplicity and generality. The probability density function (pdf) of a two-parameter Weibull distribution is39:

where β is shape parameter, η is scale parameter, β and η > 0, and v is wind speed.

When the distribution of wind speed is bimodal or multimodal, mixture distributions have a good fitting accuracy than a single distribution. The mixture distributions are a linear combination of two or several single distributions, then based on Eq. (26), the pdf of a M-component Weibull mixture distributions is3,40:

where M denotes the number of Weibull mixture distributions, wi is the weight of each distribution, and the following relationship needs to be satisfied.

Therefore, there are more parameters need to be estimated in mixture distributions than single distribution. The maximum likelihood estimators of model parameters can be obtained by an EM algorithm3.

AEP of a specific wind turbine is predicted based on the pdf of wind speed and WPC model, and can be given by3:

where Nh is the number of annual power generation time, calculated as 8760 h.

SCADA data were collected in a period of one year (January 1, 2018 to December 31, 2018) from 1# wind farm Maling Mountain (34°31′ N and 118°44′ E), located in Jiangsu Province, China. Wind data used in this study consisted of the daily averaged 10-min wind speed and wind power output, and from 28 same types of wind turbines. The hub height is 85 m, the cut-in speed is 2 m/s, the cut-out speed is 18 m/s, the rated speed is 10 m/s, and the rated power is 1800 kW. In this case study, the wind speed range is 0 ~ 18 m/s, and the average wind speed is 5.62 m/s. The optimal wind speed model is fitted by a two-component Weibull mixture distribution, for more details of our previous work see Reference 3. The estimated parameters are given as follows: w1 = 0.8726, β1 = 2.5368, η1 = 4.8927; w1 = 0.1274, β1 = 6.1139, η1 = 4.5783. Therefore, the pdf of wind speed is

Based on the analysis results of the bins method, a total of 36 fitting points of wind data including wind speed and wind power are taken. The wind data are shown in Table 1.

For data comparative analysis, and verification of the scope of application of the method proposed in this paper. Another group wind data from 2# wind farm Mishan Mountain (45°42′ N and 132°16′ E), located in Heilongjiang Province, China, are also collected. The hub height is 70 m, the cut-in speed is 3 m/s, the cut-out speed is 18 m/s, the rated speed is 10.5 m/s, and the rated power is 1500 kW. In this case study, the wind speed range is 0 ~ 18 m/s, and the average wind speed is 7.48 m/s. The wind speed-power data are shown in Table 2.

At first, the 5–9th order polynomial models are all used to fit WPC for wind farms 1# and 2#, respectively, the model parameters are estimated directly by LSE method, six evaluation metrics to the different high-order polynomial models are calculated, the results are shown in Tables 3 and 4. From Table 3, it can be seen that with the increasing of the order, the value of R2 increases gradually, while the values of RMSE, AIC and BIC monotonically decreases, which indicates that the fitting accuracy of the WPC model is getting higher and higher. It is also found that the 8th order polynomial has the lowest values of MAE and MAPE. Therefore, the 8th and 9th order polynomials are often used to model WPC5,10,41.

Similar to 1# farm, from Table 4, it can be found that for 2# farm, 9th order polynomial still gives the best wind speed-power model, because except the value of MAPE, 9th order polynomial has the highest value of R2 and the lowest values of RMSE, MAE, AIC and BIC.

For 1# wind farm, using GA, the initial value for 4PLF and 5PLF models are obtained as follows: θ4 = [1839, − 5, 589, 1] and θ5 = [1820, − 17, 10, 4, 5]. To analyze the effect of the number of iterations to the estimation results for the same model, a compassion for 5PLF with the number of iterations are 5000 and 10,000 is given, the results are shown in Figs. 4 and 5. It can be found that increasing the number of iterations, the final value of objective function is decreasing from 2.965 × 105 to 4.605 × 104, the estimation accuracy is slightly improved, but this improvement is limited and extremely time-consuming.

GA iteration results for five-parameter logistic function with 5000 generations for 1# wind farm. The partial enlarged detail is also given in the top left corner. Figure created using Matlab R2014a (8.3.0.532). (https://www.mathworks.com).

GA iteration results for five-parameter logistic function with 10,000 generations for 1# wind farm. The partial enlarged detail is also given in the top left corner. Figure created using Matlab R2014a (8.3.0.532). (https://www.mathworks.com).

For 1# farm, based on the initial values obtained by GA, the parameter estimator for 4PLF and 5PLFs using the GLSE method are θ4 = [1851, − 3.887, 345.3, 1.092] and θ5 = [1832, − 13.9, 34.55, 4.016, 608.5], the evaluation indices are shown in Table 5. From Table 5, it can be found that the parameter estimation results given by GLSE method are more accurate. The value of RMSE for 4PLF is 26.4906 when model parameters are estimated by GA only, while the estimation value of RMSE obtained by GLSE method becomes 20.5201, which is reduced by 5.9705. The value of RMSE for 5PLF is also reduced from 35.7661 to 12.1018 after using GLSE, it is reduced by 23.6643. The value of R2 all increase after using GLSE method. It is also found that the 9th order polynomial has the lowest values of MAE and MAPE, they are 9.3156 and 0.3374, respectively. It is worth noting that if only GA is used to estimate model parameter, due to the phenomenon of prematurity in the selection procedure based on the fitness, although the number of iterations has reached 10,000 generations, the accurate results are still not achieved. Comparing the values of AIC and BIC for different WPC models in Tables 3 and 5, it can be determined that the optimal model is 5PLF (highlighted in bold in Table 5) with the lowest values of AIC and BIC, they are 189.5216 and 197.4392, which followed by the ninth-order polynomial, its values of AIC and BIC are 201.5374 and 217.3726. The values of AIC and BIC of the eighth-order polynomial are 212.0065 and 226.2581, and 4PLF are 225.5411 and 231.8751. Therefore, the eighth-order polynomial and 4PLF are ranked third and fourth, respectively.

The fitting results of four model including the eighth-order and ninth-order polynomials, 4PLF and 5PLF of 1# farm are all shown in Fig. 6. The partial enlarged detail is also given in the bottom right corner. It can be found that the estimated results given by the high-order polynomial model fluctuate in the range of 40 kW and give an overfitting. Even if wind speed has already exceeded the rated wind speed, the estimation values of the high-order polynomial model still fluctuate with wind data points, and this phenomenon is more obvious when there are fewer wind data points. Compared with the high-order polynomial model, the estimated results of LF model obtained by GLSE method are more smooth and stable, and avoid this overfitting. This is because that the former has more model parameters than that of the latter. Hence, the LF can be preferred compared with high-order polynomial when the fitting accuracy is close. On the other hand, the 5PLF is better than the 4PLF, this conclusion is same as that of Villanueva and Feijoo23. In this study, the 4PLF model give an overestimate on wind power of about 10 kW at each data point after the rated speed. A possible explanation is that the 4PLF model assumes a symmetrical curve around the inflexion point and is not sufficient when the sigmoidal curve is not symmetrical around the inflexion point. However, there is a very high possibility that the variation trend of wind power around the inflexion point is not symmetrical. Fitting curves obtained by 4PLF, however, are point symmetric on the semi-log axis about its midpoint, which cannot accurately fit the power curves with asymmetric features22. The 5PLF model, which assumes an asymmetrical variation trend around the inflexion point, could be a better choice42.

Comparison of the fitting results for 1# wind farm. Figure created using Matlab R2014a (8.3.0.532). (https://www.mathworks.com).

For 2# wind farm, using GA, the initial value for 4PLF and 5PLF models are obtained as follows: θ4 = [1722, 7, 93, 2] and θ5 = [1530, 23, 15, 5, 23]. Based on the initial values obtained by GA, the parameter estimator for 4PLF and 5PLFs using the GLSE method are θ4 = [1545, − 0.1184, 585.8, 1.163] and θ5 = [1530, 20.03, 53.92, 4.621, 6420], the evaluation indices are shown in Table 6. It can be seen that except the value of MAPE, 5PLF model has the highest value of R2 and the lowest values of RMSE, MAE, AIC and BIC, which shows that 5PLF is superior to 4PLF. Comparing the values of AIC and BIC for different WPC models in Tables 4 and 6, it can be found that unlike 1# farm, for 2# farm, the optimal model is the ninth-order polynomial with the lowest values of AIC and BIC, they are 200.5966 and 216.4318, which followed by 5PLF, its values of AIC and BIC are 217.6608 and 225.5783. The values of AIC and BIC of the eighth-order polynomial are 230.2387 and 244.4904, and 4PLF are 240.4844 and 246.8185. Therefore, the eighth-order polynomial and 4PLF are ranked third and fourth, respectively.

The fitting results of four model including the eighth-order and ninth-order polynomials, 4PLF and 5PLF of 2# farm are all shown in Fig. 7. The partial enlarged detail is also given in the bottom right corner. Based on Fig. 7, the same conclusion can be drawn as obtained from 1# farm, the estimated results given by the high-order polynomial model fluctuate in a range and give an overfitting. However, the estimated results of LF model obtained by GLSE method are more smooth and stable, and avoid this overfitting. In this case, the LF model is recommended when the modelling accuracy is close.

Comparison of the fitting results for 2# wind farm. Figure created using Matlab R2014a (8.3.0.532). (https://www.mathworks.com).

Consequently, the 5PLF model give a better fitting results than 4PLF model for 1# farm, and the wind power model is given by

Using Eq. (31), the estimated cut-in speed and rated speed can be obtained as 2.07 m/s and 9.93 m/s, respectively. Compared with the actual cut-in speed 2 m/s and rated speed 10 m/s, the maximum relative error is 0.035. Finally, using Eqs. (29), (30) and (31), the predicting results of AEP for 1# farm are shown in Table 7.

To verify the calculating results, based on one-year period wind power data from 1# farm, the actual total wind power is 3.1725 GWh. Compared with this value, the AEP estimation result of 3.2360 GWh is close to the actual AEP, the relative error is only 2.00%. Therefore, the correctness of the method proposed in this paper is validated. On the other hand, the utilization rate of wind energy is 81.87%, it means that wind energy resources are not used efficiently, the possible reasons are caused by the failures of wind turbines and measuring instrument, wind abandonment or ration electricity, environmental conditions, and maintenance, etc.5,21.

To predict wind power, wind speed prediction is required after obtaining the power curve for a specific turbine. In this paper, a data set of real wind speed from 1# farm is used directly to verify the accuracy of power curve. Therefore, 100 wind speed data are randomly selected in real wind data, and after substituting the wind speed into the 5PLF WPC model, the predicting value of output power at that wind speed point can be obtained. The actual power values and the predicted values are shown in Fig. 8. It can be found that the actual power values at different wind speed data points are very close to the predicted power values, and the total actual output power of these 100 data points is 40,042 kW, and the predicted output power is 38,403 kW with a low relative error of 4.27%, indicating the WPC model has a high prediction accuracy. It also shows that proposed method of the power prediction based on wind turbine power curve is feasible.

Wind power prediction with five-parameter logistic model for 1# farm. Figure created using Matlab R2014a (8.3.0.532). (https://www.mathworks.com).

A GLSE approach, which combined GE with LSE to model wind turbine power curve and predict wind turbine AEP, are proposed in this study, the problem of selecting initial value of model parameter estimation for LF is solved, and the effectiveness and correctness are validated by the two-group different field wind data. The main conclusions are drawn as follows:

(i) The polynomial and LF models in modelling WPC were compared by six evaluation indices including RMSE, R2, MAE, MAPE, improved AIC and BIC, and it was found that 5PLF model outperforms 4PLF model, and both nine-order polynomial and 5PLF have a higher fitting accuracy. It is also found that the power values estimated by the high-order polynomial are still fluctuate even if wind speed far exceeds the rated wind speed. The LF model best describes the trend of wind power with wind speed and can be adopted to fit the relationship between wind speed and wind power. Therefore, the LF model is recommended when the modelling accuracy is close.

(ii) Although the LF is more suitable for the modelling of WPC than high-order polynomial, the LF requires an initial value when estimating the model parameters, and if the initial value is not selected appropriately, it will fall into a local optimum. Therefore, other algorithms are needed to be combined to search for a reasonable initial value. If an optimization algorithm is only used to estimate model parameters, it is time-consuming to convergence, combined GA with LSE, which not only can effectively estimate model parameter, but also significantly improve the estimation accuracy.

(iii) Based on the models of wind speed and power curve, APE can be obtained. It also proves that combined with wind speed estimation, it is possible to achieve an accurate wind power prediction using WPC and provide a reliable support for wind power grid connection and dispatching.

All data generated or analysed during this study are included in this published article.

Carrillo, C., Montaño, A. F. O., Cidrás, J. & Dorado, E. D. Review of power curve modelling for wind turbines. Renew. Sustain. Energy Rev. 21, 572–581 (2013).

Article Google Scholar

Lydia, M., Kumar, S. S., Selvakumar, A. I. & Kumar, G. E. P. Wind resource estimation using wind speed and power curve models. Renew. Energy 83, 425–434 (2015).

Article Google Scholar

Wang, Z. & Liu, W. Wind energy potential assessment based on wind speed, its direction and power data. Sci. Rep. 11, 1–15 (2021).

Google Scholar

Wang, Y., Hu, Q., Li, L., Foley, A. M. & Srinivasan, D. Approaches to wind power curve modeling: A review and discussion. Renew. Sustain. Energy Rev. 116, 109422 (2019).

Article Google Scholar

Yan, J., Zhang, H., Liu, Y., Han, S. & Li, L. Uncertainty estimation for wind energy conversion by probabilistic wind turbine power curve modelling. Appl. Energy 239, 1356–1370 (2019).

Article Google Scholar

Gill, S., Stephen, B. & Galloway, S. Wind turbine condition assessment through power curve copula modeling. IEEE T. Sustain. Energ. 3, 94–101 (2011).

Article ADS Google Scholar

Kusiak, A., Zheng, H. & Song, Z. Models for monitoring wind farm power. Renew. Energy 34, 583–590 (2009).

Article Google Scholar

Kusiak, A., Zheng, H. & Song, Z. On-line monitoring of power curves. Renew. Energy 34, 1487–1493 (2009).

Article Google Scholar

Liang, T., Meng, Z., Cui, J., Li, Z. & Shi, H. Health assessment of wind turbine based on laplacian eigenmaps. Energ. Source. Part A 11, 1–15 (2020).

Google Scholar

Lydia, M., Kumar, S. S., Selvakumar, A. I. & Kumar, G. E. P. A comprehensive review on wind turbine power curve modeling techniques. Renew. Sustain. Energy Rev. 30, 452–460 (2014).

Article Google Scholar

Park, J. Y., Lee, J. K., Oh, K. Y. & Lee, J. S. Development of a novel power curve monitoring method for wind turbines and its field tests. IEEE Trans Energy Conver. 29, 119–128 (2014).

Article ADS Google Scholar

Sun, Q., Liu, C. & Zhen, C. Abnormal detection of wind turbine operating conditions based on state curves. J. Energ. Eng. 145, 06019001 (2019).

Article Google Scholar

Ye, X. W., Ding, Y. & Wan, H. P. Statistical evaluation of wind properties based on long-term monitoring data. J. Civ. Struct. Health 10, 987–1000 (2020).

Article Google Scholar

Hagspiel, S., Papaemannouil, A., Schmid, M. & Andersson, G. Copula-based modeling of stochastic wind power in Europe and implications for the Swiss power grid. Appl. Energy 96, 33–44 (2012).

Article Google Scholar

Thapar, V., Agnihotri, G. & Sethi, V. K. Critical analysis of methods for mathematical modelling of wind turbines. Renew. Energy 36, 3166–3177 (2011).

Article Google Scholar

Xu, K. et al. Quantile based probabilistic wind turbine power curve model. Appl. Energy 296, 116913 (2021).

Article Google Scholar

Yun, E. & Hur, J. Probabilistic estimation model of power curve to enhance power output forecasting of wind generating resources. Energy 223, 120000 (2021).

Article Google Scholar

Shen, X., Fu, X. & Zhou, C. A combined algorithm for cleaning abnormal data of wind turbine power curve based on change point grouping algorithm and quartile algorithm. IEEE Trans. Sustain. Energ. 10, 46–54 (2018).

Article ADS Google Scholar

Wang, Z., Wang, L. & Huang, C. A fast abnormal data cleaning algorithm for performance evaluation of wind turbine. IEEE Trans. Instrum. Meas. 70, 1–12 (2020).

Article Google Scholar

Wang, Z., Liu, W. & Wang, X. Abnormal data cleaning of wind turbine power curve using Bayesian change point-quartile combined algorithm. Proc. Instit. Mech. Eng. A J. Power https://doi.org/10.1177/09576509221119563 (2022).

Article Google Scholar

Taslimi-Renani, E., Modiri-Delshad, M., Elias, M. F. M. & Rahim, N. A. Development of an enhanced parametric model for wind turbine power curve. Appl. Energy 177, 544–552 (2016).

Article Google Scholar

Jing, B., Qian, Z., Zareipour, H., Pei, Y. & Wang, A. Wind turbine power curve modelling with logistic functions based on quantile regression. Appl. Sci. 11, 3048 (2021).

Article CAS Google Scholar

Villanueva, D. & Feijoo, A. Comparison of logistic functions for modeling wind turbine power curves. Electr. Power Syst. Res. 155, 281–288 (2018).

Article Google Scholar

Adedeji, P. A., Akinlabi, S., Madushele, N. & Olatunji, O. O. Wind turbine power output very short-term forecast: A comparative study of data clustering techniques in a PSO-ANFIS model. J. Clean. Prod. 254, 120135 (2020).

Article Google Scholar

Manobel, B. et al. Wind turbine power curve modeling based on Gaussian processes and artificial neural networks. Renew. Energy 125, 1015–1020 (2018).

Article Google Scholar

Wang, W. et al. A study of function-based wind profiles based on least squares method: A case in the suburbs of Hohhot. Energy Rep. 8, 4303–4318 (2022).

Article Google Scholar

Liu, Y., Etenovi, D., Li, H., Gryazina, E. & Terzija, V. An optimized multi-objective reactive power dispatch strategy based on improved genetic algorithm for wind power integrated systems. Int. J. Elec. Power 136, 107764 (2022).

Article Google Scholar

Lydia, M., Selvakumar, A. I., Kumar, S. S. & Kumar, G. E. P. Advanced algorithms for wind turbine power curve modeling. IEEE T. Sustain. Energ. 4, 827–835 (2013).

Article ADS Google Scholar

Zou, R. et al. Wind turbine power curve modeling using an asymmetric error characteristic-based loss function and a hybrid intelligent optimizer. Appl. Energy 304, 117707 (2021).

Article Google Scholar

Mehrjoo, M., Jozani, M. J. & Pawlak, M. Toward hybrid approaches for wind turbine power curve modeling with balanced loss functions and local weighting schemes. Energy 218, 119478 (2021).

Article Google Scholar

Saint-Drenan, Y. M. et al. A parametric model for wind turbine power curves incorporating environmental conditions. Renew. Energy 157, 754–768 (2020).

Article Google Scholar

Zhao, Y. et al. Data-driven correction approach to refine power curve of wind farm under wind curtailment. IEEE Trans. Sustain. Energ. 9, 95–105 (2017).

Article ADS Google Scholar

Pelletier, F., Masson, C. & Tahan, A. Wind turbine power curve modelling using artificial neural network. Renew. Energy 89, 207–214 (2016).

Article Google Scholar

Morrison, R., Liu, X. & Lin, Z. Anomaly detection in wind turbine SCADA data for power curve cleaning. Renew. Energy 184, 473–486 (2022).

Article Google Scholar

Gottschalk, P. G. & Dunn, J. R. The five-parameter logistic: A characterization and comparison with the four-parameter logistic. Anal. Biochem. 343, 54–65 (2005).

Article CAS PubMed Google Scholar

Erodotou, P., Voutsas, E. & Sarimveis, H. A genetic algorithm approach for parameter estimation in vapour-liquid thermodynamic modelling problems. Comput. Chem. Eng. 134, 106684 (2020).

Article CAS Google Scholar

Motulsky, H. & Christopoulos, A. Fitting Models to Biological Data Using Linear and Nonlinear Regression-A Practical Guide to Curve Fitting (GraphPad Software Inc., 2003).

Savegnago, R. P., Cruz, V. A. R., Ramos, S. B., Caetano, S. L. & Munari, D. P. Egg production curve fitting using nonlinear models for selected and nonselected lines of white leghorn hens. Poult. Sci. 91, 2977–2987 (2012).

Article CAS PubMed Google Scholar

Deep, S., Sarkar, A., Ghawat, M. & Rajak, M. K. Estimation of the wind energy potential for coastal locations in India using the Weibull model. Renew. Energy 161, 319–339 (2020).

Article Google Scholar

Wang, Z., Yang, J., Wang, G. & Zhang, G. Application of three-parameter Weibull mixture model for reliability assessment of NC machine tools-A case study. Proc. Instit. Mech. Eng. C.-J. Mec. 225, 2718–2726 (2011).

Article Google Scholar

Wang, Y., Hu, Q., Srinivasan, D. & Wang, Z. Wind power curve modeling and wind power forecasting with inconsistent data. IEEE Trans. Sustain. Energ. 10, 16–25 (2019).

Article ADS Google Scholar

Cai, J., Liu, R., Xiong, J. & Cui, Q. A new five-parameter logistic model for describing the evolution of energy consumption. Energy Sour. B Energy Econ. Plann. 11, 176–181 (2016).

Google Scholar

Download references

We would like to thank the anonymous reviewers and the editor for their valuable comments and suggestions, which have greatly enhanced the clarity of the paper.

School of Mechanical and Electrical Engineering, Lanzhou University of Technology, Lanzhou, 730050, China

Zhiming Wang, Xuan Wang & Weimin Liu

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

W.Z. is responsible for developing methods, drawing charts, methodology improvements and writing the original draft. W.X. is responsible for data curation and analyzing results. L.W. is responsible for drawing charts, methodology improvements.

Correspondence to Zhiming Wang.

The authors declare no competing interests.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

Wang, Z., Wang, X. & Liu, W. Genetic least square estimation approach to wind power curve modelling and wind power prediction. Sci Rep 13, 9188 (2023). https://doi.org/10.1038/s41598-023-36458-w

Download citation

Received: 16 December 2022

Accepted: 04 June 2023

Published: 06 June 2023

DOI: https://doi.org/10.1038/s41598-023-36458-w

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.