M. Y. Mashor* and H. A. Hamid
Abstract
This paper describes the architecture of a modified radial basis function (RBF) network that is called hybrid-RBF networks. Orthogonal least squares (OLS) algorithm has been used to detect the structure of the modified RBF network. The algorithm automatically selects the significant RBF centres from the training data set and at the same time estimates the weights of the networks. The capability of the architecture and the suitability of the centres that has been selected using OLS algorithm were demonstrated using a simulated data and real data sets. The results showed that the hybrid-RBF networks trained using OLS algorithm were adequate to model the two data sets.
1. Introduction
Neural networks have been proved to be capable of performing nonlinear mapping (Cybenko, 1989; Funahashi 1989; and Hornik, 1991). There are a number of studies that have been accomplished on modelling non-linear systems using radial basis function networks (e.g. Chen et. al., 1990, 1992; Elanayar and Shin, 1994; Longinov, 1994; Mashor, 1995). Neural network offers an alternative to perform system identification of nonlinear systems. However, there are several problems that normally associate with this method such as slow parameter convergence and heavy computation.
Several authors (Powell, 1987; Moody and Darken, 1989) have suggested RBF networks that overcome these problems. RBF is a traditional technique of strict interpolation in multi-dimensional space that is capable of representing almost any system up to a certain accuracy (Powell, 1987). RBF network with linear input connections has been used by Mashor (1997) to further improve the performance of standard RBF networks. It has been shown that the network is more efficient than the standard RBF network. Orthogonal least square algorithm proposed by Korenberg et. al. (1988) has been used in the current study to select the structure of hybrid-RBF network and also estimate the weights.
2. Hybrid-RBF Networks
A RBf network with m output nodes and nh hidden nodes can be expressed using the following equation:
;
i = 1,2,3……..,m (1)
where wij, wi0 and cj(t) are the weights, bias weights and RBF centres respectively. v(t) is the input vector that consists of input, output lags and prediction error lags. f
(·) is the activation function and ||·
|| is a distance measure normally taken to be Euclidean norm.
The activation function f
(·) can be selected from a set of basis functions and every hidden node can have different basis function. There are many basis functions that can be used such as linear, cubic, thin-plate spline, multi-quadratic, inverse multi-quadratic and Gaussian function. In the present study, thin-plate spline function has been used as the activation function due to its capability to perform good modelling (Powell, 1987). The function can be expressed as:
f
(a) = a2 log(a) (2)
where a is || v(t) – cj(t) || .
Since neural networks are normally highly non-linear, a linear system has to be approximated using the non-linear network model. However, better results can be achieved if the approximation is carried out using a linear model. Based on this argument a RBF network with additional linear input connection has been used. This network allows direct linear connections from input nodes to output nodes to form a linear model in parallel to the normal non-linear RBF network. The network is called hybrid-RBF network and the network with one hidden layer is shown in Figure (1).
Hybrid-RBF with m outputs, n inputs, nh hidden nodes and nl linear connections can be expressed using the following equation:

i = 1,2…..,m (3)
where l
ij and vl(t) are the weights and input vector for linear connections. The values of l
’s can be estimated using the same algorithm that has been used to estimate the values of w’s for the standard RBF network. In fact, both l
’s and w’s can be estimated simultaneously.
3 . Modelling Non-linear Systems using Hybrid-RBF Networks
A wide class of non-linear systems can be represented by non-linear auto-regressive moving average with exogenous input (NARMAX) model, Leontaritis and Billings (1985). The NARMAX model can be expressed in terms of a non-linear function expansion of lagged input, output and noise terms as follows:
(4)

Figure 1: Hybrid-RBF network with one hidden layer
where
are the system output, input and noise vector respectively;
are the maximum lags in the output, input and noise vector respectively.
Nonlinear function fs(·
) will be modelled using equation (3) where f
(·
) has been selected to be thin-plate-spline function. Input vector of the network, v(t) consists of input, output and noise lags denoted as u(t-1).
..u(t-nu), y(t-1)…y(t-ny) and e(t-1)…e(t-ne) as shown in equation (4). Equation (4) can be represented by the following equation:
(5)
The final stage in system identification is model validation. There are several ways of testing a model such as one step ahead predictions (OSA), model predicted outputs (MPO), mean squared error (MSE), correlation tests and chi-squares tests. In the present study, OSA, MPO and correlation tests were used to justify the performance of the fitted network models. OSA is a common measure of predictive accuracy of a model that has been considered by many researchers. OSA can be expressed as:
(6)
and the residual or prediction error is defined as:
(7)
where
is a non-linear function, in this case the HMLP network.
Another test that often gives a better measurement of the fitted model predictive capability is the model predicted output (MPO). Generally MPO can be expressed as:
(8)
and the deterministic error or deterministic residual is
(9)
A good model will normally give a good prediction, however a model that has a good one step ahead prediction and model predicted output may not always be unbiased. The model may be significantly biased and prediction over a different set of data often reveals this problem. This condition can be tested by splitting the data into two sets, a training set and a testing set.
MSE is an iterative method of model validation where the model is tested by calculating the mean squared errors after each training step. MSE test will indicate how fast a prediction error or residual converges with the number of training data. The MSE at the t-th training step, is given by:
(10)
where
are the MSE and OSA for a given set of estimated parameters
after t training steps respectively, and nd is the number of data that were used to calculate the MSE.
An alternative method of model validation is to use correlation tests to determine if there is any predictive information in the residual after model fitting Billings and Voon (1986). The residual will be unpredictable from all linear and non-linear combinations of past inputs and outputs if the following hold, Billings and Voon (1986):
(11)
where
and
are the mean value of
and the expectation respectively. In practice, if the correlation tests lie within the 95% confidence limits,
, then the model is regarded as adequate, where N is the number of data used to train the network.
4. Orthogonal Least Squares Algorithm
Orthogonal least squares algorithm will be used to estimate the parameter q
i where i = 0,1,2…M. This algorithm can be explained using auxiliary model:
(12)
where wi(n), i = 0,1, …M are constructed to be orthogonal over the data record such that:
(13)
A procedure to construct an orthogonal estimation algorithm can be described by defining:
(14)

m = 1,2,…, M (15)
0 £ r £ m-1 (16)
Setting
(17)
and using equation (5) and orthogonal properties of wm(t) in equation (12) gives parameters estimation as follows:

m = 1,…, M (18)
After the parameter
’s are obtained, values of actual parameters
’s for the NARMAX model in equation (5) can be
calculated using the equation bellow:
(19)
where
m < i £ M (20)
Prediction error in equation (12) can be calculated as:
(21)
Term selection of a model is very useful when a parsimonious representation of the identified system is required. To achieve the required accuracy, the order of terms dynamic (ny, nu, ne) in equation (4) can be increased. However, this condition will end up with a highly complex model and involve a heavy computation. The maximum terms for NARMAX model in equation (4) is given as:

where 
ni = { ni-1 ( Ny + Nu + Ne + i – 1 ) }/ i, (22)
no = 1
Term selection in this case is the selection of the RBF network centres. This procedure can be derived from the orthogonal least squares algorithm. By squaring the auxiliary model in equation (12) and take the time average, the equation becomes:
(23)
e
(t) is assumed to be a sequence of zero mean white noise. The maximum mean squared prediction error occurs when no terms are included in the model (M = 0) and in this case,
(24)
From equation (12) and (23), the reduction of mean squared error by including a term q
1P1(t) in the model will be given by the following equation:
(25)
The above equation can also be expressed as a percentage error reduction ratio of the total mean squared error (equation (24)) as follows:
(26)
From equation (17), the influence of g0 can be ignored by rewriting the equation (23) as:
(27)
Then, rewrite the equation (26) and eliminate the d.c. term will give the following equation:
(28)
where i = 1,2,...,M.
Values of ERRi become the indicators to the terms that should be included in the model. By knowing the values of ERRi, the insignificant terms can be excluded from the model. This can contribute to the reduction of mean squared error. The threshold value of ERRi denoted as Cd, for all the terms without e
(•) term, is normally selected between 0.05 and 0.5, Korenberg, (1988). For the terms that involve e
(·•) term, the threshold value is denoted as Cde and normally selected between 0.001 and 0.05. This range is selected to ensure that only adequate noise terms are included in the fitted model. Therefore, prediction error will be reduced to white noise sequence.
Orthogonal least squares algorithm for parameter estimation and term selection can be summarised as, (Korenberg, 1988):
- Select the values ny, nu, nu, d, l in equation (4) and set e
(t) = 0, for t = 1, …,N. Select Cd and Cde .
- Estimate all the parameters that do not contain e
(t) term using equation (7) to (11) .
- If e
(t) = 0, and t = 1,2,…,N, go to step (4). If not, use e
(t) term to estimate the parameters that have e
(t) terms using equation (14) to (18).
- Calculate value ERRi using equation (28) and test against Cd and Cde.
- Estimate prediction error using equation (21).
- If any process term is excluded in step (4), repeat step (2). Otherwise, go to step (3) and repeat until convergence.
- Estimate NARMAX model coefficients using equation (19) and (20).
In step (1), ny, nu, ne and l should be selected to be sufficiently large so that all possible terms will be included for selection process. If the selected nu is sufficiently large then value of d = 1 is adequate.
5. Application Examples
Hybrid-RBF networks trained using OLS algorithm have been used to model two nonlinear systems. In the two examples the RBF centres were initialised to the first few input and output data. To test the accuracy of the fitted model compare to the actual system, model validity tests were carried out (Mashor, 1997). In the present study, OSA, MPO and correlation tests will be used to validate the fitted network models. These value are taken to be the same for both examples,
.
Example 1
System S1 is a simulated system defined by the following difference equation:
where
is a Gaussian white noise sequence with zero mean and variance 0.05 and the input, u(t) is a uniformly random sequence between (-1,+1). System S1 was used to generate 1000 pairs of data input and output. The first 600 data were used to train the network and the remaining 400 data were used to test the fitted model. The network was trained based on the following configuration:
v(t) = [u(t-1) y(t-1) y(t-2)] and
vl(t) = [u(t-1) y(t-1) y(t-2)];
where vl(t) represents linear input connection vector. The OLS algorithm was used to select the best centres from the first 100 data pairs of input and output data. After 5 iterations with
, 42 centres have been selected.
OSA and MPO produced by the network model are shown in Figure (2) and (3) respectively. These plots show that the network model can predict the system outputs quite well. Correlation tests in Figure (4) are adequate where only
and
are marginally outside the 95% confident limits. As the network model predict quite well and has acceptable correlation tests then the model can be considered to be adequate to represent the system dynamic.
Figure 2: OSA test for example 1
Figure 3: MPO test for example 1
Figure 4: Correlation tests for example 1
Example 2
The second data set was taken from a heat exchanger system and consists of 1000 samples. The first 500 data were used to train the network and the remaining 500 data were used to test the fitted network model. The network has been trained using the following specification:


with bias input
The OLS algorithm was used to select the best centres from the first 200 data pairs of input and output data. After 5 iterations with
, 12 most significant centres have been selected.
OSA and MPO generated by the network model over both the training and testing data sets are shown in Figure (5) and (6) respectively. The plots show that the model predicts quite well over both the training and testing data sets. Correlation tests shown in Figure (7) are adequate. As the model predicts quite well and has adequate correlation tests, the model was considered to be sufficient to represent the identified system.

Figure 5: OSA test for example 2

Figure 6: MPO test for example 2

Figure 7: Correlation tests for example 2
6. Conclusion
Hybrid-RBF network has been trained using OLS algorithm for weights estimation and automatic selection of RBF centres. A simulated and a real data set were used to test the efficiencies of the hybrid-RBF network. The results from the two examples show that hybrid-RBF can predict quite well and produce adequate correlation tests. Therefore it can be concluded that, the hybrid-RBF networks trained using OLS algorithm can be used to model nonlinear systems.
REFERENCES
Billings, S.A., and Voon, W.S.F., (1986), “Structure detection and model validity tests in the identification of non-linear systems”, Proc. IEE, Part D, 127, 272-285.
Chen, S., Cowan, C.F.N., Billings, S.A., and Grant, P.M., (1990), “A parallel recursive prediction error algorithm for training layered neural networks”, Int. J. of Control, 51 (6), 1215-1228.
Chen, S., Billings, S.A. and Grant, P.M., (1992), “Recursive hybrid algorithm for non-linear system identification using radial basis function networks”, Int. J. of Control, 55, 1051-1070.
Cybenko, G., (1989), “Approximation by Superposition of A Sigmoidal Function”, Mathematics of Control, Signal and System, 2, 303-314.
Elanayar V.T., S., and Shin, Y.C., (1994), “Radial basis function neural network for approximation and estimation of non-linear stochastic dynamic systems”, IEEE Trans. on Neural Networks, 5 (4), 594-603.
Funahashi, K., (1989), “On the approximate realisation of continuous mappings by neural networks”, Neural Networks, 2, 183-192.
Hornik, K., (1991), “Approximation Capabilities of Multilayer Feed Forward Neural Networks”, Neural Networks, 4, 251-257.
Korenberg M., Billings S.A., Liu Y.P. and Mcllroy P.J., (1988), “Orthogonal Parameter Estimation Algorithm For Non-Linear Stochastic Systems”, Int. J. Control, 48, No 1, 193-210.
Leontaritis, I.J., and Billings, S.A., (1985), “Input-output parametric models for non-linear systems”. Part I - Deterministic non-linear systems. Part II - Stochastic non-linear systems, Int. J. of Control, 41, 303-359.
Longinov, N.E., (1994), “Predicting pilot look-angle with a radial basis function network”, IEEE Trans. on Systems, Man and Cybernetics, 24 (10), 1511-1518.
Mashor, M.Y., (1995), System identification using radial basis function network, PhD thesis, University of Sheffield, United Kingdom.
Mashor, M.Y., (1997), “Nonlinear System Identification Using RBF Networks With Linear Input Connections”, Technical J. School of Electrical & Electronic Eng., University Science Malaysia, 3, 49-56.
Moody, J., and Darken, C.J., (1989), “Fast Learning in Neural Networks of Locally-Tuned Processing Units”, Neural Computation, 1, 281-294.
Powell, M.J.D., (1987), “Radial Basis Function Approximation to Polynomials”, Proc. 12th Biennial Numerical Analysis Conf., Dundee, 223-241.