|
|
Model Validity Tests for RBF Network
M. Y. Mashor
Abstract Model validation is an important step in system identification process. However, theoretical derivation of model validity tests for neural network such as RBF network is very complicated. The current study, investigate the capability of some of the model validity tests that are widely been used namely one step ahead prediction, model predicted output, means square error and correlation tests. This paper also explores the appropriateness of these validity tests to provide some inside information about network model deficiencies. Key Words : Model validation, system identification, RBF network, prediction, mean square error, correlation tests. 1. Introduction Radial basis function networks have theoretically been proved to posses a universal approximation property by Poggio and Girosi (1990). The networks have also been successfully applied in various fields such as system identification (Chen et al., 1991, 1992; Elanayar and Shin, 1994; Pottmann and Seborg, 1992; Ye and Loh, 1993), pattern recognition (Arad et al., 1994; Galicki et al., 1997; Jonathan and Buxton, 1997), robotics (Feng, 1993; Gorinevsky and Connolly, 1994), medicine (Linken and Nie, 1993) and business (Tan et al., 1992). In the present study, some model validity tests are investigated for RBF network in relation to system identification. Unlike conventional parametric models, neural network models are highly non-linear hence the definitive proof of network validity tests is very difficult. However, identification using neural networks involves learning or estimating mathematical descriptions of systems. Thus, the fundamental results of model validation methods for the conventional system identification may also be used for RBF network. Model validity tests such as one step ahead prediction, model predicted output, mean square error, Chi-square and correlation tests are widely been used especially in parametric system identification (Chen et al., 1990; Korenberg et al., 1988; Billings and Voon, 1986, Billings et al. 1992). Due to a complex structure of RBF network, the mathematical proof of the suitability of these tests to validate the RBF network model will be very complicated. In the present study, the suitability of these validity tests is investigated using simulation. In this study, the RBF network will be trained using the hybrid-training algorithm that was similar to the method introduced by Chen et al. (1992). The method uses exactly the same clustering algorithm and weight estimation method used by Chen et al. (1992) but the training is performed using off-line technique, i.e. the RBF centres are positioned before the weights are estimated. The off-line training was selected such that the effect of bad initial centre positions can be reduced. The hybrid-training algorithm was selected because many results that have been reported on the RBF network are based on training algorithms that are similar to this. So the discussion in this paper will be particularly applicable to those works. 2. RBF Network with Linear Input Connections A RBF network with m outputs and nh hidden nodes can be expressed as:
where wij, wio and cj(t) are the connection weights, bias connection weights and RBF centres respectively, v (t) is the input vector to the RBF network composed of lagged input, lagged output and lagged prediction error and Since neural networks are highly non-linear, even a linear system has to be approximated using the non-linear neural network model. However, modelling a linear system using a non-linear model can never be better than using a linear model. Considering this argument, the RBF network with additional linear input connections is used. The proposed network allows the network inputs to be connected directly to the output node via weighted connections to form a linear model in parallel with the non-linear standard RBF model as shown in Figure 1. The new RBF network with m outputs, n inputs, nh hidden nodes and nl linear input connections can be expressed as:
where the l ‘s and vl’s are the weights and the input vector for the linear connections respectively. The input vector for the linear connections may consist of past inputs, outputs and noise lags. Since l 's appear to be linear within the network, the l 's can be estimated using the same algorithm as for the w’s. As the additional linear connections only introduce a linear model, no significant computational load is added to the standard RBF network training. Furthermore, the number of required linear connections is normally much smaller than the number of hidden nodes in the RBF network. In the present study, Givens least squares algorithm with additional linear input connection features is used to estimate w’s and l ‘s. Refer to Chen et. al. (1992) or Mashor (1995) for implementation of Givens least squares algorithm.
Figure 1. The RBF network with linear input connections 3. Model Validity Tests A poorly fitted neural network model often predicts badly and can be biased. This may occur due to incorrect input node assignments, noisy data, insufficient hidden nodes, inappropriate values of design parameters etc. Model validity tests are procedures designed to detect model deficiencies. There are several ways of testing a model. In the present study one step ahead prediction (OSA), model predicted output (MPO), mean squared error (MSE) and correlation tests will be used to test the fitted network model. 3.1 One Step Ahead Prediction and Model Predicted Output One step ahead prediction has been used by many authors to measure the predictive capability of a fitted model. OSA is given as:
and the residual or prediction error is defines as
where
and the deterministic error or deterministic residual is:
It is normal for OSA to be good even when the model is biased, underfitted or overfitted because the value of To illustrate these concepts, consider system S1, which is a tension legs data. 1000 data samples were taken from the system for analysis. Initially the network was trained using an adequate input vector, where the input vector was assigned as: and Other parameters were set as while leaving other specifications unchanged, the OSA and MPO in Figures (3a) and (3b) were produced. In this case the OSA plot is good but the MPO plot becomes unstable over the testing data set (from 601 to 1000 data samples). This example shows that OSA cannot always detect the deficiency in a fitted model. On the other hand MPO computed over testing data set will normally reveal model deficiency.
![]() (a). One step ahead prediction
(b). Model predicted output Figure 2. Predicted outputs of the network model with proper input vector for system S1
Figure 3. Predicted outputs of the network model with improper input vector for system S1
Network overfitting is also hard to be detected by OSA test. For example, if system S1 is now over specified by assigning:
The MPO is now very bad especially over the testing data set, refer to Figure (4b). In this case, the network tends to include noise as part of the process model that often leads to a more complex model. This type of overfitted network model is often harder to detect because the network normally produces a reasonable OSA but MPO is normally very bad. The MPO in Figure (4b) is much worse than the MPO produced using the network with the appropriate structure in Figure (2b). Since the network is trained to minimise the residual over the training data set, it is not surprising that even an overfitted network will produce a good OSA over the training data set. However, OSA over the testing data set will normally deteriorate if the model is heavily overfitted. This situation is illustrated by the OSA plot in Figure (4a).
![]() (a). One step head prediction
(b). Model predicted output Figure 4. Predicted outputs of the RBF network model with the overfitted structure
3.2 Mean Squared Error Mean squared error (MSE) is an iterative method of model validation where the model is tested by calculating the mean squared error at each training step. Mean squared error at t-th training steps, is given by:
where The MSE plot will indicate how fast the prediction error and the network parameters converge with the number of training data. This can also be used to determine the number of data samples that are required to train a network. MSE will normally decrease with the number of data but after a certain number of data the MSE will no longer significantly decrease with increasing numbers of data. However, the number of data that is required to train the network depends on the training algorithm as well as the network architecture. An example of MSE evolution is shown in Figure (5) that was generated for system S1. From the plot, it is found that the MSE converges after about 300 data, which indicates that the network requires about 300 data be trained properly.
Figure 5. MSE evolution for system S1. In general, a good model will produce a good MSE, however the model with a good MSE will not always imply that the model is good. For instance, an overfitted model will normally produce a good MSE although the model cannot predict very well and may be biased. This problem may be avoided by splitting the data into two sets, the training set and the testing sets. The MSE calculated using the testing data set often provides a better measurement of the predictive capability of a fitted model. This can be seen in Figures (6a) and (6b) where the networks have the overfitted hidden nodes and input nodes. The plots show that if the MSE is calculated using testing data set then the effect of overfitting will be clear. In the case of hidden nodes overfitting (refer to figure 6b), MSE will decrease with the increasing number of hidden node for training data set but increasing for testing data set. In other words, the network loses its generalisation property. Thus, computing the MSE over the testing data set becomes more helpful to avoid overfitting.
![]() (a). Input nodes ![]() (b). Hidden nodes Figure 6. Variation of MSE with the overfitted input nodes and hidden nodes for system S1 MSE plot also gives a rough idea of the appropriate number of hidden nodes and maximum input lag. The MSE plots for S1 against the maximum number of input lag and hidden nodes Figures (6a) and (6b) indicate that the network should have 80 hidden nodes and a maximum lag of 8 that is nu = 8 and ny = 8. With this specifications the network will gives the optimum MSE over both training and testing data set. Therefore, MSE test can be used to avoid underfitting and overfitting in RBF network. 3.3 Correlation Tests A non-linear model is considered as unbiased if the residual, e (t), is unpredictable or uncorrelated with all linear and non-linear combinations of past inputs and outputs. Billings and Voon [1986] proved that for a certain class of non-linear systems the following conditions should hold if the fitted non-linear model is adequate:
where The correlation test between two sequences The term Noise can enter a system internally or externally, however, for a linear system the internal noise can always be translated to be additive at the output. Whereas, if a system is non-linear, internal noise can introduce cross product terms between the input, output and noise. Both types of noises can create problem and will normally induce bias but internal noise is often more difficult to handle. The problem of noisy data can normally be eliminated by fitting an appropriate noise model. The theoretical analysis of this problem can be presented with a few assumptions. Assume that the centres of the RBF network have been fixed and correct designing parameters have been specified, then the problem of RBF weight estimation is reduced to a linear least squares problem. A RBF network with a single output and nh hidden nodes can be expressed as:
where wj, cj and e (t) are the connection weights, RBF centres and prediction error respectively; f (.) is a basis function, selected to be the thin-plate-spline; and v(t) is an input vector that may consist of past inputs, past outputs or past prediction errors. The term f ( || v(t) - cj(t) || ) is the output of the hidden nodes that becomes available before the w's are estimated. If the term is represented by zj(t), equation (10) can be expressed as a general regression model:
where z(t), and e (t) are the regressors and the prediction errors respectively. In matrix form equation (11) can be written as: Y = ZW + X (12) where Y = [ y(1) ... y(N) ]T, W = [ w1 ... wnh ]T, X = [ e (1) ... e (N) ]T, N is the number of training data and
The solution of equation (12) can be deduced from the least squares estimate, (Goodwin and Payne, 1977) to give:
Substitute equation (12) into equation (14) and rearrange the terms to yield:
It is clear from equation (15) that for the estimate of The capability of the correlation tests to detect model deficiencies will be illustrated by using the following system (called system S2) for the node assignment problem:
1000 data pairs were generated by using a uniformly distributed zero mean white noise sequence u(t) between [-1, +1]. The output of the system was corrupted by a coloured noise formed by a function of a Gaussian white noise, e(t) that has zero mean and variance of 0.05. A network with 50 hidden nodes, Initially the network was trained by using the correct input vector, where the input vector was assigned as: The correlation tests in Figure (7a) were produced where all the tests are satisfied hence the model can be considered as adequate. When the input vector was assigned by excluding the
![]() (a). correct input vector ![]() (b). without u(t-2)
Figure 7. Correlation tests for the input node assignment problem When the input vector was assigned by excluding the y(t-2) term, the correlation tests in Figure (7c) were produced. The plots of This example suggests that the correlation tests can be used to detect missing terms in the RBF network input vector. For a simulation example, it is easy to get a good model that satisfies all the correlation tests provided an appropriate number of hidden nodes and other design parameters are specified correctly because the input vector is known. However, in practice where the true system is unknown, the detection of network deficiency is more challenging and interpretation of the correlation tests is not always so straightforward. Model deficiency occurs not only because of incorrect input vector but may be due to any incorrect design parameter of RBF network. However, correlation tests are designed to detect all possible deficiencies in the model irrespective of the cause of the deficiency. Correlation tests are quite reliable to detect bias in RBF network model where OSA and MPO normally fail. To illustrate this idea considers system S3 that is the heat exchanger data (Billings and Fadhil 1986). The network was trained using 40 centres, b 0 = 0.99, b (0) = 0.95, h (0) = 0.9, nu = 3, ny = 2 plus a bias input and 600 data were used for training. The MPO of the model is shown in Figure (8) where the network gives a reasonable prediction over both the training and testing data sets. However, the correlation tests in Figure (8) suggesting that the model is biased where almost all the correlation plots lie outside the 95% confidence limits.
![]() (a). Model predicted output ![]() (b). Correlation tests Figure 8. Validation tests for system S3 4. Conclusions It is very difficult to prove definitively for neural networks that the model validation procedures will detect the network deficiency. Model validity tests such as correlation tests, OSA, MPO and mean squared error have been shown to provide some helpful information about the fitted network model. OSA that is commonly been used to measure the predictive capability of a fitted model has been shown to be not adequate to validate the network model. It has been shown that OSA plot is still good even though the network model was highly biased. Even the OSA plot over the testing data set was just slightly deteriorated. A better validation test for predictive capability of RBF network model is MPO, which normally detects most of the model deficiencies. MSE can be used to indicate how fast the parameters of the network converge to their final values. Thus, it will indicate the number of minimum data samples that should be used to train the network. The results also suggest that MSE plot over testing data set can be used to find the optimum number of input nodes and hidden nodes for RBF network. Results in section 3.3 suggest that the correlation tests are adequate to detect missing input terms in RBF network input vector. In practice where the true system is unknown, the detection of a deficiency is more challenging and interpretation of the correlation tests is not always so straightforward. Model deficiency may occur not only because of incorrect input vector but may be due to any incorrect design parameter or RBF centres. Fortunately, correlation tests are designed to detect all possible deficiencies in the model irrespective of the cause of the deficiency. References
|
|
Assumption University of Thailand |