A Multi-Level And Multi-Scale Evolutionary Modeling System For Scientific Data Zhou Kangl Yan Lil.’ Hugo de Garis’ Li-Shan Kang’ 1 .Computation Center, Wuhan University, Wuhan, 430072, China 2. State Key Laboratory of Software Engineering, Wuban University, Wuhan, 430072, China 3. Computer Science Department, Utah State University, Logan, Utah 84322, USA Email: kang@whu.edu.cn Abstract - The discovery of scientific laws is always built on the basis of scientific experiments and observed data. Any real world complex system must be controlled by some basic laws, including macroscopic level, submicroscopic level and microscopic level laws. How to discover its necessity-laws from these observed data is the most important task of data mining macroscopic behavior of the complex system, to describe its microscopic behavior, while we use the natural fractal models (a kind of natural discrete wavelets). In this way, we build a multi-level and multi-scale evolutionary modeling system for observed data. This system provides a strong tool for the analysis and prediction of complex time series. The observed data of the sunspot and the precipitation of flood season are used as the test data of this system. The rest of the paper is arranged as follows: in section 2, we introduce the macroscopic ODE model; in section 3, we introduce the microscopic natural fractal model; numerical experiments are given in section 4; and in the end, section 5 @M) and KDD. Based on the evolutionary computation, this paper proposes a multi-level and multiscale evolutionary modeling system which models the maerebehavior of the system by ordinary dillerential equations while models the micm behavior of the system by natural fractals. This system can be used to model and predict the scientific observed time series, such as observed data of sunspot and precipitation of flood season, and always get good results. I. I”R0DUCITON is some conclusions. n. MACROSCOPIC ODE MODEL The complex time series are usually characteristics of multi-level and multkscale. Assume that it has two levels: macro and micro. In order to describe it in macroscopic level, the researchers take many kinds of methods to pre-handle the time series. The scientific observed data or scientific experimental data get from some reakworld systems such as physical, chemical or engineering system. These data usually are very complex, but any complex system is hound to be controlled by some basic laws, including macroscopic level laws, submicroscopic level laws and microscopic level laws. Consider the following observed data as the time series: x(to),x(t,),...,x(t,) A. Decomposition of Original Data In order to find out macroscopic laws from complex data, the first step is to decompose the original data x(ti), (1) i=O,I,Z,..m~asin(l)intotwoparts: thesmoothpartandthe coarse part (non-smooth part). We assume that the evolutionary process of the smooth part is controlled by macroscopic factors, and the evolutionary process of the coarse part is controlled by microscopic factors. The smooth part will he modeled by ordinary differential equations where to= to + ik, k is the time stepsize. As for the time series, except for the traditional method of time series analysis[’], evolutionary algorithm are used to cope with these data[21L31[41. Suppose that the observed data are controlled by macroscopic, sub-macroscopic and microscopic laws. We take the multi-level, multi-scale models for analyzing and predicting the observed data. In this paper, we use the ordinary differential equation (ODE) model to describe the 0-7803-7278-6/02/$10.00 02002 IEEE (ODE), while the coarse part he modeled by natural fractals (a kind ofmultkcale discrete wavelets). For the time series (l), we decompose it into two parts: 737 x(t,) =a(ti)+x\"(tj) i=O,l;..,m where the smooth part ?(ti) is (2) defined as: That is to say, to findfin function space F, such that This problem is solved by evolutionary modeling algorithm described in [7]. The main idea of the algorithm is to embed a genetic algorithm in genetic programming used to discover and optimize the structure of a model, while GA is used to optimize its parameters. The evolutionary modeling algorithm for higher-order and the coarse part: ordinary differential equation can he simply described as follows: x\"(ti) = x(t,) - T(t,), i = 0,1,... ,m (4) Notice: (3) and (4) is related to the value of the smooth parameter PROCEDURE 1 begin Initialize population P(0) = @,(O), pz(O), ..., ph(0)); braduce N parse trees randomly) 1 which is often proportional to m, when 1 is {?(t,)} is more smooth. bigger, the time series B. Macroscopic Higher-Order ODE Model In this subsection, we will mainly introduce how to model and predict the smooth data Since the smooth 1:=0; Evaluate the fitness ofp,(f ), i = I, 2, . . .N, while not terminate do begin data describe the macro behavior of dynamic system and determine the macroscopic tendency of the system, it is the essential part of observed data. Because it is the smooth part of observed data, we assume that T(t) is sufficiently smooth, that is, assume that X(f) E C.[fo,Tj, 10n04. The modeling problem of the dynamic system Pc(f ) := crossover {P(f )}; P,,,(f) :=mutation (!'At)}; Evaluate P,,,(f ); P(f + 1) := selection (P,,,(f ), P(f )); f :=f + 1; end Outputsolutionof pa,, :Y(ti),i = O,l;..,m+q; end X(f) is to For the details of the process ofmodeling, please refer to find an initial value problem of the nth-order ordinary differential equation: IU MICROSCOPIC FRACTAL MODEL For the coarse part {x\"(tj)}Eo of the time series (1): such that the mean square error between the values of its at fi, i = OJ, ...q and the series {.?(ti)} as solution x*(f) small as possible. ;c\"(ti)=x(tj)-Y(tj), i= O,l;..,m, we are going to build a multi-scale micro natural fractal model. A. Construcfion ofNafural Wavelefs 0-7803-7278-6/02/$10.00 02002 IEEE 738 of series (4), we divide the series {x\"(t;)},\" into I groups (fkom row to column to form the following matrix (see Table I), each column as a group, including /groups), where xi denotes ?(ti) TABLE I -- In order to search an /scale basic natural wavelet Assume that E; has an Fdistrihution with (l-1, m-/+l) degrees of freedom. For different confidence level d and (I- 1, m-l+l) degrees of freedom, we can get FA (/-I, m-l+l) from a Fdistribution table. (/-I, m-/+l), then the time series (4) exists /scale If EI0F~ basic natural wavelet x' (2) in confidence level 6. If E, < FB (/-I, m-/+I), then the series (4) does not exist /-scale basic natural wavelet in confidence level a. 2 XI Xli, ... x1,s ... xi,., B. Microscopic Natural Fractal Model In order to build the mathematical model for the coarse part ?(t) of the time series (4), we construct a multi-scale average I x,' natural fractal model with scale /=2, 3, _.., L. The process can be described as follows: --I 2; ... Xs+! ... Y/! PROCEDURE 2 begin for i=O, m do %(i) := x\"(ti) ; where k,. irS+1 k-1 endfor for i=O,mtq i>S+1 do x *(i):= WhenS=/-I, k,*=k, then (hly/=k. Through the points (ti_,,?: endfor 0; ),i =1,2;..,1 in the xClt for / =2, L, do plane, we can get a polygonal line as follows: using (16) to calculate E, ; if E, OFa(/-l, m-/+I) then begin P E, = otherwise for i=O,m+q do Evidently, the function x'(/ ) has a local compact support [to, t/./], we call it/-scale basic natural wavelet. In order to test whether x'(t ) is a basic natural wavelet of time series (4), we introduce the variance ratio: Ak:(Z/ -?)'/(/-I) i=1 endfor for i=O,m do (12) endfor end ~2(x,,,,,+,, -x, ) @-/+I) e, j4 --I ./ 0-7803-7278-6/02/$10.00 02002 IEEE 739 endfor E we can build a multi-level and multi-scale evolutionay modeling system for fitting and prediction of complicated time series. Firstly, call PROCEDURE 1 to build the ODE (5), and use Runge-Kutta method to solve it to get the fitting :=o; E(i) :=?(ti) - x* (i) ; E:= &+ &(i); for i=O,m do and prediction values of the smooth part x*(ti),i=o,1;..,m+q, then call PROCEDURE 2 to get the fitting and prediction values of the coarse part: endfor E = &/(m +l); for i=O,m+q Q x\"'(t),i=O,l;..,m+q. Adding up these data, we can get the needed fitting and prediction values: x\" * (ti) := x* (i) +8; endfor output end Remark 1: The output ?'(tj)+Z*(ti) = x'(ti), i =O,l;..,m+q This procedure can be described as follows: PROCEDURE 3 begin Decomposedatax [O,m] into qO,fE] and Y[O,m]; ?'(t,),i= O,I;..,m+q; {?*(ti )}Eo is the fitting part of is the prediction part of {x\"(ti)ho and {?*(ti)}:=,+, C~~IPROCEDUREI toget x'[O,m+q]; C~IIPROCED~~ to get ?*[o,m +q]; x\"(t). E is the average fitting error, and it has been eliminated as the correction (random error). Remark 2: The first part of the procedure is to test successively whether the time series exists basic natural wavelet with scale I which is less than L, usually LO(m+l)/3, for some special problems, where m is relatively small, L can be magnified to LO(m+1)/2. if it exists, then prolong it periodically to whole interval series for i=O,m Q x'(t,) :=X'(i)+?'(z]; e(i ) : =x(i ) -x*(tj ); endfor for i=m+l,m+q do r,,, f,+] and add to the time x'(ti) :=X*(i)+x\"*(i]; endfor O,l,. . .@q; output x*(tj). i = output e(i),i=O,l, ..q; end Remark 1: The first step of the procedure is to decompose the original time series into two parts: the smooth part and the coarse(non-smooth) part. Remark 2: The second step of the procedure is to call PROCEDURE 1 to deal with the smooth data (.F(t)} and get an ODE model of and the values of its solution: (x * (i)},\"\" . Remark 3: The second part of the procedure is to evaluate the random error of fitting and prediction. (?(ti)}, and then correct it when C. The Mulli-level and Multi-scale Evolutionary Modeling System Using the macroscopic modeling PROCEDURE 1 of ntharder ordinary differential equation (S), and the microscopic modeling PROCEDURE 2 of natural fractal n'(to),X'(t,),...,X'(t,+q) , where the first m+l values 0-7803-7278-6/02/$10.00 02002 IEEE 740 are the fitting values ofF(t) , and the later q values are the prediction values ofF(t) . Remark 3: The third step of the procedure is to call PROCEDURE 2 to deal with the coarse data x\"(t) and get a multiscale natural fractal model and its solution: ODE model as follows: dx -67.466530 -64.671898 *-+52368.285156 *cos@) -= d'x dt dt sin(exp(t)) dt The results are shown in Fig.] x\"'(to),~'(tl),...,~'(tm+q) , where the first m+l values are the fitting values of x\"(t), and the laterq values are the prediction values ofX(t) at tm+l. tm+z .... fmiq Remark 4: 7he fourth step of the procedure is to combine the fitting values of the smooth part with those of the coarse part of x(t) to get the time series {x * (ti)}rand the fitting The fifth step of the procedure is to conhine error {e(i)}y. the prediction values of the smooth part with those of the coarse part of x(t) to get the prediction values of x(t) at the time tm+,, t,,,+z, .../m+q, where the prediction lengthq can be decided by the users. - 0- - . . . . .. . nanyrnavu Fig. I. the fitting and prediction curves for sunspot numbers B. Modeling of the Precipitation of Flood Season The observed data of precipitation of flood season from 1880 to 1973 are taken from Weather Center of Wuhan Station shown in Fig.2. We take the observed data of the first 89 years as the training data to build models which are used to predict the precipitation of flood season of the last 5 years. Parameter settings of the modeling experiments are N. NUMERICAL- In this section, we mainly study the applications of multi-level and multkcale evolutionary modeling system to the scientific observed data of the sunspot and the precipitation of flood season. m94, 1=20 for smoothing, m=89, q=5, fo=O, At =O.Ol(one year), N=lOO, n=l (the first-order ODE) for macroscopic ODE model, and m89, q=5, L=44, h=O.l for microscopic natural fractal model. We get a first-order ODE model as follows: A. Modeling of the Observed Data of Sunspot The observed data shown in Fig.1 are taken from [8] giving the annual sunspot numbers over 176 years, from _- dr -16.915123 +(2~-129.304474)*sin(19.496265*~) dt The results are shown in Fig.2. 1749 to 1924. We take the observed data of the first 172 years as historical data (training data) to build models which are used to predict the sunspot numbers of the last 4 years. Parameter settings of the modeling experiments are m176, /=IO for smoothing, m=166, q=lO, tpO, At = 0.01 (one year), N=100, n=2 (the second-order ODE) for 0.05 macroscopic ODE model, and -166, q=10, L=83, Q= for microscopic natural fractal model. We get a second-order 0-7803-7278-6/02/$10.00 02002 IEEE 741 [4] T-C.Fu, F-L.Chung, V.Ng and R.Luk, Evolutionary segmentation of financial time series into subsequences, in Proceedings of the 2001 IEEE Congress on Evolutionary CompuIation, Seoul, Korea, 426430, May 27-30,2001. [5] C.Hafner and J.Frohlich, Generalized function analysis using hybrid evolutionary algorithms,Proceedings of the Congress on Evolutionary Computation, Washington D.C, USA, Vol. I, 287-294,1uly&9,1999. complexity of [6] L.Kang, Y.Li and Y. Chen, A tentative research on automatic programming, Wuhan University Journal ofNalural Sciences, Va1.6, No.1-2, 59-62, 2001. [7] H.Cao, L.Kang and Y. Chen, Evolutionary modeling of systems of Fig. 2. the fitting and prediction curves for precipitation of flood season ordinary differential equations with genetic programming, Genetic Programming and Evolvable Machines, Vol.1, No.4, 309-337,2000. [8] R.C. Gan, The Statistical Analysis of Dynamic Data, Beijing, Beijing ' V. CONCLUSION Compared with most available modeling methods, the multi-level and multkcale evolutionary modeling system has the following advantages: Firstly, the entire process is automatic and requires little information in the way of the real-world system or expertise. Secondly, it allows one to model the macro-behavior of the system by ordinary differential equations and to model University of Science and Technology Press, China, 1991. the micro-behavior of the system by multi-scale natural fractals simultaneously. Finally,. the models discovered by computers from the observed data can fit and predict the annual data quite well, and the structures of the models are unimaginably to humans. Acknowledgment This work is supported by the National Natural Science Foundation . of China (Nos. 60133010,, 60073043, 70071042). References [I] C.Chatfield, The Analysis of Time Series: An Introduction, Fifth Edition, Chapman & Hall, 1996. [2] K.R.Vazquez, Genetic programming in the time series modeling An application lo meteorological data, in Proceedings of the 2001 IEEE Congress on Evolutionary Computation, Seoul, Korea, 261-266,May 27-30, 2001. [3] Y.Liu and X.Ya0, Evolving neural network for Hang Seng stock index forecast, in Proceedings of the 2001 IEEE congress on Evolutionary Computation, Seoul, Korea, 256-260, May 27-30.2001, 0-7803-7278-6/02/$10.00 02002 IEEE 742