Prediction of a model enzymatic acidolysis system using neural networks

En este estudio se presenta un modelo para la acidólisis de la trilinoleina y el ácido palmítico mediante la catálisis con una lipasa específica sn-1,3 inmovilizada. Un modelo basado en redes neuronales (NN) ha sido desarrollado para la predicción de la concentración de los principales productos de esta reacción (1-palmitoil-2,3-oleoil-glicerol (POO), 1,3dipalmitoil-2-oleoil-glicerol (POP) y trioleina (OOO)). Se han usado como parámetros de entrada: la proporción del sustrato (SR), la temperatura de reacción (T) y el tiempo de reacción (t). La arquitectura óptima del modelo de NN propuesto, que consiste en una capa de entrada con tres entradas, una capa oculta con siete neuronas y una capa de salida con tres salidas, fue capaz de predecir la concentración de los productos de reacción con un error cuadrático medio (MSE) de menos de 1.5 y una R de 0.999 . Se presenta una formulación explícita del modelo NN propuesto. Se obtienen muy buenos resultados en la predicción de la reacciones de acidólisis mediante el uso de las redes neuronales.


INTRODUCTION
Among the methods used to improve the nutritional and functional properties of fats and oils, enzymatic interesterification has the greatest potential (Goderis et al., 1987).One of the most popular of these methods is the acidolysis of fats and oils by sn-1,3 specific lipase to produce structured lipids, cocoa butter equivalents, human milk fat substitutes, and so on (Undurraga et al., 2001;Quinlan and Moore, 1993).These kinds of lipids are produced by the incorporation of specific fatty acids to specific positions on the triacylglycerols or glycerides (Xu, 2003).The functional properties of lipids not only depend on their fatty acid composition, but also on the distribution on the glycerol backbone (Iwasaki and Yamane, 2000).
The use of specific lipases lends an advantage to controling the product and its composition (Xu, 2000).There are several factors affecting final product composition produced by enzymatic acidolysis.Substrate ratio, temperature and time are the major parameters that must be controlled (Willis and Marangoni, 2002).But these kind of biological systems are generally complex and it is difficult to estimate the results under any given condition.The relationship of parameters in biological systems is generally nonlinear (Braake et al., 1998).
The estimation of the progress of the reaction is required both for the quality improvement of products and for highest economical turnover.This is usually an engineering operation that is an issue of chemical modelling and mathematical manipulation.In many circumstances, it is very difficult to develop a model expressing physical, chemical or thermodynamical laws and probably unjustified (Xu, 2003).
In all these situations, the practical estimation of a real system can be made with mathematical tools.In the past decade, some well-established modelling techniques were developed for the analysis of biological phenomena, such as artificial intelligence and evolutionary computing (Manohar and Divakar, 2005).They can handle highly nonlinear problems typical of enzymatic processes.Neural network (NN) modeling in the estimation and prediction of food properties and process parameters is of increasing OZAN NAZIM ÇIFTÇI, SIBEL FADILOG ˘LU, FAHRETTIN GÖG ˘ÜS ¸ AND AYTAÇ GÜVEN interest.NN modeling has been used in the prediction of the thermal conductivity of bakery products Sablani et al., (2002) dough rheological properties (Ruan et al., 1995), physical properties of ground wheat Fang et al., (1998) thermal conductivity of fruits and vegetables (Hussain and Rahman, 1999) antioxidant capacity of cruciferous sprouts (Bucinski et al., 2004) and more recently esterification of anthranilic acid with methanol (Manohar and Divakar, 2005).There are papers about the use of neural networks in enzymology (Linko and Zhu, 1992;Linko et al., 1997;Zhu et al., 1996).(Bas ¸ et al., 2007) investigated the use of artificial neural networks (ANN) for the estimation of the enzymatic reaction rate for amyloglucosidase hydrolysed maltose.(Bhagwat et al. 2004) studied the kinetic and molecular modeling of transesterification of ethyl acetate and substituted ethanols with porcine pancreatic lipase (PPL) and Candida cylindracea lipase (YL).(Rajendran and Thangavelu 2007) used a predictive model including neural networks for a sequential optimization strategy to enhance the lipase (triacylglycerol acylhydrolases, EC 3.1.1.3)production by Bacillus sphaericus in submerged cultivation.(Boareto et al., 2007) proposed a hybrid neural model (HNM) for the on-line monitoring of lipase production by Candida rugosa.(Basri et al., 2007) compared the estimation capabilities of response surface methodology (RSM) and artificial neural network (ANN) in lipasecatalyzed synthesis of palm-based wax ester.They observed a clear superiority of ANN over RSM.
Traditionally, NNs are used as black-box models in which no one is interested in the fundamental hidden formulation.Input data is usually introduced into the black-box and the output is obtained without any understanding of what happens inside the box (Guven et al., 2006).Then the performance of the model is evaluated by comparing the output of the model with the observed data.The question to be answered is, how one can apply this kind of model in any other study, while the model has not yet been formulated.
The objective of this study is to make multi-layer perceptron NNs analysis and to develop an explicit formulation of the production of major acidolysis reaction products (1-palmitoyl-2,3-oleoyl-glycerol (POO), 1,3-dipalmitoyl-2-oleoyl-glycerol (POP) and triolein (OOO)) as a function of experimental variables: substrate ratio (SR), temperature (T) and reaction time (t).In this sytem POP is the main product, POO is the medium/side product, OOO is the residual reaction output.The main advantage of this study over existing NN studies is that an explicit formulation is derived to estimate the concentrations of POO, POP and OOO produced.

Enzymatic acidolysis
Triolein (0.1mM) and palmitic acid (0.2mM-0.6mM) were dissolved in 5 mL hexane in 50 mL erlenmayer flasks.Reactions were carried out with 10% enzyme concentration (based on weight on substrates) in a rotary shaking incubator (New Brunswick Scientific, model Nova 40, USA) at 200 rpm, at 40, 50 and 60°C.The progress of acidolysis was followed over a period of 72 h.50 µL aliquots were withdrawn at certain time intervals from the reaction mixtures into glass vials and stored at -20°C prior to analysis.

Triacylglycerol analysis by HPLC
The triacylglycerol (TAG) composition of the products obtained from reactions was followed by reversed phase high performance liquid chromatograph (HPLC).Samples were diluted in acetone, filtered and injected into HPLC.The HPLC system consisted of a quadratic pump (model LC-10ADVP; Shimadzu, Japan) equipped with a column (Sphereclone 5 µ ODS (2), 250 x 4.6 mm; Phenomenex, USA) with an accompanying guard column (40 x 3-mm id) of the same phase and an ultraviolet (UV) detector (Hewlett Packard Series 1100).Elution was monitored by UV absorbance at 215 nm.The mobile phase consisted of acetone and acetonitrile (50:50, v/v) with a flow rate of 1.0 mL/min.The column temperature was set at 50°C with a column heater (Eppendorf CH-30 column heater).

Overwiew of NNs
NNs technique is a data processing tool that mimics the function of the human brain and nerves built on the so-called neurons -processing elements-connected to each other.Artificial neurons are organized in such a way that the structure resembles a network.This technique differs from traditional data processing; it learns the relationship between the input and output data (Hecht-Nielsen, 1990).
Multilayer network models usually consist of three layers, which are input, hidden and output layers.The input layer constitutes input nodes representing input variables.The output of the input nodes are normalized and transferred to the hidden layer in which they are processed through a transfer function.The output layer consists of output variables.

PREDICTION OF A MODEL ENZYMATIC ACIDOLYSIS SYSTEM USING NEURAL NETWORKS
The basic element of a NN is an artificial neuron as shown in Figure 1, which consists of three main components; weight, bias and an activation function.Each neuron receives inputs x i (i = 1, 2, …, n) attached with a weight w ij (j Ն 1) which shows the connection strength for a particular input for each connection.Every input is then multiplied by the corresponding weight of the neuron connection and summed as A bias b i , a type of correction weight with a constant non-zero value, is added to the summation in Equation (1) as In other words, W i in Equation ( 1) is the weighted sum of the i th neuron for the input received from the preceding layer with n neurons, w ij is the weight between the i th neuron in the hidden layer and the j th neuron in the preceding (input) layer, and x j is the output of the j th neuron in the input layer.After being corrected by a bias as in Equation ( 2), the summation is transferred using a scalar-to-scalar function called an «activation or transfer function», f(U i ), to yield a value called the unit's «activation», given as Activation functions serve to introduce nonlinearity into NNs which makes it more powerful than linear transformation.

NN architecture
In this study, the usual feed-forward multilayer NN (Rumelhart and McClelland, 1986) with one single hidden layer was considered.One of the most important tasks in NN studies is to determine the optimal network architecture which is related to the number of neurons in the hidden layer.Generally, the trial and error approach is used.In this study, the best architecture of the network was obtained by trying different numbers of neurons.The trial started from two, and the performance of each network was checked by employing Mean Absolute Percentage Error (MAPE) defined as where N is the number of examplars in the training set, d i is the desired output, y i is the computed output.The goal is to minimize MAPE to obtain a network with the best generalization.
The relationship between the number of neurons ranging from 2 to 10 and the corresponding MAPE values obtained is presented in Figure 2. It is seen in Figure 2a that MAPE values decrease with an increasing number of neurons in the training stage.Therefore, the architecture of the network improves in the learning process with the increasing number of neurons.In the testing process, however, MAPE values reduce with the increasing number of neurons until the number of neurons reaches seven and then the MAPE values start to increase, which OZAN NAZIM ÇIFTÇI, SIBEL FADILOG ˘LU, FAHRETTIN GÖG ˘ÜS ¸ AND AYTAÇ GÜVEN implies that the network becomes more generalized with the increasing number of neurons until an optimum value is obtained.Beyond this optimum point the network turns out to be specialized only on the training set and it deviates from producing reasonable results in the testing stage.This procedure is a common experience in NN studies.The coefficient of determination, R 2 , is also shown in Figure 2. R 2 seems to be slightly affected by the increasing number of neurons in the training stage (Figure 2b) up to six neurons beyond which no change was noticed.However, Figure 2b shows that R 2 starts to decrease with the increase in the number of neurons after the seventh neuron.These findings are in agreement with previous studies on the MAPE.Based on these analyses, the optimal architecture of the NN was constructed as 3-7-3, representing the number of inputs, neurons, and outputs, respectively (Figure 3).
In the architecture the tangent-sigmoid transfer function is used as Most of the engineering applications of the NNs are based on back-propagation training algorithm [14].In this study, the Levenberg-Marquardt backpropagation algorithm was employed to minimize the mean square error (MSE) of the network.

͚
The data (99 in total) taken from the experimental study were used as training and testing sets for the chosen NN architecture.Among these, 20 (20% of total) were reserved for the test set and the remaining data were perceived in the training.The overall performances of both sets were evaluated by MSE -the slope a and intercept b of the best-fit linear line-and the determination coefficient (R 2 ).

RESULTS AND DISCUSSION
Acidolysis reactions were carried out at different substrate ratios, temperatures and times.The enzyme load used was 10 % of the substrates due to recommendations for this kind of acidolysis reactions (Xu et al., 1998).The major factors affecting the product formation were investigated.Here, SR is the ratio of moles triolein to the moles of palmitic acid.SR range was kept between 0.17 and 0.5.Because, SR = 0.5 is the limiting value to incorporate all palmitic acids to 1 and 3 positions of triolein.It will be unnecessary to decrease SR value less than 0.17.It is known that excess free fatty acids in the medium will acidify the enzyme layer because of high levels of free or ionized carboxylic acid groups or may cause desorption of water from the interface, and this will decrease the activity of the enzyme (Kuo and Parkin, 1993).The temperature range recommended is from 30 and 70°C for Lipozyme IM (Xu et al., 1998).But, at 30°C the solubility of substrates will be low and 70°C is a temperature over the boiling point of hexane.Also high temperatures may initiate acyl migration (Xu, 2000).So, a temperature range of 40 and 60°C was selected in our study.
ENNFs for the prediction of the formation of the major acidolysis products of triolein and palmitic acid (POO, POP and OOO) as a function of input parameters was derived.Input parameters and weights of the trained NN were extracted to form an explicit expression in the following manner.
Each input was multiplied by a connection weight (Equation 1) and then biases were simply added to this multiplication (Equation 2) and finally, the sum was transformed through a transfer function (sigmoid) (Equation 3) to generate an output.In order to acquire accurate results from the ENNF, prior to the execution of the training process of the NN, input and output parameters were normalized in the range of (-0.95; 0.95) by where ⌫ represents parameters used in the NN training process, c and d are normalization coefficients of that particular parameter.The normalization coefficients of each parameter used in the ENNF are given in Table 1.These parameters are substrate ratio; SR, temperature; T and time; t.Taking these independent parameters into account POO, POP and OOO can functionally be expressed as The ENNF for POO, POP and OOO derived from the proposed NN model can be expressed as where U i , (i = 1, …, 7) are given as U 1 = 3.0838SR + 0.3764T -0.1675t -24.6233 (12a) U 2 = -3.0959SR-0.0704T -0.0730t + 5.8976 (12b) U 3 = -21.7471SR-1.9690T + 1.1340t + 118.7453 (12c) U 4 = -5.0523SR-0.0117T -0.0069t + 2.0016 (12d) U 5 = 5.9807SR -0.3080T + 0.0767t -18.2323 (12e) It should be noted that ENNF in Equations 9-11 are valid for parameters ranging between the maximum and minimum values given in Table 1.
The ENNF shows strict agreement with experimental values, and the accuracy is also found quite high.The ENNF gives a fast and practical formulation of the concentration of acidolysis products and encourages the use of ENNF in other aspects of food engineering and biotechnology studies.The up-to-date application of the NN technology states that biological processes can be modeled by NN in such a way that it is presented as a black box.This study therefore differs from others in that it presents NN as an explicit formulation.
The robustness of the proposed NNs was proven via plotting three dimensional graphs using randomly selected hypothetical SR, T and t values versus POO, POP and POP concentrations (Figs 6  and 7).Experimental data was used to plot Figure 6 to show the effect of SR, T and t on the reaction.Figures 6a and b show the change in POP and OOO concentrations with changing SR and t values at 50°C, respectively.Figure 6c shows the effect of temperature and time on POO concentrations at SR value of 0.33.Figures 7a and b are the graphs plotted using ENNF estimated data to show the change in POP and OOO concentrations with changing SR and t values at hypothetical 48°C, respectively.Figure 7c shows the relationship between T, t and POO yield at a hypothetical SR value of 0.25.It was observed from the experimantal data that product yields (POP and POO %) increase with decreasing SR.This increase continues up to a certain reaction time and then stays nearly constant.Also temperature has little effect on this range.But lower temperatures gave higher yield for the main product of the system (POP).The same conclusions as above can be revealed by discussing the graphs obtained from ENNF estimated data.The figures show that the ENNF had captured the nonlinearity of the system very well.

CONCLUSION
NN predictions with the data for the test stage demonstrate a high generalization capacity of the proposed model with relatively low error and high correlation which exhibits a successful performance of the NNs model for the prediction of POO, POP and OOO concentrations both in training and testing stages.
The proposed ENNF is such simple that it can be used by any scientist who is even unfamiliar with NNs, in a spreadsheet on a very simple PC or even in a hand calculator when input parameters are measured.
Artificial neural network (ANN) is a convenient and cheap tool and can be a promising method for modeling the biological properties of food.Therefore, the ANN seems to find application in the control of complex enzyme catalyzed reactions.This can reduce the cost of processing and increase the quality of a product.

Figure 1
Figure 1Basic elements of an artificial neuron.
Figure 3The optimal NN architecture.

Figure 4
Figure 4 compares the NN predictions to the experimental data via scatter plots for training (Figure 4a,b,c) and testing sets (Figure 5a,b,c).It is clearly noted in Figure 4 that the proposed NN model has impressively learned the nonlinear relationship between the input and the output variables with MSE = 1.455 (a = 0.994, b = 0.237) and R 2 = 0.995 (Figure 4a); MSE = 1.027 (a = 1.000, b = 0.052) and R 2 = 0.999 (Figure 4b); MSE = 0.795 (a = 0.998, b = 0.010) and R 2 = 0.995 (Figure 4c).Comparing the NN predictions with the experimental data for the test stage demonstrates the high generalization capacity of the proposed model with relatively low error and high correlation (MSE = 2.423, a = 0.955, b = 2.232 and R 2 = 0.994 (Figure 5a), MSE = 3.690, a = 0.994, b = 0.005 and R 2 = 0.995 (Figure 5b), MSE = 0.957, a = 0.980, b = 0.357 and R 2 = 0.995 (Figure 5c), which exhibits a successful performance of the NNs model for prediction of POO, POP and OOO concentration both in the training and testing stages.Table 2 shows the NN prediction of POO, POP, OOO and experimental results with input parameters.
Figure 4 NN predictions (POO, POP and OOO) and experimental values for training sets.
Figure 6 (a) Surface plot showing the effect of SR and t on POP concentrations at 50°C.Graph was plotted using experimental data; (b) Surface plot showing the effect of SR and t on OOO concentration at 50°C.Graph was plotted using experimental data; (c) Surface plot showing the effect of T and t on POO concentration at SR = 0.33.Graph was plotted using experimental data.