Application of artificial neural networks to determine the authentication of fattening diets of Iberian pigs according to their triacylglycerol profiles

Se ha determinado mediante cromatografía de gases con detector de ionización de llama los triglicéridos de la gra­ sa subcutánea de cerdos ibéricos, cebados con cuatro tipos de alimentación: montanera, recebo, cebo extensivo y cebo intensivo. Los análisis se realizaron en una columna con una fase estacionaria ligada químicamente (50% fenil­50% metil­ polisiloxano) usando hidrógeno como gas portador. La grasa subcutánea se extrajo por fusión en horno de microondas, posteriormente se filtró y se disolvió en hexano. Un total de 2.783 muestras de varias campañas fueron analizadas. Usando los triglicéridos como descriptores químicos se ha llevado a cabo un estudio sobre la capacidad de discrimina­ ción de éstos para diferenciar el tipo y régimen de alimenta­ ción de los cerdos. A tal fin, se han empleado técnicas de reconocimiento de patrones, tales como análisis discriminan­ te lineal (LDA) y redes neuronales artificiales de perceptores multicapa (ANN­MLP). Las ANN presentan mejores resulta­ dos que el LDA, con una capacidad de predicción media de aproximadamente 97% en la diferenciación del tipo de ali­ mentación entre Montanera, Cebo extensivo y Cebo intensi­ vo. Al incluir el recebo, el modelo presenta un rendimiento promedio de 82%. La diferenciación del régimen de cebado también se ha llevado a cabo por medio de la ANN, con un rendimiento promedio del 96%.


INTRODUCTION
The Iberian pig is an autochthonous porcine breed from the southwestern Iberian Peninsula traditionally fattened in a freerange system (extensive).According to the different type of feeding in the final stage of fattening, different qualities of Iberian drycured products are obtained.This final fattening diet of the animals determines their prices in the market.Therefore, products from pigs fattened according to an extensive system, based on montanera, which consists of the exploitation of acorn and grass, reach higher prices in markets than those fed mixed diets because they are more appreciated by consumers.Quality control measures have been proposed and established to avoid fraud in the marketing of the three commercial types of meat products from Iberian pigs depending on their production background: montanera, recebo (fed with acorn, pasture and formulated feed) or Application of artificial neural networks to determine the authentication of fattening diets of Iberian pigs according to their triacylglycerol profiles By M. Narváez-Rivas1 , E. Gallardo 1 , J.M. Jurado2 , I. Viera-Alcaide 1 , F. Pablos 2  and M. León-Camacho 1, * M. Narváez-rivas, e. Gallardo, J.M. Jurado, i. viera-alcaide, F. Pablos aNd M. leóN-caMacho al., 2008) and between extensive and intensive systems (Gallardo et al., 2012).Previously, the triacylglycerol profile has been extensively used for the characterization and authentication of several foodstuffs such as olive oil, vegetable oils, animal fats, fish oils, milk and dairy products, cocoa and coffee (BosqueSendra et al., 2012;González et al., 2001).
In this work triacylglycerols in a large number of Iberian pig subcutaneous fat samples from different campaigns have been determined by gas chromatography with flame ionization detection (GCFID).Differences in the composition of the triacylglycerol profiles have been used to differentiate among four fattening diet types (montanera, recebo, extensive cebo and intensive cebo).Using them as chemical descriptors, pattern recognition (PR) techniques, such as principal component analysis (PCA), linear discriminant analysis (LDA) and artificial neural networks (ANN) have been applied.

Samples
A total of 2783 samples of subcutaneous fat from castrated pure Iberian male pigs from the campaigns corresponding to the years 2002, 2003, 2004, 2005, 2009, 2010 and 2011 (Table 1) were analyzed.1169 samples were from animals fed with a fattening diet based exclusively on acorn and pasture, usually called montanera (M).Another group consisted of 448 samples from animals fed with commercial feed and pasture, usually called extensive cebo (EC).134 samples corresponded to animals fed with concentrated feed in an intensive system, intensive cebo (IC) and the last group (1032 samples) was made up of animals reared in montanera but also included commercial feeds in the final fattening period, cebo (fed with mixed feed) (LópezBote, 1998).In 2004, the Spanish government established a regulation to fix these three Iberian pig products according to the fatty acid composition (palmitic, stearic, oleic and linoleic) from the total lipids of the pig's subcutaneous adipose tissue (BOE, 2004).However, because of the use of feeds enriched with fatty acids similar to those found in acorns and the increase in formulated feeds in an extensive regimen during the final fattening period, the Spanish government has set a new regulation to fix the quality standards of Iberian pig products taking into account the extensive and intensive systems, collecting four different types of meat products: Montanera, Recebo Extensive cebo and Intensive cebo (BOE, 2007).But in this latest regulation, no analytical method is indicated although an efficient method is needed to prevent the wrongful use of the commercial name of higher quality products.
Several approaches have been applied in order to determine the final fattening diet received by the Iberian pigs.The profiles of fatty acids (Cava et al., 1997, Flores et al., 1988, Ruiz et al., 1998), triacylglycerols (Díaz et al., 1996, NarváezRivas et al., 2009, VieraAlcaide et al., 2007, 2008), hydrocarbons (VieraAlcaide et al., 2009;Gamero Pasadas et al., 2006;NarvaézRivas et al., 2008) and volatile compounds (NarvaézRivas et al., 2010(NarvaézRivas et al., , 2011) ) have been related with the fattening diets of Iberian pigs and can be used as chemical descriptors to differentiate among the different feeding backgrounds of the animals.Nearinfrared spectrometry (NIR) is another method that has been used for the authentication of this animal fattening diet (Arce et al., 2009;Hervás et al., 1994;Zamora Rojas et al., 2011, 2012).This is a very simple, fast, cheap and nondestructive technique, but it does not differentiate perfectly among the different fattening systems and does not give information about the sample chemical composition.
One of the methods that would be a useful tool to authenticate Iberian pig products from different fattening diets is the analysis of triacylglycerols because this shows several advantages: i) it is faster than others that require a more or less long time for the derivatization or recovery; ii) it is carried out without a previous treatment or loss of them since it is performed by direct injection of the sample (fat) dissolved in hexane; and iii) it allows for differentiating among the three fattening diets (montanera, recebo and cebo) (VieraAlcaide et ApplicAtion of ArtificiAl neurAl networks to determine the AuthenticAtion of fAttening diets… different triacylglycerol species identified in the chromatograms shown in Figure 1.Palmitindioleine was used as a reference to calculate the relative retention times.

Data analysis
The seventeen triacylglycerols were used as chemical descriptors and their peak areas as analytical signals.The quantification of each one was carried out by evaluating the corresponding relative percentage according to the normalization area procedure, assuming an equal factor response for any species.A data matrix, whose rows are the samples and columns the variables, was built.Each element of this matrix xij corresponds to the content of the triacylglycerol j for the sample i.The PR calculations were made using the statistical package CSS:STATISTICA from StafsoftTM (Tulsa, OK, USA).

Fat samples analysis
Table 3 shows the mean, standard deviation, median and range values of all the samples and groups of different fattening type for the triacylglycerols determined in the analyzed fat samples corresponding to the seven different campaigns considered and which is usually known as recebo (R).The outdoor rearing (montanera, recebo and extensive cebo) are called extensive systems (ES).When they are fed only with concentrated feed in farms (intensive cebo) it is called intensive system (IS).Samples were kindly provided by the Designations of origin "Jamón de Huelva", and INIA (Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria).They were taken from different zones of production located in the southwest of Spain (Huelva, Cádiz, Sevilla, Cáceres, Badajoz, Salamanca, Córdoba and Málaga).The animals were classified into the different groups according to the field notes taken by the veterinary inspector of the Designations of origin and INIA during the final fattening period.

Extraction of the subcutaneous fat
Two samples were taken from each sacrificed animal according to a normalized procedure described in the literature (BOE, 2004).Briefly, the procedure consists of cutting a piece of approximately 3 × 3 cm with at least 6 mm of thickness from the rump, at about 10 cm of the tail following the line of the back and containing skin, adipose tissue and some loin.At the laboratory, the skin and loin were carefully removed (BOE, 2004).All the chunks corresponding to each animal of a sacrifice lot were punctured and homogenized before extraction.The representative sample of the lot was then obtained by melting the fat in a microwave oven for 3 minutes at 360 W (De Pedro et al., 1997).The fat samples were then filtered and 0.05 mg of fat were dissolved in 1.5 mL of nhexane for the gas chromatography analysis.Three replicates were determined for each sample.

Determination of the triacylglycerols by GC
Triacylglycerols were analyzed and identified as described previously by gas chromatography (VieraAlcaide et al., 2007) in a Varian 3800 gas chromatograph (Varian Co, Palo Alto, CA, USA) using a DB17HT (Agilent J&W, Loveland, CO, USA) fused silica capillary column (30 m long × 0.25 mm i.d.0.15 mm film thickness).The oven temperature was kept at 320 °C, and was then raised to 350 °C at a rate of 2.0 °C min -1 and held isothermally for 10 min.The injector temperature was kept at 360 °C, while the detector temperature was 370 °C.Hydrogen (2.1 mL min -1 column constant flow) was used as carrier gas and the makeup gas was nitrogen.Aliquots of 2 µL were injected.
The identification of seventeen triacylglycerol species was carried out by means of standards of trilinolein (LLL), triolein (OOO), tripalmitin (PPP) and tristearin (SSS) and by comparison with the relative retention times described previously in the literature (VieraAlcaide et al., 2007).Table 2   grouped according to the four fattening types.The results for the different species of triacylglycerols are in good agreement with those reported by other authors (VieraAlcaide et al., 2008).As can be seen, the most abundant triacylglycerol is POO, with median values of 31.87%.Other major triacylglycerols are PSO, OOO and PLO, with median values of 14.05, 10.39 and 10.27%, respectively.POP, POPo+PLP, SOO, SOL and OOL present medians that range between 3.67 and 6.63%.The remaining compounds range from 0.21%, for PPP, to 1.30%, for SOS.In the case of PLO, SOO, OOO, SOL, OOL and OLL, samples obtained from pigs fed in the extensive system (montanera, recebo and extensive cebo) present higher mean and median values than those from intensive fattening (intensive cebo).The mean and median contents in the remaining triacylglycerols are higher in intensive fattening samples (intensive cebo).

Classification of the different fattening types
The KruskalWallis test was applied in order to find out significant differences among the triacylglycerol contents of samples from montanera, extensive cebo and intensive cebo fattening diets.This test calculates the H parameter for comparison with the chisquared distribution for n1 degrees of freedom and α=0.05,where n is the number of groups being considered (Muth, 1999).In order to highlight significant differences among the three types of fattening diets a post hoc comparison was also carried out.This test was applied to samples from campaigns from 2002 to 2005, because these cases were initially considered to build a first model to be extended to the campaigns 2009 2011.Table 4 shows the H-value obtained for the seventeen considered variables and the results of the post hoc analysis.It can be seen that the largest H-values are obtained for OOO, PSO, OOL, POP and OLL.These variables present significant differences between montanera and the two type of cebo fattening systems as well as PPP, MOP, PPS, POPo+PLP, PPS and SOS.Intensive cebo is statistically different from extensive cebo and montanera considering the contents of PLL+PoLO and SOL, but lower Hvalues are computed.POO and POPo+MLO show significant differences between montanera and extensive cebo.Only for SOO there are positive differentiations among all the possible comparisons.PLO does not present a significant Hvalue and it is not considered for further calculations.
In light of KruskalWallis test results, triacylglycerol contents could be considered as potential chemical descriptors to differentiate feed types for Iberian pigs.To corroborate this thought, principal component analysis (PCA) was applied with the aim of finding trends in the ApplicAtion of ArtificiAl neurAl networks to determine the AuthenticAtion of fAttening diets… as a linear combination of the original ones and condenses the variance of the data set as much as possible for each successive PC.Thus, a graphical representation of the cases in the space of the first two PCs allows for an easier visualization of data trends with a lower dimensionality (Jolliffe, 2002).
In this case, the first three PCs account for 52.53%, 13.78% and 10.64% of total variance, respectively.The most contributing variables to PC1 were PPP, MOP, PPS, POP, PPS, PSO, SOS, OOO, OOL and OLL.PC2 is highly correlated to PLO and PLL+PoLO and PC3 to SOO.The distribution of the data in the plane of the two first PCs is shown in Figure 2. Some separation can be observed for samples from montanera and extensive cebo according to their scores of PC1, as might be expected taking into account the most contributing variables to this PC and the results of the Kruskal Wallis test.Samples from intensive cebo appear completely separated from montanera in the space of these two PCs.In light of such results, linear discriminant analysis was applied to build an adequate classification model.LDA computes linear combinations of the data to obtain discriminant functions (DFs) as linear combinations of the original variables.These DFs separate the considered categories by minimizing the withinclass and betweenclass ratio of the sum of squares.The model can be constructed through a stepwise approach by selecting only the most discriminating variables and reducing the number of variables used as chemical descriptors (Massart, 1998).In this case, a forward stepwise approach was used, obtaining two DFs in order to differentiate montanera, extensive cebo and intensive cebo from campaigns 2002 to 2005.This model eliminates the M. Narváez-rivas, e. Gallardo, J.M. Jurado, i. viera-alcaide, F. Pablos aNd M. leóN-caMacho PSS, PSO, SOS, SOO, OOO, SOL, OOL and OLL, could be used as discriminant variables to differentiate the three considered classes.Using the listed variables, different models were built in order to classify samples from the campaigns 2009, 2010 and 2011.The models were obtained using a training set of cases to find out the relationship between the discriminant functions and the original variables and a test set to study the model performance.Two parameters, sensitivity (SENS) and specificity (SPEC) were computed from the classification matrix of the test set.SENS of a class represents the percentage of cases correctly classified in that class and, SPEC corresponds to the percentage of objects not belonging to a certain class and subsequently classified as pertaining to another (Forina et al., 1991).The results of all models are shown in Table 5.
The first model (LDA1a) was built by using samples of montanera, extensive cebo and intensive cebo from campaigns 20022005 as training cases and samples from 2009 as test ones.In this case the prediction ability was 84.9% for montanera, 41.7% for extensive cebo and 65.6% for intensive cebo.SPEC varies from 84.4% to 88.6% for intensive cebo and extensive cebo, respectively.In order to improve these results, a second model (LDA1b) was constructed by adding 2/3 of the samples from campaign 2009 to the training set.These samples were selected randomly.When the model was applied to the test set, which consisted of 1/3 of samples from 2009, SENS was increased to 91.8% and 100% for montanera and intensive cebo.The class extensive cebo presented the lowest SENS value (44.4%).SPEC of the three classes was also improved.The model LDA2a was obtained using samples from campaigns 20022005 and 2009 as training cases and was applied to test cases from 2010.In this case, SENS of montanera and extensive cebo were 96.4% and 74.7%, respectively.Although no samples were incorrectly classified as intensive variables POO and PLL+PoLO, using the remaining 14 variables to solve the classification problem.The recognition ability of the model, defined as the percentage of samples from those used to compute the DFs correctly assigned to their class, was 98.1%, 92.8% and 100% for montanera, extensive cebo and intensive cebo, respectively.As can be seen in the distribution of the samples in the plane of the two computed DFs (Figure 3), the considered classes appear separately.
According to these results, the contents of PPP, MOP, PPS, POP, POPo+PLP, PLPo+MLO,   ApplicAtion of ArtificiAl neurAl networks to determine the AuthenticAtion of fAttening diets… (an input layer, various hidden layers and an output layer), with unidirectional connections from input to output.Just as LDA, ANNs learn from the data and a training set is necessary to find out the relationships between inputs and outputs, whilst a test set is used to show the prediction ability.The basic structure of the models was a 14:8:3 MLPANN, or 14 input neurons (one for each variable used in LDA models), 8 neurons in the hidden layer and 3 outputs (one for each of the considered classes).The training method was back propagation, minimizing the error made by the network.To avoid overfitting problems, a third set of cases, the verification set, is required to crossvalidate the models (Tetko et al., 1995).Model ANN1a uses the same data matrix as LDA1a but, in this case, samples pertaining to campaigns from 2002 to 2005 were randomly divided into two subsets.2/3 of the samples were used as training cases and 1/3 as verification cases.All samples from campaign 2009 were used as test cases.SENS of montanera, extensive cebo and intensive cebo, as can be concluded from the SPEC value (100%), this class had a very low SENS (4%).When 2/3 of samples from 2010 were added to the training set, model LDA2b was obtained with 94.1% and 95.9% of overall SENS and SPEC, respectively.It could be thought that this is a good solution to the classification problem, but models LDA3a, constructed with samples from 20022005 and 20092010 and tested with samples from 2011 present bad results.The overall prediction ability hardly improves by using the same training cases and 2/3 of samples from 2011 (LDA3b).
In view of the fact that a linear approach did not provide an adequate solution to our classification problem, a nonlinear model was applied.Artificial neural networks (ANN) mimic the biological nervous system and can be used where other modeling techniques cannot predict complex phenomena (Zupan et al., 1993).In this case, multilayer perceptrons (MLP) ANN have been applied.MLPANN are feed forwarded networks consisting of neurons arranged in layers presents recognition abilities of 65%, 92.1%, 100% and 51.7% for montanera, extensive cebo, intensive cebo and recebo, respectively.As can be seen in figure 4, samples of recebo appear overlapped with montanera and extensive cebo in the plane formed by the first two DFs.On the basis of these results, twelve new models have been developed including samples of recebo in the different subsets of cases.
Codes of these models are the same as the twelve explained above because the same campaigns and percentage of training, verification and test cases were used, but the letter r is added to indicate that recebo is also included.ANN models including recebo improve LDA results, especially when samples of the tested campaign were included to build the model (series br).The best result was obtained by ANN1br, where SENS was 91.7% and SPEC 97.4%.SENS decreases slightly to 85.4% when samples from 2010 are incorporated (ANN2br).The same happens when using samples from 2011 (ANN3br), showing a SENS of 68.6%.This last model only correctly classified 33.3% of recebo and 60% of extensive cebo.In conclusion, the use of nonlinear models as well as the inclusion of samples from the tested campaign into the training set improves LDA results, but the fact of considering recebo as a class lead to unsatisfactory results.

Classification of different fattening systems
The same strategy has been used to study the differentiation capability of LDA and ANN according to the fattening system (extensive and intensive).Only SENS was computed because with only two classes, SPEC of a class has the same value of SENS of the other considered class.Results are shown in Table 6.An initial LDA model was built by using samples from campaigns 20022005 and a forward stepwise approach.The selected variables were PPP, MOP, PPS, POP, POPo+PLP, SOS, SOL, OOL and OLL. Figure 5 shows the sample distribution according to the scores obtained for the computed DF.The recognition ability was 100% and 96.1% for intensive and extensive systems, cebo were 98.6%, 61.1% and 71.9%, respectively.These results improve when data from 2009 was divided into 1/3 training cases, 1/3 verification cases and 1/3 test cases.This model (ANN1b) leads to 100% prediction ability for montanera and intensive cebo, and 91.7% for extensive cebo.SPEC was 100% for the two last classes.ANN2a, using data from 2010 as test and the previous years as training (2/3) and verification (1/3), was unable to correctly classify any intensive cebo samples.However, the inclusion of samples from 2010 into training and verification sets solved this problem (ANN2b), leading to overall SENS and SPEC of 98.7% and 99.0%, respectively.The performance declines again when using data from 20022005 and 20092010 to train and verify the model ANN3a and samples from 2011 as test cases.But this performance also improves when dividing the samples from this year into training, verification and test (model ANN3b), with overall SENS an SPEC of 94.2% and 97.8%, respectively.From these results it can be concluded that nonlinear models, such as MLPANN, perform better than linear ones in the classification of samples of Iberian pig fat according to the fattening diet.In addition, it is necessary to consider the annual variability between different campaigns to establish relationships between the triacylglycerol contents and the fattening diet.For this reason, every new model must include training samples from the campaign to be studied.
Once the classification problem of montanera, extensive cebo and intensive cebo is solved, it is important to pay attention to samples of recebo.If the developed models were used to classify samples of recebo, they are mostly assigned to montanera and extensive cebo classes, as the models do not included samples of recebo in the training set.To include recebo samples a new forward stepwise LDA model was computed using campaigns 2002 2005 to find out the most discriminant variables.The selected variables were the same fourteen used in the previous models.This initial LDA

CONCLUSIONS
LDA and MLPANN have been applied to study the classification of Iberian pig fat samples according to their fattening diet.Fourteen variables were selected by forward stepwise LDA to obtain the models, using the contents of PPP, MOP, PPS, POP, POPo+PLP, PLPo+MLO, PSS, PSO, SOS, SOO, OOO, SOL, OOL and OLL.The models were built using samples from different campaigns and testing with cases from new campaigns.From this chemometric study it can be concluded that a nonlinear approach, such as MLPANN, obtains better classification efficiencies than linear ones.On the other hand, due to the betweenyears variability, the use of training samples from the same campaign to be tested is essential.This fact can be due to different factors, such as the pasture composition or amounts of acorn consumed by the animals, and these are intrinsically related to the climate and rainfall of the particular year (Narváez Rivas et al., 2009 andVieraAlcaide et al., 2008).In addition to these conclusions, recebo samples respectively.This model was applied by using samples from 2009 as test set (LDA4a), with 74.4% and 84.4 of sensitivity for extensive and intensive systems, respectively.When 2/3 of the samples from 2009 are included in the training set, the SENS of the intensive fattening system rises to 100% and the value for the extensive system increases to 81.0%.Using samples from 20022005 and 2009 as training (LDA5a), samples from 2010 belonging to extensive systems are 100% correctly classified but a SENS of 12% is obtained for the intensive system.This value grows to 100% by using 2/3 of samples from this campaign as training cases, falling to 96.7% for the extensive system.Model LDA6a was built to test 2011 samples using cases from 20022005 and 20092010 as training set, with 91.6% and 50% of SENS for extensive and intensive fattening systems, respectively.These values improve to 71.2% and 82.6% when samples from 2011 are used as training cases.
ANN models were built in the same way as LDA, but using 1/3 of training cases as the verification set.Model ANN4a used samples from 20022005 to obtain a 9:10:2 network with 100% prediction ability  appeared mixed with montanera and extensive cebo.This fact depends on the procedure and sequence of feeding, by alternating feed and acorn, which is usually performed to mimic products of higher quality.LDA and ANN models were also trained to carry out the differentiation of fattening systems.In this case the contents of PPP, MOP, PPS, POP, POPo+PLP, SOS, SOL, OOL and OLL were selected by forward stepwise LDA.In view of the results, the same conclusions can be extracted: ANN models perform better than LDA and samples of the tested campaign must be included to take into account betweenyear variability.

T
Figure 1Chromatogram of the triacylglycerol profiles of Iberian pig subcutaneous fat sample.Peak identification: see table 2.
Figure 3 Distribution of the samples according to their scores for the discriminant functions.
Figure 2 Score plot in the plane of the two first PCs.

Figure 4
Figure 4Distribution of the samples (including recebo) according to their scores for the discriminant functions.

Figure 5
Figure 5 Distribution of samples from extensive (ES) and intensive (IS)fattening systems according to the scores obtained for the computed discriminant functions.Fattening system

Table 3 Triacylglycerol contents (%) of Iberian pig subcutaneous fat samples analyzed during seven campaigns and grouped by fattening diets PPP MOP PPS POP POPo+PLP PLPo+MLO PSS PSO POO PLO PLL+PoLO SOS SOO OOO SOL OOL OLL
M: montanera; R: recebo; EC: extensive cebo; IC: intensive cebo.n, number of samples of each type sample distribution.PCA sequentially obtains new variables, called principal components (PCs),

Table 4 Kruskal-Wallis test results Variable H a Significant difference in post hoc comparison
a Significant difference for H> 5.99

Table 5 Sensitivities and specificities (%) obtained for the models built
. Narváez-rivas, e. Gallardo, J.M. Jurado, i. viera-alcaide, F. Pablos aNd M. leóN-caMacho M For instance, LDA1ar implies that samples from 2002 to 2005 were used as training and samples from 2009 were used as test.The overall results of this model were SENS of 50.3% and SPEC of 83.7.Samples of recebo were confused with the other three classes.This situation does not improve by including samples from 2009 into the training set (LDA1br), with a SENS of 18.2% for the recebo class.Model LDA2ar, using campaigns 20022005 and 2009 as training set, improves the prediction ability for samples of recebo from 2010 to 84%, but intensive cebo falls to 4%.The best results from those models using LDA were obtained from LDA2br, with overall SENS and SPEC of 82.4% and 93.3%, respectively.The improvement is reversed by models using data from 2011, LDA3ar and LDA3br, which showed overall SENS values of 37.6 and 46.7 respectively.