A chemometrics study of analytical panels in virgin olive oil . An approach for evaluating panels in training

Un procedimiento matemático que incluye Correlación Canónica, Análisis Factorial y de "clusters", y Regresión Múltiple fué diseñado para estudiar si dos paneles analíticos, con diferentes niveles de entrenamiento, podrían ser considerados idénticos. Para evaluar si un panel en o después de un entrenamiento opera tan bien como uno experto, se analizaron las correlaciones entre los atributos de ambos paneles, las matrices de correlaciones parciales, la varianza explicada por los factores más importantes y la similaridad entre los aglomerados ("clusters") entre otras. Además, los tres factores más importantes fueron correlacionados con las valoraciones finales de calidad, para conocer la interrelacion entre la hoja de perfil y la tabla de puntuación. Finalmente, un análisis de regresión múltiple permite conocer como es el modelo que los paneles han diseñado para muestras de aceite de oliva virgen evaluadas. También se sugieren las tendencias en la investigación futura.

In 1974 Stoner et al. published the quantitative descriptive analysis method (QDA) opening a new way in sensorial evaluation.The consequence was an increasing use of the descriptive and quantitative multiscalar test in the sensorial analysis of foods that has become as widely used as Bengtsson's triangular test (Bengtsson, 1943).However, it was not possible to analyse mathematically the amount of information gathered by QDA without statistical multivariate algorithms.Undoubtedly, sensory and quality problems are generally multivariate problems (Resurrecion, 1988b), (Powers, 1988).Thus, factor analysis procedures such as principal components or principal factor (Clapperton et al, 1979), stepwise discriminant analysis (Ennis et al., 1980), different kinds of cluster programs (Sheppard, 1980), (Resurrecion, 1988a) and multiple regression (Resurrección, 1988b) have been some of the statistical methods widely used in reaching conclusions.
As far as the authors are aware, those statistical studies have been generally used to find interrelations among attributes, between their intensities and the final evaluation or between objectives measures and a sensory measure.This work has followed a different approach, discussing a methodology that allows everyone to know whether the level of training of a panel is similar to expert, using a panel with a large experience as pattern and another of less as instrument to calibrate.The autors consider that the panel, constituted by human panelists, is the sensory measurement instrument and try evaluating it.
according to the research goal and the number of DVs, IVs and covariates.However, the authors' intention is no only to discuss differences between the panels as to try to understand where and why there are differences and how to correct them, if it is possible.Then, the authors' approach follows a stepwise methodology is which the results of the statistical procedures of principal components, regression and cluster analysis have been used as clues to assess the next step.
Two panels of the Instituto de la Grasa, both trained following the same method, one with more than ten years experience evaluating the quality of virgin olive oil, and the other with only three months, have been used for developing the proposed methodology.The paper initially considers the hypothesis "The two panels are different to one another" and , as it was a null hypothesis, will attempt to demonstrate that there is only one panel.Other conclusions, that have been taken from the statistical procedures, are related to the sensory terms of the kind of food evaluated (virgin olive oil) and the similarity and dissimilarity between the panels.Thus, the intention of the paper is to present research results, and at the same time to explain the applied statistical methodology.

Sensory evaluation of oils
A data set of 24 samples of virgin olive oil belonging to more than one crop was collected from different zones all over Spain.The samples were evaluated according to the standard COI/T20 Doc. 3 (C.O.I., 1987) by two panels.The panelists evaluated each sample in duplicate, at different dates and making only one evaluation per session (Gutierrez et al, 1989).One of the panels (henceforth, panel A) was constituted by twelve panelists who had more than ten years experience; the other (panel B) was constituted by fifteen panelists with only three months experience after training.
Figure 1 shows, on the left, the olfactory-gustatory-tactile notes of virgin olive oil.Eighteen attributes with five levels of intensity can be detected and evaluated in virgin olive oil by the panelists, besides other free sensory variables.The attributes which can have a positive influence on the quality of oil are described at the top of the figure, while those with a negative influence are at the bottom.The nine points of the grading table are shown on the right.The whole rank is clustered in five great groups according to the defects and characteristics of the evaluated virgin olive oil.
All panelists were trained to detect whether one or more of those attributes could be perceptible in every sample from a barely level to an extreme one.They have also been taught to assign the best point The data that the paper analyses are the mean intensity of each one of the attributes for each sample by every panelist.Only four attributes of the whole set have not been detected in the samples.They are sour, vinegary, acid and humidity.

Statistic programs
Multivariate data evaluation was made by BMDP (BMDP, 1981) and SPSS (SPSS, 1986) packages running under VMS on a VAX8550.Six different programs were used, Canonical Correlation Analysis, Bivariate (Scatter) Plots, Stepwise Regression, Factor Analysis, Cluster Analysis of Cases and K-Means Clustering.
Factor Analysis was run with the following options: Principal Components Analysis (PCA), Kaiser's normalization Varimax (VMAX).Three tests, Barletts, Kaiser-Meyer-Olkin and anti-image covariance (AlC), were previously computed to know whether this procedure could be applied.
The assigned options on K-Means Clustering were: two clusters and their centers and distances were computed from the standardized data by the pooled withln-cluster covariance matrix, using Euclidean distances.
Cluster Analysis of Cases was performed to compute the similarity matrix on the basis of the Euclidean distance among the variables, which were the factors obtained by PCA afther double crossvalidation.
Stepwise Regression was run to estimate theparameters of a multiple linear regression by entering or removing variables according to the RSWAP method, being F__to_enter 4.30 and F_to_remove 4.28.While Canonical Correlation procedure performed the preassigned conditions of BMDP6M (BMDP, 1981).
intensities of the perceptible attributes having similar ranks with every sample evaluated by both two panels.The difference between each mean evaluation of the attributes by both panels was less than 1.0 generally.Only 3.4% of the evaluations were greater than that value but always less than 1.5.

RESULTS AND DISCUSSION
The panelists were independently taught and multivariate methods were used in training them.The outliers among panelists were previously identified and deleted from subsequent analysis.
To determine whether there were different opinions about the attributes between both panels, a correlation analysis was performed.Table I shows, using every sample, the correlation coefficient between the mean level of intensity of each attribute, described in Figure 1, with its level of significance.A good correlation was found between the following taste and smell attributes of both panels: fruity, green, bitter, pungent, mustiness, muddy, fusty and rancid, which are the most important in evaluating the quality of virgin olive oil.This fact is even more interesting, due to the mean Table I Table of correlation coeficients and their level of significance.

Canonical Correlation
The first selected statistical procedure was canonical correlation.The goal was to analyze the relationships between two sets of variables and to know how the two panels relate to each other.The first pair of canonical variates -linear combination of attributes-maximizes the correlation between a linear combination of panel A and a linear combination of panel B. Successive pairs explain the correlation after the variance due to previous pairs has been removed.Three canonical variates were considered necessary at the 0.01, after applying Bartlett's test.One of the canonical correlation were 0.99 (p<0.001),representing 99% overlapping variance between each pair of canonical variates.
The procedure also detected the attributes with the higher squared multiple correlations in the panel A with the chosen canonical variates of panel B, they were pungent (0.83, p=0.04) and mustiness (0.85, p=0.03).While the attributes with the higher correlation in the panel B with the canonical variates in panel A were bitter (0.93, p=0.001) and also pungent (0.89, p=0.008).
However, Tabachnick and Fidell (1983) suggest to reconsider the use of canonical correlation when a group of variables within a set are identifiable but correlated with one another.In this study, fruity (R-squa-red=0.94), in panel A, and fusty (R-squared=0.92), in panel B, can be considered nearly a linear combination of others in their panels.

Principal Componéis.
At this point, authors thought there was enough background for considering that both panels were, in fact, a unity, from a mathematical point of view.Even though, a null hypothesis was proposed: "There are two different panels", as the hypothesis to be verified (approved or rejected) and every statistical program was designed to verify it.
The selected statistical procedure was Factor Analysis.Factor analysis is the most commonly used technique to reduce a large number of variables to the smallest set of factors with which to explain the variance in the experiment.Thus, principal components analysis was used as the method of factor extraction, and the resulting configuration was rotated applying varimax.However, three different tests were previously computed to know whether the factor analysis could be applied.Thus, the results of Kaiser-Meyer-Olkin's test (0.76 panel A, 0.69 Panel B) showed the matrices were adequate to implement factorial analysis.Besides, the results of Barletts's test of sphericity rejected the null hypothesis (260.1 panel A, 243.7 panel B) and only there were 26 (14.3%) off-diagonal elements of Anti-lmage-Correlation Matrix greater than 0.09 in panel B and even lesser in panel A, 14 (7.7%).
The measure of sampling adecuacy displayed that the attributes "other ripe fruit", panel A, and "winey", panel B, had a low value.However, they have not been removed.
The first information given by this procedure was the covariance matrix.Table II and III show the matrices obtained with the independent analysis of the panels, in agreement with the proposed null hypothesis.As far as the authors can see, the level of the values are very similar in general.
However, some drawbacks have been found.The correlations between the pair of attributes rough-fusty, rough-rancid, metallic-fusty, metallic-rancid and metallic-mustiness are rather different.This discrepancy could be due to the characteristics of two of those attributes: rough (tactile note) and metallic (olfatory-Table IV Proportion of variance explained by each factor.gustatory-tactile note).This second attribute has a clear relation with the storage, fustyness and the kind of technological process used, which is quite unlike the first attribute.Thus, those correlations given by panel A are closer to the real meaning of both attributes.

PROPORTION OF VARIANCE EXPLAINED
At first glance, the discrepancy indicates that panel B has not attained an adequate knowledge about those attributes, due to the fact that they are not easy to understand and the evaluation of a great number of samples with those attributes is required to gain a good experience.More studies could be made of other relationships such as, for instance, sweet-vinegary, even though the level of confidence on the conclusions would probably be lower.

Table V
Sorted rotated factor loadings of both panels.Many analysts use the value of 1.0 for the explained variance to decide how many factors are adequate to model the data.However, we applied double crossvalidation to define the number of significant factors; by this means, it was found that the first three factors, explaining more than 70.0% of the total variance, were significant enough in sensory analysis.Table IV shows that the successive addition of only three factors gives that account of the variance explained in data space, and 1.0 in factor space.Moreover, the first factor of each panel explains 52.04% (panel A) and 52.89% (panel B) of the total variance.In this way, the variance explained by panel B is only a little greater than that by panel A, showing a certain degree of mathematical similarity between the panels, against the proposed hypothesis.
The loadings for each of the attributes on the three factors are given in Table V.These loadings were obtained after applying a varimax rotation that diminishes the mathematical rumour by maximizing the variance of the loadings across attributes within factors and also tending to reapportion the variance among factors so that they become relatively equal in importance.
In Table V, loadings less than 0.25 have been replaced by zero.Factor 1 of panel B appears to increase as similarity to the attributes described as "fusty", "muddy" and "mustiness".These, which belong to a non-positive group of attributes, have a closed relation.Besides, the sign of the attributes were chosen in a mode opposite to logic and in consequence the positive notes have negative coefficients, contrary to the negative ones.In the authors' opinion, the panelists were worried looking for negative notes in the samples rather than positive ones, which can be a psychological problem of beginners.Other partial conclusions could be taken from factor 2 or 3, such as the signs of factor 2 follow the logic now, or factor 3 is composed by negative attributes excepting "other ripe fruit" (here "other").
In the other panel, Table V, the positive and negative attributes elected by all three factors are more balanced, due to its large experience.In contrast to panel B, factor 1 appears to be strictly related to the positive attributes with closed profile, so a virgin olive oil with the attributes fruity and green logically has the attributes bitter and pungent, as the loadings of panel A show on this factor.Despite these differences, the general performance of both panels seems similar, because they are as a figure and its mirror-image.Panel A evaluates the presence of positive attributes and the absence of negatives ones, while panel B evaluates the presence of negative attributes and the absence of the positives.Therefore, a cluster analysis could help us to accept or reject the null hypothesis, due to its being a general descriptor for procedures that groups cases according to some measure of similarity.This kind of analysis can provide important clues to the most probable groups of the samples, independently of the panel that evaluated them.
Before that, it was important to know if there were multivariate outliers in panel A and B and delete them from subsequent analysis.
The ratio CHISQ/DF, BMDP (1981), is relevant for identifying outliers among cases.This ratio contains Mahalanobis distances, evaluated as Chi-Squares, of each case from the centroid of the cases for the original data.Because Chi-Square has been divided by degree of freedom, one can follow the same process for p=0.01 with 14df for each panel.Therefore, values in excess of 2.08 indicate outliers.Applying this criterion, no samples were found as outliers in both panels.
Following a similar procedure, the ratio between the Mahalanobis distance of each case from the centroid of the factor scores and degree of freedom allows to identify the outliers with respect to the solution.The new cutoff value was obtained by chosen number of factors as degrees of freedom.With the use of this new criterion, two cases, sample number 4 in panel A and number 7 in panel B, were found to be near to outliers in the solution space.

Cluster Analysis
The non-hierarchical cluster procedure "K-Means Clustering" was the first attempt to determine if there were two panels, which meant that each panel evaluated the same sample in different mode, or only one panel, which meant that the attributes detected by both panels were similar in general.
Thus, to verify the similarity among the samples, we established two as the fixed number of homogeneous groups of cases to be built, without considering whether the samples were evaluated by either panel A or B. The result is shown in Figure 2.There are the two groups as it was programmed on the statistical procedures; one of them is constituted by four samples and the other by the rest.But the samples of this small group belong to both panels and they are coupled two-by-two, Table VI.
Hence, this result shows that initially there were not two panels because the cluster analysis was not able to cluster panel A independently of panel B. The order of building two clusters was executed, but, in both clusters, each sample of one of the panels is accompanied with its couple in the other panel, despite the sample number 7 was only near to outlier in panel B.
Despite this good result, verifying that panel B was as expertise as panel A, a hierarchical clustering technique was used to know deeper the similarity among samples and to single out some categories.We applied cluster analysis to the information held by the three significant factors computed by PCA.The similarity values of the linkages, on the basis of the Euclidean distance, were represented by the dendogram shown in Fig. 3; at a similarity value of 0.6 three great groups are separated, two of them closer to each other.The results are worse applying the hierarchical technique than the other nonhierarchical, which is normal due to the fact that the first technique joins the samples without predetermined premises.However, this is a good result, in authors' opinion, because only 25% of the evaluations of the same sample were classified in different clusters.In the greatest group, which gathers the first and second, it is possible to  Dendogram of both panels using complete linkage by SSPS+Statistical library.
find groups constituted only by samples evaluated by the same panel, but at a similarity value between 0.8 and 1.0.In the third group, there are only two samples of panel A without their couple from panel B.

Correlation and Regression Analysis.
Until now, the paper has analysed the panels by the attributes and their intensities detected in the samples.However, this analysis could be considered incomplete without a study of the final evaluation of every sample given by each panel.
A correlation program was run and Its result, a correlation coefficient of 0.82 with a level of significance lesser than 0.001, can be considered good enough in general (Powers, 1984).However, the authors considered that some information could be pointed out, matching those values with the information kept In the factors.
Three correlation coefficients were also computed comparing the three factors of each panel with its final evaluation.The results.Table VII, show that the greatest correlation is with the first factor, which explains greater variance that the others, Table IV The best correlation with the first factor shows that the panelists were good trained to assign the best level of the grading table according to the attributes previously identified and only following fuzzy rules.Thus, it is most important to remark the fuzzy logic and the statistics get similar results in this case.So, the authors are working on new research in which they hope to explain by fuzzy linguistic terms how the panelists assign the points of the grading table.
Meanwhile, a multiple linear regression was used to improve the correlation coefficients displayed in table VII.It was performed on the basis of the RSWAP method to entering and removing the attributes, the final evaluation as dependent variable and the attri- Consejo Superior de Investigaciones Científicas Licencia Creative Commons 3.0 España (by-nc) http://grasasyaceites.revistas.csic.es Figura 2 A scatter plot of the orthogonal projection of cases into the plane defined by the centers of the two cluster built by a K-Means Clustering Procedure, Figure 3

Table II
Matrix of partial correlations of each pair of attributes of panel A.

Table III Matrix
of partial correlations of each pair of attributes of panel B. fruity appple

Table VII Correlation
. coefficients between the final evaluation and the factors for each panel.