Selecting and training a panel to evaluate the rancid defect in soybean oil and fish hamburgers

C. Marques, A.S. dos Reis, L.D. da Silva, S.T. Carpes and M.L. Mitterer-Daltoé*

Graduate Program in Chemical and Biochemical Technology Process, Chemistry Department, Federal University of Technology, Km 01, 85503-390 - Paraná, Pato Branco, PR, Brazil.

*Corresponding author:



Due to its composition of unsaturated and polyunsaturated fatty acids, oils and fats are very susceptible to oxidation, with rancidity being one of the main defects. Among the several existing methodologies to monitor oxidation in foods, sensory analysis stands out because of the sensitivity of responses. Accordingly, this study aimed to select and train a panel of expert assessors to identify the rancid flavor, showing the statistical steps in the process. Assessors were selected according to their individual performance, statistically analyzed by ANOVA and Tukey’s mean comparison, Wald Sequential Analysis and chi-square test. The validation of the trained panel was carried out with the sensory analysis of fish burgers and soybean oil. F Value and box-plot graphic methods were effective for better visualization of results when used along with the mean and standard deviation tables. The final trained panel consisted of seven assessors, who have been able to identify and differentiate rancid taste in both samples used for validation.



Selección y entrenamiento de un panel para evaluar el defecto rancio en aceite de soja y hamburguesas de pescado. Debido a su composición en ácidos grasos insaturados y poliinsaturados, los aceites y grasas son muy susceptibles a la oxidación, siendo la rancidez uno de los principales defectos. Entre las diversas metodologías existentes para seguir la oxidación en los alimentos, el análisis sensorial destaca por la sensibilidad de las respuestas. Este estudio tuvo como objetivo seleccionar y capacitar a un panel de catadores entrenados para identificar el sabor rancio y se muestra estadisticamente los pasos del proceso. Los catadores fueron seleccionados de acuerdo a su capacidad individual y estadísticamente analizados por ANOVA comparando las medias mediante Tukey, análisis secuencial de Wald y prueba de chi-cuadrado. La validación del panel entrenado se realizó con el análisis sensorial de hamburguesas de pescado y aceite de soja. El valor F y la representación gráfica boxplot fueron eficaces para una mejor visualización de los resultados cuando se utilizan junto con las tablas de desviación media y estándar. El panel final estuvo formado por siete catadores que fueron capaces de identificar y diferenciar el sabor rancio en ambas muestras utilizadas para su validación.


Submitted: 19 January 2017; Accepted: 27 March 2017

ORCID: Marques C, dos Reis AS, da Silva LD, Carpes ST, Mitterer-Daltoé ML

KEYWORDS: Bioactive Phytochemicals; Fish; Flavors; Food Quality; Sensory Analysis

PALABRAS CLAVE: Análisis sensorial; Calidad de los alimentos; Fitoquímicos Bioactivos; Pescado; Sabores

Citation/Cómo citar este artículo: Marques C, dos ReisAS, da Silva LD, Carpes ST, Mitterer-Daltoé ML. 2017. Selecting and training a panel to evaluate the rancid defect in soybean oil and fish hamburgers. Grasas Aceites 68 (3), e203.

Copyright: ©2017 CSIC. This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-by) Spain 3.0 License.




The sensory perception of food is placed first on the in-mouth transformation and that is the reason why it is so dynamic (Ares et al., 2015) depending on the response of each individual. However, consumer preferences, acceptance and feedback are very important to the market (Ares et al., 2015), which demonstrates the importance of sensory analysis in many areas, from the development of products to quality control (Ares et al., 2015; Latreille et al., 2006).

Therefore, selection and training are required to assess reliable measurements from individual reactions (Latreille et al., 2006; Etaio et al., 2010). To accredit a trained panel, the assessors must present repeatability (consensus), reproducibility, discrimination ability; and they must be able to notice differences that might seem small to consumers, with all technical competence acquired remaining over time (Etaio et al., 2010; González et al., 2007; López-Aguilar et al., 2007). A number of five to eight trained assessors are sufficient for a reliable evaluation (Dutcosky, 2007).

The steps to be followed to achieve a reliable sensory trained panel are basically: assessor selection, basic, and specific training, assessor qualification and method validation (Etaio et al., 2010). However, this entire process is known to be time consuming, expensive and not practical due to several aspects (Kamruzzaman et al., 2013). It is common practice by many researchers that require precise measurements to hire an accredited trained panel (Sinesio et al., 1990; Campo et al., 2006; Kamruzzaman et al., 2013; Borràs et al., 2015) or to simply use instrumental methods for flavor testing (Lee and Choe, 2012) instead of training their own assessors.

Several studies, from the oldest (Banfield and Harries, 1975; Sinesio et al., 1990; Lea et al., 1995) to the newest (Latreille et al., 2006) approach the statistic performance of trained assessors. However, references for selecting and training demonstrate the details of the process, including the temperature and time of rancidity, and are very limited in the literature (Latreille et al., 2006; Elortondo et al., 2007; Etaio et al., 2010).

To our knowledge, there are no published studies reporting the selection and training of assessors for the rancid defect in oils and fats specifically, including the time and temperature of rancidity with results demonstrated statistically. The closest to it is the sunflower oil shelf-life estimation detailed by Houhg and Fiszman, (2005) where the focus is a demonstration of the cut-off point methodology.

The undesirable compounds known as off-flavors in oil and fat are products of oxidation reactions which also destroy essential fatty acids resulting in a loss in nutritional value in addition to the sensory rejection (Lee and Choe, 2012). The most common off-flavor is rancidity. Two of many contributors to the rancid flavor are the hexanal and nonanal compounds (Campo et al., 2006; Ibrahim, 2001; López-Aguilar et al., 2007) which lend a distinct odor to this defect. Human perception of oxidized flavors in food with high fat content is more accurate than chemical methods, and aids in the extent of deterioration evaluation when a well-trained panel is available (Sinesio et al., 1990).

To encourage the consumption of fish, due to the high nutritional value of this type of meat, one of the strategies is to turn the fish into a practical product, such as the hamburger (Corbo et al., 2008; Del Nobile et al., 2009). One of the most questionable parameters of the stability of a fish product is lipid oxidation, due to its composition with high levels of polyunsaturated fatty acids that are more susceptible to oxidation due to the double bonds in the chain, a reaction that occurs even in low temperatures (Soares and Gonçalves, 2012; Wu and Mao, 2008).

Due to proven importance, the aim of this study is to select and train a panel of assessors specialized in recognizing the rancid defect taste in fish hamburgers and oils and to demonstrate the entire process, emphasizing on the statistical treatment of data.


Approved by the Ethics Committee – CAAE number 48687815.0.0000.5547 – UTFPR, Pato Branco/PR, the study was performed with professors (6), undergraduates (12) and graduate students (8) of UTFPR, 26 subjects in total. The ones involved already had prior contact with the Sensory Analysis discipline facilitating the understanding of the analysis and the terms involved, but none had previously participated of a rancid flavor defect training session. Each assessor performed the analysis in a sensory cabin, properly lit and isolated from the others and from the sample preparation area, with access to a sink for sample disposal and water at will.

2.1. Global performance at selectionTOP

The procedure for selection included a previous interview before the difference test addressed to the product (Dutcosky, 2007). Questions about allergies and availability for training were made, along with filling in the form required by the Ethics Committee.

The selection of assessors was performed through the triangle test, a modality of sensory analysis called discriminative, which differentiates two samples that received different treatments (ASTM, 2010). The probability of accuracy is one-third. It is recommended to use 20 to 40 subjects for a solid result (Næs et al., 2010).

The test consisted of two samples: rancid and regular sunflower oil. The rancid oil was produced in an oven at 60 °C for 14 days (Houhg and Fiszman, 2005) with air circulation, within an open amber glass recipient, with 10% head space. Oils of the same brand were purchased at Pato Branco – PR local market. Three samples in random order were presented to the assessors, ten different times, where two samples were equal and they had to identify the different one by circling it. They had to taste it, advised to not smell it or try to differentiate by color somehow. The color was masked by black cups.

The main conditions were kept constant; 15 mL of oil (Borràs et al., 2015) at 50 °C ± 2 °C (Houhg and Fiszman, 2005) in a plastic cup of 50 mL, coded with three random digits. Each replicate contained two equal samples oil (regular oil, no rancidity) and a different (rancid) alternating with two equal (rancid) and a different (no rancidity). Warm distilled water kept at 40 °C and plain crackers were provided to clean the palate between samples (Houhg and Fiszman, 2005; Borràs et al., 2015). Each session lasted from 10 to 15 minutes, from 9 am to 12 pm.

The number of correct answers from the assessors so that there was a significant difference between the samples was found in a table based on the chi-square test; if the assessor reached the minimum of correct answers he was selected– 10 replicates requires 7 right answers (p < 0.05). Another statistical analysis applied to the selection was Wald Sequential Analysis, according to the graphical method (ISO, 2004) to further evaluate the assessors approved or rejected, and those who required training (Santana et al., 2006).

The decision system was obtained through hypothesis testing (ISO, 2004) Ho: p1 ≤ p0, and using the values p0 = 0.33 (probability of a correct response when no perceptible difference exists), p1 = 0.67 (probability of a correct response when a perceptible difference does exist), for α risk = 0:05 (probability of concluding that a perceptible difference exists when one does not) and β risk = 0.05 (probability of concluding that no perceptible difference exists when one does).

2.2. Training processTOP

An unstructured scale of 10 cm was used for training, presented with the numbers 0 and 10 at the extremes (Houhg and Fiszman, 2005), where the assessors had a choice of where to place the intensity of the rancid defect of the sample on any point.

The training procedure consisted of three different days/stages of analysis, to calculate the accuracy of the answers and consistency of the team. On each day, four dilutions with rancid oil (0%, 10%, 50% and 100%) were provided to the assessors selected in a sufficient amount of 15mL (Borràs et al., 2015), using plastic cups coded with random digits with three numbers.

Dilutions of 0% and 100% were presented as the extremes of the scale, where 100% represented the sample at its maximum rancidity (14 days – 60 °C) and 0% represented the regular oil sample with no rancidity. The remaining, 10% and 50% dilutions were placed between 0 and 10 cm by assessors, corresponding to little–none/much rancid flavor. This procedure was repeated three times within the same day in order to have mean and standard deviations for each day. Each session lasted 10-15 minutes.

2.3. Ability to discriminate between dilutions in trainingTOP

Assuming that samples were only 10% and 50%, a paired test was applied to check whether there was a difference between them, and those who inverted the order of samples on the scale (placed 50% before 10%) had their responses considered incorrect. To check the difference, the bilateral paired test table was consulted (p < 0.05) (ASTM, 2010).

2.4. Individual performance of assessorsTOP

The responses were measured in centimeters along the 10 cm scale. ANOVA statistical analysis evaluated individual results, means and standard variations, giving the three days’ precision using Tukey’s mean comparison test (p < 0.05) performed by Statistica® software 12.7.

2.5. Panel performance and homogeneityTOP

Similarly to the individual performance, the mean of each day’s responses was calculated, with respective standard deviations, to evaluate panel homogeneity. Assessors that did not differ statistically (p > 0.05) from each other, by Tukey’s mean comparison test, coinciding in the analysis of both samples, 10% and 50%, were selected for the final trained sensory panel.

2.6. Trained panel validationTOP

Validation is important to test the panel reproducibility, which means that if the test is repeated after some time, or by another sensory panel trained exactly as in the present study, the results would not differ significantly (Lea et al., 1995).

The validation was performed eight times with the products under study, fish burgers which had been stored for 30 days, and soybean oil with two distinct antioxidants. The burgers were made with grass carp fish meat (79.00%), where 33% of the total fatty acids were polyunsaturated (Wu and Mao, 2008); ice (10.00%), vegetable fat (5.00%), textured soy protein - TSP (3.00%), spices (2.99%) and BHT (0.01%). The water to hydrate the TSP was discounted from the ice. They were vacuum-packed, and stored under refrigeration until the days of sampling, then frozen each day (initial – 0 days; 7, 14, 17, 21, 23, 25 and 30 days).

The burgers were thawed and grilled to serve to the assessors. The samples were cut into uniform sizes of about 1.5 cm3, and maintained at 75 ºC (internal center) to the time of delivery (Mitterer-Daltoé et al. 2012) using plastic cups coded with three-digit random numbers. Water at room temperature was provided to clean the mouth between samples.

Samples of soybean oil with the tertiary butylhydroquinone (TBHQ) antioxidant 200 mg/Kg, 100 and 200 mg/Kg of Quassia amara (Q.a) extract were tested after 96 hours of rancidity (60 ºC – oven) to detect any difference among them, regarding the rancid flavor.

An unstructured scale of 10 cm was applied again, for the distribution of burger and oil samples within range (different sheets), anchored in little–none/much rancid flavor. ANOVA was applied to the trained team’s results to check for differences between samples (p < 0.05). The recognition of the difference between samples were compared for equivalence with the training rancid oil to validate the trained panel.


3.1. Global performance at selectionTOP

Twenty-six people attended the selection (9 males;17 females; ages ranging from 20–50), all of which were assessors (A), 15 of which, got seven right responses or more of the ten replicates provided, based on the chi-square table for the triangle discriminatory test, and they were considered suitable for training. By means of Wald Sequential Analysis, eight more assessors were between the acceptance lines (ax = 2.0789 + 0.5n) and rejection lines (rx = 2.0789 + 0.5n). These assessors obtained results that made them eligible for training within the applied statistics. At this point, three people were excluded, as they were found below or at the rejection line (Fig. 1).

Figure 1. Wald Sequential Analysis for selection of assessors; α=β=0.05; p0 = 0.33; p1 = 0.67.


3.2. Training process, ability to discriminate between dilutions in trainingTOP

From the 23 assessors selected, 18 agreed to continue the training. According to the unilateral paired test table (p < 0.05) 13 assessors should set the right order of sample concentrations, 10% before 50%, in the unstructured 10 cm scale, so that, according to the paired test, the standard dilutions would present significant difference (Table 1) and become standards for the rest of the training process. Fourteen assessors got all the correct orders, verifying significant differences between dilutions.

Table 1. Number of correct responses regarding the order of samples in each triplicate, per day of training
Assessor Day 1 Day 2 Day 3 Total
A1, A3, A4, A5, A6, A7, A8, A9, A10, A12, A13, A15, A17, A18 0 – Zero. No incorrect answers regarding the order of samples 9
A2*, A16* 3 2 2 7
A11* 3 2 3 8
A14* 2 3 3 8
* Eliminated

To be approved, the assessors should have shown a total of nine correct responses, which means no change in the order of sample concentrations inside the triplicate, for every day of training.

3.3. Individual performance of assessorsTOP

According to the results of each assessor, the mean and the standard deviations of the responses were calculated in triplicate for each day of training through ANOVA, with the mean comparison analysis of p-values (Tukey), the mean of the tested three days which assessors presented homogeneity among the days (Table 2).

Table 2. Inter-day precision of the rancid flavor (cm) in 10% and 50% oil dilutçions
Assessor Dilution (%) Day 1 Day 2 Day 3
A1* 10 3.47a ± 0.46 3.53a ± 0,06 1.67b ± 0.06
  50 8.07a ± 0.21 8.37a ± 0.15 6.87b ± 0.11
A3 10 1.70a ± 0.36 1.07a ± 0.74 0.37a ± 0.23
  50 4.33a ± 2.40 6.30a ± 1.15 5.83a ± 4.37
A4 10 1.50a ± 0.30 1.53a ± 1.10 1.27a ± 0.68
  50 5.77a ± 1.42 5.33a ± 1.15 5.90a ± 0.66
A5 10 2.50a ± 1.28 3.63a ± 1.91 4.67a ± 1.53
  50 5.93a ± 2.50 6.83a ± 0.76 6.80a ± 0.72
A6 10 2.23a ± 0.25 2.03a ± 0.84 2.13a ± 0.60
  50 7.30a ± 0.75 7.47a ± 0.50 6.93a ± 0.93
A7 10 3.53a ± 2.15 4.43a ± 2.18 4.93a ± 7.20
  50 4.93a ± 2.41 7.70a ± 0.62 7.24a ± 0.80
A8 10 3.43a ± 1.85 3.60a ± 0.17 1.63a ± 0.23
  50 7.10a ± 2.13 7.33a ± 0.47 7.13a ± 0.32
A9 10 1.50a ± 0.44 0.93a ± 0.06 1.03a ± 0.15
  50 4.73a ± 0.67 4.00a ± 0.62 4.73a ± 0.46
A10 10 1.43a ± 0.31 1.77a ± 0.67 3.40a ± 1.40
  50 8.10a ± 0.78 5.67a ± 2.75 8.50a ± 0.87
A12 10 2.47a ± 0.49 2.50a ± 1.04 2.33a ± 0.29
  50 5.57a ± 1.20 5.80a ± 1.21 5.73a ± 0.68
A13 10 2.93a ± 0.59 1.63a ± 0.38 1.77a ± 1.29
  50 7.10a ± 0.79 5.40a ± 2.33 5.87a ± 1.27
A15 10 1.60a ± 0.53 1.50a ± 1.21 2.47a ± 1.27
  50 6.50a ± 1.50 6.90a ± 0.10 8.37a ± 0.85
A17 10 2.37a ± 0,15 2.23a ± 0.40 2.40a ± 0.53
  50 6.23a ± 0.64 6.00a ± 1.81 5.00a ± 0.62
A18 10 1.17a ± 0.42 1.57a ± 0.35 1.43a ± 0.21
  50 5.57a ± 0.51 5.60a ± 0.10 6.10a ± 0.30
*A1: Eliminated; Same letters in the same line: means do not differ (Tukey p ˃ 0.05).

Assessor A1, whose day 3 differed significantly from the others (p ˂ 0.01), was eliminated at this stage of the statistical analysis. The remaining assessors exhibited homogeneity among days, with no significant differences among means.

For better visualization of this outcome, the F value of ANOVA (one-way) was calculated along with the p-value to test the individual performance of each assessor. F values higher than Fcritical (5.1432) demonstrate significant differences among days of training.

The inability of A1 (F10 = 45.87 and F50 = 70.87) was also computed by the F value (Figure 2) which tested the ability and the homogeneity among the days of the other assessors in training.

Figure 2. Column chart for F values with logarithmic scale (significance level: 5%). The F values were divided by assessors and samples with concentrations of 10% (F10) and 50% (F50).


3.4. Panel performance and homogeneityTOP

The individual performance, in terms of homogeneity among the days and differentiation of samples, removed five assessors thus far. The thirteen remaining were evaluated according to panel homogeneity. Those who did not differ from each other for both samples (rancid oil; 10% and 50% standard solution) were considered part of the final trained panel (Tukey p > 0.05; n = 9).

By analyzing the p-values, only seven (A3, A4, A9, A12, A13, A17 and A18) of the thirteen remaining assessors showed homogeneity in their responses (Table 3).

Table 3. Panel consistency for 10% and 50% standard solutions of sunflower rancid oil
  10% 50%
A3 1.04c ± 0.67 5.49ab ± 1.03
A4 1.43c ± 0.14 5.67ab ± 0.30
A5* 3.60ab ± 1.08 6.52ab ± 0.52
A6* 2.13bc ± 0.10 7.23a ± 0.27
A7* 4.30a ± 0.71 6.62ab ± 1.48
A8* 2.88abc ± 1.09 7.19a ± 0.13
A9 1.15c ± 0.30 4.49b ± 0.42
A10* 2.20bc ± 1.05 7.42a ± 1.53
A12 2.43abc ± 0.09 5.70ab ± 0.12
A13 2.11bc ± 0.71 6.12ab ± 0.88
A15* 1.85bc ± 0.53 7.25a ± 0.98
A17 2.33bc ± 0.09 5.74ab ± 0.65
A18 1.39c ± 0.20 5.75ab ± 0.30
*Eliminated. Same letters in the same column: means do not differ (Tukey p ˃ 0.05).

Figures 3 and 4 exhibit a graphical representation through box-plots of the variation in assessors’ behavior, regarding the given dilutions. The trained panel (7 female assessors; ages ranging from 20 to 40) demonstrated proximity to 1 cm of the scale (dilution 10%) and 5cm (50% dilution), 1.70 ± 0.58 for 10% standard and 5.57 ± 0.51 for 50%. Through box-plot charts, it can be seen that the trained panel presented a combination of factors to be chosen: low variability of data, means (□) in the center of responses and proximity between means and medians (—–).

Although A3 showed a large standard deviation for the 50% dilution (Fig. 4), he presented the same consistency with the panel. The assessor with the best results (box-plot and Table 3) was A9, revealing very low data variability, high precision and accuracy. The box-plot not only allows to visualize the behavior of each assessor, but also to behold the influence of each standard used. A better data distribution was found with less variation and values closer to 1 cm for the 10% standard solution of rancid oil (Fig. 3), which was not observed in the 50% standard.

Figure 3. Box-plot for rancid oil sample diluted to 10%, from three-day training, separated by assessor; N = 9. Means (□); 1–99% Range├—┤; Medians (—–).


Figure 4. Box-plot for rancid oil sample diluted to 50%, from three-day training, separated by assessor; N = 9. Means (□); 1–99% Range├—┤; Medians (—–).


3.5. Trained panel validationTOP

The validation of the trained panel was carried out in the analysis of samples: fish hamburgers and soybean oil with Quassia amara extract and synthetic antioxidant TBHQ.

Table 4 shows the equivalence of the rancid taste among samples and standards where burgers H1 and H2 did not differ from each other in rancid taste, nor when compared to H3, H4, H5. However, they showed a significant difference when compared to H6, H7 and H8, the last one considered more pronounced in this parameter. The sample H8, with 30 days of storage under refrigeration, manifested a significant difference with the other days except from the 25th and equivalent to the 50% standard. H3, H4 and H5 have a rancid flavor deffect correnponding to a 10% rancid oil standard.

Table 4. Values (mean ± standard deviation) for the rancid flavor of samples equivalent to standards
Sample Rancid flavor (cm)
10% standard 1.70deB ± 0.58
50% standard 5.57abA ± 0.51
H1 – initial 0.00e ± 0.00
H2 – 7 days 0.16e ± 0.30
H3 – 14 days 0.76de ± 0.62
H4 – 17 days 1.54de ± 1.03
H5 – 21 days 2.30cde ± 1.46
H6 – 23 days 3.31bcd ± 2.02
H7 – 25 days 4.77abc ± 2.60
H8 – 30 days 5.97a ± 2.67
Soybean Oil + TBHQ 2.32B ± 1.75
Soybean Oil + 100 mg/KgQ. a. 2.92B ± 2.35
Soybean Oil + 200 mg/KgQ. a. 2.10B ± 2.19
Equal letters in the column show that the means do not differ significantly (Tukey p < 0.05; n = 7). Capital letters indicate oil samples; lower case letters for burger samples.

The assessors’ perception showed an equivalent degree of difference in standards and hamburger samples. At the first phase of training it became clear that 10% and 50% standards are different, proving that hamburguers are too. The trained panel was able to find differences equivalent to the training of samples, which have a more complex matrix, and therefore validates the assessors and the method.

For oil samples, after 96 hours in an oven at 60 °C, there was no significant difference among the samples tested (Soybean oil with TBHQ, soybean oil with 100 mg/Kg of Quassia amara and soybean oil with 200 mg/Kg of Quassia amara) regarding the rancid flavor (Table3). These results demonstrate that the concentration of Quassia amara extract used was effective in preventing rancidity, like the synthetic antioxidant. Even the lowest concentration tested, revealing that is possible to use 100 mg/Kg for economic purposes, with a similar effect and no significant difference. Also, it is evident that the rancidity found in all three samples corresponds to the 10% rancid oil standard.


The sequence of analysis applied was efficient for selection, training and panel validation for the rancid taste in oil and fish hamburgers. Wald Sequential Analysis proved to be more effective than the chi-square method for selection, especially since eight important assessors would be automatically excluded already without the Wald test. Each assessor has a different sensitivity for flavors, and the Wald Analysis shows the individual performance graphically, giving a chance to those with potential.

Another point to be emphasized about the Wald Sequencial Analysis is the evaluation of assessors who had acuity at first, but saturated before the end of repetitions, such as A20 who presented saturation on repetitions 5 to 10, performing right only on the 9thand even then was selected for training.

As well as the selection, the training also proved to be an important part of the process as established by A1 and A2, both with the best results, 9 correct answers, in the triangle test, but didn’t pass through training. This demonstrated the fact that it is possible to have an optimum performance during selection, but not the same happens during the second phase, which confirms the importance of training in order to select a panel.

ANOVA with Tukey mean comparisons represented the means and standard deviations, F value column plot representation, and box-plots completed the statistical analysis and proved effective in the training process. Graphic representations, such as the F value plot and box-plot, provided a better visualization of why only seven were chosen in the end, or why some were eliminated. Tables represented by means and standard deviations seem confusing at times, especially with a larger amount of data.

F values in the table format were used by Braghieri et al. (2012) to verify panel agreement for five attributes during a meat evaluation, proving to be an effective method for this purpose, since they found a high level of homogeneity between the panels tested. F values’ column plot is called a suitable method to compare assessors’ ability to detect differences between products. ANOVA one-way is the most standard way to treat the raw data (Næs et al., 2010) and was proven valuable even before, with studies such as Lea et al. (1995), which measured validity in a sensory analysis.

The box-plot in the present work allowed better understanding and visualization in the variation of assessors’ responses for each standard, also emphasizing the difference between standards. Regarding box-plot graphics, a study by Williamson et al. (1989) and Coli et al. (2015) mention the importance of the box-plot graphic method to highlight the visualization of data distribution, means and medians, and extreme values, which give a better idea about the response variation than tables.

By means of the box-plot it is noted that the standard deviations shown by 50% dilution (Figure 4) were higher when compared to the differences shown by 10% dilution (Figure 3), which emphasizes the perception limit of the judges for rancid taste. There is the possibility that the 50% standard alters or saturates the assessors’ sensations, as it is near the saturation threshold of the rancid flavor.

Sinesio et al. (1990) found a similar intensity of rancid flavor (around 6 cm out of a 9 cm scale) for sausages added with 63.5% rancid fat. In beef steaks, after only 9 days under refrigeration, a nine-assessor trained panel verified 11.4 to 21.4 points, from a 100-point unstructured scale, for rancid flavor (Campo et al., 2006). This data confirms the rancidity levels found (around 6 out of a 10 cm scale for H8) in the analyzed burger.

For soybean oil, López-Aguilar et al. (2007) found 5.89 to 6.58 cm (in a 10-cm scale) for rancid flavor in commercial oils, with 12 weeks of analysis. In this study, trained assessors assigned lower values to the oils with the synthetic antioxidant TBQH and Quassia amara extract (100 and 200 mg/Kg) in 96 hours of analysis, confirming the efficiency of the tested antioxidants. Considering the results, because of their composition, the soybean oil and the fish burgers were highly susceptible to oxidation. A trained sensory panel for the rancid flavor would be an appropriate method for quality control, despite the time, effort and money applied.

The validation of the trained panel with a product is essential, because the products’ matrix is often more complex than the solutions used in the selection and training, and this may confuse the panel. Elortondo et al. (2007) and Mitterer-Daltoé et al. (2012) validated the sensory panel with a product that recalled the sensation by which the panel had been trained. Elortondo et al. (2007) didn’t use ANOVA for this purpose, only the percentage, where a sensation mentioned by 66.6% of assessors was considered a parameter, in which they were trained afterwards.

Seven proved to be a good number of trained assessors in the rancid flavor defect case. This number was not explored in depth in the literature, appearing in studies with other flavors and defects in descriptive analyses (Sinesio et al., 1990; Campo et al., 2006; López-Aguilar et al., 2007). The trained panel proved accurate in differentiating samples with the rancid flavor defect according to the combined methods applied. In another study Etaio et al.(2010) started a selection for a wine trained panel with 31 assessors, finishing with 13 in the training, ending with seven experts as in the present study, but using only percentages of success; no ANOVA was applied. This proves the difficulty in obtaining a large number of expert assessors.

Because of this difficulty, researchers hire trained panels or use untrained assessors/consumers. Wang et al. (2012) used only 4 trained assessors to analyze the tenderness, chewiness and juiciness of pork meat samples. Borrás et al. (2015) used 8 trained assessors to evaluate defects like rancidity and metallic in olive oils. None of the studies trained the assessors; they used a trained panel that already existed. For fish burgers, Corbo et al. (2008) and Del Nobile et al. (2009) used five and ten untrained assessors, respectively, to determine color, odor, texture, drip loss and general appearance values in the first case and overall quality in the second case.

The International Organization for Standardization usually gives some directions for the selection and training of sensory panels. Companies and universities in some countries follow the methods undergo a lengthy training process to further borrow or rent teams to research, showing that training a panel for a particular research is unusual for the reasons already mentioned. However, according to Resolution CNS/MS 196 (BRASIL, 1996) on the conduct of research involving humans in Brazil, the research subject must be free and voluntary, prohibited of any type of compensation, except for personal damages. However, it is possible to outsource the service, receiving only the outcome.

Therefore, the value and importance of a trained panel previously justified, along with the fact that in some countries such as Brazil there is no possibility of hiring a trained panel without third parties involved, or more financial resources, highlights the relevance of the results presented here which detail and discussed the selection and training of a sensory panel.


It was possible to select and train a panel for the rancid flavor, with seven assessors that were able to distinguish the target products through triangular discriminative analysis, unstructured and hedonic scale, using ANOVA for data statistics, in addition to the Wald Sequential Analysis, box-plot and bar graph for F values. This study is important for other studies that require trained panels, especially if training is necessary because it demonstrated clearly and in a reproducible way, step by step, how to achieve a trained panel.


The authors are grateful to the National Council for Scientific and Technological Development – CNPq – Brazil (Universal Process nᵒ 456102/2014-0).



Ares G, Jaeger SR, Antúnez L, Vidal L, Giménez A, Coste B, Picallo, A Castura, JC. 2015. Comparison of TCATA and TDS for dynamic sensory characterization of food products. Food Res. Int. 78, 148–158.
ASTM, 2010. ASTM —Standard Test Method for Sensory Analysis — Triangle Test. ASTM Int. 1–8.
Banfield CF, Harries JM. 1975. A technique for comparing judges’ performance in sensory tests. J. Food Technol. 10, 1–10.
Braghieri A, Piazzolla N, Carlucci A, Monteleone E, Girolami A, Napolitano F, 2012. Development and validation of a quantitative frame of reference for meat sensory evaluation. Food Qual. Prefer. 25, 63–68.
BRASIL, 1996. Diretrizes e normas regulamentadoras de pesquisas envolvendo os seres humanos, in: CNS/MS 196/96. Diário Oficial da República Federativa do Brasil, Brasília, DF: Ministério da Saúde.
Borrás E, Mestres M, Aceña L, Busto O, Ferré J, Boqué R, Calvo A. 2015. Identification of olive oil sensory defects by multivariate analysis of mid infrared spectra. Food Chem. 187, 197–203.
Campo MM, Nute GR, Hughes SI, Enser M, Wood JD, Richardson RI. 2006. Flavour perception of oxidation in beef. Meat Sci. 72, 303–311.
Coli MS, Gil A, Rangel P, Souza ES, Oliveira MF, Cristina A, Chiaradia N, 2015. Chloride concentration in red wines: influence of terroir and grape type. Food Sci. Technol. 35, 95–99.
Corbo MR, Speranza B, Filippone A, Granatiero S, Conte A, Sinigaglia M, Del Nobile, 2008. Study on the synergic effect of natural compounds on the microbial quality decay of packed fish hamburger. Int. J. Food Microbiol. 127, 261–7.
Del Nobile MA, Corbo MR, Speranza B, Sinigaglia M, Conte A, Caroprese M. 2009. Combined effect of MAP and active compounds on fresh blue fish burger. Int. J. Food Microbiol. 135, 281–287.
Dutcosky SD, 2007. Análise sensorial de alimentos, 2nd ed. Champagnat, Curitiba.
Elortondo FJP, Ojeda M, Albisu M, Salmerón J, Etayo I, Molina M, 2007. Food quality certification: An approach for the development of accredited sensory evaluation methods. Food Qual. Prefer. 18, 425–439.
Etaio I, Albisu M, Ojeda M, Gil PF, Salmerón J, Pérez Elortondo FJ. 2010. Sensory quality control for food certification: A case study on wine. Panel training and qualification, method validation and monitoring. Food Cont. 21, 542–548.
González MM, Navarro T, Gómez G, Pérez RA, de Lorenzo C. 2007. Sensory assessment of table olive: I. Set up of a panel test and use of standarised scales. Grasas Aceites 58.
Houhg G, Fiszman S. 2005. Estimación de la vida útil sensoria de los alimentos, 1st ed. Martín Impresores, S. L., Valencia - Espanha.
Ibrahim HMA. 2001. Acceleration of curing period of pastrami manufactured from buffalo meat: II-Fatty acids, amino acids, nutritional value and sensory evaluation. Grasas Aceites 52.
ISO, 2004. Sensory analysis - Methodology - Sequential analysis, in: ISO 16820:2004. ANSI, New York, p. 10.
Kamruzzaman M, ElMasry G, Sun DW, Allen P, 2013. Non-destructive assessment of instrumental and sensory tenderness of lamb meat using NIR hyperspectral imaging. Food Chem. 141, 389–396.
Latreille J, Mauger E, Ambroisine L, Tenenhaus M, Vincent M, Navarro S, Guinot C. 2006. Measurement of the reliability of sensory panel performances. Food Qual. Prefer. 17, 369–375.
Lea P, Ródbotten M, Naes T. 1995. Measuring validity in sensory analysis. Food Qual. Prefer. 6, 321–326.
Lee E, Choe E. 2012. Changes in oxidation-derived off-flavor compounds of roasted sesame oil during accelerated storage in the dark. Biocatal. Agric. Biotechnol. 1, 89–93.
López-Aguilar JR, Valerio-Alfaro G, Monroy-Rivera JA, Medina-Juárez LA, O’Mahony M, Angulo O. 2007. Evaluation of a simple and sensitive sensory method for measuring rancidity in soybean oils. Grasas Aceites 57, 149–154.
Mitterer-Daltoé ML, Treptow R, De O, Martins E, Martins VM V, Queiroz MI. 2012. Selecting and Training a Panel to Evaluate the Metallic Sensation of Meat. Food Sci. Technol. Res. 18, 279–286.
Næs T, Brockhoff PB, Tomic O. 2010. Statistics for Sensory and Consumer Science, Wiley, Chichester.
Santana LRR, Santos LCS, Natalicio MA, Mondragon-Bernals OL, Elias EM, Silva CB, Zepka, LQ, Martins ISL, Vernaza MG, Castillo-Pizarro C, Bolini HMA. 2006. Perfil sensorial de iogurte light, sabor pêssego. Ciência e Tecnol. Aliment. 26, 619–625.
Sinesio F, Risvik E, Rodbotten M. 1990. Evaluation of panelist performance in descriptive profiling of rancid sausages: a multivariate study. J. Sens. Stud. 5, 33–52.
Soares KMP, Gonçalves AA. 2012. Qualidade e segurança do pescado. Rev. Inst. Adolfo Lutz 71, 1–10.
Wang Q, Lonergan SM, Yu C, 2012. Rapid determination of pork sensory quality using Raman spectroscopy. Meat Sci. 91, 232–239.
Williamson DF, Parker RA, Kendrick JS. 1989. The box plot: A simple visual method to interpret data. Ann. Intern. Med. 110, 916–921.
Wu T, Mao L. 2008. Influences of hot air drying and microwave drying on nutritional and odorous properties of grass carp (Ctenopharyngodon idellus) fillets. Food Chem. 110, 647–653.

Copyright (c) 2017 Consejo Superior de Investigaciones Científicas (CSIC)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Contact us

Technical support