Olive mill wastewater characteristics : modelling and statistical analysis

Presentamos una síntesis de los trabajos realizados en los últimos 50 años relacionados con la caracterización del alpechín. Realizamos una recopilación de los datos publicados, buscando correlaciones entre los datos relativos al alpechín y los compuestos fenólicos. Esto permite la determinación de las características del alpechín a partir de una sola medida: La concentración de compuestos fenólicos. Proponemos dos modelos, uno basado en datos relativos a seis países y un segundo aplicado unicamente a Portugal. El análisis estadístico de las correlaciones obtenidas indica que la demanda química de oxígeno de un determinado alpechín es una función polinómica de segundo grado de su concentración de compuestos fenólicos. Se comprobó la significancia de esta correlación mediante la aplicación del análisis multivariable ANOVA, y además se evaluó la distribución de residuos y sus promedios a un nivel de fiabilidad del 95 y 99%. Este trabajo ayudará al diseño futuro de plantas de tratamiento de alpechín, así como a su funcionamiento y control.


INTRODUCTION
Olive Mill Wastewater (OMW) is generated in the production of olive oil.Its treatment is a major environmental problem in Mediterranean countries, where the generation rate is very high and concentrated in a short period of time (November-February).The annual OMW production is estimated to be over 30 x 10 6 m 3 (Hamdi, 1993 a;Yesilada et al., 1995;Paredes et al., 1996), despite the efforts to implement two-phase, clean extraction technology.
The composition of OMW is very variable and depends on olive variety, the ripeness of the fruit, and the extraction process (press or centrifuge) (Lopez & Ramos-Cormenzana, 1996).
In the bibliographic research, the majority of the articles found presented Chemical Oxygen Demand (COD) determinations, which ranged from 1.9 to 220 kgm -3 (Alba, 1994;Passarinho, 2002); the Biochemical Oxygen Demand (BOD5) is not commonly determined, however it was found to vary from 16.0 to 93.5 kgm -3 (Fernandéz et al., 1989;Saviozi et al., 1991); in the same context it was found that the Total Solids (TS) varied from 5.9 to 103.2 kgm -3 , the Volatile Solids (VS) varied from 2.4 to 89.9 kgm -3 (Alba, 1994;Hamdi, 1993 b) and that the content of polyphenols (PhC) varied from 0.1 to 17.5 kgm -3 (Alba, 1994;Hamdi, 1993 a).The high content of organic matter and polyphenols together with the very large volumes produced and the seasonality of the industry has led to considerable pollution and has limited the application of conventional methods of wastewater treatment (Yesilada et al., 1995).
In this article we present a compilation of all data we could find from OMW studies published over the last 50 years.From this survey only those whose polyphenol compound concentration (PhC) was expressed in terms of caffeic acid equivalents were selected.The characterisations carried out for Portuguese OMWs were also considered.The final compilation includes 85 different evaluations of OMWs composition from more than 6 Mediterranean countries.
The data are summarised and correlations between the most commonly measured parameters are sought.A mathematical model relating COD to PhC is obtained and tested.The aim is to be able to estimate the characteristics of an OMW from one simple measurement: the phenolic compounds concentration, the eventual goal being to use this in the planning, operation and monitoring of an OMW treatment plant.

EXPERIMENTAL PROCEDURE
A literature survey was carried out and the parameters usually used to characterise OMW were selected.The most common parameters found were: COD, BOD5, PhC, TS and VS, with PhC concentration and COD by far the most common parameters determined.The Portuguese OMW characterisation was carried out by the determination of the aforementioned parameters according to Standard Methods (APHA, 1995).

Folin-Denis Method
This was based on oxireduction reactions between PhC and metallic ions, adapted from a method described by Maestro- Durán et al., 1991.Results were expressed in terms of milligrams of caffeic acid equivalent, as this was used as a reference substance.

RESULTS AND DISCUSSION
The modelling of OMW characteristics was studied using two different strategies, an exhaustive bibliographic research was carried out and 45 values of OMW characterisation were encountered and used for the model development, the second approach was the validation of the model proposed by the introduction of 40 different values from Portuguese OMW collected over the last 8 years in traditional and continuous olive mills.

Grasas y Aceites
It is important to take into account that the values collected from the bibliographic survey, Table I, concern different OMW source countries, different fruit varieties, ripeness and different extraction systems (pressing and centrifuging).For the Portuguese case the same aforementioned variables were considered, in this case the source country was obviously the same, as were the analytical procedures used for the OMW characterisation as described above.
Note that for the modelling work, besides the Portuguese values found in bibliographic references, experimental values for Portuguese OMW from the 2002/2003 campaign were also considered.These values are summarised in Table II.
The first step was the compilation of all the data, after which some useful analytical relations were determined based on the average values of COD, BOD, PhC, TS and VS, as shown in Table III.
It is important to highlight the good agreement between the standard deviation of the ratios found in the literature and the ones determined for the Portuguese case.The average BOD represents 40 -49% of the average COD, the average PhC 5%, and the average TS and VS 55-66 %.The ratio of VS to TS is roughly 70-80%.
In the olive oil extraction process the PhC reach a heterogeneous equilibrium, resulting in their partition between the organic and liquid phases, based on their affinities.Phenolic compounds found in olive oil are attributed a beneficial effect for the consumers health, due to their antioxidant effect in removing free-radicals, which are molecules involved in chronic diseases (Luchetti, 2001).To the PhC found in the liquid phase is attributed an antimicrobial and phytotoxic effect; these are the most recalcitrant compounds found in OMW (Ranalli, 1991;Fadil et al., 2003) and one of the limiting factors in the efficiency of conventional treatments, such as chemical treatment (Chackchouk et al., 1994), and biological treatment (Hamdi & Garcia, 1991;Borja et al., 1995, Beccari et al., 1996).

Model development
The values found in the literature and the Portuguese values were considered separately, as shown in Figure 1-A, and altogether, as shown in Figure 1-B.They were fitted with a second-degree polynomial function.For the literature values the correlation parameter (R 2 ) was 0.4229, for the Portuguese values it was much higher at 0.7976, and for all the values it was 0.5401.
In order to define the boundaries of the model, tables of absolute frequencies of PhC values and COD values were constructed, using class intervals of 1 kgm -3 and 20 kgm -3 respectively.Bar charts are shown in Figure 2, with the intervals being given in Table IV.
For PhC, Figure 2-A, the values are concentrated between 0 and 8 kgm -3 .The distribution between 0 and 6 kgm -3 appears to be approximately uniform, considering the expected statistical fluctuations of 1/(N) 0,5 , where N is the number of values in the interval.This would define the normal working limits of an OMW treatment plant.
There is a clear, but small, decrease in the number of values in the range 6 to 8 kgm -3 , followed by a very sharp fall: of the total of 85 values only 8 stand in the range 8 to 18 kgm -3 .To explain these higher values, correlations between extraction systems were sought, but no plausible explanation was found, so it may be attributed to a statistical fluctuation.There are noticeable peaks in the PhC values in the ranges of 0 to 1 and 3 to 4 kgm -3 , however only the lower peak may be considered statistically significant.It was noticed that Portuguese OMWs with PhC in this interval had largely been collected at the outlet of the centrifuge of a two-phase extraction system, where the olive oil separation is carried out.In two-phase systems there are two exit lines, one for the olive husk, which is a combination of olive pulp and stone with the olive vegetation water, and one for the olive oil.In order to increase olive oil extraction yield, water is injected into the centrifuge, which is often designated as olive oil washing water, giving a dilute OMW.The current tendency in olive mills is to convert to two-phase extraction systems, which could be the reason for this peak in PhC absolute frequency.
In Figure 2 B) the distribution is Gaussian-like, with COD values from 40 to 100 kgm -3 occurring most frequently.This information is crucial for the conception, design, scale-up and optimisation of OMW treatment plants.
A matrix was constructed, as shown in Table V, combining the PhC and COD absolute frequencies.This clearly illustrates that there is a correlation between PhC and COD values.For the lower values, highlighted dark grey in Table V, the correlation is clearly linear, whereas for the higher PhC values COD increases more slowly.
The low frequency of PhC values, together with the absence of values in some intervals for high PhC concentration, led us to base the model only on PhC concentrations up to 8 kgm -3 , as presented in Figure 3 A-B.
The main feature in these representations is that for this range both second-degree polynomial and linear functions fit well, with R 2 being very similar for both of them.However, when PhC increases above the range considered here there is not a corresponding increase in COD, and a saturation point is reached.This is reproduced, for the total range of values we have, by the second-degree polynomial (Figure 1).
Although linearity is observed for low PhC concentrations, the second-degree polynomial model is preferable because it fits well for high and low concentrations.

Test for the Significance of the Regressions
R 2 gives the amount of variability in the data explained or accounted by the model regressions.The regressions presented in Figure 1 account for 42.29% and 79.76% and 54.01% of the variability in the data for literature, Portuguese and all values together, respectively.However, a large value of R 2 doesn't necessarily imply that the model is a good one, because this parameter does not measure the statistical significance of a regression.For example, a regression applied to two points will have an R 2 of 1, but it has no statistical significance.In order to assess these empirical models adequacy an ANOVA table for each type of data (literature, Portuguese and all) and a residual analysis were studied.The models are summarized in Figure 4.
There is good agreement between the regressions for the literature values and for all values together.
To the regressions obtained for the literature values, the Portuguese values and all the values together which are given by the following equations: Literature values: It was performed a significance test as suggested by Montgomery and Runger (1999).For a regression of the type, Y = β2 x 2 + β1 x + β0, this test tries to determine whether a linear relationship exists between the response variable y (COD) and a subset of the regressor variables x (PhC) and x 2 (PhC 2 ).The appropriate hypotheses are: H1: βj ≠ 0 for at least one j (5) Rejection of H0: β1 = β2=0 implies that at least one of the regressor variables contributes significantly to the model.
The parameters obtained from a multivariable analysis are given in Table VI.
Since the P-value is considerably smaller than = 0.05, we reject the null hypothesis and conclude that COD is linearly related to either (PhC) or (PhC) 2 or both.
Further tests of model adequacy are required before we can comfortably use this model in practice, such as residual analysis.

Residual Analysis
Standardized residuals from the multiple regression model are defined by di = ei/(MSE) 0.5  (6) Where ei is CODi,measured value -CODi,predicted value and MSE is the Mean Square Error Standardized residuals were calculated and plotted against CODi,predicted value, as is shown in Figure 5. From visual analysis, the residuals are independently distributed.The mean residual values are -5.34 x 10 -2 ± 0.29, 4.48 x 10 -5 ± 0.31, and -2.95 x 10 -5 ± 0.21 for the literature values, Portuguese values and all values, respectively, for a confidence level of 95% and -5.34 x 10 -2 ± 0.39, 4.48 x 10 -5 ± 0.42, and -2.95 x 10 -5 ± 0.28 for a confidence level of 99%.In all circumstances the mean residual values are almost zero, validating the use of a second-order polynomial.As the regressions found for the literature and the Portuguese values are similar, it makes sense to consider only one model, accounting for all of the values.
Both residual distribution analysis and the mean residual values, with errors calculated for confidence levels of 95 and 99%, indicate that the model is more accurate for all of the values, thus equation 3 is to be preferred.
However, when a Portuguese OMW is being considered, equation 2 may be preferred, as it refers specifically to this case, and the error is only marginally higher than for equation 3.

CONCLUSIONS AND FUTURE WORK
Despite the fact that the OMW characteristics were obtained from different countries, with different olive varieties, from different years, from mills with different extraction processes and were obtained using different techniques in different laboratories, a good correlation between COD and PhC is found using a second-degree polynomial.
The model developed may be useful not only in the conception of OMW treatment plants but also for their monitoring, allowing less time to be spent on analysis.
After a multivariable analysis with P-values considerably smaller than α = 0.05, a visual residual analysis independently distributed and their mean residual values for a confidence level of 95 and 99% being zero, the models proposed in this study were  validated.In the future the models will be improved by the inclusion of more values from OMW characterisations, and their optimisation will be pursued.
YDOX HV 3RUWX JX HV H YDOX HV 3RO\ 3 RUWX JX HV H Y DOX HV 3RO\ / LWHUDWX UH Y DOX HV A B Figure 2 Histograms of A) PhC and B) COD absolute frequencies

Figure 3
Figure 3 Linear and Second-degree polynomial functions applied to A) literature and Portuguese values B) all values together; for PhC concentrations up to 8 kgm -3

Figure 5
Figure 5 Standardized residuals from multiple regression model against COD predicted values for A) literature B) Portuguese and C) all values together

Table II Portuguese OMW characterisation from 6 olive mills operating in 2002/2003 campaign Olive mill I Olive mill II Olive mill III Olive mill IV
nd -not determined.

Table III Useful relationships between BOD, PhCs, TS, VS and COD and VS with TS
Second-degree polynomial function applied to A) literature and Portuguese values separately B) literature and Portuguese values together