An exploration of the prices of avocados, what has influenced their prices, and what their prices will be in the future.
This project is mostly an opportunity for me to try out some custom themes from my new package DGThemes
, but also an interesting exploration of avocado prices, their influences, and practice in forecasting. The main statistical exploration is trying to find if there are any recurring seasonal patterns and identify their causes. There is certainly some clear evidence for this in the numbers, with logical support for these price fluctuations in the real world as well.
It looks like avocados cost more when they are organic, but not that much more!
Almost no avocados are organic, a suprisingly large amount are conventional at 97.2%.
Type | Average Volume | Percent |
---|---|---|
Organic | 47,811.21 | 2.8% |
Conventional | 1,653,212.90 | 97.2% |
The two plots below are two different ways of looking at organic and conventional prices over time, it is clear that organic has consistently been higher, and that prices mostly mirror each other - except for one time in later 2016 where organic avocados prices spiked significantly relative to conventional avocadoes.
It appears that the peak monthly prices for conventional and organic avocados are in September and October.
The yearly distribution of prices makes this more clear.
You can actually see a difference if you zoom in on this though - conventional avocadoes peak in price in October while organic avocados appear to peak in September.
Does that pattern appear the same over every year though? It looks like it is a trend of generally increasing prices with peaks at the same times each year, although there is an outlier year for conventional avocados in 2015, where the price steadily decreased almost all year.
Here’s one more way to look at this, it looks like the prices are almost exactly mirroring each other.
This is perhaps the best way to look at monthly price change, but I threw a polar twist on there as well to make this a slightly more interesting graphic. The late summer to early fall months clearly show the largest price peaks.
The prices of avocados have significantly increased both years, although there is some variance based on month the % increased, with some months even decreasing overall.
This is true for organic avocados as well as conventional avocados.
Fall prices are consistently higher than winter prices as has been observed previously.
Average monthly price looks like it mirrors volume, which makes sense - as supply decreases price increases goes the law of demand.
null device
1
So far we know that 2017 had the highest prices pretty clearly, increases in avocado prices for organic and conventional types tend to mirror each other the lowest prices occurred in the year of 2015.
Fun with polar again!
Another way of looking at seasonality trends is subseries plots, which are depicted below with: - Blue lines: Is the mean of the avocado month for a specific month. - Black lines: It is a way to see the fluctuations throughout each month.
The upward trend in price appears to be between June - September most of the time for organic avocados, and June - October for conventional avocados, with very similar trends. The graph also shows that each month has three peaks, with different parts representing the highest peak of the month each time.
Here I run some autocorrelations to look at the lags which show a high autocorrelation until the third lag in March, telling us that there is a strong linear relationship between these weeks for the first quarter. There is much less evidence for a linear relationship in other months, leading to the conclusion that there are no consistent patterns that show a linear relationship with prices for either conventional or organic avocados.
Autocorrelation formula:
\(r_{k} = \frac{\sum_{i=1}^{N-k}(Y_{i} - \bar{Y})(Y_{i+k} - \bar{Y})} {\sum_{i=1}^{N}(Y_{i} - \bar{Y})^{2} }\)
The price correlations indicate that correlations are not linear given the cubic pattern of the autocorrelation itself.
This has a more linear relationship, but there is little strength to the relationship.
Each of the following models has key differences that make them more/less effective at forecasting
Checking out the summary of the “best model” as given by auto.arima
“ARIMA(0,1,1)(1,1,0)[12]”ARIMA(0,1,1)(1,1,0)[12]".
ARIMA(0,1,0)(0,1,0)[12] : 26.09415
ARIMA(0,1,1)(0,1,0)[12] : 22.65824
ARIMA(0,1,2)(0,1,0)[12] : 24.31751
ARIMA(0,1,3)(0,1,0)[12] : Inf
ARIMA(0,1,4)(0,1,0)[12] : Inf
ARIMA(0,1,5)(0,1,0)[12] : Inf
ARIMA(1,1,0)(0,1,0)[12] : 21.36947
ARIMA(1,1,1)(0,1,0)[12] : 24.08884
ARIMA(1,1,2)(0,1,0)[12] : Inf
ARIMA(1,1,3)(0,1,0)[12] : Inf
ARIMA(1,1,4)(0,1,0)[12] : Inf
ARIMA(2,1,0)(0,1,0)[12] : 23.87939
ARIMA(2,1,1)(0,1,0)[12] : 27.59858
ARIMA(2,1,2)(0,1,0)[12] : Inf
ARIMA(2,1,3)(0,1,0)[12] : Inf
ARIMA(3,1,0)(0,1,0)[12] : 27.45598
ARIMA(3,1,1)(0,1,0)[12] : 32.10778
ARIMA(3,1,2)(0,1,0)[12] : Inf
ARIMA(4,1,0)(0,1,0)[12] : 32.03487
ARIMA(4,1,1)(0,1,0)[12] : Inf
ARIMA(5,1,0)(0,1,0)[12] : 35.72966
Best model: ARIMA(1,1,0)(0,1,0)[12]
ARIMA(0,1,0)(0,1,0)[12] : 37.55211
ARIMA(0,1,1)(0,1,0)[12] : Inf
ARIMA(0,1,2)(0,1,0)[12] : Inf
ARIMA(0,1,3)(0,1,0)[12] : Inf
ARIMA(0,1,4)(0,1,0)[12] : Inf
ARIMA(0,1,5)(0,1,0)[12] : Inf
ARIMA(1,1,0)(0,1,0)[12] : 33.97369
ARIMA(1,1,1)(0,1,0)[12] : Inf
ARIMA(1,1,2)(0,1,0)[12] : Inf
ARIMA(1,1,3)(0,1,0)[12] : Inf
ARIMA(1,1,4)(0,1,0)[12] : Inf
ARIMA(2,1,0)(0,1,0)[12] : 31.71183
ARIMA(2,1,1)(0,1,0)[12] : 35.5061
ARIMA(2,1,2)(0,1,0)[12] : Inf
ARIMA(2,1,3)(0,1,0)[12] : Inf
ARIMA(3,1,0)(0,1,0)[12] : 35.49753
ARIMA(3,1,1)(0,1,0)[12] : Inf
ARIMA(3,1,2)(0,1,0)[12] : Inf
ARIMA(4,1,0)(0,1,0)[12] : 39.99761
ARIMA(4,1,1)(0,1,0)[12] : 45.78658
ARIMA(5,1,0)(0,1,0)[12] : 45.41766
Best model: ARIMA(2,1,0)(0,1,0)[12]
Series: conv_ts
ARIMA(1,1,0)(0,1,0)[12]
Coefficients:
ar1
-0.6738
s.e. 0.2027
sigma^2 estimated as 0.1794: log likelihood=-8.18
AIC=20.37 AICc=21.37 BIC=21.79
Training set error measures:
ME RMSE MAE MPE MAPE
Training set 0.009446519 0.2995327 0.1849857 -2.174167 15.81608
MASE ACF1
Training set 0.6231098 -0.1040902
Ljung-Box test
data: Residuals from ARIMA(1,1,0)(0,1,0)[12]
Q* = 4.7453, df = 5, p-value = 0.4477
Model df: 1. Total lags used: 6
NULL
It seems pretty clear that the best model is the seasonal naive.
There is good evidence for the mean model being the best by this method of comparison.
Residuals can tell us the difference between the model and reality, and understand accuracy. The scales on the y axis alone indicate that the model works much better for predicting prices for conventional avocados.