Introduction

This project is mostly an opportunity for me to try out some custom themes from my new package DGThemes, but also an interesting exploration of avocado prices, their influences, and practice in forecasting. The main statistical exploration is trying to find if there are any recurring seasonal patterns and identify their causes. There is certainly some clear evidence for this in the numbers, with logical support for these price fluctuations in the real world as well.

Summary Statistics

Frequency of Avocado Prices by Type

It looks like avocados cost more when they are organic, but not that much more!

By the Numbers

Almost no avocados are organic, a suprisingly large amount are conventional at 97.2%.

Type	Average Volume	Percent
Organic	47,811.21	2.8%
Conventional	1,653,212.90	97.2%

Price over Time

The two plots below are two different ways of looking at organic and conventional prices over time, it is clear that organic has consistently been higher, and that prices mostly mirror each other - except for one time in later 2016 where organic avocados prices spiked significantly relative to conventional avocadoes.

Monthly Prices

Organic and Conventional

It appears that the peak monthly prices for conventional and organic avocados are in September and October.

The yearly distribution of prices makes this more clear.

You can actually see a difference if you zoom in on this though - conventional avocadoes peak in price in October while organic avocados appear to peak in September.

Does that pattern appear the same over every year though? It looks like it is a trend of generally increasing prices with peaks at the same times each year, although there is an outlier year for conventional avocados in 2015, where the price steadily decreased almost all year.

Here’s one more way to look at this, it looks like the prices are almost exactly mirroring each other.

Zooming In

This is perhaps the best way to look at monthly price change, but I threw a polar twist on there as well to make this a slightly more interesting graphic. The late summer to early fall months clearly show the largest price peaks.

Price Increases Year-to-Year

The prices of avocados have significantly increased both years, although there is some variance based on month the % increased, with some months even decreasing overall.

This is true for organic avocados as well as conventional avocados.

Seasonal Charting

Fall prices are consistently higher than winter prices as has been observed previously.

Monthly Analyses

Average monthly price looks like it mirrors volume, which makes sense - as supply decreases price increases goes the law of demand.

Higher Price Lower Volume or Lower Volume Higher Price?

null device
          1

Using Autoplot for Time Series Forecasting

So far we know that 2017 had the highest prices pretty clearly, increases in avocado prices for organic and conventional types tend to mirror each other the lowest prices occurred in the year of 2015.

Polar plots

Fun with polar again!

Seasonal Subseries Plots

Another way of looking at seasonality trends is subseries plots, which are depicted below with: - Blue lines: Is the mean of the avocado month for a specific month. - Black lines: It is a way to see the fluctuations throughout each month.

The upward trend in price appears to be between June - September most of the time for organic avocados, and June - October for conventional avocados, with very similar trends. The graph also shows that each month has three peaks, with different parts representing the highest peak of the month each time.

Stationary Time Series

Here I run some autocorrelations to look at the lags which show a high autocorrelation until the third lag in March, telling us that there is a strong linear relationship between these weeks for the first quarter. There is much less evidence for a linear relationship in other months, leading to the conclusion that there are no consistent patterns that show a linear relationship with prices for either conventional or organic avocados.

Autocorrelation formula:

\(r_{k} = \frac{\sum_{i=1}^{N-k}(Y_{i} - \bar{Y})(Y_{i+k} - \bar{Y})} {\sum_{i=1}^{N}(Y_{i} - \bar{Y})^{2} }\)

The price correlations indicate that correlations are not linear given the cubic pattern of the autocorrelation itself.

Weekly Autocorrelations

This has a more linear relationship, but there is little strength to the relationship.

Forecasting Methods

Smoothing Average

Naive and Drift Methods

Each of the following models has key differences that make them more/less effective at forecasting

Average Method: This is literally the average of the three years, these years are 2015, 2016 and 2017.
Naive Method: This is just setting the forecast value to the last value that was seen, if prices are above this line it indicates an upward trend, and the same for the inverse.
Seasonal Naive Method: This will look for the last value of the past year, which will depend on whether the data is monthly or weekly.
Drift Method: This is probably the best method here since it uses the pattern of prices using the average change of the historical data as a point of reference. One downside is that it ignores seasonality patterns.

ARIMA Model

Checking out the summary of the “best model” as given by auto.arima “ARIMA(0,1,1)(1,1,0)[12]”ARIMA(0,1,1)(1,1,0)[12]".


 ARIMA(0,1,0)(0,1,0)[12]                    : 26.09415
 ARIMA(0,1,1)(0,1,0)[12]                    : 22.65824
 ARIMA(0,1,2)(0,1,0)[12]                    : 24.31751
 ARIMA(0,1,3)(0,1,0)[12]                    : Inf
 ARIMA(0,1,4)(0,1,0)[12]                    : Inf
 ARIMA(0,1,5)(0,1,0)[12]                    : Inf
 ARIMA(1,1,0)(0,1,0)[12]                    : 21.36947
 ARIMA(1,1,1)(0,1,0)[12]                    : 24.08884
 ARIMA(1,1,2)(0,1,0)[12]                    : Inf
 ARIMA(1,1,3)(0,1,0)[12]                    : Inf
 ARIMA(1,1,4)(0,1,0)[12]                    : Inf
 ARIMA(2,1,0)(0,1,0)[12]                    : 23.87939
 ARIMA(2,1,1)(0,1,0)[12]                    : 27.59858
 ARIMA(2,1,2)(0,1,0)[12]                    : Inf
 ARIMA(2,1,3)(0,1,0)[12]                    : Inf
 ARIMA(3,1,0)(0,1,0)[12]                    : 27.45598
 ARIMA(3,1,1)(0,1,0)[12]                    : 32.10778
 ARIMA(3,1,2)(0,1,0)[12]                    : Inf
 ARIMA(4,1,0)(0,1,0)[12]                    : 32.03487
 ARIMA(4,1,1)(0,1,0)[12]                    : Inf
 ARIMA(5,1,0)(0,1,0)[12]                    : 35.72966



 Best model: ARIMA(1,1,0)(0,1,0)[12]


 ARIMA(0,1,0)(0,1,0)[12]                    : 37.55211
 ARIMA(0,1,1)(0,1,0)[12]                    : Inf
 ARIMA(0,1,2)(0,1,0)[12]                    : Inf
 ARIMA(0,1,3)(0,1,0)[12]                    : Inf
 ARIMA(0,1,4)(0,1,0)[12]                    : Inf
 ARIMA(0,1,5)(0,1,0)[12]                    : Inf
 ARIMA(1,1,0)(0,1,0)[12]                    : 33.97369
 ARIMA(1,1,1)(0,1,0)[12]                    : Inf
 ARIMA(1,1,2)(0,1,0)[12]                    : Inf
 ARIMA(1,1,3)(0,1,0)[12]                    : Inf
 ARIMA(1,1,4)(0,1,0)[12]                    : Inf
 ARIMA(2,1,0)(0,1,0)[12]                    : 31.71183
 ARIMA(2,1,1)(0,1,0)[12]                    : 35.5061
 ARIMA(2,1,2)(0,1,0)[12]                    : Inf
 ARIMA(2,1,3)(0,1,0)[12]                    : Inf
 ARIMA(3,1,0)(0,1,0)[12]                    : 35.49753
 ARIMA(3,1,1)(0,1,0)[12]                    : Inf
 ARIMA(3,1,2)(0,1,0)[12]                    : Inf
 ARIMA(4,1,0)(0,1,0)[12]                    : 39.99761
 ARIMA(4,1,1)(0,1,0)[12]                    : 45.78658
 ARIMA(5,1,0)(0,1,0)[12]                    : 45.41766



 Best model: ARIMA(2,1,0)(0,1,0)[12]

Series: conv_ts
ARIMA(1,1,0)(0,1,0)[12]

Coefficients:
          ar1
      -0.6738
s.e.   0.2027

sigma^2 estimated as 0.1794:  log likelihood=-8.18
AIC=20.37   AICc=21.37   BIC=21.79

Training set error measures:
                      ME      RMSE       MAE       MPE     MAPE
Training set 0.009446519 0.2995327 0.1849857 -2.174167 15.81608
                  MASE       ACF1
Training set 0.6231098 -0.1040902


    Ljung-Box test

data:  Residuals from ARIMA(1,1,0)(0,1,0)[12]
Q* = 4.7453, df = 5, p-value = 0.4477

Model df: 1.   Total lags used: 6

NULL

Comparing Forecasts

It seems pretty clear that the best model is the seasonal naive.

Comparing Methods Again

There is good evidence for the mean model being the best by this method of comparison.

Naïve Resiudals

Residuals can tell us the difference between the model and reality, and understand accuracy. The scales on the y axis alone indicate that the model works much better for predicting prices for conventional avocados.

ARIMA Model Forecasting

Conclusions

Expensive Organic Avocados: As note at the start, organic avocados are more expensive than conventional avocados.
Similar patterns by type: Most price patterns between organic and conventional avocados are matched, but there are some slight differences.
Volatility: Volatility increased each year, but the greatest volatility also matched the highest prices in 2017.
Buy avocados before fall!: The best season to buy either type of avocado is not fall, all other times of the year have pretty reasonable prices.
Downward trend in the long run: Based on the ARIMA model, for both types of avocados we can expect in the long run a downward trend in prices.

Avocado Prices: Now, Then, and in the Future