The easiest way to model seasonality is by using seasonal dummies. A more scientific method is to apply Fast Fourier Transform (FFT).
Definition of seasonality
Seasonality is defined as a repetitive pattern over a particular time period. It holds information on repeating patterns of customers’ shopping behaviour and product availability. Therefore, it is an important feature to capture within a time-series machine learning model.
Ways to model seasonality
The simplest way to model seasonality is to add time-related features. Examples of such seasonality indicators are:
- Quarter of the year
- Day of the week
- ISO week of the year
- Month of the year
- Warm/ cold weather indicator: one feature describing reasonably warm or cold months (Relatively warm months are the ones in quarters 2 and 3 and relatively cold months are in quarters 1 and 4)
These features have the benefit of being very easy to implement. The downside is that they undesirably capture the effect of recurrent business decisions.
Fast Fourier Transform (FFT)
A more scientific method of modelling seasonality is to create a Fourier term. The Fast Fourier Transform (FFT) method creates a sinusoid (Fourier term) which is repeated over a specified period of time.
Basic components of a Fourier term
A Fourier term is composed from the following components:
- Amplitude: the magnitude of the peak value
- Period: the amount of time required to complete one full cycle
- Frequency: the number of observations measured before the seasonal pattern repeats
Practical example
A practical example would make the theory easier to understand. The Google trends data for the searches of the keyword “meloen” (Dutch word for “melon” was used in this example. Here is a plot of the time series for a time period of four years.
It can be easily seen in the plot above, that searches for melon start to increase somewhere in June and peak in July for all years. This would give an indication that there is seasonality around summer.
The hypothesis may be tested with the following seasonality test:
from scipy.stats import friedmanchisquare
input_data["year"] = pd.DatetimeIndex(input_data["date"]).year
input_data["month"] = pd.DatetimeIndex(input_data["date"]).month
input_data.reset_index(inplace=True)
max_year = input_data[input_data["month"] == 12]["year"].max()
input_data = (input_data[(input_data["year"] >= min_year) &
(input_data["year"] <= max_year)])
values=label_field)
input_data[year_values[number_of_years - 3]],
input_data[year_values[number_of_years - 2]],
input_data[year_values[number_of_years - 1]])
if p > alpha:
print("Same distributions (fail to reject H0)")
season = True
else:
print("Different distributions (reject H0)")
season = False
return season
print(f"Seasonality is: {season}")
>> Same distributions (fail to reject H0)
>> Seasonality is True
The seasonality test indicates that it failed to reject null hypothesis H0, thus seasonality is present in the dataset.
This is indeed confirmed by searching for the crop season for melon in the Netherlands, which is three months: namely June, July and August. This would be the time when a lot of people living in the Netherlands would search where to buy melons from or on information on how to grow melons.
The FFT can be used to model the seasonality indicated by the test above.
from cmath import phase
import numpy as np
import pandas as pd
from scipy import fft
from scipy import signal as sig
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
import seaborn as sns
pdf["trend"] = pdf.apply(lambda row: row.name + 1, axis=1)
return pdf
# Performs fourier transformation
fft_output = fft.fft(pdf[column_name].to_numpy())
amplitude = np.abs(fft_output)
freq = fft.fftfreq(len(pdf[column_name].to_numpy()))
freq = freq[mask]
amplitude = amplitude[mask]
peaks = sig.find_peaks(amplitude[freq >= 0])[0]
peak_freq = freq[peaks]
peak_amplitude = amplitude[peaks]
fourier_output = pd.DataFrame()
fourier_output["index"] = peaks
fourier_output["freq"] = peak_freq
fourier_output["amplitude"] = peak_amplitude
fourier_output["period"] = 1 / peak_freq
fourier_output["fft"] = fft_output[peaks]
fourier_output["amplitude"] = fourier_output.fft.apply(lambda z: np.abs(z))
fourier_output["phase"] = fourier_output.fft.apply(lambda z: phase(z))
fourier_output["amplitude"] = fourier_output["amplitude"] / N
fourier_output = fourier_output[fourier_output["period"] >= period_min]
fourier_output = fourier_output[fourier_output["period"] <= period_max]
fourier_output_dict = fourier_output.to_dict("index")
pdf_temp = pdf[["trend"]]
lst_periods = [int(round(val, 0)) for val in lst_periods]
a = fourier_output_dict[key]["amplitude"]
w = 2 * math.pi * fourier_output_dict[key]["freq"]
p = fourier_output_dict[key]["phase"]
pdf_temp[key] = pdf_temp["trend"].apply(
lambda t: a * math.cos(w * t + p))
for column in list(fourier_output.index):
pdf_temp["FT_All"] = pdf_temp["FT_All"] + pdf_temp[column]
pdf["seasonality"] = pdf["seasonality"].round(4)
X = pdf[predictors]
y = pdf["searches"]
lm = LinearRegression()
model = lm.fit(X, y)
pdf["baseline"] = model.predict(X_predict)
pdf["baseline"] = pdf["baseline"].round(4)
return (fourier_output, pdf)
pdf = pdf.reset_index()
pdf = add_trend_term(pdf=pdf)
pdf,
column_name="searches",
period_min=period_min,
period_max=period_max
)
pdf = pdf.set_index("date")
sns.lineplot(data=pdf, x="date", y="searches",
label="searches", color="grey")
label="baseline", color="black")
axs.legend()
plt.show()
By adjusting the min_period
and max_period
parameters, a sinusoid is applied on a level specified based on the underlying data.
Experiment #1: Create a sinusoid on a weekly level
The first experiment is not satisfactory as it overfits on the underlying data and captures noise as seasonality. Melon is used for some Christmas dishes so the searches peak in December. This peak is considered noise and should not be modelled as seasonality.
Experiment #2: Create a sinusoid on a yearly level
The Fourier term in the second experiment captures the yearly seasonality for melon. The Fourier term may be used as a feature when building a model to capture seasonality.
Conclusions
Seasonality is an important aspect of a time series dataset. There is an easy way to model it by using seasonal dummies. A more scientific method is to apply FFT to model seasonality for a specified time period.
References
Melon seasonality in the Netherlands: https://www.vangeldernederland.nl/nl_NL/blog/item/meloen-seizoen-op-zn-top-241/
Google trends data: https://trends.google.com/trends/
GitHub code: https://github.com/frida-ah/fourier