Bitcoin Price Forecasting
¶
Milestone Project 3¶
The goal of this notebook is to explore time series forecasting by predicting Bitcoin prices (BitPredict)
What is a time series problem?¶
Time series deals with data over time:
- Predicting weather (short term and medium term forecasts), climate (long term forecasts)
- Predicting sales accross different geographies for different products in the catalog (might require a bit of geostatistics)
- Identifying special events in time, such as anomalies:
- Anomalous weather events (maybe temperature at different pressure levels is changing), SST anomalies in the equitorial pacific ocean
- Sensor data anomalies in an aircraft, consisting of thousands of sensors
The problems can be broadly categorized into two categories:
Problem Type | Examples | Output |
---|---|---|
Classification | Anomaly detection, time series identification (where did this time series come from?) | Discrete (a label) |
Forecasting | Predicting stock market prices, forecasting future demand for a product, stocking inventory requirements | Continuous (a number) |
Contents of this Notebook¶
- Acquire the bitcoin time series data csv file
- Format data for time series problem
- Split into training and test sets the wrong way
- Split into training and test sets the right way
- Visualizing the data
- Converting time series forecasting into a supervised learning task
- Preparing univariate and multivariate time series data
- Evaluate a time series forecasting model
- Generally, this would depend on how much ahead forecast we are evaluating. 1-day error, 2-day error
- Evaluation methods can be done for models which are trained differently. For e.g. we train a 3 window 1 horizon
model 1
, and 7 window 2 horizonmodel 2
, and we want to evaluate the 2 day forecast error rate. In this case formodel 1
, we would have to forecast recursively to get the 2 day ahead. forecast
- Setting up different deep learning modelling experiments:
- Dense models
- Sequence models (CNN/LSTM)
- Ensembling methods
- Multivariate models
- Re-implementing the N-BEATS algorithm using Tensorflow
- Creating modelling checkpoints to save the best model
- Forecasting using a time series model
- Estimation of prediction confidence intervals (quantifying forecast uncertainty)
- Discussing two types of uncertainty in machine learning
- Data uncertainty
- Model uncertainty
- Demonstrating why forecasting in an open system is BS (the turkey problem)
Get the data¶
We have the bitcoin price data downloaded from https://www.coindesk.com/price/bitcoin:
- Click on all and export data as csv
Change project directory¶
import os
os.chdir('/content/drive/MyDrive/projects/Tensorflow-tutorial-Daniel-Bourke/notebooks')
import sys
sys.path.append('../')
!pip install scikit-learn==0.24
Collecting scikit-learn==0.24 Downloading https://files.pythonhosted.org/packages/b1/ed/ab51a8da34d2b3f4524b21093081e7f9e2ddf1c9eac9f795dcf68ad0a57d/scikit_learn-0.24.0-cp37-cp37m-manylinux2010_x86_64.whl (22.3MB) |████████████████████████████████| 22.3MB 7.0MB/s Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.24) (1.19.5) Collecting threadpoolctl>=2.0.0 Downloading https://files.pythonhosted.org/packages/f7/12/ec3f2e203afa394a149911729357aa48affc59c20e2c1c8297a60f33f133/threadpoolctl-2.1.0-py3-none-any.whl Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.24) (1.0.1) Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.24) (1.4.1) Installing collected packages: threadpoolctl, scikit-learn Found existing installation: scikit-learn 0.22.2.post1 Uninstalling scikit-learn-0.22.2.post1: Successfully uninstalled scikit-learn-0.22.2.post1 Successfully installed scikit-learn-0.24.0 threadpoolctl-2.1.0
!nvidia-smi
Sun Jun 27 08:01:04 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 465.27 Driver Version: 460.32.03 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... Off | 00000000:00:04.0 Off | 0 | | N/A 45C P0 31W / 250W | 0MiB / 16280MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
DATA_FILE = '../data/BTC_USD_2013-10-01_2021-06-24-CoinDesk.csv'
import pandas as pd
df = pd.read_csv(DATA_FILE, parse_dates=['Date'], index_col=['Date'])
df.head()
Currency | Closing Price (USD) | 24h Open (USD) | 24h High (USD) | 24h Low (USD) | |
---|---|---|---|---|---|
Date | |||||
2013-10-01 | BTC | 123.65499 | 124.30466 | 124.75166 | 122.56349 |
2013-10-02 | BTC | 125.45500 | 123.65499 | 125.75850 | 123.63383 |
2013-10-03 | BTC | 108.58483 | 125.45500 | 125.66566 | 83.32833 |
2013-10-04 | BTC | 118.67466 | 108.58483 | 118.67500 | 107.05816 |
2013-10-05 | BTC | 121.33866 | 118.67466 | 121.93633 | 118.00566 |
df.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2823 entries, 2013-10-01 to 2021-06-24 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Currency 2823 non-null object 1 Closing Price (USD) 2823 non-null float64 2 24h Open (USD) 2823 non-null float64 3 24h High (USD) 2823 non-null float64 4 24h Low (USD) 2823 non-null float64 dtypes: float64(4), object(1) memory usage: 132.3+ KB
# No of samples
df.shape
(2823, 5)
# Date range
df.index.min(), df.index.max()
(Timestamp('2013-10-01 00:00:00'), Timestamp('2021-06-24 00:00:00'))
Let's check if any day is missing or not
date_diff = pd.Series(df.index).diff()
date_diff.value_counts()
1 days 2821 2 days 1 Name: Date, dtype: int64
There is one data point where the difference is 2 days. Except for one data point, the time series is samples at daily frequency. But a time series can be samples at various frequencies as follows:
1 sample per timeframe | Number of samples per year |
---|---|
Second | 31,536,000 |
Hour | 8,760 |
Day | 365 |
Week | 52 |
Month | 12 |
Components of a time series¶
- Trend: The long term movement of the data (whether going higher or lower).
- Seasonality - Periodic pattern in the data (which occurs at fixed time interval) - less than a year
- Weather obviously has a seasonal component
- Electricity load also has a strong seasonal component, where in equitorial countries, especially in urban areas, summers will lead to peak consumption
- Cyclicity - Rise and fall cyclic patterns in the data (not at fixed intervals, often at longer time scales)
We will only be concerned with the closing price of Bitcoin, so will remove all other columns
SELECT_COLUMN = 'Closing Price (USD)'
bitcoin_prices = df[[SELECT_COLUMN]].rename(columns={SELECT_COLUMN: 'price'})
bitcoin_prices.index.name = 'date'
bitcoin_prices.head()
price | |
---|---|
date | |
2013-10-01 | 123.65499 |
2013-10-02 | 125.45500 |
2013-10-03 | 108.58483 |
2013-10-04 | 118.67466 |
2013-10-05 | 121.33866 |
def get_date_range(ser, format=None):
ser = pd.to_datetime(ser, format=format)
ser = pd.Series(ser)
return ser.agg(['min', 'max'])
dt_range = get_date_range(bitcoin_prices.index)
dt_range
min 2013-10-01 max 2021-06-24 Name: date, dtype: datetime64[ns]
Let us plot it
import matplotlib.pyplot as plt
bitcoin_prices.plot(figsize=(12, 7))
plt.ylabel(SELECT_COLUMN)
plt.title('Price of Bitcoin from {min} to {max}'.format(**dt_range.dt.strftime('%d %B %Y')),
fontdict=dict(weight='bold', size=15));
Importing data using the Python's csv
module¶
For no apparent reason, let us try to read the data using the Python's csv module
import csv
with open('../data/BTC_USD_2013-10-01_2021-06-24-CoinDesk.csv', 'r') as f:
csv_reader = csv.reader(f)
header = next(csv_reader)
lines = []
for line in csv_reader:
lines.append(line)
pd.DataFrame(lines, columns=header).head()
Currency | Date | Closing Price (USD) | 24h Open (USD) | 24h High (USD) | 24h Low (USD) | |
---|---|---|---|---|---|---|
0 | BTC | 2013-10-01 | 123.65499 | 124.30466 | 124.75166 | 122.56349 |
1 | BTC | 2013-10-02 | 125.455 | 123.65499 | 125.7585 | 123.63383 |
2 | BTC | 2013-10-03 | 108.58483 | 125.455 | 125.66566 | 83.32833 |
3 | BTC | 2013-10-04 | 118.67466 | 108.58483 | 118.675 | 107.05816 |
4 | BTC | 2013-10-05 | 121.33866 | 118.67466 | 121.93633 | 118.00566 |
I am not so sure, why we did this all of a sudden. But okay. Moving on.
Formatting Data - 1. Creating training and tests sets the wrong way¶
Let's first turn our timesteps and prices into numpy arrays
timesteps, prices = bitcoin_prices.index.to_numpy(), bitcoin_prices['price'].to_numpy()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(timesteps, prices, test_size=0.2, random_state=42)
X_train.shape, X_test.shape
((2258,), (565,))
plt.figure(figsize=(12, 7))
plt.scatter(X_train, y_train, s=5, label='train')
plt.scatter(X_test, y_test, s=5, label='test')
plt.legend()
plt.title('Splitting time series data into train and test sets (the wrong way)',
fontdict=dict(size=15, weight='bold'));
The training and sets are very much overlapping. In real world, the test set for a time series data would be the data that is about to come (in the future). Hence, the above split is wrong, as some test data points are before the training data points, meaning thereby we are using the training data points to predict the past as well.
2. Create train and test sets the right way¶
The right way to create the test set is to emulate how we expect the real world test data to come i.e. the future. So we take train dataset as the past data points (80%), and the test data as the future data points (20%).
train_prop = 0.8
train_size = int(train_prop * len(prices))
X_train, y_train = timesteps[:train_size], prices[:train_size]
X_test, y_test = timesteps[train_size:], prices[train_size:]
len(X_train), len(X_test)
(2258, 565)
Now let us plot it
plt.figure(figsize=(12, 7))
plt.scatter(X_train, y_train, label='train', s=5)
plt.scatter(X_test, y_test, label='test', s=5)
plt.xlabel('Date')
plt.ylabel(SELECT_COLUMN);
plt.legend()
plt.title('Splitting time series data into train and test sets (the right way)',
fontdict=dict(size=15, weight='bold'));
Note: The splits can vary 80/20, 90/10, 95/5 depending upon the amount of data you have. We might even have much data, in which case we can use AIC/BIC or other such model fit metrics.
Resource: The trickier aspects of time series modelling. See: 3 facts about time series forecasting that surprise experienced machine learning practitioners
def plot_bitcoin(tsteps, prices, linetype='.', start=0, end=None, label=None):
plt.plot(tsteps[start:end], prices[start:end], linetype, label=label)
plt.xlabel('Date')
plt.ylabel(SELECT_COLUMN)
if label:
plt.legend(fontsize=15)
plt.grid(True)
plt.figure(figsize=(12, 7))
plot_bitcoin(X_train, y_train, linetype='-', label='Train data')
plot_bitcoin(X_test, y_test, linetype='-', label='Test data')
Modelling Experiments¶
Two terms are really important in the type of forecasting model we are going to build:
- Window - The number of timesteps we take to predict into the future
- Horizon - The number of timesteps ahead into the future we predict.
For example, a weather forecasting model with Window = 7, and Horizon = 2, uses the last 7 days weather information to predict the next two days' weather.
Model Number | Model Type | Horizon size | Window size | Extra data |
---|---|---|---|---|
0 | Naïve forecast model (baseline) | NA | NA | NA |
1 | Dense model | 1 | 7 | NA |
2 | Dense model variation | 1 | 30 | NA |
3 | Dense model variation | 7 | 30 | NA |
4 | Conv1D | 1 | 7 | NA |
5 | LSTM | 1 | 7 | NA |
6 | Dense model variation (but with multivariate data) | 1 | 7 | Block reward size |
7 | N-BEATs Algorithm | 1 | 7 | NA |
8 | Ensemble (multiple models optimized on different loss functions) | 1 | 7 | NA |
9 | Future prediction model (model to predict future values) | 1 | 7 | NA |
10 | Dense Model variation (but with turkey 🦃 data introduced) | 1 | 7 | NA |
MODELS = {}
MODELS_INFO = {}
RESULTS = {}
PREDICTIONS = {}
Model 0: Naive Forecast (Baseline)¶
This is the most common baseline model for time series forecasting, which just sets the future one-day ahead forecast as the current day value.
$$\hat{y_t} = y_{t-1}$$The estimate of the
t
th timestep is just equal to the value of thet-1
th
WINDOW = 1
HORIZON = 1
model_name = f'naive-model-baseline_W{WINDOW}H{HORIZON}'
def naive_model(y):
return y[:-1]
naive_forecast = naive_model(y_test)
PREDICTIONS[model_name] = naive_forecast
MODELS[model_name] = naive_model
plt.figure(figsize=(12, 7))
plot_bitcoin(tsteps=X_test, prices=y_test, label='test data')
plot_bitcoin(tsteps=X_test[1:], prices=naive_forecast, linetype='-', label='Naive forecast')
Evaluating a time series model¶
Evaluation of a time series can be carried at various levels of
n_head
forecast horizons. A model trained for any horizon (= 1, say), can be used to forecast to an arbitrary future timesteps ahead by utilising its own predictions in a recursive manner.For the above model, we can calculate various continous value evaluation metrics such as MAE, RMSE
Scale-dependent errors¶¶
These are metrics which can be used to compare time series values and forecasts that are on the same scale.
For example, Bitcoin historical prices in USD veresus Bitcoin forecast values in USD.
Metric | Details | Code |
---|---|---|
MAE (mean absolute error) | Easy to interpret (a forecast is X amount different from actual amount). Forecast methods which minimises the MAE will lead to forecasts of the median. | tf.keras.metrics.mean_absolute_error() |
RMSE (root mean square error) | Forecasts which minimise the RMSE lead to forecasts of the mean. | tf.sqrt( tf.keras.metrics.mean_square_error() ) |
Percentage errors¶¶
Percentage errors do not have units, this means they can be used to compare forecasts across different datasets.
Metric | Details | Code |
---|---|---|
MAPE (mean absolute percentage error) | Most commonly used percentage error. May explode (not work) if y=0 . |
tf.keras.metrics.mean_absolute_percentage_error() |
sMAPE (scaled mean absolute percentage error) | Recommended not to be used by Forecasting: Principles and Practice, though it is used in forecasting competitions. | Custom implementation |
Scaled errors¶¶
Scaled errors are an alternative to percentage errors when comparing forecast performance across different time series.
Metric | Details | Code |
---|---|---|
MASE (mean absolute scaled error). | MASE equals one for the naive forecast (or very close to one). A forecast which performs better than the naïve should get <1 MASE. | See sktime's mase_loss() |
import tensorflow as tf
# Compared to the naive forecast, what is my MAE?
def mean_absolute_scaled_error(y_true, y_pred):
''' Return MASE (assuming no seasonality in the data)'''
mae = tf.reduce_mean(tf.abs(y_true - y_pred), axis=1)
mae_naive_no_seasonlity = tf.reduce_mean(tf.abs(y_true[1:] - y_pred[:-1]), axis=1)
return mae / mae_naive_no_seasonlity
- For a naive forecast the MASE will be 1. For a model better than the naive forecast, it should be less than 1.
- Worse model: MASE > 1
- Better model: MASE < 1
from sklearn import metrics as skmetrics
import numpy as np
def evaluate_preds(y_true, y_pred):
# Make sure the predictions are in float32 format
y_true = np.broadcast_to(tf.cast(y_true, dtype=tf.float32), y_pred.shape)
y_pred = tf.cast(y_pred, dtype=tf.float32).numpy()
# Calculate all the metrics
mae = skmetrics.mean_absolute_error(y_true, y_pred, multioutput='raw_values')
mse = skmetrics.mean_squared_error(y_true, y_pred, multioutput='raw_values')
rmse = np.sqrt(mse)
mape = skmetrics.mean_absolute_percentage_error(y_true, y_pred, multioutput='raw_values')
# mase = mean_absolute_scaled_error(y_true, y_pred)
return {
'mae': mae,
'mse': mse,
'rmse': rmse,
'mape': mape,
}
RESULTS[model_name] = evaluate_preds(y_test[1:], naive_forecast)
RESULTS[model_name]
{'mae': array([645.9816], dtype=float32), 'mape': array([0.02631178], dtype=float32), 'mse': array([1368900.2], dtype=float32), 'rmse': array([1170.0001], dtype=float32)}
# Fine the average price of Bitcoin in test dataset
tf.reduce_mean(y_test).numpy()
21796.140229169752
We can interpret this as follows:
- The average price in our bitcoin dataset is \$20,056 (Since there is one large peak which has its value almost 3x the average), this metric might give a skewed view of the interpretation
- Each prediction is off by \$645
Depends on how much close to the true value we want our estimate to be, this might or might not be a good model.
Various time series models which we can use for a baseline estimation¶
Model/Library Name | Resource |
---|---|
Moving average | https://machinelearningmastery.com/moving-average-smoothing-for-time-series-forecasting-python/ |
ARIMA (Autoregression Integrated Moving Average) | https://machinelearningmastery.com/arima-for-time-series-forecasting-with-python/ |
sktime (Scikit-Learn for time series) | https://github.com/alan-turing-institute/sktime |
TensorFlow Decision Forests (random forest, gradient boosting trees) | https://www.tensorflow.org/decision_forests |
Facebook Kats (purpose-built forecasting and time series analysis library by Facebook) | https://github.com/facebookresearch/Kats |
LinkedIn Greykite (flexible, intuitive and fast forecasts) | https://github.com/linkedin/greykite |
Formatting data - 2. Windowing the dataset¶
- Windowing is a method for turning a time series dataset into supervised learning problem
For example, for a model with
window=7
andhorizon=1
, the window-horizon pair will look like:Data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] \
[0, 1, 2, 3, 4, 5, 6] -> [7] \ [1, 2, 3, 4, 5, 6, 7] -> [8] \ [2, 3, 4, 5, 6, 7, 8] -> [9] \
HORIZON = 1
WINDOW = 7
def get_labelled_windows(x, horizon=1):
return x[:, :-horizon], x[:, -horizon:]
test_window, test_label = get_labelled_windows(tf.expand_dims(tf.range(8) +1, axis=0), horizon=HORIZON)
print(f'Window: {tf.squeeze(test_window).numpy()} -> Label: {tf.squeeze(test_label.numpy())}')
Window: [1 2 3 4 5 6 7] -> Label: 8
import numpy as np
def make_window_horizon_pairs(x, window, horizon):
window_step = np.expand_dims(np.arange(window + horizon), axis=0)
window_adders = np.expand_dims(np.arange(len(x) - (window + horizon - 1)), axis=0).T
window_indices = window_step + window_adders
windowed_array = x[window_indices]
windows, labels = get_labelled_windows(windowed_array, horizon)
return windows, labels
full_windows, full_labels = make_window_horizon_pairs(prices, window=WINDOW, horizon=HORIZON)
len(full_windows), len(full_labels)
(2816, 2816)
# View first three window horizon pairs
for i in range(3):
print(f'Window: {full_windows[i]} -> Label: {full_labels[i]}')
Window: [123.65499 125.455 108.58483 118.67466 121.33866 120.65533 121.795 ] -> Label: [123.033] Window: [125.455 108.58483 118.67466 121.33866 120.65533 121.795 123.033 ] -> Label: [124.049] Window: [108.58483 118.67466 121.33866 120.65533 121.795 123.033 124.049 ] -> Label: [125.96116]
Note: The function
tf.keras.preprocessing.timeseries_dataset_from_array()
does a similar job of preparing the dataset in such a format + has the added advantage of returning atf.data.Dataset
object which offers several benefits over traditional numpy array datasets.
Turning windows into training and test sets¶
def make_train_test_splits(windows, labels, test_prop=0.2):
split_size = int(len(windows)*(1-test_prop))
train_windows, train_labels = windows[:split_size], labels[:split_size]
test_windows, test_labels = windows[split_size:], labels[split_size:]
return train_windows, train_labels, test_windows, test_labels
train_windows, train_labels, test_windows, test_labels = make_train_test_splits(full_windows, full_labels)
len(train_windows), len(test_windows)
(2252, 564)
So now, we have split the time series data into 80%-20% splits.
Make a modelling checkpoint¶
We want to compare each model's best performance against each other. For this reason we would use ModelCheckpoint
callback to save the best model weights during training. This would also help us resume training.
import tensorflow as tf
TASK = 'bitcoin_time_series_prediction'
def create_model_checkpoint(model_name):
return tf.keras.callbacks.ModelCheckpoint(f'../checkpoints/{TASK}/{model_name}', verbose=0, save_best_only=True)
from src.visualize import plot_learning_curve
Function for easy training and reloading of KerasModel
¶
CHECKPOINT_DIR = f'../checkpoints/bitcoin_time_series_prediction/'
HISTORY_LOG_DIR = f'../history_logs/bitcoin_time_series_prediction/'
import json
def train_keras_model(training_func, model_name,
model_or_checkpoint_dir=CHECKPOINT_DIR, history_log_dir=HISTORY_LOG_DIR):
model_or_checkpoint_dir = os.path.join(model_or_checkpoint_dir, model_name)
history_log_dir = os.path.join(history_log_dir, f'{model_name}.json')
try:
model = tf.keras.models.load_model(model_or_checkpoint_dir)
print('loaded model successfully!')
with open(history_log_dir, 'r') as f:
history_dict = json.load(f)
print('loaded history_dict successfully!')
except Exception:
history = training_func()
history_dict = history.history
model = history.model
with open(history_log_dir, 'w') as f:
json.dump(history_dict, f)
return model, history_dict # Because I know the format of return value
Model 1: Dense Model (Window = 7, Horizon = 1)¶
We will create a simple Dense model which takes in 7 preceding timesteps as input and predicts 1 day ahead.
This will be the configuration of the simple dense model:
- A single dense layer with 128 neurons
- An output layer with no activation
- Adam optimizer and MAE loss function
- Batch size of 128
- 100 epochs
model_name = 'simple-dense_W7H1'
from tensorflow.keras import layers
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(WINDOW,)),
layers.Dense(HORIZON, activation='linear')
], name=model_name)
model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(), metrics=['mse'])
model.summary()
Model: "simple-dense_W7H1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 128) 1024 _________________________________________________________________ dense_1 (Dense) (None, 1) 129 ================================================================= Total params: 1,153 Trainable params: 1,153 Non-trainable params: 0 _________________________________________________________________
Training function¶
def training_func():
history = model.fit(train_windows, train_labels,
validation_data=(test_windows, test_labels), # Ideally these should have been called validation_windows, validation_labels
epochs=100,
batch_size=128,
callbacks=[create_model_checkpoint(model_name=model_name)])
return history
model, history_dict = train_keras_model(training_func, model_name)
MODELS[model_name] = model
loaded model successfully! loaded history_dict successfully!
Since the dataset is so small, the model trains really fast, even in CPU.
Learning curve¶
plot_learning_curve(history_dict, extra_metric='mse');
Making forecasts with a time series model¶
def make_preds(model, data):
return tf.squeeze(model.predict(data))
test_windows.shape
(564, 7)
test_labels.shape
(564, 1)
preds = make_preds(model, test_windows)
PREDICTIONS[model_name] = preds
results = evaluate_preds(tf.squeeze(test_labels), preds)
RESULTS[model_name] = results
results
{'mae': array([657.3211], dtype=float32), 'mape': array([0.02734248], dtype=float32), 'mse': array([1407883.4], dtype=float32), 'rmse': array([1186.5426], dtype=float32)}
plt.figure(figsize=(12, 7))
plot_bitcoin(tsteps=X_test[-len(test_windows):], prices=test_labels[:, 0], label='test data')
plot_bitcoin(tsteps=X_test[-len(test_windows):], prices=preds, linetype='-', label='prediction')
These predictions are not the actual forecasts but predictions on the test dataset for Window = 7
and Horizon = 1
Model 2: Dense (Window = 30, horizon = 1)¶
We will keep the previous model architecture, but make the window size = 30 i.e. we will use the past 30 timesteps to predict the next day.
HORIZON = 1
WINDOW = 30
Make window-horizon pair dataset¶
full_windows, full_labels = make_window_horizon_pairs(prices, window=WINDOW, horizon=HORIZON)
full_windows.shape, full_labels.shape
((2793, 30), (2793, 1))
Make train test splits¶
train_windows, train_labels, test_windows, test_labels = make_train_test_splits(full_windows, full_labels)
len(train_windows), len(test_windows)
(2234, 559)
model_name = f'simple-dense_W{WINDOW}H{HORIZON}'
tf.random.set_seed(42)
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(WINDOW,)),
layers.Dense(HORIZON, activation='linear')
], name=model_name)
model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(), metrics=['mse'])
model.summary()
Model: "simple-dense_W30H1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_2 (Dense) (None, 128) 3968 _________________________________________________________________ dense_3 (Dense) (None, 1) 129 ================================================================= Total params: 4,097 Trainable params: 4,097 Non-trainable params: 0 _________________________________________________________________
def training_func():
history = model.fit(train_windows, train_labels,
validation_data=(test_windows, test_labels),
epochs=100, batch_size=128,
callbacks=[create_model_checkpoint(model_name=model_name)])
return history
model, history_dict = train_keras_model(training_func, model_name)
MODELS[model_name] = model
loaded model successfully! loaded history_dict successfully!
Learning curve¶
plot_learning_curve(history_dict, extra_metric='mse');
Get predictions¶
preds = make_preds(model, test_windows)
PREDICTIONS[model_name] = preds
Evaluate the model¶
results = evaluate_preds(tf.squeeze(test_labels), preds)
RESULTS[model_name] = results
results
{'mae': array([695.2544], dtype=float32), 'mape': array([0.02886072], dtype=float32), 'mse': array([1513971.9], dtype=float32), 'rmse': array([1230.4357], dtype=float32)}
pd.set_option('display.float_format', lambda x: '%.3f'%x)
pd.DataFrame(RESULTS)
naive-model-baseline_W1H1 | simple-dense_W7H1 | simple-dense_W30H1 | |
---|---|---|---|
mae | [645.9816] | [657.3211] | [695.2544] |
mse | [1368900.2] | [1407883.4] | [1513971.9] |
rmse | [1170.0001] | [1186.5426] | [1230.4357] |
mape | [0.026311785] | [0.027342476] | [0.028860716] |
The 30 window model performed worse than the 7 window model!
plt.figure(figsize=(10, 7))
plot_bitcoin(tsteps=X_test[-len(test_windows):], prices=test_labels[:, 0], label='test data')
plot_bitcoin(tsteps=X_test[-len(test_windows):], prices=test_labels[:, 0], linetype='-', label='predicted')
Model 3: Dense (Window = 30, horizon = 7)¶
Now we will predict a week ahead (horizon = 7) using all the past month data (window = 30)
Make window horizon pairs¶
HORIZON = 7
WINDOW = 30
full_windows, full_labels = make_window_horizon_pairs(prices, window=WINDOW, horizon=HORIZON)
full_windows.shape, full_labels.shape
((2787, 30), (2787, 7))
Make train test splits¶
train_windows, train_labels, test_windows, test_labels = make_train_test_splits(full_windows, full_labels)
len(train_windows), len(test_windows)
(2229, 558)
We will again use the same architechture as model 1
model_name = f'simple-dense_W{WINDOW}H{HORIZON}'
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(WINDOW,)),
layers.Dense(HORIZON, activation='linear')
], name=model_name)
model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(), metrics=['mse'])
model.summary()
Model: "simple-dense_W30H7" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_4 (Dense) (None, 128) 3968 _________________________________________________________________ dense_5 (Dense) (None, 7) 903 ================================================================= Total params: 4,871 Trainable params: 4,871 Non-trainable params: 0 _________________________________________________________________
def training_func():
return model.fit(train_windows, train_labels, validation_data=(test_windows, test_labels),
batch_size=128, epochs=100, verbose=0,
callbacks=[create_model_checkpoint(model_name=model_name)])
model, history_dict = train_keras_model(training_func, model_name)
MODELS[model_name] = model
loaded model successfully! loaded history_dict successfully!
Make predictions¶
preds = make_preds(model, test_windows)
preds.shape
TensorShape([558, 7])
Evaluate the predictions¶
We have predictions for each time step in the horizon. Hence our error metrics will also be for 7 timesteps in the horizon.
results = evaluate_preds(tf.broadcast_to(test_labels, preds.shape), preds)
RESULTS[model_name] = results
results
{'mae': array([ 765.6764, 985.5112, 1252.9808, 1483.6759, 1665.6678, 1832.3729, 2038.6433], dtype=float32), 'mape': array([0.03302484, 0.04054034, 0.0505597 , 0.05939244, 0.06612409, 0.07350221, 0.08160087], dtype=float32), 'mse': array([ 1736351.6, 3018528. , 4652685.5, 6375201. , 8422235. , 10376389. , 12933791. ], dtype=float32), 'rmse': array([1317.7069, 1737.3911, 2157.0085, 2524.916 , 2902.1086, 3221.2402, 3596.358 ], dtype=float32)}
1-Day ahead forecast¶
plt.figure(figsize=(12, 7))
plot_bitcoin(tsteps=X_test[-len(test_windows):], prices=test_labels[:, 0], label='test data')
plot_bitcoin(tsteps=X_test[-len(test_windows):], prices=preds[:, 0], linetype='-', label='test data')
plt.title('1-Day ahead forecast', fontdict=dict(weight='bold', size=20))
Text(0.5, 1.0, '1-Day ahead forecast')
2-Day ahead forecast¶
plt.figure(figsize=(12, 7))
plot_bitcoin(tsteps=X_test[-len(test_windows):], prices=test_labels[:, 0], start=300, label='test data')
plot_bitcoin(tsteps=X_test[-len(test_windows):], prices=preds[:, 1], start=300, linetype='-', label='test data')
plt.title('1-Day ahead forecast', fontdict=dict(weight='bold', size=20))
Text(0.5, 1.0, '1-Day ahead forecast')
Which Dense model is performing best?¶
def _get_day1_pred(x):
try:
return x[0]
except Exception:
return x
result_df = pd.DataFrame(RESULTS)
result_df.applymap(_get_day1_pred).loc[['mae']].plot(figsize=(12, 7), kind='bar');
- The Simple Dense model with Window = 7 and Horizon = 30 has the highest 1-day ahead
mean_absolute_error
i.e. it performs the worst out of all. - Still, the naive baseline model has the best i.e lowest
mean_absolute_error
rate out of all. But remember, you can't really use it to forecast ahead, as all the predictions will be set to the last timestep, which is set to its last and so on. So it will be a flat prediction.- One of the main reason why naive forecast performs so good is because time series data has autocorrelation i.e.
y_t
correlates withy_{t-1}
- One of the main reason why naive forecast performs so good is because time series data has autocorrelation i.e.
Model 4: Conv1D¶
HORIZON = 1
WINDOW = 7
Make window horizon pairs¶
full_windows, full_labels = make_window_horizon_pairs(prices, window=WINDOW, horizon=HORIZON)
full_windows.shape, full_labels.shape
((2816, 7), (2816, 1))
Create train test splits¶
train_windows, train_labels, test_windows, test_labels = make_train_test_splits(full_windows, full_labels)
len(train_windows), len(test_windows)
(2252, 564)
Alright, now our data is windowed as well as split into train and test sets. Are we prepared now to input the data into the CNN model? Not quite. Here is why..
The Conv1D
layer in Tensorflow expects the input to be : (batch_size, timesteps, num_features)
So in our case:
batch_size
is 32 which we choose and is taken care oftimesteps
is the Window which is set to 7num_features
is the number of features = 1 i.e. the time, which is there but not explicitly reflecting in the shape of the data, hence we need to expand the dimension.
Let us create an expand_dim
layer for this.
x = tf.constant(train_windows[0])
expand_dims_layer = layers.Lambda(lambda x: tf.expand_dims(x, axis=-1))
print(f'Original shape: {x.shape}')
print(f'Expanded shape: {expand_dims_layer(x).shape}')
Original shape: (7,) Expanded shape: (7, 1)
Create the model¶
model_name = f'Conv1D_W{WINDOW}H{HORIZON}'
model = tf.keras.Sequential([
layers.InputLayer(input_shape=(WINDOW,)),
expand_dims_layer,
layers.Conv1D(filters=128, kernel_size=5, padding='causal', activation='relu'),
layers.Flatten(),
layers.Dense(HORIZON)
], name=model_name)
model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(), metrics=['mse'])
model.summary()
Model: "Conv1D_W7H1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lambda (Lambda) (None, 7, 1) 0 _________________________________________________________________ conv1d (Conv1D) (None, 7, 128) 768 _________________________________________________________________ flatten (Flatten) (None, 896) 0 _________________________________________________________________ dense_6 (Dense) (None, 1) 897 ================================================================= Total params: 1,665 Trainable params: 1,665 Non-trainable params: 0 _________________________________________________________________
def training_func():
return model.fit(train_windows, train_labels, validation_data=(test_windows, test_labels),
batch_size=128, epochs=100, callbacks=[create_model_checkpoint(model_name=model_name)])
model, history_dict = train_keras_model(training_func, model_name)
MODELS[model_name] = model
loaded model successfully! loaded history_dict successfully!
Learning Curve¶
plot_learning_curve(history_dict);
Highly, highly overfitted!
Make predictions¶
preds = make_preds(model, test_windows)
PREDICTIONS[model_name] = preds
preds
<tf.Tensor: shape=(564,), dtype=float32, numpy= array([ 7541.4507, 7538.419 , 7369.965 , 7237.2026, 7184.724 , 7187.412 , 7223.9355, 7114.046 , 7084.255 , 6921.393 , 6631.929 , 7186.379 , 7276.2285, 7163.8823, 7198.8374, 7248.9175, 7206.078 , 7187.368 , 7214.822 , 7191.6914, 7201.186 , 7285.3677, 7373.8467, 7282.6475, 7175.0635, 7152.6455, 7003.6787, 7191.7744, 7337.9785, 7364.6484, 7612.7056, 7979.748 , 8113.6357, 7875.896 , 7981.4604, 8091.0244, 8157.432 , 8111.8335, 8574.723 , 8873.057 , 8779.799 , 8846.266 , 8930.719 , 8732.35 , 8607.697 , 8668.239 , 8669.643 , 8429.824 , 8387.077 , 8361.985 , 8510.619 , 8810.689 , 9100.28 , 9304.295 , 9503.933 , 9445.041 , 9359.061 , 9356.293 , 9298.008 , 9188.378 , 9492.299 , 9700.635 , 9790.46 , 9883.514 , 10085.147 , 9929.859 , 10114.081 , 10340.126 , 10287.595 , 10307.725 , 10007.113 , 9859.378 , 9656.468 , 10041.84 , 9855.681 , 9610.647 , 9598.509 , 9686.635 , 9883.508 , 9699.7705, 9420.492 , 8874.922 , 8719.231 , 8766.307 , 8683.228 , 8544.305 , 8793.549 , 8849.852 , 8764.271 , 8967.116 , 9133.852 , 8996.733 , 8304.699 , 7863.1914, 7879.417 , 7980.547 , 6333.364 , 5519.192 , 5242.038 , 5365.313 , 5047.3066, 5287.1494, 5410.579 , 6084.9463, 6253.743 , 6222.9976, 5943.9224, 6291.8306, 6770.8853, 6788.26 , 6710.742 , 6665.292 , 6339.411 , 5959.5957, 6294.047 , 6509.752 , 6548.7217, 6748.091 , 6810.837 , 6854.688 , 6801.7354, 7163.369 , 7242.215 , 7330.912 , 7325.197 , 6970.078 , 6827.244 , 6975.6484, 6962.3354, 6874.709 , 6720.8228, 7046.123 , 7122.7676, 7233.6885, 7200.423 , 6924.5923, 6855.8154, 7051.611 , 7497.005 , 7556.7773, 7529.1836, 7585.683 , 7747.339 , 7787.009 , 8530.646 , 8852.533 , 8889.89 , 8918.209 , 8918.396 , 8905.227 , 8937.319 , 9283.38 , 9813.696 , 9969.903 , 9701.797 , 8936.364 , 8563.64 , 8724.303 , 9254.482 , 9713.696 , 9483.283 , 9367.356 , 9585.998 , 9760.102 , 9739.926 , 9551.698 , 9204.241 , 9133.665 , 9202.079 , 9102.982 , 8908.516 , 8806.854 , 9036.312 , 9438.19 , 9484.705 , 9619.711 , 9482.682 , 10061.312 , 9750.354 , 9602.463 , 9720.47 , 9724.098 , 9680.5 , 9675.544 , 9786.042 , 9788.949 , 9837.089 , 9398.01 , 9355.39 , 9440.887 , 9402.332 , 9398.074 , 9467.065 , 9481.166 , 9395.695 , 9281.652 , 9317.174 , 9300.972 , 9581.144 , 9645.531 , 9381.216 , 9214.246 , 9158.783 , 9064.132 , 9058.662 , 9153.888 , 9164.439 , 9200.016 , 9116.636 , 9079.3955, 9100.436 , 9068.736 , 9215.672 , 9254.261 , 9417.577 , 9292.253 , 9219.013 , 9209.201 , 9270.924 , 9253.142 , 9235.816 , 9209.796 , 9139.027 , 9135.913 , 9162.723 , 9184.195 , 9165.945 , 9328.718 , 9509.221 , 9611.161 , 9578.073 , 9659.367 , 9886.473 , 10920.689 , 11117.455 , 11240.623 , 11142.498 , 11313.994 , 11701.682 , 11319.678 , 11185.884 , 11180.248 , 11579.72 , 11801.186 , 11679.086 , 11691.686 , 11668.261 , 11799.029 , 11453.717 , 11446.039 , 11688.332 , 11810.38 , 11867.688 , 11871.409 , 12277.097 , 12184.1455, 11808.047 , 11740.367 , 11632.802 , 11654.115 , 11641.29 , 11716.611 , 11456.929 , 11387.672 , 11317.536 , 11432.908 , 11506.293 , 11616.4375, 11670.207 , 11885.209 , 11566.882 , 10829.965 , 10495.851 , 10125.296 , 10149.144 , 10321.08 , 10136.389 , 10171.461 , 10293.698 , 10396.121 , 10416.16 , 10321.857 , 10578.838 , 10811.986 , 11011.717 , 10965.449 , 10915.205 , 11005.288 , 10903.886 , 10583.1875, 10470.812 , 10306.683 , 10561.194 , 10727.828 , 10752.0625, 10727.887 , 10814.707 , 10796.883 , 10731.072 , 10629.982 , 10561.65 , 10541.093 , 10623.125 , 10734.497 , 10626.674 , 10602.295 , 10820.072 , 11041.059 , 11304.902 , 11351.824 , 11587.145 , 11507.463 , 11394.204 , 11445.915 , 11392.298 , 11350.012 , 11414.252 , 11708.442 , 11924.427 , 12838.35 , 13227.715 , 13044.728 , 12996.316 , 13009.268 , 13056.212 , 13542.195 , 13432.408 , 13408.624 , 13501.606 , 13834.5205, 13804.771 , 13623.514 , 13746.213 , 14094.468 , 15171.939 , 15624.715 , 15023.148 , 15226.63 , 15336.139 , 15420.499 , 15674.178 , 16161.474 , 16391.78 , 16071.379 , 15876.004 , 16520.6 , 17485.62 , 17891.977 , 17950.715 , 18437.865 , 18653.504 , 18653.883 , 18472.152 , 18876.436 , 18860.46 , 17532.754 , 16887.86 , 17518.027 , 18184.225 , 19143.953 , 19120.512 , 19143.867 , 19355.191 , 18988.797 , 18953.152 , 19043.594 , 19138.79 , 18763.326 , 18487.143 , 18357.648 , 18167.207 , 18669.91 , 19054.184 , 19243.746 , 19376.053 , 20866.787 , 22661.082 , 23214.098 , 23747.691 , 23681.602 , 23295.291 , 23307.537 , 23264.996 , 23543.523 , 24339.664 , 26039.207 , 26578.246 , 26698.455 , 26880.615 , 28378.078 , 29224.055 , 29382.15 , 31464.473 , 33033.61 , 32064.059 , 33544.68 , 35957.75 , 39181.25 , 40475.383 , 40552.406 , 39166.594 , 35279.008 , 33885.63 , 36160.625 , 38516.574 , 37395.1 , 35880.98 , 36032.992 , 36459.37 , 36586.58 , 35280.08 , 31475.176 , 32207.078 , 32447.938 , 32412.895 , 32254.977 , 32256.316 , 30995.361 , 32479.887 , 34683.004 , 35003.863 , 33398.773 , 33212.48 , 35179.527 , 37280.684 , 37506.32 , 37676.35 , 39691.418 , 39154.89 , 43202.21 , 46325.125 , 46146.38 , 46919.914 , 47854.812 , 47512.81 , 48524.88 , 48381.45 , 48723.805 , 51284.734 , 52128.38 , 54924.555 , 55183.75 , 56702.62 , 54947.613 , 49416.992 , 47976.594 , 48124.273 , 46627.68 , 46081.348 , 45206.684 , 48267.203 , 48453.85 , 50139.547 , 48904.758 , 48797.68 , 48847.254 , 50209.16 , 51456.664 , 53803.785 , 56587.883 , 57786.133 , 57513.473 , 59832.324 , 60590.742 , 57368.785 , 56041.695 , 57880.297 , 58544.902 , 58304.68 , 58310.992 , 57961.39 , 55038.566 , 54234.633 , 53143.727 , 52259.53 , 53727.582 , 55937.477 , 55828.508 , 56921.89 , 58482.844 , 58969.355 , 58906.6 , 58738.94 , 57797.094 , 57851.277 , 58635.87 , 58317.816 , 56717.52 , 57241.008 , 58159.215 , 59150.22 , 59675.15 , 59821.586 , 62395.65 , 63224.28 , 63373.188 , 62152.445 , 60709.473 , 57533.613 , 55949.484 , 56332.777 , 54803.555 , 52205.19 , 50473.22 , 50540.72 , 49079.32 , 52139.016 , 55018.727 , 55109.867 , 53440.16 , 56035.332 , 57992.34 , 57001.152 , 56779.117 , 54358.098 , 56231.7 , 56845.254 , 57128.453 , 58244.49 , 58274.3 , 56315.215 , 55979.496 , 53115.375 , 50083.406 , 49446.57 , 48362.41 , 46060.49 , 43274.54 , 42795.914 , 40290.367 , 39354.453 , 37356.01 , 37162.293 , 34919.887 , 36893.395 , 38433.51 , 38975.24 , 38707.758 , 35654.906 , 34360.273 , 35196.938 , 36908.902 , 36713.86 , 37013.395 , 38566.18 , 37801.68 , 35384.46 , 34863.027 , 34309.74 , 33639.746 , 36094.535 , 37077.523 , 37340.305 , 35915.684 , 37968.64 , 40134.31 , 40465.836 , 38909.88 , 37630.83 , 35908.43 , 35460.816 , 35610.227 , 32616.32 , 31842.602 ], dtype=float32)>
Evaluate predictions¶
preds.shape
TensorShape([564])
test_labels.shape
(564, 1)
results = evaluate_preds(tf.squeeze(test_labels), preds)
RESULTS[model_name] = results
results
{'mae': array([658.0913], dtype=float32), 'mape': array([0.02705742], dtype=float32), 'mse': array([1412351.2], dtype=float32), 'rmse': array([1188.4238], dtype=float32)}
Did our CNN model work well?¶
result_df = pd.DataFrame(RESULTS)
result_df.applymap(_get_day1_pred).loc[['mae']].plot(figsize=(12, 7), kind='bar');
Not really an improvement :(
Model 5: LSTM¶
LSTM
layer in keras also expects the input tensor to be of the shape [batch, timesteps, num_features]
WINDOW = 7
HORIZON = 1
model_name = f'LSTM_W{WINDOW}H{HORIZON}'
tf.random.set_seed(42)
model = tf.keras.Sequential([
layers.InputLayer(input_shape=(WINDOW,)),
expand_dims_layer,
layers.LSTM(128, activation='relu'),
layers.Dense(HORIZON)
], name=model_name)
model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(), metrics=['mse'])
model.summary()
WARNING:tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. Model: "LSTM_W7H1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= lambda (Lambda) (None, 7, 1) 0 _________________________________________________________________ lstm (LSTM) (None, 128) 66560 _________________________________________________________________ dense_7 (Dense) (None, 1) 129 ================================================================= Total params: 66,689 Trainable params: 66,689 Non-trainable params: 0 _________________________________________________________________
def training_func():
return model.fit(train_windows, train_labels, validation_data=(test_windows, test_labels),
batch_size=128, epochs=100, callbacks=[create_model_checkpoint(model_name)])
model, history_dict = train_keras_model(training_func, model_name)
MODELS[model_name] = model
WARNING:tensorflow:Layer lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU. loaded model successfully! loaded history_dict successfully!
Make predictions¶
preds = make_preds(model, test_windows)
preds.shape
TensorShape([564])
Evaluate predictions¶
results = evaluate_preds(tf.squeeze(test_labels), preds)
RESULTS[model_name] = results
results
{'mae': array([763.69543], dtype=float32), 'mape': array([0.03152551], dtype=float32), 'mse': array([1799688.5], dtype=float32), 'rmse': array([1341.5247], dtype=float32)}
Nope, just blindly using LSTM did not improve our result, even over the baseline naive forecast model.
Multivariate time series prediction¶
Till now all of the models could not even beat the baseline naive forecast model which just uses the previous timestep as the next timestep forecast.
Maybe giving the model a bit more information than just the time will help explain the variation in bitcoin prices.
Almost everything, for e.g. tweets, financial news, world news could help increase the predictive power of bitcoin time series forecasting model. It is just upto us, as to how we feed this information to the model.
Here, for now, we will include one more variable, called the Bitcoin block reward size. It is the number of bitcoins one receives from mining a Bitcoin block.
Initially the bitcoin block reward size was 50. But every 4 years approx. the reward size is halved.
bitcoin_prices.head()
price | |
---|---|
date | |
2013-10-01 | 123.655 |
2013-10-02 | 125.455 |
2013-10-03 | 108.585 |
2013-10-04 | 118.675 |
2013-10-05 | 121.339 |
Bitcoin block reward and start dates of introduction:
Block Reward | Start Date |
---|---|
50 | 3 January 2009 (2009-01-03) |
25 | 28 November 2012 |
12.5 | 9 July 2016 |
6.25 | 11 May 2020 |
3.125 | TBA (expected 2024) |
1.5625 | TBA (expected 2028) |
bitcoin_price_block_df = bitcoin_prices.copy()
bitcoin_price_block_df['block_reward'] = np.nan
bitcoin_price_block_df
price | block_reward | |
---|---|---|
date | ||
2013-10-01 | 123.655 | nan |
2013-10-02 | 125.455 | nan |
2013-10-03 | 108.585 | nan |
2013-10-04 | 118.675 | nan |
2013-10-05 | 121.339 | nan |
... | ... | ... |
2021-06-20 | 35656.305 | nan |
2021-06-21 | 35681.134 | nan |
2021-06-22 | 31659.542 | nan |
2021-06-23 | 32404.330 | nan |
2021-06-24 | 33532.258 | nan |
2823 rows × 2 columns
bitcoin_price_block_df.loc['28 November 2012': '9 July 2016', 'block_reward'] = 25
bitcoin_price_block_df.loc['9 July 2016': '11 May 2020', 'block_reward'] = 12.5
bitcoin_price_block_df.loc['11 May 2020':, 'block_reward'] = 6.125
bitcoin_price_block_df
price | block_reward | |
---|---|---|
date | ||
2013-10-01 | 123.655 | 25.000 |
2013-10-02 | 125.455 | 25.000 |
2013-10-03 | 108.585 | 25.000 |
2013-10-04 | 118.675 | 25.000 |
2013-10-05 | 121.339 | 25.000 |
... | ... | ... |
2021-06-20 | 35656.305 | 6.125 |
2021-06-21 | 35681.134 | 6.125 |
2021-06-22 | 31659.542 | 6.125 |
2021-06-23 | 32404.330 | 6.125 |
2021-06-24 | 33532.258 | 6.125 |
2823 rows × 2 columns
from sklearn.preprocessing import minmax_scale
bitcoin_price_block_scaled_df = pd.DataFrame(minmax_scale(bitcoin_price_block_df),
columns=bitcoin_price_block_df.columns,
index=bitcoin_price_block_df.index)
bitcoin_price_block_scaled_df.plot(figsize=(12, 7))
<matplotlib.axes._subplots.AxesSubplot at 0x7f7fed0a5f10>
Make a windowed dataset for multivariate data¶
HORIZON = 1
WINDOW = 7
bitcoin_prices_windowed = bitcoin_price_block_df.copy()
for i in range(WINDOW):
bitcoin_prices_windowed[f'price_lag{i+1}'] = bitcoin_prices_windowed['price'].shift(periods=i+1)
bitcoin_prices_windowed.head()
price | block_reward | price_lag1 | price_lag2 | price_lag3 | price_lag4 | price_lag5 | price_lag6 | price_lag7 | |
---|---|---|---|---|---|---|---|---|---|
date | |||||||||
2013-10-01 | 123.655 | 25.000 | nan | nan | nan | nan | nan | nan | nan |
2013-10-02 | 125.455 | 25.000 | 123.655 | nan | nan | nan | nan | nan | nan |
2013-10-03 | 108.585 | 25.000 | 125.455 | 123.655 | nan | nan | nan | nan | nan |
2013-10-04 | 118.675 | 25.000 | 108.585 | 125.455 | 123.655 | nan | nan | nan | nan |
2013-10-05 | 121.339 | 25.000 | 118.675 | 108.585 | 125.455 | 123.655 | nan | nan | nan |
bitcoin_prices_windowed.columns
Index(['price', 'block_reward', 'price_lag1', 'price_lag2', 'price_lag3', 'price_lag4', 'price_lag5', 'price_lag6', 'price_lag7'], dtype='object')
The multivariate model we are going to train will be a Dense model which doesn't really care about the ordering of the sequence
So we can pass the features as
['block_reward', 'price_lag1', 'price_lag2', 'price_lag3', 'price_lag4', 'price_lag5', 'price_lag6', 'price_lag7']
Let us make the $X$ and $y$ for the same
X = bitcoin_prices_windowed.dropna().drop('price', axis=1).astype(np.float32)
y = bitcoin_prices_windowed.dropna()['price'].astype(np.float32)
X.head()
block_reward | price_lag1 | price_lag2 | price_lag3 | price_lag4 | price_lag5 | price_lag6 | price_lag7 | |
---|---|---|---|---|---|---|---|---|
date | ||||||||
2013-10-08 | 25.000 | 121.795 | 120.655 | 121.339 | 118.675 | 108.585 | 125.455 | 123.655 |
2013-10-09 | 25.000 | 123.033 | 121.795 | 120.655 | 121.339 | 118.675 | 108.585 | 125.455 |
2013-10-10 | 25.000 | 124.049 | 123.033 | 121.795 | 120.655 | 121.339 | 118.675 | 108.585 |
2013-10-11 | 25.000 | 125.961 | 124.049 | 123.033 | 121.795 | 120.655 | 121.339 | 118.675 |
2013-10-12 | 25.000 | 125.280 | 125.961 | 124.049 | 123.033 | 121.795 | 120.655 | 121.339 |
X.shape
(2816, 8)
SO we have 1 + 7 features i.e. past 7 days bitcoin prices + the block_reward
y.head()
date 2013-10-08 123.033 2013-10-09 124.049 2013-10-10 125.961 2013-10-11 125.280 2013-10-12 125.927 Name: price, dtype: float32
Now let us make the train, test splits
train_prop = 0.8
train_size = int(len(X)*train_prop)
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]
len(X_train), len(X_test)
(2252, 564)
Model 6: Dense (multivariate time series)¶
model_name = f'multivariate-dense_W{WINDOW}H{HORIZON}'
tf.random.set_seed(42)
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(X_train.shape[1],)),
layers.Dense(HORIZON)
], name=model_name)
model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(), metrics=['mse'])
model.summary()
Model: "multivariate-dense_W7H1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_8 (Dense) (None, 128) 1152 _________________________________________________________________ dense_9 (Dense) (None, 1) 129 ================================================================= Total params: 1,281 Trainable params: 1,281 Non-trainable params: 0 _________________________________________________________________
There is a better way in which we can feed the block_reward size to the model. Since there is a definite order to past 7 days bitcoin prices it must be processed by a sequence model (or using transformer models as well which encode the position without the need to sequentially process them -> hence can process a sequence in parallel!)
For the block reward size, we can learn an embedding i.e. a numerical representation which we can then concatenate with the representation of the sequence data i.e. prices and make our architecture.
def training_func():
return model.fit(X_train, y_train, validation_data=(X_test, y_test),
batch_size=128, epochs=100, callbacks=[create_model_checkpoint(model_name)])
model, history_dict = train_keras_model(training_func, model_name)
MODELS[model_name] = model
loaded model successfully! loaded history_dict successfully!
Learning Curve¶
plot_learning_curve(history_dict, extra_metric='mse');
Make predictions¶
preds = make_preds(model, X_test)
preds.shape
TensorShape([564])
Evaluate predictions¶
results = evaluate_preds(y_test, preds)
RESULTS[model_name] = results
results
{'mae': array([651.629], dtype=float32), 'mape': array([0.02681939], dtype=float32), 'mse': array([1388462.9], dtype=float32), 'rmse': array([1178.3306], dtype=float32)}
Comparing model performance¶
result_df = pd.DataFrame(RESULTS)
result_df.applymap(_get_day1_pred).loc[['mae']].plot(figsize=(14, 7), kind='bar');
Model 7: N-BEATS Algorithm¶
So far we have only tried smaller models. Now we will reimplement the architechture from the paper N-BEATS (Neural Basis Expansion Analysis for Interpretable Time Series Forecasting) algorithm
The N-BEATS model was trained on a univariate time series prediction task and achieved SOTA winning performance on the M4 competition (Kaggle).
- Replicating the model architechture
- Hyperparameters are listed in Appendix D of the N-BEATS paper
To replicate the above architechture, we will:
- Create a custom layer for the
NBeatsBlock
by subclassingtf.keras.layers.Layer
- Implement the full architechture using the
Keras
functional API
Building the N-BEATS block layer¶
class NBeatsBlock(tf.keras.layers.Layer):
def __init__(self,
input_size: int,
theta_size: int,
horizon: int,
n_neurons: int,
n_layers: int,
**kwargs):
super().__init__(**kwargs)
self.input_size = input_size
self.theta_size = theta_size
self.horizon = horizon
self.n_neurons = n_neurons
self.n_layers = n_layers
self.hidden = [tf.keras.layers.Dense(n_neurons, activation='relu') for _ in range(n_layers)]
self.theta_layer = tf.keras.layers.Dense(theta_size, activation='linear', name='theta')
def call(self, inputs):
x = inputs
for layer in self.hidden:
x = layer(x)
theta = self.theta_layer(x)
backcast, forecast = theta[:, :self.input_size], theta[:, -self.horizon:]
return backcast, forecast
def get_config(self):
attrs = ['input_size', 'theta_size', 'horizon', 'n_neurons', 'n_layers']
config = {attr: self.__getattribute__(attr) for attr in attrs}
return config
class NBeatsStack(tf.keras.layers.Layer):
def __init__(self, block, n_blocks, **kwargs):
super().__init__(**kwargs)
self.block = block
self.block_config = block.get_config()
self.n_blocks = n_blocks
def call(self, inputs):
block_config = {**self.block_config, 'name': f'{self.name}_block0'}
initial_block = NBeatsBlock.from_config(block_config)
stack_residuals, stack_forecast = initial_block(inputs)
for block_num in range(1, self.n_blocks):
block_config = {**self.block_config, 'name': f'{self.name}_block{block_num}'}
block = NBeatsBlock.from_config(block_config)
block_backcast, block_forecast = block(stack_residuals)
stack_residuals = layers.subtract([stack_residuals, block_backcast], name=f'{block.name}_subtract')
stack_forecast = layers.add([stack_forecast, block_forecast], name=f'{block.name}_add')
return stack_residuals, stack_forecast
dummy_nbeats_block_layer = NBeatsBlock(input_size=WINDOW,
theta_size=WINDOW+HORIZON,
horizon=HORIZON,
n_neurons=128,
n_layers=4)
dummy_nbeats_block_layer.trainable_weights
[]
dummy_inputs = tf.expand_dims(tf.range(WINDOW) + 1, axis=0)
dummy_inputs
<tf.Tensor: shape=(1, 7), dtype=int32, numpy=array([[1, 2, 3, 4, 5, 6, 7]], dtype=int32)>
backcast, forecast = dummy_nbeats_block_layer(dummy_inputs)
print(f'Backcast: {tf.squeeze(backcast).numpy()}')
print(f'Forecast: {tf.squeeze(forecast).numpy()}')
Backcast: [ 0.19014986 0.8379836 -0.32870018 0.2515991 -0.4754027 -0.7783665 -0.5299448 ] Forecast: -0.7554213404655457
dummy_nbeats_stack_layer = NBeatsStack(dummy_nbeats_block_layer, n_blocks=4)
dummy_nbeats_stack_layer(dummy_inputs)
(<tf.Tensor: shape=(1, 7), dtype=float32, numpy= array([[-0.49055243, -0.61159587, 0.11933358, 0.23455854, -0.0260229 , -0.78948456, 0.36006966]], dtype=float32)>, <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.22062236]], dtype=float32)>)
Preparing data for the N-Beats algorithm using tf.Data
¶
To ensure the model training runs as fast as possible, we will prepare our dataset using the tf.Data
API.
HORIZON = 1
WINDOW = 7
bitcoin_prices.head()
price | |
---|---|
date | |
2013-10-01 | 123.655 |
2013-10-02 | 125.455 |
2013-10-03 | 108.585 |
2013-10-04 | 118.675 |
2013-10-05 | 121.339 |
# Add windowed columns
bitcoin_prices_nbeats = bitcoin_prices.copy()
for i in range(WINDOW):
bitcoin_prices_nbeats[f'price_lag{i+1}'] = bitcoin_prices_nbeats['price'].shift(periods=i+1)
bitcoin_prices_nbeats.head()
price | price_lag1 | price_lag2 | price_lag3 | price_lag4 | price_lag5 | price_lag6 | price_lag7 | |
---|---|---|---|---|---|---|---|---|
date | ||||||||
2013-10-01 | 123.655 | nan | nan | nan | nan | nan | nan | nan |
2013-10-02 | 125.455 | 123.655 | nan | nan | nan | nan | nan | nan |
2013-10-03 | 108.585 | 125.455 | 123.655 | nan | nan | nan | nan | nan |
2013-10-04 | 118.675 | 108.585 | 125.455 | 123.655 | nan | nan | nan | nan |
2013-10-05 | 121.339 | 118.675 | 108.585 | 125.455 | 123.655 | nan | nan | nan |
Make $X$ and $y$
X = bitcoin_prices_nbeats.dropna().drop('price', axis=1)
y = bitcoin_prices_nbeats.dropna()['price']
X.shape
(2816, 7)
Split into train and test sets
train_prop = 0.8
train_size = int(len(X)*train_prop)
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]
len(X_train), len(X_test)
(2252, 564)
X_train.shape, y_train.shape
((2252, 7), (2252,))
Now make the tf.data.Dataset
# Turn train features and labels into tensor Datasets
train_features_tfdata = tf.data.Dataset.from_tensor_slices(X_train)
train_labels_tfdata = tf.data.Dataset.from_tensor_slices(y_train)
# Turn test features and labels into tensor Datasets
test_features_tfdata = tf.data.Dataset.from_tensor_slices(X_test)
test_labels_tfdata = tf.data.Dataset.from_tensor_slices(y_test)
# Combine features and labels into a single dataset
train_tfdata = tf.data.Dataset.zip((train_features_tfdata, train_labels_tfdata))
test_tfdata = tf.data.Dataset.zip((test_features_tfdata, test_labels_tfdata))
# Batch and prefetch
BATCH_SIZE = 1024
train_tfdata = train_tfdata.batch(BATCH_SIZE).prefetch(buffer_size=tf.data.AUTOTUNE)
test_tfdata = test_tfdata.batch(BATCH_SIZE).prefetch(buffer_size=tf.data.AUTOTUNE)
train_tfdata, test_tfdata
(<PrefetchDataset shapes: ((None, 7), (None,)), types: (tf.float64, tf.float64)>, <PrefetchDataset shapes: ((None, 7), (None,)), types: (tf.float64, tf.float64)>)
Setting up hyperparameters and extra layers for N-BEATS algorithm¶
Source: Figure 1 and Table 18/Appendix D of the N-BEATS paper
# Hyperparameters values different from the N-BEATS paper Figure 1 and Table 18/Appendix D
LOOKBACK = 7 # If I forecast to H-timesteps ahead, by how much factor should I lookback?
HORIZON = 1 # How much timesteps ahead I am interested to forecast?
N_EPOCHS = 5000
N_NEURONS = 512
N_LAYERS = 4
N_STACKS = 30
N_BLOCKS = 1
INPUT_SIZE = LOOKBACK*HORIZON
THETA_SIZE = LOOKBACK + HORIZON
INPUT_SIZE, THETA_SIZE
(7, 8)
Now that the hyperparameters are ready, we need to create one important part of the architechture -> double residual stacking (section 3.2 of the N-BEATS paper) using the tf.keras.layers.subtract
and tf.keras.layers.add
layers
# Random features
tensor_1 = tf.range(10)
tensor_2 = tf.range(10) + 10
# Subtract
subtracted = tf.keras.layers.subtract([tensor_2, tensor_1])
# Add
added = tf.keras.layers.add([tensor_2, tensor_1])
print(f'Input tensors:\n tensor_1: {tensor_1.numpy()}\n tensor_2: {tensor_2.numpy()}\n')
print(f'Subtracted: {subtracted.numpy()}')
print(f'Added: {added.numpy()}')
Input tensors: tensor_1: [0 1 2 3 4 5 6 7 8 9] tensor_2: [10 11 12 13 14 15 16 17 18 19] Subtracted: [10 10 10 10 10 10 10 10 10 10] Added: [10 12 14 16 18 20 22 24 26 28]
model_name= f'NBeatsGeneric_W{WINDOW}H{HORIZON}'
Creating the NBeats
Model¶
Here, the number of Blocks i.e. the NBeatsBlock
is 1 for each stack, with each such block having 4 Block-layers
. Now this forms part of a NBeatsStack
which are listed as being 30 in number. So as per my understanding the model architecture is as follows:
- The inputs are taken as
model_inputs
- Now this is passed to a
NBeatsStack
. - Inside the
NBeatsStack
, we have 1NBeatsBlock
i.e.Block=1
- Inside each
NBeatsBlock
we have 4 FC layers i.e.Block-layers=4
, each of which outputs a block residual (subtracted cumulatively) and a block forecast (aggregated cumulatively) - Finally the residual which is subtracted cumulatively is passed on to the next stack, while forecast which is aggregated cumulatively is aggregated for each stack (30 times)
# Set up the stack input layer
model_inputs = layers.Input(shape=INPUT_SIZE, name='model_input')
# Set up a template block
template_block = NBeatsBlock(input_size=INPUT_SIZE, theta_size=THETA_SIZE,
horizon=HORIZON, n_neurons=N_NEURONS, n_layers=N_LAYERS)
# Set up the initial stack
initial_stack = NBeatsStack(template_block, n_blocks=N_BLOCKS, name='stack0')
# Get residuals and forecast from the initial stack
residuals, forecast = initial_stack(model_inputs)
for stack_num in range(1, N_STACKS):
stack = NBeatsStack(template_block, n_blocks=N_BLOCKS, name=f'stack{stack_num}')
residuals, stack_forecast = stack(residuals)
forecast = layers.add([forecast, stack_forecast], name=f'{stack.name}_add')
model = tf.keras.Model(inputs=model_inputs, outputs=forecast, name=model_name)
model.compile(loss='mae', optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.001), metrics=['mse'])
model.summary()
Model: "NBeatsGeneric_W7H1" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== model_input (InputLayer) [(None, 7)] 0 __________________________________________________________________________________________________ stack0 (NBeatsStack) ((None, 7), (None, 1 0 model_input[0][0] __________________________________________________________________________________________________ stack1 (NBeatsStack) ((None, 7), (None, 1 0 stack0[0][0] __________________________________________________________________________________________________ stack1_add (Add) (None, 1) 0 stack0[0][1] stack1[0][1] __________________________________________________________________________________________________ stack2 (NBeatsStack) ((None, 7), (None, 1 0 stack1[0][0] __________________________________________________________________________________________________ stack2_add (Add) (None, 1) 0 stack1_add[0][0] stack2[0][1] __________________________________________________________________________________________________ stack3 (NBeatsStack) ((None, 7), (None, 1 0 stack2[0][0] __________________________________________________________________________________________________ stack3_add (Add) (None, 1) 0 stack2_add[0][0] stack3[0][1] __________________________________________________________________________________________________ stack4 (NBeatsStack) ((None, 7), (None, 1 0 stack3[0][0] __________________________________________________________________________________________________ stack4_add (Add) (None, 1) 0 stack3_add[0][0] stack4[0][1] __________________________________________________________________________________________________ stack5 (NBeatsStack) ((None, 7), (None, 1 0 stack4[0][0] __________________________________________________________________________________________________ stack5_add (Add) (None, 1) 0 stack4_add[0][0] stack5[0][1] __________________________________________________________________________________________________ stack6 (NBeatsStack) ((None, 7), (None, 1 0 stack5[0][0] __________________________________________________________________________________________________ stack6_add (Add) (None, 1) 0 stack5_add[0][0] stack6[0][1] __________________________________________________________________________________________________ stack7 (NBeatsStack) ((None, 7), (None, 1 0 stack6[0][0] __________________________________________________________________________________________________ stack7_add (Add) (None, 1) 0 stack6_add[0][0] stack7[0][1] __________________________________________________________________________________________________ stack8 (NBeatsStack) ((None, 7), (None, 1 0 stack7[0][0] __________________________________________________________________________________________________ stack8_add (Add) (None, 1) 0 stack7_add[0][0] stack8[0][1] __________________________________________________________________________________________________ stack9 (NBeatsStack) ((None, 7), (None, 1 0 stack8[0][0] __________________________________________________________________________________________________ stack9_add (Add) (None, 1) 0 stack8_add[0][0] stack9[0][1] __________________________________________________________________________________________________ stack10 (NBeatsStack) ((None, 7), (None, 1 0 stack9[0][0] __________________________________________________________________________________________________ stack10_add (Add) (None, 1) 0 stack9_add[0][0] stack10[0][1] __________________________________________________________________________________________________ stack11 (NBeatsStack) ((None, 7), (None, 1 0 stack10[0][0] __________________________________________________________________________________________________ stack11_add (Add) (None, 1) 0 stack10_add[0][0] stack11[0][1] __________________________________________________________________________________________________ stack12 (NBeatsStack) ((None, 7), (None, 1 0 stack11[0][0] __________________________________________________________________________________________________ stack12_add (Add) (None, 1) 0 stack11_add[0][0] stack12[0][1] __________________________________________________________________________________________________ stack13 (NBeatsStack) ((None, 7), (None, 1 0 stack12[0][0] __________________________________________________________________________________________________ stack13_add (Add) (None, 1) 0 stack12_add[0][0] stack13[0][1] __________________________________________________________________________________________________ stack14 (NBeatsStack) ((None, 7), (None, 1 0 stack13[0][0] __________________________________________________________________________________________________ stack14_add (Add) (None, 1) 0 stack13_add[0][0] stack14[0][1] __________________________________________________________________________________________________ stack15 (NBeatsStack) ((None, 7), (None, 1 0 stack14[0][0] __________________________________________________________________________________________________ stack15_add (Add) (None, 1) 0 stack14_add[0][0] stack15[0][1] __________________________________________________________________________________________________ stack16 (NBeatsStack) ((None, 7), (None, 1 0 stack15[0][0] __________________________________________________________________________________________________ stack16_add (Add) (None, 1) 0 stack15_add[0][0] stack16[0][1] __________________________________________________________________________________________________ stack17 (NBeatsStack) ((None, 7), (None, 1 0 stack16[0][0] __________________________________________________________________________________________________ stack17_add (Add) (None, 1) 0 stack16_add[0][0] stack17[0][1] __________________________________________________________________________________________________ stack18 (NBeatsStack) ((None, 7), (None, 1 0 stack17[0][0] __________________________________________________________________________________________________ stack18_add (Add) (None, 1) 0 stack17_add[0][0] stack18[0][1] __________________________________________________________________________________________________ stack19 (NBeatsStack) ((None, 7), (None, 1 0 stack18[0][0] __________________________________________________________________________________________________ stack19_add (Add) (None, 1) 0 stack18_add[0][0] stack19[0][1] __________________________________________________________________________________________________ stack20 (NBeatsStack) ((None, 7), (None, 1 0 stack19[0][0] __________________________________________________________________________________________________ stack20_add (Add) (None, 1) 0 stack19_add[0][0] stack20[0][1] __________________________________________________________________________________________________ stack21 (NBeatsStack) ((None, 7), (None, 1 0 stack20[0][0] __________________________________________________________________________________________________ stack21_add (Add) (None, 1) 0 stack20_add[0][0] stack21[0][1] __________________________________________________________________________________________________ stack22 (NBeatsStack) ((None, 7), (None, 1 0 stack21[0][0] __________________________________________________________________________________________________ stack22_add (Add) (None, 1) 0 stack21_add[0][0] stack22[0][1] __________________________________________________________________________________________________ stack23 (NBeatsStack) ((None, 7), (None, 1 0 stack22[0][0] __________________________________________________________________________________________________ stack23_add (Add) (None, 1) 0 stack22_add[0][0] stack23[0][1] __________________________________________________________________________________________________ stack24 (NBeatsStack) ((None, 7), (None, 1 0 stack23[0][0] __________________________________________________________________________________________________ stack24_add (Add) (None, 1) 0 stack23_add[0][0] stack24[0][1] __________________________________________________________________________________________________ stack25 (NBeatsStack) ((None, 7), (None, 1 0 stack24[0][0] __________________________________________________________________________________________________ stack25_add (Add) (None, 1) 0 stack24_add[0][0] stack25[0][1] __________________________________________________________________________________________________ stack26 (NBeatsStack) ((None, 7), (None, 1 0 stack25[0][0] __________________________________________________________________________________________________ stack26_add (Add) (None, 1) 0 stack25_add[0][0] stack26[0][1] __________________________________________________________________________________________________ stack27 (NBeatsStack) ((None, 7), (None, 1 0 stack26[0][0] __________________________________________________________________________________________________ stack27_add (Add) (None, 1) 0 stack26_add[0][0] stack27[0][1] __________________________________________________________________________________________________ stack28 (NBeatsStack) ((None, 7), (None, 1 0 stack27[0][0] __________________________________________________________________________________________________ stack28_add (Add) (None, 1) 0 stack27_add[0][0] stack28[0][1] __________________________________________________________________________________________________ stack29 (NBeatsStack) ((None, 7), (None, 1 0 stack28[0][0] __________________________________________________________________________________________________ stack29_add (Add) (None, 1) 0 stack28_add[0][0] stack29[0][1] ================================================================================================== Total params: 0 Trainable params: 0 Non-trainable params: 0 __________________________________________________________________________________________________
tf.random.set_seed(42)
# 1. Setup N-BEATS Block layer
nbeats_block_layer = NBeatsBlock(input_size=INPUT_SIZE,
theta_size=THETA_SIZE,
horizon=HORIZON,
n_neurons=N_NEURONS,
n_layers=N_LAYERS,
name="InitialBlock")
# 2. Create input to stacks
stack_input = layers.Input(shape=(INPUT_SIZE), name="stack_input")
# 3. Create initial backcast and forecast input (backwards predictions are referred to as residuals in the paper)
residuals, forecast = nbeats_block_layer(stack_input)
# 4. Create stacks of blocks
for i, _ in enumerate(range(N_STACKS-1)): # first stack is already creted in (3)
# 5. Use the NBeatsBlock to calculate the backcast as well as block forecast
backcast, block_forecast = NBeatsBlock(
input_size=INPUT_SIZE,
theta_size=THETA_SIZE,
horizon=HORIZON,
n_neurons=N_NEURONS,
n_layers=N_LAYERS,
name=f"NBeatsBlock_{i}"
)(residuals) # pass it in residuals (the backcast)
# 6. Create the double residual stacking
residuals = layers.subtract([residuals, backcast], name=f"subtract_{i}")
forecast = layers.add([forecast, block_forecast], name=f"add_{i}")
# 7. Put the stack model together
model = tf.keras.Model(inputs=stack_input,
outputs=forecast,
name=model_name)
# 8. Compile with MAE loss and Adam optimizer
model.compile(loss="mae",
optimizer=tf.keras.optimizers.Adam(0.001),
metrics=["mae", "mse"])
model.summary()
Model: "NBeatsGeneric_W7H1" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== stack_input (InputLayer) [(None, 7)] 0 __________________________________________________________________________________________________ InitialBlock (NBeatsBlock) ((None, 7), (None, 1 796168 stack_input[0][0] __________________________________________________________________________________________________ NBeatsBlock_0 (NBeatsBlock) ((None, 7), (None, 1 796168 InitialBlock[0][0] __________________________________________________________________________________________________ subtract_0 (Subtract) (None, 7) 0 InitialBlock[0][0] NBeatsBlock_0[0][0] __________________________________________________________________________________________________ NBeatsBlock_1 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_0[0][0] __________________________________________________________________________________________________ subtract_1 (Subtract) (None, 7) 0 subtract_0[0][0] NBeatsBlock_1[0][0] __________________________________________________________________________________________________ NBeatsBlock_2 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_1[0][0] __________________________________________________________________________________________________ subtract_2 (Subtract) (None, 7) 0 subtract_1[0][0] NBeatsBlock_2[0][0] __________________________________________________________________________________________________ NBeatsBlock_3 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_2[0][0] __________________________________________________________________________________________________ subtract_3 (Subtract) (None, 7) 0 subtract_2[0][0] NBeatsBlock_3[0][0] __________________________________________________________________________________________________ NBeatsBlock_4 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_3[0][0] __________________________________________________________________________________________________ subtract_4 (Subtract) (None, 7) 0 subtract_3[0][0] NBeatsBlock_4[0][0] __________________________________________________________________________________________________ NBeatsBlock_5 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_4[0][0] __________________________________________________________________________________________________ subtract_5 (Subtract) (None, 7) 0 subtract_4[0][0] NBeatsBlock_5[0][0] __________________________________________________________________________________________________ NBeatsBlock_6 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_5[0][0] __________________________________________________________________________________________________ subtract_6 (Subtract) (None, 7) 0 subtract_5[0][0] NBeatsBlock_6[0][0] __________________________________________________________________________________________________ NBeatsBlock_7 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_6[0][0] __________________________________________________________________________________________________ subtract_7 (Subtract) (None, 7) 0 subtract_6[0][0] NBeatsBlock_7[0][0] __________________________________________________________________________________________________ NBeatsBlock_8 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_7[0][0] __________________________________________________________________________________________________ subtract_8 (Subtract) (None, 7) 0 subtract_7[0][0] NBeatsBlock_8[0][0] __________________________________________________________________________________________________ NBeatsBlock_9 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_8[0][0] __________________________________________________________________________________________________ subtract_9 (Subtract) (None, 7) 0 subtract_8[0][0] NBeatsBlock_9[0][0] __________________________________________________________________________________________________ NBeatsBlock_10 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_9[0][0] __________________________________________________________________________________________________ subtract_10 (Subtract) (None, 7) 0 subtract_9[0][0] NBeatsBlock_10[0][0] __________________________________________________________________________________________________ NBeatsBlock_11 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_10[0][0] __________________________________________________________________________________________________ subtract_11 (Subtract) (None, 7) 0 subtract_10[0][0] NBeatsBlock_11[0][0] __________________________________________________________________________________________________ NBeatsBlock_12 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_11[0][0] __________________________________________________________________________________________________ subtract_12 (Subtract) (None, 7) 0 subtract_11[0][0] NBeatsBlock_12[0][0] __________________________________________________________________________________________________ NBeatsBlock_13 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_12[0][0] __________________________________________________________________________________________________ subtract_13 (Subtract) (None, 7) 0 subtract_12[0][0] NBeatsBlock_13[0][0] __________________________________________________________________________________________________ NBeatsBlock_14 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_13[0][0] __________________________________________________________________________________________________ add_0 (Add) (None, 1) 0 InitialBlock[0][1] NBeatsBlock_0[0][1] __________________________________________________________________________________________________ subtract_14 (Subtract) (None, 7) 0 subtract_13[0][0] NBeatsBlock_14[0][0] __________________________________________________________________________________________________ add_1 (Add) (None, 1) 0 add_0[0][0] NBeatsBlock_1[0][1] __________________________________________________________________________________________________ NBeatsBlock_15 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_14[0][0] __________________________________________________________________________________________________ add_2 (Add) (None, 1) 0 add_1[0][0] NBeatsBlock_2[0][1] __________________________________________________________________________________________________ subtract_15 (Subtract) (None, 7) 0 subtract_14[0][0] NBeatsBlock_15[0][0] __________________________________________________________________________________________________ add_3 (Add) (None, 1) 0 add_2[0][0] NBeatsBlock_3[0][1] __________________________________________________________________________________________________ NBeatsBlock_16 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_15[0][0] __________________________________________________________________________________________________ add_4 (Add) (None, 1) 0 add_3[0][0] NBeatsBlock_4[0][1] __________________________________________________________________________________________________ subtract_16 (Subtract) (None, 7) 0 subtract_15[0][0] NBeatsBlock_16[0][0] __________________________________________________________________________________________________ add_5 (Add) (None, 1) 0 add_4[0][0] NBeatsBlock_5[0][1] __________________________________________________________________________________________________ NBeatsBlock_17 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_16[0][0] __________________________________________________________________________________________________ add_6 (Add) (None, 1) 0 add_5[0][0] NBeatsBlock_6[0][1] __________________________________________________________________________________________________ subtract_17 (Subtract) (None, 7) 0 subtract_16[0][0] NBeatsBlock_17[0][0] __________________________________________________________________________________________________ add_7 (Add) (None, 1) 0 add_6[0][0] NBeatsBlock_7[0][1] __________________________________________________________________________________________________ NBeatsBlock_18 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_17[0][0] __________________________________________________________________________________________________ add_8 (Add) (None, 1) 0 add_7[0][0] NBeatsBlock_8[0][1] __________________________________________________________________________________________________ subtract_18 (Subtract) (None, 7) 0 subtract_17[0][0] NBeatsBlock_18[0][0] __________________________________________________________________________________________________ add_9 (Add) (None, 1) 0 add_8[0][0] NBeatsBlock_9[0][1] __________________________________________________________________________________________________ NBeatsBlock_19 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_18[0][0] __________________________________________________________________________________________________ add_10 (Add) (None, 1) 0 add_9[0][0] NBeatsBlock_10[0][1] __________________________________________________________________________________________________ subtract_19 (Subtract) (None, 7) 0 subtract_18[0][0] NBeatsBlock_19[0][0] __________________________________________________________________________________________________ add_11 (Add) (None, 1) 0 add_10[0][0] NBeatsBlock_11[0][1] __________________________________________________________________________________________________ NBeatsBlock_20 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_19[0][0] __________________________________________________________________________________________________ add_12 (Add) (None, 1) 0 add_11[0][0] NBeatsBlock_12[0][1] __________________________________________________________________________________________________ subtract_20 (Subtract) (None, 7) 0 subtract_19[0][0] NBeatsBlock_20[0][0] __________________________________________________________________________________________________ add_13 (Add) (None, 1) 0 add_12[0][0] NBeatsBlock_13[0][1] __________________________________________________________________________________________________ NBeatsBlock_21 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_20[0][0] __________________________________________________________________________________________________ add_14 (Add) (None, 1) 0 add_13[0][0] NBeatsBlock_14[0][1] __________________________________________________________________________________________________ subtract_21 (Subtract) (None, 7) 0 subtract_20[0][0] NBeatsBlock_21[0][0] __________________________________________________________________________________________________ add_15 (Add) (None, 1) 0 add_14[0][0] NBeatsBlock_15[0][1] __________________________________________________________________________________________________ NBeatsBlock_22 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_21[0][0] __________________________________________________________________________________________________ add_16 (Add) (None, 1) 0 add_15[0][0] NBeatsBlock_16[0][1] __________________________________________________________________________________________________ subtract_22 (Subtract) (None, 7) 0 subtract_21[0][0] NBeatsBlock_22[0][0] __________________________________________________________________________________________________ add_17 (Add) (None, 1) 0 add_16[0][0] NBeatsBlock_17[0][1] __________________________________________________________________________________________________ NBeatsBlock_23 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_22[0][0] __________________________________________________________________________________________________ add_18 (Add) (None, 1) 0 add_17[0][0] NBeatsBlock_18[0][1] __________________________________________________________________________________________________ subtract_23 (Subtract) (None, 7) 0 subtract_22[0][0] NBeatsBlock_23[0][0] __________________________________________________________________________________________________ add_19 (Add) (None, 1) 0 add_18[0][0] NBeatsBlock_19[0][1] __________________________________________________________________________________________________ NBeatsBlock_24 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_23[0][0] __________________________________________________________________________________________________ add_20 (Add) (None, 1) 0 add_19[0][0] NBeatsBlock_20[0][1] __________________________________________________________________________________________________ subtract_24 (Subtract) (None, 7) 0 subtract_23[0][0] NBeatsBlock_24[0][0] __________________________________________________________________________________________________ add_21 (Add) (None, 1) 0 add_20[0][0] NBeatsBlock_21[0][1] __________________________________________________________________________________________________ NBeatsBlock_25 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_24[0][0] __________________________________________________________________________________________________ add_22 (Add) (None, 1) 0 add_21[0][0] NBeatsBlock_22[0][1] __________________________________________________________________________________________________ subtract_25 (Subtract) (None, 7) 0 subtract_24[0][0] NBeatsBlock_25[0][0] __________________________________________________________________________________________________ add_23 (Add) (None, 1) 0 add_22[0][0] NBeatsBlock_23[0][1] __________________________________________________________________________________________________ NBeatsBlock_26 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_25[0][0] __________________________________________________________________________________________________ add_24 (Add) (None, 1) 0 add_23[0][0] NBeatsBlock_24[0][1] __________________________________________________________________________________________________ subtract_26 (Subtract) (None, 7) 0 subtract_25[0][0] NBeatsBlock_26[0][0] __________________________________________________________________________________________________ add_25 (Add) (None, 1) 0 add_24[0][0] NBeatsBlock_25[0][1] __________________________________________________________________________________________________ NBeatsBlock_27 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_26[0][0] __________________________________________________________________________________________________ add_26 (Add) (None, 1) 0 add_25[0][0] NBeatsBlock_26[0][1] __________________________________________________________________________________________________ subtract_27 (Subtract) (None, 7) 0 subtract_26[0][0] NBeatsBlock_27[0][0] __________________________________________________________________________________________________ add_27 (Add) (None, 1) 0 add_26[0][0] NBeatsBlock_27[0][1] __________________________________________________________________________________________________ NBeatsBlock_28 (NBeatsBlock) ((None, 7), (None, 1 796168 subtract_27[0][0] __________________________________________________________________________________________________ add_28 (Add) (None, 1) 0 add_27[0][0] NBeatsBlock_28[0][1] ================================================================================================== Total params: 23,885,040 Trainable params: 23,885,040 Non-trainable params: 0 __________________________________________________________________________________________________
Error: Solved here: https://stackoverflow.com/questions/58352326/running-the-tensorflow-2-0-code-gives-valueerror-tf-function-decorated-functio. I don't know how, but this solves it.
tf.config.run_functions_eagerly(True)
%%time
history = model.fit(train_tfdata, epochs=N_EPOCHS, validation_data=test_tfdata, verbose=0,
callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=200, restore_best_weights=True),
tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', patience=100, verbose=1)])
Epoch 01020: ReduceLROnPlateau reducing learning rate to 0.00010000000474974513. Epoch 01445: ReduceLROnPlateau reducing learning rate to 1.0000000474974514e-05. Epoch 01545: ReduceLROnPlateau reducing learning rate to 1.0000000656873453e-06. CPU times: user 22min 52s, sys: 16.4 s, total: 23min 8s Wall time: 22min 59s
MODELS[model_name] = model
Learning Curve¶
plot_learning_curve(model, extra_metric='mse', include_validation=True);
Why is there a spike in the learning curve? Hmm...
Way too heavily overfitted and noisy!
Make predictions¶
preds = make_preds(model, test_tfdata)
preds.shape
TensorShape([564])
Evaluate predictions¶
results = evaluate_preds(y_test, preds)
RESULTS[model_name] = results
results
{'mae': array([657.32544], dtype=float32), 'mape': array([0.02764818], dtype=float32), 'mse': array([1393407.6], dtype=float32), 'rmse': array([1180.4269], dtype=float32)}
from tensorflow.keras.utils import plot_model as plot_keras_model
plot_keras_model(model, to_file='/tmp/model.png')
Model 8: Creating an ensemble (stacking different models together)¶
- Ensembles which are uncorrelated (i.e. they are somewhat different learners), when combined often yield better result. This is the "decision of the crowd effect".
- Ensembles win Kaggle competitions
In the N-BEATS paper, they trained an ensemble of models (180 in total, see section 3.4) to achieve the results they did using a combination of:
- Different loss functions (sMAPE, MASE and MAPE)
- Different window sizes (2 x horizon, 3 x horizon, 4 x horizon...)
But but but, sometimes due to the models being so big, and always being overfitted, due to the plethora of local minima present in the loss functions, even deep learning models with the same architechture with different random initilizations to their weights can produce different results
* This is a trick used by [Chris Deotte](https://www.kaggle.com/cdeotte) by setting different random seeds
We will create our ensemble as follows:
- Using different loss functions (MAE, MSE, MAPE)
- Randomly initializing the models
- Using a random normal distribution He normal initialization
base_model = tf.keras.Sequential([
layers.Dense(128, kernel_initializer='he_normal', activation='relu'),
layers.Dense(128, kernel_initializer='he_normal', activation='relu'),
layers.Dense(HORIZON)
], name='base_model')
def get_ensemble_models(base_model,train_data=train_tfdata, test_data=test_tfdata,
n_models=10, n_epochs=100, loss_funcs=['mae', 'mse', 'mape']):
ensemble_models = []
ensemble_predictions = []
ensemble_results = []
y_test = list(map(lambda x: x[1], test_data)) # Get target from PrefetchDataset
for loss in loss_funcs:
for i in range(1, n_models+1):
model = tf.keras.models.clone_model(base_model)
model._name = base_model.name + f'_{loss}{i}'
model.compile(loss=loss, optimizer=tf.keras.optimizers.Adam(),
metrics=['mae', 'mse'])
print(f'Training model with loss: {loss}, variant: {i}', end='\t')
model.fit(train_data, validation_data=test_data, epochs=n_epochs, verbose=0,
callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=200, restore_best_weights=True),
tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', patience=100, verbose=0)])
preds = make_preds(model, test_data)
results = evaluate_preds(tf.squeeze(y_test), tf.squeeze(preds))
s = []
for metric, val in results.items():
s.append(f'{metric} = {float(val):.2f}')
print(', '.join(s))
ensemble_models.append(model)
ensemble_predictions.append(preds)
ensemble_results.append(results)
return ensemble_models, ensemble_predictions, ensemble_results
%%time
ensemble_models, ensemble_predictions, ensemble_results = get_ensemble_models(base_model, n_models=10, n_epochs=1000)
Training model with loss: mae, variant: 1 mae = 670.35, mse = 1435482.25, rmse = 1198.12, mape = 0.03 Training model with loss: mae, variant: 2 mae = 671.43, mse = 1465856.62, rmse = 1210.73, mape = 0.03 Training model with loss: mae, variant: 3 mae = 661.25, mse = 1445027.75, rmse = 1202.09, mape = 0.03 Training model with loss: mae, variant: 4 mae = 702.21, mse = 1536612.00, rmse = 1239.60, mape = 0.03 Training model with loss: mae, variant: 5 mae = 652.07, mse = 1424520.50, rmse = 1193.53, mape = 0.03 Training model with loss: mae, variant: 6 mae = 667.51, mse = 1452609.50, rmse = 1205.24, mape = 0.03 Training model with loss: mae, variant: 7 mae = 662.11, mse = 1429428.12, rmse = 1195.59, mape = 0.03 Training model with loss: mae, variant: 8 mae = 654.03, mse = 1364435.12, rmse = 1168.09, mape = 0.03 Training model with loss: mae, variant: 9 mae = 685.72, mse = 1528113.50, rmse = 1236.17, mape = 0.03 Training model with loss: mae, variant: 10 mae = 661.16, mse = 1421004.50, rmse = 1192.06, mape = 0.03 Training model with loss: mse, variant: 1 mae = 656.91, mse = 1419966.62, rmse = 1191.62, mape = 0.03 Training model with loss: mse, variant: 2 mae = 681.11, mse = 1476751.88, rmse = 1215.22, mape = 0.03 Training model with loss: mse, variant: 3 mae = 657.55, mse = 1386588.25, rmse = 1177.53, mape = 0.03 Training model with loss: mse, variant: 4 mae = 666.25, mse = 1391233.62, rmse = 1179.51, mape = 0.03 Training model with loss: mse, variant: 5 mae = 663.43, mse = 1423982.12, rmse = 1193.31, mape = 0.03 Training model with loss: mse, variant: 6 mae = 663.20, mse = 1425852.50, rmse = 1194.09, mape = 0.03 Training model with loss: mse, variant: 7 mae = 656.18, mse = 1415798.12, rmse = 1189.87, mape = 0.03 Training model with loss: mse, variant: 8 mae = 666.21, mse = 1425088.50, rmse = 1193.77, mape = 0.03 Training model with loss: mse, variant: 9 mae = 654.67, mse = 1381196.88, rmse = 1175.24, mape = 0.03 Training model with loss: mse, variant: 10 mae = 663.06, mse = 1405337.12, rmse = 1185.47, mape = 0.03 Training model with loss: mape, variant: 1 mae = 648.79, mse = 1380018.12, rmse = 1174.74, mape = 0.03 Training model with loss: mape, variant: 2 mae = 647.48, mse = 1388015.75, rmse = 1178.14, mape = 0.03 Training model with loss: mape, variant: 3 mae = 649.64, mse = 1385607.00, rmse = 1177.12, mape = 0.03 Training model with loss: mape, variant: 4 mae = 651.81, mse = 1373586.50, rmse = 1172.00, mape = 0.03 Training model with loss: mape, variant: 5 mae = 642.23, mse = 1359732.12, rmse = 1166.08, mape = 0.03 Training model with loss: mape, variant: 6 mae = 654.79, mse = 1384865.75, rmse = 1176.80, mape = 0.03 Training model with loss: mape, variant: 7 mae = 647.65, mse = 1361251.38, rmse = 1166.73, mape = 0.03 Training model with loss: mape, variant: 8 mae = 652.77, mse = 1394521.25, rmse = 1180.90, mape = 0.03 Training model with loss: mape, variant: 9 mae = 653.88, mse = 1381597.00, rmse = 1175.41, mape = 0.03 Training model with loss: mape, variant: 10 mae = 643.60, mse = 1344850.25, rmse = 1159.68, mape = 0.03 CPU times: user 29min 4s, sys: 1min 28s, total: 30min 33s Wall time: 28min 47s
Ensemble mean¶
ensemble_mean_results = evaluate_preds(y_test, np.mean(ensemble_predictions, axis=0))
ensemble_mean_results
{'mae': array([644.93396], dtype=float32), 'mape': array([0.02686146], dtype=float32), 'mse': array([1369131.6], dtype=float32), 'rmse': array([1170.099], dtype=float32)}
Ensemble median¶
ensemble_median_results = evaluate_preds(y_test, np.median(ensemble_predictions, axis=0))
ensemble_median_results
{'mae': array([643.4394], dtype=float32), 'mape': array([0.02665836], dtype=float32), 'mse': array([1370662.8], dtype=float32), 'rmse': array([1170.753], dtype=float32)}
RESULTS['ensemble-mean_W7H1'] = ensemble_mean_results
RESULTS['ensemble-median_W7H1'] = ensemble_median_results
Plotting the prediction interval (quantifying ensemble uncertainty)¶
- We are assuming the distribution of the ensemble predictions to be Gaussian. This may or may not be true (likely not true).
import scipy
1.959963984540054
def get_confidence_interval(preds, conf=0.95):
z_critical = scipy.stats.norm.ppf(conf + (1-conf)/2)
std = tf.math.reduce_std(preds, axis=0)
interval = z_critical*std
preds_mean = tf.reduce_mean(preds, axis=0)
lower, upper = preds_mean - interval, preds_mean + interval
return preds_mean, lower, upper
preds_mean, lower, upper = get_confidence_interval(ensemble_predictions)
X_test.index
DatetimeIndex(['2019-12-09', '2019-12-10', '2019-12-11', '2019-12-12', '2019-12-13', '2019-12-14', '2019-12-15', '2019-12-16', '2019-12-17', '2019-12-18', ... '2021-06-15', '2021-06-16', '2021-06-17', '2021-06-18', '2021-06-19', '2021-06-20', '2021-06-21', '2021-06-22', '2021-06-23', '2021-06-24'], dtype='datetime64[ns]', name='date', length=564, freq=None)
plt.figure(figsize=(12, 7))
offset = 500
plt.plot(X_test.index[offset:], y_test[offset:], 'g', label='test data')
plt.plot(X_test.index[offset:], preds_mean[offset:], 'k-', label='ensemble')
plt.fill_between(X_test.index[offset:],
lower[offset:],
upper[offset:], label='prediction interval')
plt.legend(fontsize=20)
plt.xlabel('Date')
plt.ylabel('BTC Price');
Oddly, it seems like model is lagging a bit in its predictions. The Daniel Bourke notebook mentions this maybe due to overfitting, but why would overfitting cause a lag in the predictions?
Model 9: Train a Dense model on the whole historical data to predict Bitcoin prices¶
After we make our artificial splits and find the best model, the final step is always to retrain the model on the whole data.
Note: Forecasting models need to be trained everytime a new prediction comes (maybe through retraining or online learning to adjust the weights). But care must always be taken to not spoil the model's weights just because of one highly unusual prediction (i.e. anomaly).
bitcoin_prices_windowed.head()
price | block_reward | price_lag1 | price_lag2 | price_lag3 | price_lag4 | price_lag5 | price_lag6 | price_lag7 | |
---|---|---|---|---|---|---|---|---|---|
date | |||||||||
2013-10-01 | 123.655 | 25.000 | nan | nan | nan | nan | nan | nan | nan |
2013-10-02 | 125.455 | 25.000 | 123.655 | nan | nan | nan | nan | nan | nan |
2013-10-03 | 108.585 | 25.000 | 125.455 | 123.655 | nan | nan | nan | nan | nan |
2013-10-04 | 118.675 | 25.000 | 108.585 | 125.455 | 123.655 | nan | nan | nan | nan |
2013-10-05 | 121.339 | 25.000 | 118.675 | 108.585 | 125.455 | 123.655 | nan | nan | nan |
X_all = bitcoin_prices_windowed.drop(['price', 'block_reward'], axis=1).dropna().to_numpy()
y_all = bitcoin_prices_windowed['price'].to_numpy()
Great we have the $X$ and the $y$ now. Now we will prepare a dataset using the tf.data.Dataset
API
# Make feature and label tfdata
features_tfdata = tf.data.Dataset.from_tensor_slices(X_all)
labels_tfdata = tf.data.Dataset.from_tensor_slices(y_all)
# Zip them together
dataset_all = tf.data.Dataset.zip((features_tfdata, labels_tfdata))
# Batch and prefetch for optimal performance
BATCH_SIZE = 1024 # See Appendix D of N-Beats paper
dataset_all = dataset_all.batch(BATCH_SIZE).prefetch(buffer_size=tf.data.AUTOTUNE)
dataset_all
<PrefetchDataset shapes: ((None, 7), (None,)), types: (tf.float64, tf.float64)>
model_name = 'dense-all-data_W7H1'
WINDOW = 7
HORIZON = 1
tf.random.set_seed(42)
model = tf.keras.Sequential([
layers.Dense(128, activation='relu', input_shape=(WINDOW,)),
layers.Dense(128, activation='relu'),
layers.Dense(HORIZON)
], name=model_name)
model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(), metrics=['mse'])
model.summary()
Model: "dense-all-data_W7H1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_291704 (Dense) (None, 128) 1024 _________________________________________________________________ dense_291705 (Dense) (None, 128) 16512 _________________________________________________________________ dense_291706 (Dense) (None, 1) 129 ================================================================= Total params: 17,665 Trainable params: 17,665 Non-trainable params: 0 _________________________________________________________________
history = model.fit(dataset_all, epochs=100, verbose=0)
Learning Curve¶
plot_learning_curve(history, extra_metric='mse');
Predicting into the future¶
We will predict 14 days ahead into the future with our model with WINDOW=7
, and HORIZON=1
N_AHEAD = 14
def forecast_ahead_recursively(input_values, n_ahead, window_size, model):
forecast = []
last_window = input_values[-window_size:]
for _ in range(n_ahead):
pred = model.predict(tf.expand_dims(last_window, axis=0))
forecast.append(tf.squeeze(pred).numpy())
print(f'Predicting on window: {last_window} -> {tf.squeeze(pred).numpy()}\n')
last_window = np.append(last_window, pred)[-window_size:]
return forecast
forecasts = forecast_ahead_recursively(y_all, n_ahead=N_AHEAD, window_size=7, model=model)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py:3704: UserWarning: Even though the `tf.config.experimental_run_functions_eagerly` option is set, this option does not apply to tf.data functions. To force eager execution of tf.data functions, please use `tf.data.experimental.enable.debug_mode()`. "Even though the `tf.config.experimental_run_functions_eagerly` "
Predicting on window: [37722.02018463 35520.45103404 35656.30473708 35681.13395636 31659.54172976 32404.33035622 33532.25784581] -> 32636.619140625 Predicting on window: [35520.45103404 35656.30473708 35681.13395636 31659.54172976 32404.33035622 33532.25784581 32636.61914062] -> 32201.40625 Predicting on window: [35656.30473708 35681.13395636 31659.54172976 32404.33035622 33532.25784581 32636.61914062 32201.40625 ] -> 32116.45703125 Predicting on window: [35681.13395636 31659.54172976 32404.33035622 33532.25784581 32636.61914062 32201.40625 32116.45703125] -> 32228.9921875 Predicting on window: [31659.54172976 32404.33035622 33532.25784581 32636.61914062 32201.40625 32116.45703125 32228.9921875 ] -> 31990.830078125 Predicting on window: [32404.33035622 33532.25784581 32636.61914062 32201.40625 32116.45703125 32228.9921875 31990.83007812] -> 31747.494140625 Predicting on window: [33532.25784581 32636.61914062 32201.40625 32116.45703125 32228.9921875 31990.83007812 31747.49414062] -> 31682.65234375 Predicting on window: [32636.61914062 32201.40625 32116.45703125 32228.9921875 31990.83007812 31747.49414062 31682.65234375] -> 31590.365234375 Predicting on window: [32201.40625 32116.45703125 32228.9921875 31990.83007812 31747.49414062 31682.65234375 31590.36523438] -> 31450.38671875 Predicting on window: [32116.45703125 32228.9921875 31990.83007812 31747.49414062 31682.65234375 31590.36523438 31450.38671875] -> 31308.4765625 Predicting on window: [32228.9921875 31990.83007812 31747.49414062 31682.65234375 31590.36523438 31450.38671875 31308.4765625 ] -> 31202.412109375 Predicting on window: [31990.83007812 31747.49414062 31682.65234375 31590.36523438 31450.38671875 31308.4765625 31202.41210938] -> 31093.3046875 Predicting on window: [31747.49414062 31682.65234375 31590.36523438 31450.38671875 31308.4765625 31202.41210938 31093.3046875 ] -> 30968.29296875 Predicting on window: [31682.65234375 31590.36523438 31450.38671875 31308.4765625 31202.41210938 31093.3046875 30968.29296875] -> 30848.626953125
Plot future forecasts¶
def get_future_dates(start_date, n_ahead, offset=1):
start_date = start_date + np.timedelta64(offset, 'D')
end_date = start_date + np.timedelta64(n_ahead, 'D')
return np.arange(start_date, end_date, dtype='datetime64[D]')
last_timestep = bitcoin_prices.index[-1]
last_timestep
Timestamp('2021-06-24 00:00:00')
forecast_timesteps = get_future_dates(start_date=last_timestep, n_ahead=N_AHEAD)
forecast_timesteps
array(['2021-06-25', '2021-06-26', '2021-06-27', '2021-06-28', '2021-06-29', '2021-06-30', '2021-07-01', '2021-07-02', '2021-07-03', '2021-07-04', '2021-07-05', '2021-07-06', '2021-07-07', '2021-07-08'], dtype='datetime64[D]')
Incorporate the last bitcoin prize into the forecast so that the plot doesn't break in the transition
forecast_timesteps = np.insert(forecast_timesteps, 0, last_timestep)
forecasts = np.insert(forecasts, 0, bitcoin_prices['price'][-1])
plt.figure(figsize=(12, 7))
plot_bitcoin(bitcoin_prices.index, bitcoin_prices['price'], start=2700, linetype='-', label='BTC Price (historical)')
plot_bitcoin(forecast_timesteps, forecasts, linetype='-', label='BTC Price (forecasted)');
plt.xticks(rotation=45);
Building a turkey model and why forecasting in an open system is BS¶
Introducing a Turkey problem to our BTC data (price BTC falls 100x in one day)
btc_price_turkey = bitcoin_prices['price'].copy()
btc_price_turkey[-1] = btc_price_turkey[-1] / 100
# Get the timesteps for the turkey problem
btc_timesteps_turkey = np.array(bitcoin_prices.index)
btc_timesteps_turkey[-10:]
array(['2021-06-15T00:00:00.000000000', '2021-06-16T00:00:00.000000000', '2021-06-17T00:00:00.000000000', '2021-06-18T00:00:00.000000000', '2021-06-19T00:00:00.000000000', '2021-06-20T00:00:00.000000000', '2021-06-21T00:00:00.000000000', '2021-06-22T00:00:00.000000000', '2021-06-23T00:00:00.000000000', '2021-06-24T00:00:00.000000000'], dtype='datetime64[ns]')
Let's see how it looks
plt.figure(figsize=(10, 7))
plot_bitcoin(tsteps=btc_timesteps_turkey,
prices=btc_price_turkey,
linetype="-",
label="BTC Price + Turkey Problem",
start=2500)
Create train and test sets¶
WINDOW = 7
HORIZON = 1
# Create train and test sets for turkey problem data
full_windows, full_labels = make_window_horizon_pairs(np.array(btc_price_turkey), window=WINDOW, horizon=HORIZON)
len(full_windows), len(full_labels)
X_train, y_train, X_test, y_test = make_train_test_splits(full_windows, full_labels)
len(X_train), len(X_test)
(2252, 564)
model_name = 'turkey-model_W7H1'
turkey_model = tf.keras.models.clone_model(MODELS['simple-dense_W7H1'])
turkey_model._name = model_name
turkey_model.compile(loss='mae', optimizer=tf.keras.optimizers.Adam(), metrics=['mse'])
turkey_model.fit(X_train, y_train, epochs=100, verbose=0, validation_data=(X_test, y_test), callbacks=[create_model_checkpoint(model_name)])
/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py:3704: UserWarning: Even though the `tf.config.experimental_run_functions_eagerly` option is set, this option does not apply to tf.data functions. To force eager execution of tf.data functions, please use `tf.data.experimental.enable.debug_mode()`. "Even though the `tf.config.experimental_run_functions_eagerly` "
INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets INFO:tensorflow:Assets written to: ../checkpoints/bitcoin_time_series_prediction/turkey-model_W7H1/assets
<tensorflow.python.keras.callbacks.History at 0x7f7febe11990>
Make predictions¶
preds = make_preds(model, X_test)
preds.shape
/usr/local/lib/python3.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py:3704: UserWarning: Even though the `tf.config.experimental_run_functions_eagerly` option is set, this option does not apply to tf.data functions. To force eager execution of tf.data functions, please use `tf.data.experimental.enable.debug_mode()`. "Even though the `tf.config.experimental_run_functions_eagerly` "
TensorShape([564])
Evaluate predictions¶
results = evaluate_preds(tf.squeeze(y_test), tf.squeeze(preds))
RESULTS[model_name] = results
results
{'mae': array([780.02155], dtype=float32), 'mape': array([0.19891468], dtype=float32), 'mse': array([3393848.], dtype=float32), 'rmse': array([1842.24], dtype=float32)}
plt.figure(figsize=(12, 7))
offset=300
plot_bitcoin(tsteps=btc_timesteps_turkey[-len(X_test):],
prices=btc_price_turkey[-len(y_test):],
linetype="-",
label="test data", start=offset)
plot_bitcoin(tsteps=btc_timesteps_turkey[-len(X_test):],
prices=preds,
label="prediction",
start=offset);
plt.title('Turkey Problem - Forecasting', fontdict=dict(weight='bold', size=20));
Beautiful image from mrdbourke
illustrating the concept of Turkey problem | Black Swan Event¶
Compare model predictions¶
result_df = pd.DataFrame(RESULTS)
result_df.applymap(_get_day1_pred).loc[['mae']].plot(figsize=(18, 7), kind='bar');
plt.title('Bitcoin time series forecasting', fontdict=dict(size=20, weight='bold'));