Improving and evaluating prediction intervals

Using artificial intelligence and statistical methods for prediction the future is very helpful, but not always easy to understand. The ways that models decide on a prediction can be difficult to comprehend for humans. Therefore, it might be difficult to trust a model to make the right decisions, especially if the model is used for operating the electrical grid.

An inevitable fact regarding all models, is that they are always uncertain, to some extent. Prediction intervals answer the important question:

How certain or uncertain is the model?

Imagine a model trying to predict the electricity consumption for some geographical region. Here is an example of a table of predictions, using made up data.

Datetime	Model predictions	Observations
06.07.2024 00:00	4869	5023
06.07.2024 00:15	4956	4938
06.07.2024 00:30	5001	5049
06.07.2024 00:45	5016	5056
…..	…..	…..

One way of evaluating how uncertain the model is, can be to plot the prediction errors (the predictions minus the observations) in a histogram.

Figure 1: Normally distributed prediction errors centered at 0.

A histogram like this can be very informative and tell us a lot about the model. By the looks of it, the prediction errors in this case are normally distributed, with a mean of 0. If the histogram instead looked like this

Figure 2: Normally distributed prediction errors centered at 30.

we would know that the model’s predictions are too high in general, because the histogram is centred at 30. Or if the shape was nothing like a normal distribution, it could indicate model was missing some information.

Assuming that our prediction errors are normally distributed, as in Figure 1, we could calculate prediction intervals for the model predictions. Depending on the use case, we can use either one-sided or two-sided intervals. If it is a one-sided interval, we call the values percentiles. For example, for the 95th percentile, it is expected that 95 % of observations will fall under that value, and the remaining 5 % will be over.

Calculating percentiles

In order to calculate the percentiles, we first get the margin of error and then add that to the predictions.

Python

import numpy as np
from scipy.stats import norm
import pandas as pd
from sklearn import linear_model

## Making mock data and a simple linear regression modell using the last timestep to predict the next. 
observations = pd.DataFrame(np.sin(np.linspace(0, 100, 3000)) + np.random.normal(
    loc=0, scale=np.random.uniform(low=1, high=5), size=3000))

model = linear_model.LinearRegression()
model.fit(y=observations[1:2000], X=observations[:2000-1])

predictions = model.predict(observations[2000:])
observations = observations[2000:]

## Calcultating the percentiles
q = 0.05
prediction_errors = predictions - observations
std_errors = prediction_errors.std()
margin_of_error = norm(0, std_errors).ppf(q)

import numpy as np
from scipy.stats import norm
import pandas as pd
from sklearn import linear_model

## Making mock data and a simple linear regression modell using the last timestep to predict the next. 
observations = pd.DataFrame(np.sin(np.linspace(0, 100, 3000)) + np.random.normal(
    loc=0, scale=np.random.uniform(low=1, high=5), size=3000))

model = linear_model.LinearRegression()
model.fit(y=observations[1:2000], X=observations[:2000-1])

predictions = model.predict(observations[2000:])
observations = observations[2000:]

## Calcultating the percentiles
q = 0.05
prediction_errors = predictions - observations
std_errors = prediction_errors.std()
margin_of_error = norm(0, std_errors).ppf(q)

In this particular case, if we assume the standard deviation is 20, the margin of error for the 5th percentile is -32. The 95th percentile can be calculated with the same method. Because the normal distribution is symmetrical, the 95th percentile margin of error is 32. Adding that to the predictions, results in the following table. The 95th and 5th percentile are denoted by P95 and P05 respectively.

Datetime	Percentile	Prediction
06.07.2024 00:00	P05	4837
06.07.2024 00:00	P95	4901
06.07.2024 00:00		4869
06.07.2024 00:15	P05	4924
06.07.2024 00:15	P95	4988
06.07.2024 00:15		4956
….	……	……

Assuming t-distribution instead

In general, we expect prediction errors to be normally distributed. However, sometimes that assumption might not hold, as was the case in one of our projects in Statnett.

We were given the task of evaluating and potentially improving the existing way of calculating prediction intervals, which was essentially the process described above. After some analysis, it seemed that assuming the prediction errors instead follow a t-distribution could yield better results.

In short, the t-distribution is more flexible than the normal distribution and tends towards a normal distribution as the degrees of freedom tend toward infinity. Here is an example of how the data seemed to fit the (scaled) t-distribution better than the normal distribution, especially in the tails and peak.

Figure 3: Histogram of prediction errors plotted with a t-distribution and a normal distribution

One way of fitting the prediction errors to the scaled and shifted t-distribution, is to fit the probability density function (pdf) to a histogram of the errors. In this project, we fitted the cummulative distribution function (cdf) curve to the errors instead. The reason for using curve_fit with the cdf instead, is that the histograms introduce an extra parameter with the bin size and are more sensitive to outliers.

A simplified version of the implemented code is

Python

def fit_t_dist_cdf_curve(difference):
    difference = np.sort(difference)
    y_data = np.linspace(0.0, 1.0, len(difference))
    p_optimal, *_ = curve_fit(t.cdf, xdata=difference, ydata=y_data,
                              p0=[1, 1, 1]))
    degrees_of_freedom, loc_value, scale_value = p_optimal
    return degrees_of_freedom, loc_value, scale_value

q = 0.05
prediction_errors = [predictions-observations]
parameters = fit_t_dist_cdf_curve(prediction_errors)

margin_of_error = stats.t.ppf(q, *parameters)

def fit_t_dist_cdf_curve(difference):
    difference = np.sort(difference)
    y_data = np.linspace(0.0, 1.0, len(difference))
    p_optimal, *_ = curve_fit(t.cdf, xdata=difference, ydata=y_data,
                              p0=[1, 1, 1]))
    degrees_of_freedom, loc_value, scale_value = p_optimal
    return degrees_of_freedom, loc_value, scale_value

q = 0.05
prediction_errors = [predictions-observations]
parameters = fit_t_dist_cdf_curve(prediction_errors)

margin_of_error = stats.t.ppf(q, *parameters)

Note that the data is fitted to a t-distribution cdf with parameters df (degrees of freedom), loc (shifting) and scale (scaling). However, it is not a noncentral t-distribution being used, but (the cdf of) a scaled and shifted version of the standard t-distribution. The documentation states that ” Specifically, t.pdf(x, df, loc, scale) is identically equivalent to t.pdf(y, df) / scale with y = (x – loc) / scale”.

Evaluating the prediction intervals

In our project it was necessary to evaluate the different prediction intervals in order to see which one performed the best. We chose to do so using heatmaps.

The model we chose to use for comparing the prediction intervals, is a short-term consumption model, predicting the electrical consumption in the five different bid zones in Norway NO1-NO5. For each bid zone, the model produces a forecast every 5 minutes with a 24 timesteps of 5 minutes giving a horizon of 2 hours. This is displayed in a table with 24 columns, where each column represents a timestep of 5 minutes, and every row is a new forecast.

Datetime	1	2	3	…	24
06.07.2024 00:00	4869	4957	4935	…	5370
06.07.2024 00:15	4956	4938	4990	…	5377
06.07.2024 00:30	5001	5049	5034	…	5236
06.07.2024 00:45	5016	5056	5092	…	5045
…..	…..	…..	…..	…	…..

Adding in the prediction intervals gives the following table for one bid zone.

Datetime	Percentile	1	2	3	…	24
06.07.2024 00:00	P05	4837	4922	4892	…	5304
06.07.2024 00:00	P25	4855	4940	4914	…	5330
06.07.2024 00:00	P75	4883	4974	4956	…	5410
06.07.2024 00:00	P95	4901	4992	4978	…	5436
06.07.2024 00:00		4869	4957	4935	…	5370
06.07.2024 00:15	P05	4924	4903	4947	…	5311
06.07.2024 00:15	P25	4942	4923	4969	…	5337
06.07.2024 00:15	P75	4970	4952	5011	…	5317
06.07.2024 00:15	P95	4988	4973	5033	…	5443
06.07.2024 00:15		4956	4938	4990	…	5377
….	…..	…..	…..	…..	…	…..

Using these tables, and a given test period, we counted how many of the observations fell below its target percentile. That resulted in a new table, where for all rows with PO5, the target value is 0.05, for all rows with P25 the target value is 0.25, etc.

Bid zone	Percentile	1	2	3	…	24
NO1	P05	0.057	0.058	0.059	…	0.075
NO1	P25	0.225	0.224	0.226	…	0.207
NO1	P75	0.777	0.781	0.782	…	0.817
NO1	P95	0.959	0.961	0.962	…	0.985
NO2	P05	0.041	0.039	0.038	…	0.046
NO2	P25	0.219	0.219	0.226	…	0.229
….	…..	…..	…..	…..	…	…..
NO5	P95	0.937	0.937	0.938	…	0.935

Using the known target values, we made a heatmap of the absolute value of the difference between the percentile scores and the target values. Therefore, the lower and more blue the value, the better the result. Figure 4 is the heat map when the percentiles were calculated assuming the prediction errors were normally distributed, and figure 5 is the heat map calculated assuming t-distribution.

Figure 4: Heat map of prediction interval scores for a short-term consumption model, using normal distribution as an assumption

Figure 5: Heat map of prediction interval scores for the same model, using t-distribution as an assumption

Visually, it is clear that for this given model and test period, the method using the shifted and scaled t-distribution yielded the best results.

Summary

Prediction intervals are important because they indicate how certain or uncertain a model is.

Depending on your assumptions, there are several ways to calculate prediction intervals.

The t-distribution is more flexible than the normal distribution and gave more precise prediction intervals in our specific case.

Sources

Student’s t-distribution – Wikipedia

scipy.stats.t — SciPy v1.14.0 Manual

The Normal Distribution vs. Student’s T-Distribution | by T.J. Kyner | Medium

———————————————————

Data Science @ Statnett