The green transition has overwhelmed the power grid connection process for new customers. Swift and cost-effective connection relies on the expertise of power system analysts. To achieve this, Statnett believe we must enable analysts to adopt a data-driven approach akin to data scientists.
In 2019, Statnett therefore established the FRIDA program with a cross-disciplinary product team whose objective was to empower analyst by making it much easier to find and use power system data for analysis. In this post we’ll present how the team accomplished this task. The first part focuses on on the current industry challenge faced by Statnett and network operators around the world. The second part presents the tools and processes that were created to tackle it.
Overwhelming demand for grid capacity poses challenges for the power system
Statnett holds three key roles in the power system: grid owner, system operator, and transmission system network planner. We currently face challenges in:
Grid Connection Requests: An increasing number of requests from power producers and industrial projects, such as battery factories and data centers, demand timely responses, proper guidance for optimal project planning, and accelerated grid connection processes.
High Power System Utilization: During certain times of the year or when components are disconnected, the power system operates at maximum capacity. Consequently, optimizing incident management, maintenance planning, and disconnections is becoming increasingly vital.
Overall, Statnett’s decisions and services significantly impact both our customers’ value creation and the security of supply. Even minor improvements can yield considerable value.
Power system analysts require access to comprehensive amounts of data
To achieve these improvements, Statnett’s power system engineers need easy access to reliable information from various areas of the organization. They analyze vast quantities of data, including:
Graph model: The Common Information Model (CIM) is a comprehensive graph model that details all relevant components within the power system. It outlines connections, electrical attributes, and metadata, such as ownership. However, its complexity, consisting of millions of triplets and hundreds of data-classes, requires a level of IT knowledge that most analyst don’t have, which prevents efficient data exploration.
Time series: Long-term data, especially extreme values, is crucial to identify stress scenarios in the system. Given the significant influence of weather on the Norwegian power system, analysts typically require a decade’s worth of data to build confidence. For most equipment Statnett receives measurements from sensors and estimates from a network solver about every ten seconds.
Events: Information on faults, maintenance, and special regulations affecting system utilization is vital, along with other market data and the cost of non-delivered energy that influences bottleneck-related costs.
Traditionally, this information was only accessible through source systems or data warehouses using reports. Analysts spent 30-50% of their time locating, extracting, and ensuring data quality. To get the customers connected the analysts should spend their time finding solutions, not data.
The biggest challenge is just the sheer volume of projects. There are only so many power engineers out there who can do the sophisticated studies we need to do to ensure the system stays reliable, and everyone else is trying to hire them, too
The Power SDK makes power system data instantly available in Python
As a team we worked closely with the power system engineers to identify the areas and user stories that would be most useful to focus on. We mostly focused on creating a web app to meet their needs for finding components and studying observations related to them. This is further explained in the movie below. The movie is in Norwegian.
In the remaining part of this post, we will focus on what we did to meet the wish from our power system engineers to have the power system data instantly available in Python. The Statnett Power SDK is a Python package built on top of Statnetts data platform, to make it easier to retrive data from multiple sources and analyse this data to make better decisions.
The Power SDK was initially developed in cooperation with Cognite. Cognite Data Fusion is a part of Statnetts data platform and the primary source of information in the Power SDK. You may find the original source code here. The Power SDK has since then been further developed in Statnett. The Power SDK aims to address three common issues that made the initial user threshold too high:
End users needed to connect to different databases and understand the data structure and format, which typically differed from database to database.
Related data could be identified through code, but this required knowledge of graph queries and graph traversals, which most analysts lacked. For instance, adherence to the same price.
The standard SDK provided by the platform used unfamiliar terms and lacked functions for typical use cases within the power analysis domain.
Below we have provided examples of how the Power SDK addresses these issues. Let’s say that an analyst needs to know the power produced in price area NO5. With the Power SDK, she may get this information through querying in terms that are familiar to how she navigates the power system mentally.
To get all substations in a price area you may write:
You may also specify the substations you want by name, voltage level, grid type and whether to include historic substations as well. You may even define areas based on the power lines entering the area. The area object will contain all the substations and power lines within the area.
From the area or list of substations you may find all connected components, such as generating units or power transformers:
Python
generating_units = substations.generating_units()
From these components you may retrieve all time series to the generators. Here I retrieve all time series of ThreePhaseActivePower related to the generators:
Once you have the time series or events, a user with basic Python skills may easily do powerful analysis. The example below shows how to find the total production in a power area over time. The area object refered to in the example could be defined by (in this example fictitious substation names):
Python
area = client.power_area(substations=['Substation1','Substation2']).expand_area(level=2)
or by specifying which power lines that defines the electric area.
The code below finds all relevant time series associated with the generators in the power area defined earlier, summarizes these and plots hourly values over the past year.
The analysts need guidance and support to embrace new tools
Getting busy analysts to change how they work is challenging unless you provide easy to use tools with reliable access to support. Otherwise, the barrier to trying new ways of doing things can be too significant, especially in a busy work environment. Nevertheless, we believe that the workload does not have to be large to make a change.
To provide support, we have established a common chat for users to facilitate discussions with the team and other users. Both the team and expert users can quickly provide feedback, making it easier for users to explore new approaches. Moreover, we have set up JupyterHub, enabling users without Python experience to get started faster. This also allows departments to use notebooks as lightweight dashboards, facilitating the sharing of insights.
As a result, the adoption of our Power SDK has increased dramatically, and the service has also reduced the cost and time to market for integrating other data sources for analysts. For instance, the Entsoe-py package has become popular and reduced the load on our internal data platform team.
Furthermore, helping users who encounter challenges provides us with a better understanding of their needs and pain points, facilitating the development of our products. Through our dialogue with users, we can capture the value of what we do and share our knowledge with others. For example, feedback from a user states:
Previously, I would have spent a week extracting data from the data arehouse to investigate load and flow into and through the Oslo area. Now I can create good assumptions within 1 hour.
Power System Analyst in Statnetts Regional Grid Planning department
The feedback we receive from users is useful for sharing knowledge and gaining support from management to continue our development efforts.
The analysts have used the tools to improve their methods
We are thrilled to see that analysts have a keen interest in using statistical methods to make better decisions. With our experience and expertise, we can increase the scope of data analysis and data science in Statnett, empowering analysts to generate valuable insights.
The Power SDK can also load weather data from Statnetts substations. This figure shows the relationship between the peak hourly load per day and average outdoor temperature. The figure is done by an analyst working in our long term planning department and used to assess the expected peak load in a region in Norway.
These assessments may have great economic and environmental impact as well: As the figure shows, the load is increasing at a lower rate as it gets colder. Compared to a linear trend, which was common historically, this leads to a lower expected peak demand and hence lower demand for grid capacity.
If you want more information or have ideas for cooperation, please don’t hesitate to contact us.
We are looking into sensor and weather based approaches to operating the grid more efficiently using dynamic line rating
This winter, the electricity price in the south of Norway has reached levels over ten times higher than the price in the north. These differences are a result of capacity constraints and not surprisingly, Statnett is considering to increase the grid capacity. An obvious way would be to build new transmission lines, but this is very expensive and takes years to complete. The other option is to utilize the existing grid more efficiently.
What factors are limiting the transmission capacity? How do we estimate these limits, and can more data help us get closer to their true values? Together with some of my colleagues at the Data science team, I’ve been putting weather data to work in an attempt to calculate transmission capacities in a smarter way. Because if we can safely increase the capacities of individual transmission lines, we might be able to increase the overall capacity between regions as well.
Too hot to handle
Transmission capacity is limited for a number reasons. If too much power flows through this single transmission line that connects a power producer and a consumer, there are at least three things that could go really wrong:
A wind power plant connected to a consumer through a transmission line. The line consists of three towers and four spans.
The voltage at the consumer could drop below acceptable levels
The conductor could suffer material damage if the temperature goes beyond its thermal limit
The conductor could sag too close to the ground as it expands with higher temperature
The sagging problem must be taken seriously to avoid safety hazard and wildfire risk, and regulations specify a minimum allowed clearance from the conductor to the ground.
Statnett calculates a thermal rating for each line in the transmission grid. These ratings restrict the power flow in the network in an attempt to always repect thermal limits and minimum clearance regulations. Today, these calculations rely on some consequential simplifications, and the resulting limits are generally considered to be on the conservative side.
Fresh weather data enables a different kind of calculation of these limits, and the ambition of our current work is to increase the current-carrying capacity, or ampacity of Norwegian transmission lines when it’s safe to do so. But before attempting to calculate better limits indirectly, using weather data, we first ran a project to estimate ampacity directly, using sensors.
Estimating ampacity with sensors
A dynamic line rating (DLR) sensor is a piece of hardware that provides ampacity values based on measurements. The sensor cannot measure the ampacity directly, it is only able to make an estimate based on what it observes, which is typically the sag or clearance to ground.
One of the spans where we have tested DLR sensors.
We recently tested and evaluated DLR sensors from three different suppliers. The results were mixed, we experienced large ampacity deviations between the three.
DLR sensors come at a cost, and the reach of a sensor is limited to the single span where it is located. This is the motivation behind our indirectapproach to calculate ampacity ratings, without using the DLR sensors.
Calculating the ampacity
A typical span. Regulations specify the minimum allowed clearance between the ground and the conductor.
In our indirect model, we base our calculations on the method described in CIGRE 601 Guide for thermal rating calculations of overhead linesto calculate ampacity. In short, we need to solve a steady-state heat balance equation that accounts for joule losses, solar heating, convective cooling and radiative cooling:
Joule losses heat the conductor and increase as the current through the conductor increases. The losses are caused by the resistive and magnetic properties of the conductor.
Solar radiation heats the conductor and dependens on the solar position, which varies with the time of day and date of year. Cloud cover, the orientation of the line, the type of terrain on the ground and the properties of the conductor also influence the heating effect.
Convection cools the conductor. The effect is caused by the movement of the surrounding air and increases with the temperature difference between the conductor and the air. Forced convection on a conductor depends on the wind speed and the wind direction relative to the orientation of the span. A wind direction perpendicular to the conductor has the largest cooling effect. And even without wind, there will still be natural convection.
Radiation cools the conductor. The temperature difference between the conductor temperature and the air temperature causes a transmission of heat from the conductor to the surroundings and the sky.
Convective cooling and joule heating have the most significant effect on the heat balance, as the figure below shows. The solar heating and the radiative cooling are smaller in magnitude.
As it turns out, many of the parameters in the equation are weather-dependent. In the following sections I will explain three indirect models that go to different lengths of applying weather data when calculating thermal ratings.
The simplest weather-based ampacity calculation: a static seasonal limit
Historically, a simple and common approach to the problem has been to establish seasonal ampacity ratings for a transmission line by considering the worst-case seasonal weather conditions of the least advantageous span. This approach can lead to limits such as:
Summer: 232 A
Spring and autumn: 913 A
Winter: 1913 A
Such a model requires very little data, but has the disadvantage of being too conservative, providing a lower ampacity than necessary in all but the worst-case weather conditions. The true ampacities would vary over time.
Including air temperature in the calculation
Statnett includes air temperatures in their ampacity calculations. The temperatures can come from measuring units or a forecast service. This approach is vastly more sophisticated than the static seasonal limits, but there are still many other factors where Statnett resorts to simplifying assumptions.
The wind speed and wind direction are recognized as the most difficult to forecast. The wind can vary a lot along the length of a span and over time. To overcome the modeling challenge, the traditional Statnett solution has been to assume constant wind speed from a fixed wind direction. The effect of solar heating is also reduced to a worst-case values for a set of different temperatures. The end result is ampacity values that depend only on the air temperature. The resulting ampacities are believed to be conservative, especially since the solar heating is often lower than the worst-case values.
However, counterexamples can be made, where the true ampacity would be lower than the values calculated by Statnett. Combinations of low wind speed, unfavourable wind direction and solar heating close to worst-case can happen in rare cases, especially for spans where winds are shielded by terrain. And of course, errors in the input air temperature data can also give a too high ampacity.
Including more weather parameters
Our full weather-based model not only considers air temperature, but also wind speed, wind direction, date and time (to calculate solar radiation) and the configuration of the transmission line span.
Applying the wind speed and direction data is not straightforward. And at the same time, the consequences of incorrect ampacities are asymmetric. The additional capacity can never justify clearance violations or material damage of the conductor, so being right on average may not be a good strategy.
Comparing the models
We compared the ampacity from DLR sensors to the three indirect, weather-based calculation approaches for a specific line in the summer, autumn and winter. The DLR sensor estimate should not be accepted as the true ampacity, but is considered to be the most accurate and relevant point of reference.
In summer, the static seasonal limit results in low utilization of the line and low risk, as expected. The current Statnett approach allows higher utilization, but without particularly high risk, since the limits are still conservative compared to the DLR sensor estimate. The weather-based approach overestimates the ampacity most of these four days in July, but drops below the DLR sensor estimate a few times.
Summer period.
Our autumn calculations tell a similar story. However, the difference between the current Statnett approach and the DLR sensors is even larger, indicating high potential for capacity increases with alternative methods.
Autumn period.
In winter, the static seasonal rating and the current Statnett approach are still more conservative that the DLR sensor, but the weather-based estimate is more conservative than the estimate of the DLR sensor in the chosen period.
Winter period.
The static seasonal limits and Statnett’s air temperature dependent limits are consistently more conservative than the approach with a DLR sensor. Our weather-based approach, on the other hand, give too optimistic ampacity ratings, with a few exceptions.
We think we have an idea why.
Weather-based models require high quality weather data or forecasts
For all seasons, the weather-based approach stands out with its large ampacity variations. It is great that our model better reflects the weather-driven temperature dynamics, but the large variations also show the huge impact weather parameters (and their inaccuracies) have on the calculations.
The weather data we used to calculate ampacity were to a large extent forecasts, and even the best forecasts will be inaccurate. But even more imporantly: the weather data were never calibrated to local conditions. This will overestimate the cooling effect under some conditions. In reality, terrain and vegetation in the immediate surroundings of a power line can deflect much of the wind.
To calculate more accurate weather-based limits, we believe the weather parameters from a forecast cannot be used directly without some form of adjustment.
For the future, we would like to find a way to make these adjustments, such that we can increase the ampacity compared to Statnett’s current approach while reducing the risk of overestimation. Additionally, The Norwegian Meteorological Institute provides an ensemble forecast that gives additional information about the uncertainty of the weather forecast. In future work, we would like to use the uncertainty information in our indirect weather-based model.
Together with almost half of the Data science group at Statnett, I spend my time building automatic systems for congestion management. This job is fascinating and challenging at the same time, and I would love to share some of what our cross-functional team has done so far. But before diving in, let me first provide some context.
A good day at Statnett’s control centre. Photo: Trond Isaksen
The balancing act
Like other European transmission system operators (TSOs), Statnett is keeping the frequency stable at 50 Hz by making sure generation always matches consumption. The show is run by the human operators in Statnett’s control centre. They monitor the system continuously and instruct flexible power producers or consumers to increase or decrease their power levels when necessary.
These balancing adjustsments are handled through a balancing energy market. Flexible producers and consumers offer their reserve capacity as balancing energy bids (in price-volume pairs). The operators select and and activate as many of them as needed to balance the system. To minimize balancing costs, they try to follow the price order, utilizing the least expensive resources first.
While busy balancing the system, control centre operators also need to keep an eye on the network flows. If too much power is injected in one location, the network will be congested, meaning there are overloads that could compromise reliable operation of the grid. When an operator realizes a specific bid will cause congestion, she will mark it as unavailable and move on to use more expensive bids to balance the system.
In the Norwegian system, congestion does not only occur in a few well-known locations. Due to highly distributed generation and a relatively weak grid, there are hundreds of network constraints that could cause problems, and the Norwegian operators often need to be both careful and creative when selecting bids for activation.
A filtering problem
The Nordic TSOs are transitioning to a new balancing model in the upcoming year. A massive change is that balancing bids will no longer be selected by humans, but by an auction-like algorithm, just as in many other electricity markets. This algorithm (unfortunately, but understandably) uses a highly aggregated zonal structure, meaning that it will consider capacity restrictions between the 12 Nordic bidding zones, but not within them.
Consequently, the market algorithm will disregard all of the more obscure (but still important) intra-zonal constraints . This will -of course- lead to market results that simply do not fit inside the transmission grid, and there is neither time nor opportunity after the market clearing to modify the outcome in any substantial way.
My colleagues and I took the task of creating a bid filtering system. This means predicting which bids would cause congestion, and mark them as unavailable to prevent them from being selected by the market algorithm.
Filtering the correct bids is challenging. Network congestions depend on the situation in the grid, which is anything but constant due to variations in generation, consumption, grid topology and exchange with other countries. The unavailability decision must be made something like 15-25 minutes before the bid is to be used, and there is plenty of room for surprises during that period.
How it works
Although I enjoy discussing the details, I will give a only a short summary of the system works. To decide the availability of each bid in the balancing market, we have created a Python universe with custom-built libraries and microservices that follows these steps
Assemble a detailed model of the Norwegian power system in its current state. Here, we combine the grid model from Statnett’s equpment database and combine it with fresh data from Statnett’s SCADA system.
Adjust the model to reflect the expected state. Since we are looking up to 30 minutes ahead, we offset the effect of current balancing actions, and apply generation schedules and forecasts to update all injections in the model.
Prepare to be surprised. To make more robust decisions in the face of high uncertainty in exchange volumes, we even apply a hundred or more scenarios representing different exchange patterns on the border.
Find the best balancing actions for each scenario of the future. Interpreting the results of an optimal power flow calculation provides lots of insight into which bids should be activated (and which should not) in each exchange scenario.
Agree on a decision. In the final step, the solution from each scenario is used to form a consensus decision on which bids to make unavailable for the balancing market algorithm.
An example result and how to read it
My friend and mentor Gerard Doorman recently submitted a paper for the 2022 CIGRE session in Paris, explaining the bid filtering system in more detail. I will share one important figure here to illustrate the final step of the bid filtering method. The figure shows the simulated result of running the bid filtering system at 8 AM on August 23, 2021.
Simulated bid filtering results from August 23, 2021. Bids are sorted horizontally, according to price, and grouped by their bidding zone (NO1 through 5) and direction (up/down).
Before you cringe from information overload, let me assist you by explaining that the abundance of green cells in the horizontal bar on top shows that the vast majority of balancing bids were decided to be available.
There are also yellow cells, showing bids that likely need to be activated to keep the system operating within its security limits, no matter what happens.
The red cells are bids that have been made unavailable to prevent network congestion. To understand why, we need to look at the underlying results in the lower panel. Here, each row presents the outcome of one scenario, and purple cells show the bids that were rejected, i.e. not activated in the optimal solution, although being less expensive than other ones that were activated for balancing in the same scenario (in pink).
The different scenarios often do not tell the same story, a bid that is rejected in one scenario can be perfectly fine in the next, it all depends on the situation in the grid and which other bids are also activated. Because of this ambiguity, business rules are necessary to create a reasonable aggregate result, and the final outcome will generally be imperfect.
So, does this filtering system have any postive impact on network congestions at Statnett? I will leave the answer for later, but if you’re curious to learn more, don’t hesitate to leave a comment.
We use pydantic models to share data requirements and metadata between ML applications. Here’s how.
In our previous post, we wrote about how we use the Python package pydantic to validate input data. We serve our machine learning models with APIs written using the FastAPI library, which uses pydantic to validate and parse the input data received through the API. FastAPI was our introduction to pydantic. We started out with some pydantic models for validating input data
to the API. We saw that sharing the models across several applications would lead to less boilerplate code, as well as more explicit handling of our data requirements.
The data models we share across our ML applications serve two purposes:
Sharing metadata, i.e. tidy models declaring what data a specific machine learning model requires, including the logic for naming standards and formats for exporting data. This is also used as the foundation for a mapping of how to select data from the training database.
Validating that data requirements are fulfilled. When training our models, we have certain requirements to the training data, for example that there is no missing data. The data models allow us to validate these requirements across other applications as well, such as in the API.
In this post, we will show how we share metadata and data requirements using pydantic. For an introduction to pydantic syntax and concepts, take a look at our previous post, or the excellent pydantic documentation.
A quick overview of applications
Before we go into detail, we’ll start with a quick overview of how data flows and what applications are at play.
Outline of architecture with the relevant components
Our forecast models are served by APIs, the ML API application in the figure. The models specify the data they require to make a prediction at a get endpoint. The ML / stream integration application requests the feature metadata specification from the ML API, fetches the data from the streaming data and posts the observations to the ML API’s prediction endpoint. The response is the prediction, which the ML / stream integration application published to the streaming data platform for users to consume.
The data from the streaming platform is persisted to a database. To ensure we have valid training data, data cleaning applications read the raw data, validate that the data meets the requirements. These applications also try to fill small holes of missing data according to domain specific rules, and warn if there are larges holes that need to be manually processed. Data validation is run on a regular schedule, as data is continuously persisted from the streaming platform to the database.
When training the models, validated data from the data cleaning applications is used.
When reading data, the applications use the metadata models to select the correct data. The ML API specifies what data should be fetched from the streaming platform using the metadata model. The data cleaning applications uses the metadata model to select the correct data from the database. When training models, the data is fetched from the database according to the metadata model.
When operating on data, the applications use the data requirements models to ensure the validity of the input data. Before making a prediction, data is validated in the ML API. The prognosis response is also validated before it is returned to the ML / stream integration application. The data cleaning applications also validate data according to the data requirements models before writing to the database for training.
The metadata models are used for reading the correct data, shown with a yellow star. The data requirements model is used for validating that data meets requirements, shown with a blue triangle.
Sharing metadata with pydantic
Below is an example of our BaseFeatureMetaData class, which we use for metadata on
the base features in our models, i.e., the features as they come from our sources. This metadata model specifies all necessary details on the features
a model requires:
The name of the feature.
The source we require for the feature, as several features are found in multiple sources. An example is weather forecasts, which are available from different vendors, but it could also be different estimates for measurements or unobservable values.
Which locations we require data from.
history_period and future_period together specifies the time series the model requires to provide its prediction.
The time resolution required for this feature, as many features are available in different time aggregations.
from datetime import timedelta
from typing import List
from pydantic import BaseModel
from our_internal_utils import timedelta_isoformat_
class BaseFeatureMetaData(BaseModel):
name: str
source: str
locations: List[str]
history_period: timedelta
future_period: timedelta
resolution: timedelta
class Config:
json_encoders = {timedelta: timedelta_isoformat_}
def __str__(self):
res_str = timedelta_isoformat_(self.resolution)
return f"{self.name}_{self.source}_{res_str}"
We can now create a metadata specification for temperature data, from the meteorological institute, or met as we call them, from two weather stations, SN18700 (Blindern) and SN27500 (Lillehammer), for 48 hours of weather observations and 72 hours of forecast values, using the hourly resolution data:
By the ML / stream integration application that fetches data to models for live prediction. We provide the
metadata in the API which serves our ML models, specifying all details on the
features a model requires. The application uses the metadata to collect the required
time series from our streaming platform.
When we train our models, to fetch the correct training data from the database for each model. The metadata specification of each model, which is exposed in the API, is also what we use to map a base feature the which data to read from the database.
When validating training data in the data cleaning applications. Each base feature has its own data cleaning application configured with the metadata model, used to map a base feature to which data to read from the database, as for training.
In order to provide the metadata model in json format from the API, we use pydantic’s Config class, which we can use to customize behaviour of the model. The json_encoder in our
Config specifies that all timedelta fields should be converted using our utility
function timedelta_isoformat_, to provide valid json. Pydantic provides a .json() method for exporting models to json. temperature_metadata.json() returns:
following the timedelta representation standard we have agreed on in the interface between the ML API and the ML / stream integration application. We also overwrite the __str__ representation of the class to provide our internal
naming standard for features. str(temperature_metadata) returns:
'temperature_met_PT1H'
We use this naming standard in column names for the feature data in internal applications and naming each data cleaning application, as each base feature has its own cleaning application.
Validating data requirements across applications
Our BaseFeature class explicitly states our data requirements
through validators: that time steps must be evenly spaced, that we don’t
accept any NaNs in the timeseries and other basic requirements. This model is used
in our ML API, to validate input data before running predictions.
This use case was our first introduction to
pydantic data models, and is described in FastAPI’s excellent documentation.
in our data cleaning application, to validate that new potential training data, persisted from the streaming platform,
meets the requirements we have in training.
Since we use the BaseFeature model when reading data from the database to train our models, the validation can be done before training as well, or we can use the construct() method for already validated sources, which creates models without validation.
This use of pydantic data models is analogous to the input validation example shared in our previous post. Using the data model in both the API serving the models, the models themselves and the data cleaning applications ensures that we collect all our data requirements in one place. This way we avoid multiple .dropna() lines spread throughout our code,
which can lead to different treatment of data during training and prediction, and makes it difficult to refactor code. Using the data requirements in the data cleaning applications also ensures that there is validated data available for training at any time.
Below is a shortened example with a few of the validators of a parent model,
DataFrameTimeSeries, which the data requirement models inherit from.
In the DataFrameTimeSeries model, missing values are allowed. Our BaseFeature model
inherits from DataFrameTimeSeries and inherits all validators, but we override the
data field, because missing data is not allowed for our BaseFeature input. When an instance of a pydantic BaseModel object is created, the validator decorated functions will be run, as well as validation that the type hints are adhered to. An exception will be raised if any of the validations fails.
from typing import List, Optional, Dict
from pydantic import BaseModel, validator
class DataFrameTimeSeries(BaseModel):
columns: List[str]
timestamps: List[int]
data: List[List[Optional[float]]]
# Some time series are allowed to have missing values
@validator("columns")
def at_least_one_column(cls, columns: List[str]) -> List[str]:
if len(columns) < 1:
raise ValueError("No column names specified")
return columns
@validator("timestamps", "data")
def len_at_least(cls, value: List) -> List:
minlen = 1
if len(value) < minlen:
raise ValueError(f"length is less than {minlen}")
return value
@validator("timestamps")
def index_has_proper_spacing(cls, value: Dict) -> Dict:
diffs = [x[0] - x[1] for x in zip(value[1:], value)]
if len(diffs) < 1:
return value
if any(diff <= 0 for diff in diffs):
raise ValueError("Index is not monotonically increasing")
if any(diff != diffs[0] for diff in diffs):
raise ValueError("Index has unequal time steps")
return value
class BaseFeature(DataFrameTimeSeries):
data: List[List[float]]
# BaseFeatures inherit validators from DataFrameTimeSeries,
# but are not allowed to have missing values, so the data
# field does not have Optional floats
In our API, the validation is done immediately when a new request is made,
as the pydantic validation is enforced
through type hints in the API routes.
Our data cleaning applications runs validation once per day, validating all
BaseFeatures in the training database. There is one application running per BaseFeature. New data from the streaming platform is stored in the database continuously. The cleaning application reads the new data since last cleaning and, if there are small holes, attempts to fill them according to domain specific rules. The number of timesteps that can be filled automatically as well as the interpolation method is set per base feature. The joint dataset of original and processed data is then converted to BaseFeatures through a utility function. As this creates a BaseFeature instance, validation is performed immediately. A view containing only validated data is updated to include the newly added data if it meets the requirements. This view is the source for training data.
We could create additional validation functions, for example creating subclasses
of BaseFeature with domain specific validation, such as checking whether temperature values
are within normal ranges.
Going further
As of now, our use of data models ensure that the data meet minimum requirements for
our specific use case. What this doesn’t provide is a broader evaluation of data quality: are values
drifting, how often is data missing from this source, etc. For this kind of overview, we are working on a pipeline for monitoring input data
closer to the source and provide overviews for other data consumers. Our initial investigations of combining pydantic models with great expectations, a data testing, documentation and profiling framework, are promising.
We use the Python package pydantic for fast and easy validation of input data. Here’s how.
We discovered the Python package pydantic through FastAPI, which we use for serving machine learning models. Pydantic is a Python package for data parsing and validation, based on type hints. We use pydantic because it is fast, does a lot of the dirty work for us, provides clear error messages and makes it easy to write readable code.
Two of our main uses cases for pydantic are:
Validation of settings and input data.
We often read settings from a configuration file, which we use as inputs to our functions. We often end up doing quite a bit of input validation to ensure the settings are valid for further processing. To avoid starting our functions with a long set of validations and assertions, we use pydantic to validate the input.
Sharing data requirements between machine learning applications.
Our ML models have certain requirements to the data we use for training and prediction, for example that no data is missing. These requirements are used when we train our models, when we run online predictions and when we validate newly added training data. We use pydantic to specify the requirements we have and ensure that the same requirements are used everywhere, avoiding duplication of error-prone code across different applications.
This post will focus on the first use case, validation of settings and input data. A later post will cover the second use case.
Validation of settings and input data
In some cases, we read settings from a configuration file, such as a
toml file, to be parsed as nested dictionaries.
We use the settings as inputs to different functions. We
often end up doing quite a bit of input validation to ensure the settings parsed from
file are valid for further processing. A concrete example is settings for machine learning
models, where we use toml files for defining model parameters, features and training details for the models.
This is quite similar to how FastAPI uses pydantic for input validation: the input to the API call is json, which in Python translates to a dictionary, and input validation is done using pydantic.
In this post we will go through input validation for a function interpolating a time series to a higher frequency. If we want to do
interpolation, we set the interpolation factor, i.e., the factor of upsamling, the
interpolation method, and an option to interpolate on the integral. Our interpolation
function is just a wrapper around pandas interpolation methods,
including the validation of input and some data wrangling. The input validation code started out looking a bit like this:
from typing import Dict
def validate_input_settings(params_in: Dict) -> Dict:
params_validated = {}
for key, value in params_in.items():
if key == "interpolation_factor":
if not int(value) == value:
raise ValueError(f"{key} has a non-int value")
if not int(value) >= 2:
raise ValueError(f"{key}: {value} should be >= 2")
value = int(value)
elif key == "interpolation_method":
allowed_set = {
"repeat", "distribute", "linear", "cubic", "akima"
}
if value not in allowed_set:
raise ValueError(f"{key} should be one of {allowed_set}, got {value}")
elif key == "interpolate_on_integral":
if not isinstance(value, bool):
raise ValueError(f"{key} should be bool, got {value}")
else:
raise ValueError(f"{key} not a recognized key")
params_validated[key] = value
return params_validated
This is heavily nested, which in itself makes it hard to read, and perhaps you find that the validation rules aren’t crystal clear at first glance. We use SonarQube for static code quality analysis, and this piece of code results in a code smell, complaining that the code is too complex. In fact, this already has a cognitive complexity
of 18 as SonarQube counts, above the default threshold of 15. Cognitive complexity is a measure of how difficult it is to read code, and increments for each break in linear flow, such as an if statement or a for loop. Nested breaks of the flow are incremented again.
Let’s summarize what we check for in validate_input_settings:
interpolation_factor is an integer
interpolation_factor is greater than or equal to 2
interpolation_method is in a set of allowed values
interpolate_on_integral is boolean
The keys in our settings dictionary are among the three mentioned above
In addition to the code above, we have a few more checks:
if an interpolation_factor is given, but no interpolation_method,
use the default method linear
if an interpolation_factor is given, but not interpolate_on_integral,
set the default option False
check for invalid the invalid combination interpolate_on_integral = False
and interpolation_method = "distribute"
At the end of another three if statements inside the for loop,
we end up at a cognitive complexity of 24.
Pydantic to the rescue
We might consider using a pydantic
model for the input validation.
Minimal start
We can start out with the simplest form of a pydantic model, with field types:
from pydantic import BaseModel
class InterpolationSetting(BaseModel):
interpolation_factor: int
interpolation_method: str
interpolate_on_integral: bool
Pydantic models are simply classes inheriting from the BaseModel class. We can create an instance of the new class as:
This automatically does two of the checks we had implemented:
interpolation_factor is an int
interpolate_on_integral is a bool
In the original script, the fields are in fact optional, i.e., it is possible to
provide no interpolation settings, in which case we do not do interpolation. We will
set the fields to optional later, and then implement the additional necessary checks.
We can verify the checks we have enforced now by supplying non-valid input:
from pydantic import ValidationError
try:
InterpolationSetting(
interpolation_factor="text",
interpolation_method="linear",
interpolate_on_integral=True,
)
except ValidationError as e:
print(e)
which outputs:
1 validation error for InterpolationSetting
interpolation_factor
value is not a valid integer (type=type_error.integer)
Pydantic raises a ValidationError when the validation of the model fails, stating
which field, i.e. attribute, raised the error and why. In this case
interpolation_factor raised a
type error because the value "text" is not a valid integer. The validation is
performed on instantiation of an InterpolationSetting object.
Validation of single fields and combinations of fields
Our original code also had some additional requirements:
interpolation_factor should be greater than or equal to two.
interpolation_method must be chosen from a set of valid methods.
We do not allow the combination of interpolate_on_integral=False
and interpolation_method="distribute"
The first restriction can be implemented using pydantic types. Pydantic provides many different types, we will use a constrained types this requirement, namely conint, a constrained integer type providing automatic restrictions such as lower limits.
The remaining two restrictions can be implemented as validators. We decorate our validation
functions with the validator decorator. The input argument to the validator decorator is the name of the attribute(s)
to perform the validation for.
All validators are run automatically when we instantiate
an object of the InterpolationSetting class, as for the type checking.
Our
validation functions are class methods, and the first argument is the class,
not an instance of the class. The second argument is the value to validate, and
can be named as we wish. We implement two validators, method_is_valid and valid_combination_of_method_and_on_integral:
from typing import Dict
from pydantic import BaseModel, conint, validator, root_validator
class InterpolationSetting(BaseModel):
interpolation_factor: conint(gt=1)
interpolation_method: str
interpolate_on_integral: bool
@validator("interpolation_method")
def method_is_valid(cls, method: str) -> str:
allowed_set = {"repeat", "distribute", "linear", "cubic", "akima"}
if method not in allowed_set:
raise ValueError(f"must be in {allowed_set}, got '{method}'")
return method
@root_validator()
def valid_combination_of_method_and_on_integral(cls, values: Dict) -> Dict:
on_integral = values.get("interpolate_on_integral")
method = values.get("interpolation_method")
if on_integral is False and method == "distribute":
raise ValueError(
f"Invalid combination of interpolation_method "
f"{method} and interpolate_on_integral {on_integral}"
)
return values
There are a few things to note here:
Validators should return a validated value. The validators are run
sequentially, and populate the fields of the data model if they are valid.
Validators should only raiseValueError, TypeError or AssertionError.
Pydantic will catch these errors to populate the ValidationError and raise
one exception regardless of the number of errors found in validation. You can read
more about error handlingin the docs.
When we validate a field against another, we can use the root_validator, which
runs validation on entire model. Root validators are a little different: they have
access to the values argument, which
is a dictionary containing all fields that have already been validated. When the
root validator runs, the interpolation_method may have failed to validate, in
which case it will not be added to the values dictionary. Here, we handle that
by using values.get("interpolation_method") which returns None if the key is
not in values. The docs contain more information on root
validators
and
field ordering,
which is important to consider when we are using the values dictionary.
Again, we can verify by choosing input parameters to trigger the errors:
from pydantic import ValidationError
try:
InterpolationSetting(
interpolation_factor=1,
interpolation_method="distribute",
interpolate_on_integral=False,
)
except ValidationError as e:
print(e)
which outputs:
2 validation errors for InterpolationSetting
interpolation_factor
ensure this value is greater than 1 (type=value_error.number.not_gt; limit_value=1)
__root__
Invalid combination of interpolation_method distribute and interpolate_on_integral False (type=value_error)
As we see, pydantic raises a single ValidationError regardless of the number of ValueErrors raised in our model.
Implementing dynamic defaults
We also had some default values if certain parameters were not given:
If an interpolation_factor is given, set the default value linear
for interpolation_method if none is given.
If an interpolation_factor is given, set the default value False
for interpolate_on_integral if none is given.
In this case, we have dynamic defaults dependent on other fields.
This can also be achieved with root validators, by returning a conditional value.
As this means validating one field against another, we must take care to ensure
our code runs whether or not the two fields have passed validation and been added to
the values dictionary. We will now also use
Optional types, because we will handle the cases where not all values are provided. We add the new validators set_method_given_interpolation_factor and set_on_integral_given_interpolation_factor:
from typing import Dict, Optional
from pydantic import BaseModel, conint, validator, root_validator
class InterpolationSetting(BaseModel):
interpolation_factor: Optional[conint(gt=2)]
interpolation_method: Optional[str]
interpolate_on_integral: Optional[bool]
@validator("interpolation_method")
def method_is_valid(cls, method: Optional[str]) -> Optional[str]:
allowed_set = {"repeat", "distribute", "linear", "cubic", "akima"}
if method is not None and method not in allowed_set:
raise ValueError(f"must be in {allowed_set}, got '{method}'")
return method
@root_validator()
def valid_combination_of_method_and_on_integral(cls, values: Dict) -> Dict:
on_integral = values.get("interpolate_on_integral")
method = values.get("interpolation_method")
if on_integral is False and method == "distribute":
raise ValueError(
f"Invalid combination of interpolation_method "
f"{method} and interpolate_on_integral {on_integral}"
)
return values
@root_validator()
def set_method_given_interpolation_factor(cls, values: Dict) -> Dict:
factor = values.get("interpolation_factor")
method = values.get("interpolation_method")
if method is None and factor is not None:
values["interpolation_method"] = "linear"
return values
@root_validator()
def set_on_integral_given_interpolation_factor(cls, values: Dict) -> Dict:
on_integral = values.get("interpolate_on_integral")
factor = values.get("interpolation_factor")
if on_integral is None and factor is not None:
values["interpolate_on_integral"] = False
return values
We can verify that the default values are set only when interpolation_factor is provided, running InterpolationSetting(interpolation_factor=3) returns:
Note: If we have static defaults, we can simply set them for the fields:
class InterpolationSetting(BaseModel):
interpolation_factor: Optional[int] = 42
Final safeguard against typos
Finally, we had one more check in out previous script: That no unknown keys were provided. If we provide unknown keys to our data model now, nothing really happens, for example InterpolationSetting(hello="world") outputs:
Often, an unknown field name is the result of a typo
in the toml file. Therefore we want to raise an error to alert the user.
We do this using a the model config, controlling the behaviour of the model. The extra attribute of the config determines what we do with extra fields. The default is ignore, which we can see in the example above, where the field is ignored, and not added to the model, as the option allow does. We can use the forbid option to raise an exception when extra fields are supplied.
from typing import Dict, Optional
from pydantic import BaseModel, conint, validator, root_validator
class InterpolationSetting(BaseModel):
interpolation_factor: Optional[conint(gt=2)]
interpolation_method: Optional[str]
interpolate_on_integral: Optional[bool]
class Config:
extra = "forbid"
@validator("interpolation_method")
def method_is_valid(cls, method: Optional[str]) -> Optional[str]:
allowed_set = {"repeat", "distribute", "linear", "cubic", "akima"}
if method is not None and method not in allowed_set:
raise ValueError(f"must be in {allowed_set}, got '{method}'")
return method
@root_validator()
def valid_combination_of_method_and_on_integral(cls, values: Dict) -> Dict:
on_integral = values.get("interpolate_on_integral")
method = values.get("interpolation_method")
if on_integral is False and method == "distribute":
raise ValueError(
f"Invalid combination of interpolation_method "
f"{method} and interpolate_on_integral {on_integral}"
)
return values
@root_validator()
def set_method_given_interpolation_factor(cls, values: Dict) -> Dict:
factor = values.get("interpolation_factor")
method = values.get("interpolation_method")
if method is None and factor is not None:
values["interpolation_method"] = "linear"
return values
@root_validator()
def set_on_integral_given_interpolation_factor(cls, values: Dict) -> Dict:
on_integral = values.get("interpolate_on_integral")
factor = values.get("interpolation_factor")
if on_integral is None and factor is not None:
values["interpolation_factor"] = False
return values
If we try again with an unknown key, we now get a ValidationError:
from pydantic import ValidationError
try:
InterpolationSetting(hello=True)
except ValidationError as e:
print(e)
This raises a validation error for the unknown field:
1 validation error for InterpolationSetting
hello
extra fields not permitted (type=value_error.extra)
Adapting our existing code is easy
Now we have implemented all our checks, and can go on to adapt our existing code to use the new data model. In our original implementation, we would do something like
We can replace the call to validate_input_settings with instantiation of the pydantic model: params_validated = InterpolationSetting(params_in). Each pydantic data model has a .dict() method that returns the parameters as a dictionary, so we can use it in the input argument to interpolate_result directly: interpolate_result(params_validated.dict()). Another option is to refactor interpolate_result to use the attributes of the InterpolationSetting objects, such as params_validated.interpolation_method instead of the values of a dictionary.
Conclusion
In the end, we can replace one 43 line method (for the full functionality)
and cognitive complexity of 24 with one 40 line
class containing six methods, each with cognitive complexity less than 4.
The pydantic data models will not necessarily be shorter than the custom validation code they replace, and since there are a few quirks and concepts to pay attention to, they are not necessarily easier
to read at the first try.
However, as we use the library for validation in our APIs, we are getting familiar with it, and we can understand more easily.
Some of the benefits of using pydantic for this are:
Type checking (and in fact also some
type conversion),
which we previously did ourselves, is now done automatically for us, saving us the work of repeating lines of error-prone code in many different functions.
Each validator has a name which, if we put a little thought into it, makes it very
clear what we are trying to achieve. In our previous example, the purpose of each
nested condition had to be deduced from the many if clauses and error messages. This
should be a lot clearer now, especially if we use pydantic across different
projects.
If speed is important, pydantic’s
benchmarks show that they are
fast compared to similar libraries.
Hopefully this will help you determine whether or not you should consider using pydantic models in your projects. In a later post, we will show how we use pydantic data models to share metadata between machine learning applications, and to share data requirements between the applications.
In Statnett, we collect large amounts of sensing data from our transmission grid. This includes both electric parameters such as power and current, and parameters more directly related to the individual components, such as temperatures, gas concentrations and so on.
Nevertheless, the state and behaviour of many of our assets are to a large extent not monitored and to some extent unobservable outside of regular maintenance and inspection rounds. We believe that more data can give us a better estimate of the health of each component, allowing for better utilization of the grid, more targeted maintenance, and reduced risk of component failure.
About a year ago, we therefore acquired a Pilot Kit from Disruptive Technologies, packed with a selection of miniature sensors that are simple to deploy and well-suited for retrofitting on existing infrastructure. We set up a small project in collaboration with Statnett R&D, where we set about testing the capabilities of this technology, and its potential value for Statnett.
Since then we’ve experimented with deploying these tiny IOT-enabled devices on a number of things, ranging from coffee machines to 420 kV power transformers.
To gauge the value and utility of these sensors in our transmission grid, we had to determine what to measure, how to measure, and how to gather and analyze the data.
This blog post summerizes what we’ve learnt so far, and evaluates some of the main use cases we’ve identified for instrumenting the transmission grid. The process is by no means finished, but we will describe the steps we have taken so far.
Small, Low-cost IoT-Sensors
Disruptive Technologies is a fairly young company whose main product is a range of low-cost, IoT-enabled sensors. Their lineup includes temperature sensors, touch sensors, proximity sensors, and more. In addition to sensors, you need one or more cloud connectors per area you are looking to instrument. The sensors transmit their signals to a cloud connector, which in turn streams the data to the cloud through mobile broadband or the local area network. Data from the sensors are encrypted, so a sensor can safely transmit through any nearby cloud connector without prior pairing or configuration.
The devices and the accompanying technology have some characteristics that make them interesting for power grid instrumentation, in particular for retrofitting sensors to already existing infrastructure:
Cost: The low price of sensors makes experimental or redundant installation a low-risk endeavour.
Size: Each sensor is about the size of a coin, including battery and the wireless communication layer.
Simplicity: Each sensor will automatically connect to any nearby cloud connector, so installation and configuration amounts to simply sticking the sensor onto the surface you want to monitor, and plugging the cloud connector into a power outlet.
Battery life: expected lifetime for the integrated battery is 15 years with 100 sensor readings per day.
Security: Data is transmitted with full end-to-end encryption from sensor to Disruptive’s cloud service.
Open API: Data are readily available for download or streaming to an analytics platform via a REST API.
Finding Stuff to Measure
Disruptive develops several sensor types, including temperature, proximity and touch. So far we have chosen to focus primarily on the temperature sensors, as this is the area where we see the most potential value for the transmission grid. We have considered use cases in asset health monitoring, where temperature is a well-established indicator of weaknesses or incipient failure. Depending on the component being monitored, unusual heat patterns may indicate poor electrical connections, improper arc quenching in switchgear, damaged bushings, and a number of other failure modes.
Asset management and thermography experts in Statnett helped us compile an initial list of components where we expect temperature measurements to be useful:
Transformers and transformer components. At higher voltage levels, transformers typically have built-in sensors for oil temperature, and modern transformers also tend to monitor winding and hotspot temperatures. Measurement on sub-components such as bushings and fans may however prove to be very valuable.
Ciruit breakers. Ageing GIS facilities are of particular importance both due their importance in the grid, and to the risk of environmental consequences in case of SF6 leakage. Other switchgear may also be of interest, since intermittent heat development during breaker operation will most likely not be uncovered by traditional thermography.
Disconnectors. Disconnectors (isolator switches) come in a number of flavors, and we often see heat development in joints and connection points. However, we know from thermography that hotspots are often very local, and it may be hard to predict in advance where on the disconnector the sensor should be placed.
Voltage and current transformers. Thermographic imaging has shown heat development in several of our instrument transformers. Continuous monitoring of temperature would enable us to better track this development and understand the relationship between power load, air temperature and transformer heating.
Capacitor banks. Thermography often reveals heat development at one or more capacitor in capacitor banks. However, it would require a very large number of sensors required to fully monitor all potential weak spots of a capacitor bank.
A typical use cases for the proximity sensors in the power system is open door or window alarms. Transmission level substations are typically equipped with alarms and video surveillance, but it might be relevant for other types of equipment in the field, or at lower voltage levels.
The touch sensors may for instance be used to confirm operator presence at regular inspection or maintenance intervals. Timestamping and georeferencing as part of an integrated inspection reporting application is a more likely approach for us, so we have not pursued this further.
Deployment on Transformer
Our first pilot deployment (not counting the coffee machine) was on three transformers and one reactor in a 420 kV substation. The sensors were deployed in winter when all components were energized, so we could only access the lower part of the main transformer and reactor bodies. This was acceptable, since the primary intention of the deployment was to gain experience with the process and hopefully avoid a few pitfalls in the future.
Moreover, the built-in temperature sensors in these components gave us a chance to compare readings from Disruptive sensors with the “true” inside temperature, giving us an impression of both the reliability of readings from Disruptive sensors and the ability to estimate oil temperature based on measurements on the outside of the transformer housing. We also experimented with different types of insulation covering the sensor, in order to gauge the effect of air temperature variations on sensor readings.
Deployment in Indoor Substation
Following the initial placement on the transformer bodies, we instrumented an indoor GIS facility, where we deployed sensors on both circuit breakers and disconnectors; plus one additional sensor to measure ambient temperature in the room. Since the facility is indoors and all energized components are fully insulated, this deployment was fairly straightforward. Our main challenge was that the cloud connector had a hard time connecting finding a cellular signal, but with a bit of fiddling we eventually found a few locations in the room with sufficient signal strength.
Concrete buildings can make it hard to find a good signal for mobile broadband. In this case we raised the cloud connector to the ceiling in search of a signal.
Deployment on Air Insulated Breakers
Finally, we took advantage of a planned disconnection of one of the busbars at a 300 kV facilty to instrument all poles of an outdoor SF6 circuit breaker. As mentioned above, disconnectors and instrument transformers were other instrument transformers. However, due to the layout of the substation, these were still energized so the circuit breakers were the only components we could gain access to.
Apart from monitoring the breakers, this deployment enabled us to test how the sensors reacted to being placed directly on uninsulated high-voltage equipment, and to check for any negative side-effects such as corona discharges.
Developing a Microservice for Data Ingestion
The sensors from Disruptive Technologies work by transmitting data from the sensor, via one or more cloud connectors, to Disruptive’s cloud software solution. The data are encrypted end-to-end, so the sensors may use any reachable cloud connector to transmit data.
As a precautionary measure, we opted to maintain an in-house mapping between sensor device ID and placement in the grid. This way, there is nothing outside Statnett’s systems to identify where the sensor data are measured.
Disruptive provides various REST APIs for streaming and downloading data from their cloud solution. For internal technical reasons, we chose to use a “pull” architecture, where we download new sensory readings every minute and pass them on to our internal data platform. We therefore developed a microservice that:
Pulls data from Disruptive’s web service at regular intervals.
Enriches the data with information about which component the sensor is placed on and how it is positioned.
Produces each sensor reading as a message to our internal Kafka cluster.
From Kafka, the data are consumed and stored in a TimescaleDB database. Finally, we display and analyze the data using a combination of Grafana and custom-built dashboards.
The microservice runs on our internal Openshift Container Platform (PaaS).
We wrote a custom adapter that ingests data from Disruptive’s web services and produces messages on our internal Kafka cluster. From here, the data flows to Timescale and dashboards. The data are anonymous on Disruptive’s web service, so the adapter also adds contextual information to the messages.
The Value of Data
Do these newly acquired data help us take better decisions in the operation of the grid, and hence operate the grid in a smarter, safer, and more cost-effective way? This is really the litmus test for the value of retrofitting sensors and gathering more data about the components in the grid.
In this pilot project, we consulted field personnel and component experts regularly for advice on where and how to place sensors. However, it was the FRIDA project, a large cross-disciplinary digitalization project at Statnett, that really enabled and inspired the relevant switchgear expert to analyze the data we had collected in more detail.
Once he looked at the data, he discovered heat generation in one of the breakers, with temperatures that significantly exceeded what would be expected under normal operation. A thermographic imaging inspection was immediately ordered, which confirmed the readings from the Disruptive sensors.
The temperature sensors made us aware of high temperatures in one of the breakers, indicating a possible incipient failure of a critical component. As a result, the control centre changed the operating pattern on the substation and planned for mainentance of the unhealthy component. The figure shows the temperature on each phase of the breaker, with busbar A in the top panel and busbar B in the bottom one. The room temperature is shown as a dotted orange line.
Based on the available data, the breaker and thermography experts concluded that the breaker, altough apparently operating normally, shows signs of weakness and possibly incipient failure. This in turn lead to new parts being ordered and maintenance work planned for the near future.
While waiting for the necessary maintenance work to be performed, the operation of the substation has been adapted to reduce the stress on the weakened equipment. Until maintenance is performed, the limits for maximum amount of power flowing through the switchgear are now updated on a regular (and frequent) basis, based on the latest temperature readings from our sensors. Having access to live component state monitoring has also made the control centre able to make other changes to the operating pattern on the substation.
The control centre now continuously monitors the heat development in the substation, using the sensors from this pilot project. The new data has thus not only helped discovered an incipient failure in a critical component, it also allows us to keep operating the substation in a safe and controlled way while we are waiting for an opportunity to repair or replace the troublesome components.
Lessons Learned in the Field
Under optimal conditions, the wireless range, i.e. the maximum distance between sensors and cloud connectors, is 1000 meters with line of sight. Indoor, signals are reliably transmitted over ranges of 20+ meters in normal mode when the conditions are favorable. A weak signal will make the sensor transmit in “Boost mode”, and this quickly drains the battery.
High-voltage power transformer are huge oil-filled steel constructions, often surrounded by thick concrete blast walls. We quickly learned that these are not the best conditions for low-power wireless signal transfer. When attaching the sensors to the main body of the transformer, we observed that the communication distance was reduced to less than 10 meters and required line-of-sight between the sensors and the cloud connector.
One reason for the short transmission distance is that the metal body of the transformer absorbs most of the RF signal from the sensor. Disruptive therefore adviced us to use a bracket so that the sensors could be mounted at a 90 degree angle to the surface when mounting the sensors on large metal bodies. We used Lego blocks for this purpose. Disruptive have since developed a number of range extenders that are arguably better-looking.
Lego bracket vs. surface range extender. If you roll your own, keep in mind where on the sensor the actual temperature measurement is made, and rotate it accordingly. On the sensors in our Pilot Kit, this was the upper right corner of the device (not in the corner where the small orange dot is located).
Although we did experience an improvement in signal transmission range when mounting the sensors at an angle to the transformer body, sending signals around the corner of the transformer still turned out to be very challenging.
Disruptive have yet to develop a ruggedized cloud connector. The need to position the cloud connector such that line of sight was maintained, limited our ability to place the cloud connector inside existing shelters such as fuse cabinets. We therefore developed a weather-proof housing for outdoor use, so we could position the cloud connectors for optimal transmission conditions.
A weather-proof housing allowed us to position the cloud connectors for optimal transmission conditions.
All sensors have their device ID printed on the device itself. However, the text is tiny and the ID is long and complex. We opted to put a short label on each sensor using a magic marker in order to simplify the deployment process in the field. This simplistic approach was satisfactory for our pilot project, and required minimal support system development, but obviously does not scale to larger deployments.
As mentioned above, we have deployed a number of sensors directly on energized, unisolated, 300 kV components. We were quite curious to see how the sensors would cope with being mounted directly on high-voltage equipment. So far, the measurements from the sensors seem to be unaffected by the voltage. However, we have lost about 25 % of these sensors in less than a year due to high battery drainage. We suspect that this may be related to the environment in which they operate, but it may also be bad luck or related to the fact that our sensors are part of a pre-production pilot kit.
Finally, sticking coin-sized sensors onto metal surfaces is easy in the summer. It’s not always equally easy on rainy days or in winter, with cold fingers and icy or snow-covered components.
Other Things we Learned Along the Way
So far, our impression is the the data quality is good. The sensor readings are precise, and the sensors are mostly reliable. The snapshot from Grafana gives an impression of the data quality: as can be seen, there is very good correspondence between the temperature readings from the three different phases on busbar A after switching, when it has been disconnected.
However, both the cloud connectors and the SaaS-based architecture are weak spots where redundancy is limited. If a cloud connector fails, we risk loosing data from all the sensors communicating with that connector. This can to some extent be alleviated by using more cloud connectors. The SaaS architecture is more challenging: downtime on Disruptive’s servers sometimes affects the entire data flow from all their sensors.
Deployment is super easy, but an automated link to our ERP system would ease installation further, and significantly reduce the risk of human error when mapping sensors to assets.
The sensors are very small. This is cool, but if they were 10x larger with 10x stronger wireless signal, this would probably be better for many of our use cases.
Finally, communication can be challenging when dealing with large metal and concrete constructions. This goes for both the communication between sensor and cloud connector, and for the link between the cloud connector and the outside world. This is in most cases solvable, but may require some additional effort during installation.
Next Step: Monitor All Ageing GIS Facilities?
This pilot project has demonstrated that increased component monitoring can have a high value in the transmission grid.
Our main focus has been on temperature monitoring to assess component health, but the project has spurred quite a lot of enthusiasm in Statnett, and a number of other application areas have been suggested to us during the project.
One of the prime candidates for further rollout is to increase the monitoring of GIS substations, either by adding sensors to other parts of the facility, by selecting other substations for instrumentation, or both.
A further deployment of sensors must be aligned with other ongoing activities at Statnett, such as rollout of improved wireless communication at the substations. Nonetheless, we have learned that there are many valuable use cases for retrofitting sensors to the transmission grid, and we expect to take advantage of this kind of technology in the years to come.
If your goal is a more data driven organization, a group of six people in the Data Science department cannot do the task alone. In this post we describe how we developed a data science academy where a group of colleagues attended three months of training in data science.
How it all started
A few years ago, we established a Data Science department in Statnett. Being the transmission system operator of the Norwegian power system, Statnett operates approximately 11,000 km of power lines and cables, with about 150 substations. Every single day we gather vast amounts of data from our assets in the power system, and our goal is to enable more data-driven decisions.
We have made progress working together with core departments in Statnett to develop tools and machine learning models that solve advanced challenges. But we quickly realized that to become a more data-driven organization we needed to develop new skills throughout the company.
Luckily, we had a lot of colleagues who wanted to learn. Our Python classes were full, people attended our lectures and tested their skills on code kata’s.
But that wasn’t enough. A 2-3-hour course might be inspiring, but to truly be able to use data for daily analytical tasks our colleagues asked for more training. An idea was born.
Developing relevant data science skills
We could have chosen another strategy – to concentrate the data skills around the Data Science department. Issues around data quality and securing the right interpretation of large data sets would perhaps be more easily handled that way, but we truly believed in a decentralized system. By installing the right skills and tools in the core business we could get more value and scale faster.
We set an ambitious goal. A group of employees in Statnett would attend a full time Data Science academy for three months. They would have to leave their regular tasks behind and be fully commited to the classes.
The HR department supported the whole process, including the selection of employees. The application was open for everyone, but they needed support from their leaders. This was an important aspect, as we wanted the department of the attendee to commit to the program as well. There is no use in learning new skills if you are expected to go back to the old habits after the program!
From Python to data wrangling
The content of the course was developed by us, and tightly linked to Statnett’s core business. That way we ensured a more interesting academy for our experienced colleagues. It quickly became clear to them how they could use data science tools in their daily work routine.
The academy was a mix of coding sessions, lectures followed by workshops, self-studies and group work. We focused on a group of important skills:
Basic programming, especially focusing on Python
Learning about and getting access to data sources, both inside Statnett and external
Data visualization and communicating results of data analysis
Data prep/data wrangling – the process of transforming raw data and gaining control over missing data
The classes were taught by employees in the Data Science department, in addition to shorter workshops and lectures by colleagues from across the company.
What we learned
Our goal was to utilize Statnett’s own data in all case studies and tasks. But at the beginning of the course we realized that some of the attendees needed more basic coding and data visualization skills, and we adjusted the progression. We focused more on the basic skills and waited till the end with the more advanced use cases from the organization.
This is one of the reasons why we believe it’s fundamental to build data skills through an internally developed program. We were able to evaluate consecutively and adjusted the program to ensure the right outcome.
Participants from different parts of the company combine their specific domain knowledge with their new data science skills to solve use cases related to their day-to-day work.
We still haven’t decided what the next academy will be like, but most certainly we will start with the basic skills and build on that. A possible way forward is to admit more employees from the departments that have started using the data science tools to the next academy, to further build on what’s working.
Long term effects of learning new skills
Our attendees are now able to use data science tools to solve their day to day tasks more effectively, so they can focus their creativity on the more demanding tasks. Our goal is to ensure that employees across Statnett are inspired to learn more about the possibilities in coding and data visualization.
A relevant example is the vast amount of Excel workbooks in use. They may have started as an effective way to solve certain tasks, but many have grown way past their reasonable size. These tasks can be more effectively solved through Python tools where you update data across data sets all at once, always ensuring that every user knows which version is the latest. The hope is that these quite simple examples will inspire more people across the organization to learn.
Our first group of attendees are devoted to sharing their knowledge. In addition to initiating projects related to their new skillset, our ambition is that they will also involve other colleagues interested in learning. The course administrator will still be available as a resource for our alumni, helping them putting their newly acquired skills into use in their daily work.
Better collaboration between IT and core business
A side effect of the academy is that data skills make you a better product owner. When employees in core business know more about the possibilities in programming, data visualization and machine learning, they can work better together with the IT department in further developing the products and services. In fact, this is an important aspect of digital transformation. Companies all over are looking to utilize their data assets in new ways. Some of this is best solved through large scale IT projects, but a lot of the business value is gained through simple data tools for continuous improvement.
What happens to the Data Science department when our colleagues can solve their own problems using data and coding? Well, our ambitions are to further build our capacity as a leader in our field. We just started working on a research and development project with reinforcement learning, and we will keep working on better methods and smarter use of data.
Our strategy was to build on talent from the business and hire fast learners when we started building a data science department at Statnett two years ago. Now, we have operated for a year with great achievements as well as setbacks and learning points. Read about our story in this post.
About two years ago I was tasked with building a data science team at Statnett, a government owned utility. Interesting data science tasks abounded, but most data science work was done by consultants or external parties as R&D projects. This approach has its merits, but there were two main reasons why I advocated building an in-house data science team:
Domain specific knowledge as well as knowing the ins and outs of the company is important to achieve speed and efficiency. An in-house data science team can do a proof of concept in the time it takes to hire an external team. And an external team often needs a lot of time to get to know the data sources and domain before they start delivering. It is also a burden for the few people who teach new consultants the same stuff time and again, they can become a bottleneck.
Data science is about making the company more data driven, it is an integral part of developing our core business. If we are to succeed, we must change how people work. I believe this is easier to achieve if you are intimately familiar with the business and are in it for the long haul.
But how to build a thriving data science department from scratch? Attracting top talent can be tough, especially so when you lack a strong professional environment where your new hires can learn and develop their skills.
Build on talent from the business
Drew Conway’s Venn Diagram The primary colors of data: hacking skills, math and stats knowledge, and substantive expertise
Prior to establishing the data science team, I lead a team of power system analysts. These analysts work in the department for power system planning, which has strong traditions for working data driven. Back in 2008 we adopted Python to automate our work with simple scripts; we didn’t use version control or write tests. Our use of Python grew steadily and eventually we hired a software development consultant to aid in the development and help us establish better software development practices.
This was long before I was aware that there was anything called data science and machine learning. However, the skillset we developed – a mix of coding skills, domain expertise and math – is just what Drew Conway described as the “primary colors” of data science in his now famous Venn diagram from 2010.
Even the Economist commented on the rise of Python in a 2018 article.
A background as an analyst with python skills proved to be a very good starting point for doing data science. The first two data scientists in Statnett came from the power system analysis department. I guess we were lucky to have started coding python back in 2008, as this coincided perfectly with the rise of python as the programming language for doing data science.
Still, most large companies have some sort of business analyst function, and this is a good place to find candidates for data science jobs. Even if they are unfamiliar with (advanced) coding, you get people that already know the business and should know basic math and statistics as well.
Hire for aptitude rather than skill
Having two competent analysts to kickstart the data science department also vastly improved our recruitment position. In my experience it is important for potential hires to know that they will have competent colleagues and a good work environment for learning. Therefore, Øystein, one of the “analyst-turned-data scientists”, attended all interviews and helped develop our recruitment strategy.
Basically, our strategy was to hire and develop fast learners, albeit with some background in at least two of the “bubbles” from the Venn diagram. Preferably the candidates should have complementary backgrounds. E.g. some strong coders, some more on the statistics side and some with a background in power systems.
We deliberately didn’t set any hard skill requirements for applicants. Requiring experience in machine learning or power systems would have narrowed the field down too much. Also, the technical side of data science is in rapid development. What is in vogue today, will be outdated in a year. So, it makes more sense to hire for aptitude rather than look for certifications and long experience with some particular technology. This emphasis on learning is also more attractive for the innovative talent that you need in a data science position.
Use interview-cases with scoring to avoid bias
During the first four months we conducted about 60 interviews with potential candidates. I had good experience using tests and cases in previous interview situations. First off, this forces you to be very specific about what you are looking for in a candidate and how to evaluate this. Secondly, it helps to avoid bias if you can develop an objective scoring system and stick to it. Thirdly, a good selection of tailored cases (not standard examples from the internet) serves to show off some interesting tasks that are present in the company. We used the ensuing discussions to talk about how we approached the cases in our company, thus advertising our own competence and work environment.
Such a process is not without flaws. Testing and cases can put some people off, and you are at risk of evaluating how candidates deal with a stressful test situation rather than how they will perform at work. Also, expect to spend a lot of time recruiting. I still think the benefits clearly outweigh the downsides. (Btw, maybe an aspiring data scientist should expect a data drive recruitment process).
We ended up with a set of tasks and cases covering:
Basic statistics/logic
Evaluate a data science project proposal
How to communicate a message with a figure
Statistical data visualization
Coding skills test (in language of choice)
Standardized aptitude tests
We did two rounds of interviews and spent about 4 hours in total per candidate for the full test set. In addition, some of the cases were prepared beforehand by the candidate. The cases separated well and in retrospect there was little need for adjustments. In any case, it would have been problematic to change anything once we started: How you explain the tasks, what hints you give etc. should be consistent from candidate to candidate.
Focus on a few tasks at a time
The first year of operation has necessarily been one of learning. Everyone has had to learn the ropes of our new infrastructure. We have recently started using a Hadoop stack for our streaming data and data at rest. And we have been working hard to get our CI/CD pipelines working and learning how to deploy containerized data science application on Open Shift.
Even though the team has spent a lot of time learning the technology and domain, this was as expected. I would say the major learning point from the first year is one of prioritization: We took on too many tasks at once and we underestimated the effort of getting applications into production.
As I stated initially, interesting data science tasks abound, and it is hard to say no to a difficult problem if you are a data scientist. When you couple this with a propensity to underestimate the effort needed to get applications into production (especially on new infrastructure for the first time), you have a recipe for unfinished work and inefficient task switching.
So, we have made adjustments along the way to focus on fewer tasks and taking the time needed to complete projects and get them in production. However, this is a constant battle; there are always new proposals/needs/problems from the business side that are hard to say no to. At times, having a team of competent data scientists feels like a double-edged sword because there are so many interesting problems left unsolved. Setting clear goals and priorities, as simple as that sounds, is probably where I can improve the most going forward.
Get strong business involvement
I would also stress that it is paramount to have strong presence from the business side to get your prioritizations right and to solve the right problem. This holds for all digital transformation, when in my experience, the goal often has to be adjusted along the way. The most rewarding form of work is when our deliveries are continuously asked for and the results critiqued and questioned.
On the other hand, one of the goals with the data science department was to challenge the business side to adopt new data driven practices. This might entail doing proof of concepts that haven’t been asked for by the business side to prove that another way of operating is viable.
It can be difficult to strike the right balance between the two forms of work. In retrospect we spent too much time initiating our own pilot projects and proof of concepts in the beginning. While they provide ample learning opportunities – which was important during our first year – it is hard to get such work implemented and it is a real risk that you solve the wrong problem. A rough estimate therefore is that about 80 % of our work should be initiated by and led by the business units, with the representative from the data science team being an active participator, always looking for better ways to solve the issues at hand.
Scaling data science competency
From the outset, we knew that there would be more work than a central data science department could take on. Also, to become a more data-driven organization, we needed to develop new skills and a new mindset throughout the company, not only in one central department.
Initially we started holding introductory Python classes and data science discussions. We had a lot of colleagues who were eager to learn, but after a while it became clear that this approach had its limits. The lectures and classes were inspiring, but for the majority it wasn’t enough to change they way they work.
That’s why we developed a data science academy: Ten employees leave their work behind for three months and spend all their time learning about data science. The lectures and cases are tailored to be relevant for Statnett and held mainly by data scientists. The hope is that the attendees learn enough to spread data science competence in their own department (with help from the us). In such a way we hope to spread data science competency across the company.
What have we achieved?
At times I have felt that progress was slow. Challenges with data quality have been ubiquitous. Learning unfamiliar technology by trial and error has brought development to a standstill. Very often, projects that we think are close to production ready have weeks left of work. This has coined the term “ten minutes left” meaning an unspecified amount of work, often weeks or months left.
But looking back, the accomplishments become clearer. There is a saying that “people overestimate what can be done in the short term, and underestimate what can be done in the long term” which applies here as well.
Some of the work that we have done builds on ideas from our time working on reliability analysis in power systems and is covered here on this blog:
We have also spent a considerable effort on communication, both to teach data science to new audiences at Statnett, exchange knowledge and to promote all the interesting work that happens at Statnett. This blog, scientific papers, our data science academy and talks at meetups and conferences are examples of this.
Load forecasts are important input to operating the power system, especially as renewable energy sources become a bigger part of our power mix and new electrical interconnectors link the Norwegian power system to neighbouring systems. That is why we have been developing machine learning models to make predictions for power consumption. This post describes the process for developing models as well as how we have deployed the models to provide online predictions.
Load forecasting, also referred to as consumption prognosis, combines techniques from machine learning with knowledge of the power system. It’s a task that fits like a glove for the Data Science team at Statnett and one we have been working on for the last months.
We started with a three week long Proof of Concept to see if we could develop good load forecasting models and evaluate the potential of improving the models. In this period, we only worked on offline predictions, and evaluated our results against past power consumption.
The results from the PoC were good, so we were able to continue the work. The next step was to find a way to go from offline predictions to making online predictions based on near real time data from our streaming platform. We also needed to do quite a bit of refactoring and testing of the code and to make it production ready.
After a few months of work, we have managed to build an API to provide online predictions from our scikit learn and Tensorflow models, and deploy our API as a container application on OpenShift.
Good forecasts are necessary to operate a green power system
Balancing power consumption and power production is one of the main responsibilities of the National Control Centre in Statnett. A good load forecast is essential to obtain this balance.
The balancing services in the Nordic countries consist of automatic Frequency Restoration Reserves (aFRR) and manual Frequency Restoration Reserves (mFRR). These are activated to contain and restore the frequency. Until the Frequency Restoration Reserves have been activated, the inertia (rotating mass) response and system damping reduce the frequency drop below unacceptable levels.
The growth in renewable energy sources alters the generation mix; in periods a large part of the generators will produce power without simultaneously providing inertia. Furthermore, import on HVDC-lines do not contribute with inertia. Therefore, situations with large imports to the Nordic system can result in insufficient amount of inertia in the system. A power system with low inertia will be more sensitive to disturbances, i.e. have smaller margins for keeping frequency stability.
That is why Statnett is developing a new balancing concept called Modernized ACE (Area Control Error) together with the Swedish TSO Svenska kraftnät. Online prediction of power consumption is an integral part of this concept and is the reason we are working on this problem.
Can we beat the existing forecast?
Statnett already has load forecasts. The forecasts used today are proprietary software, so they are truly black box algorithms. If a model performs poorly in a particular time period, or in a specific area, confidence in the model can be hard to restore. In addition, there is a need for predictions for longer horizons than the current software is providing.
We wanted to see if we could measure up to the current software, so we did a three week stunt, hacking away and trying to create the best possible models in the limited time frame. We focused on exploring different features and models as well as evaluating the results to compare with the current algorithm.
At the end of this stunt, we had developed two models: one simple model and one complex black box model. We had collected five years of historical data for training and testing, and used the final year for testing of the model performance.
Base features from open data sources
The current models provide predictions for the next 48 hours for each of the five bidding zones in Norway. Our strategy is to train one model for each of the zones. All the data sources we have used so far are openly available:
weather observations can be downloaded from the Norwegian Meteorological Institutes’s Frost API. We retrieve the parameters wind speed and air temperature from the observations endpoint, for specific weather stations that we have chosen for each bidding zone.
weather forecasts are available from the Norwegian Meteorological Institute’s thredds data server. There are several different forecasts, available as grid data, along with further explanation of the different forecasts and the post processing of parameters. We use the post processed unperturbed member (the control run) of the MetCoOp Ensemble Prediction System (MEPS), the meps_det_pp_2_5km-files. From the datasets we extract wind speed and air temperature at 2 m height for the closest points to the weather stations we have chosen for each bidding zone.
historical hourly consumption per bidding area is available from Nord Pool
The five bidding zones in Norway
Among the features of the models were:
weather observations and forecasts of temperature and wind speed from different stations in the bidding zones
weather parameters derived from observations and forecasts, such as windchill
historical consumption and derived features such as linear trend, cosine yearly trend, square of consumption and so on
calendar features, such as whether or not the day is a holiday, a weekday, the hour of the day and so on
Prediction horizon
As mentioned, we need to provide load forecasts for the next 48 hours, to replace the current software. We have experimented with two different strategies:
letting one model predict the entire load time series
train one model for each hour in the prediction horizon, that is one model for the next hour, one model for two hours from now and so on.
Our experience so far is that one model per time step performs better, but of course it takes more time to train, and is more complex to handle. This explodes if the frequency of the time series is increased or the horizon is increased: perhaps this will be too complex for a week ahead load forecast, resulting in 168 models for each bidding zone.
Choosing a simple model for correlated features
We did some initial experiments using simple linear regression, but quickly experienced the problem of bloated weights: The model had one feature with a weight of about one billion, and another with a weight of about minus one billion. This resulted in predictions in the right order of magnitude, but with potential to explode if one feature varied more than usual.
Ridge regression is a form of regularization of a linear regression model, which is designed to mitigate this problem. For correlated features, the unbiased estimates for least squares linear regression have a high variance. This can lead to bloated weights making the prediction unstable.
In ridge regression, we penalize large weights, thereby introducing a bias in the estimates, but hopefully reducing the variance sufficiently to produce reliable forecasts. In linear least squares we find the weights minimizing the error. In ridge regression, we penalize the error function for having large weights. That is we add a term in proportion to the sum square of the weights x1 and x2 to the error. This extra penalty can be illustrated with circles in the plane, the farther you are from the origin, the bigger the penalty.
In ridge regression, the idea is to minimize the error, but penalize large weight values. Larger weights correspond to larger circles in the plane.
For our Ridge regression model, we used the Ridge class from the Python library scikit learn.
Choosing a neural network that doesn’t forget
Traditional neural networks are unable to remember information from earlier time steps, but recurrent neural networks (RNNs) are designed to be able to remember from earlier time steps. Unlike the simpler feed-forward networks where information can only propagate in one direction, RNNs introduce loops to pick up the information from earlier time steps. An analogy for this is being able to see a series of images as a video, as opposed to considering each image separately. The loops of the RNN is unrolled to form the figure below, showing the structure of the repeating modules:
An RNN contains a single network layer, represented by the blue box, where information from the previous module is incorporated.
In practice, RNNs are only able to learn dependencies from a short time period. Long short-term memory networks, LSTMs are a type of RNNs that use the same concept of loops as RNNs, but the value lies in how the repeating module is structured.
One important distinction from RNNs is that the LSTM has a cell state, that flows through the module only affected by linear operations. Therefore, the LSTM network doesn’t “forget” the long term dependencies.
The repeating layer in an LSTM network, where the state propagates through the module, shown by the grey line. The blue boxes are network layers and blue circles are pointwise operations.
For a thorough explanation of LSTM, check out this popular blog post by Chris Olah: Understanding LSTMs.
Results
To evaluate the models, we used mean absolute percentage error, MAPE, which is a common metric used in load forecasting. Our proof of concept models outperformed the current software when we used one year of test data, reducing the MAPE by 28% on average over all bidding zones in the forecasts.
The average MAPE over all bidding zones was reduced from 5.1 % to 4.0 % when forecasting one day ahead (labeled D-1) and from 6.2 % to 4.2 % when forecasting two days ahead (labeled D-2).
We thought this was quite good for such a short PoC, especially considering that the test data was an entire year. However, we see that for the bidding zone NO1 (Østlandet), where the power consumption is largest, we need to improve our models.
How to evaluate fairly when using weather forecasts in features
We trained the models using only weather observations, but when we evaluated performance, we used weather forecasts for time steps after the prediction time. In the online prediction situation, observations are available for the past, but predictions must be used for the future. To compare our models with the current model, we must simulate the same conditions.
We have also considered training our model using forecasts, but we only have access to about a year of forecasts, therefore we have chosen to train on observations. Another possible side effect of training on forecasts, is the possibility of learning the strengths and weaknesses of a specific forecasting model from The Norwegian Meteorological Institute. This is not desirable, as we would have to re-learn the patterns when a new model is deployed. Also, the model version (from the Meteorological Institute) is not available as of now, so we do not know when the model needs re-training.
Our most recent load forecast
By now we have made some adjustments to the model, and our forecasts for NO1 from the LSTM model and Ridge regression model are shown below. Results are stored in a PostgreSQL database and we have built a Grafana dashboard visualizing results.
The plot shows our newest load forecasts for the area NO1 (Østlandet) as of 20.12 17:00. For the hours in the past, the forecast from the hour before is shown. The shaded areas are the minimum and maximum values the previous forecasts have been, the bold green line is the actual consumption. The dashed line shows the forecast from 24 hours ago.
How to deploy real time ML models to OpenShift
The next step in our development process was to create a framework for delivering online predictions and deploy it. This also included refactoring the code from the PoC. We worked together with a development team to sketch out an architecture for fetching our data from different sources to delivering a prediction. The most important steps so far have been:
Developing and deploying the API early
Setting up a deployment pipeline that automatically deploys our application
Managing version control for the ML models
Develop and deploy the API early
“API first” development (or at least API early, as in our case) is often referenced as an addition to the twelve-factor app concepts, as it helps both client and server side define the direction of development. A process with contracts and requirements sounds like a waterfall process, but our experience from this project is that in reality, it supports an agile process as the requirements are adapted during development. Three of the benefits were:
API developers and API users can work in parallel. Starting with the API has let the client and server side continue work on both sides, as opposed to the client side waiting for the server side to implement the API. In reality, we altered the requirements as we developed the API and the client side implemented their use of it. This has been a great benefit, as we have uncovered problems, and we have been able to adjust the code early in the development phase.
Real time examples to end users. Starting early with the API allows us to demonstrate the models with real time examples to the operators who will be using the solution, before the end of model development. The operators will be able to start building experience and understanding of the forecasts, and give feedback early.
Brutally honest status reports. Since the results of the current models are available at any time, this provides a brutally honest status report of the quality of the models for anyone who wants to check. With the models running live on the test platform, we have some time to discover peculiarities of different models that we didn’t explore in the machine learning model development phase, or that occur in online prediction and not necessarily in the offline model development.
Separate concerns for the ML application and integration components
Our machine learning application provides an API that only performs predictions when a client sends a request with input data. Another option could be to have the application itself fetch the data and stream the results back to our streaming platform, Kafka.
We chose not to include fetching and publishing data inside our application in order to separate concerns of the different components in the architecture, and hopefully this will be robust to changes in the data sources.
The load forecasts will be extended to include Sweden as well. The specifics of how and where the models for Sweden will run are not decided, but our current architecture does not require the Swedish data input. Output is written back to Kafka topics on the same streaming platform.
The many components in our load forecast system architecture. The machine learning application only delivers predictions when a request is made.
The tasks we rely on are:
Fetching data from the different data sources we use. These data are published to our Kafka streaming platform.
Combine relevant data and make a request for a forecast when new data is available. A java application is scheduled each hour, when new data is available.
Make the predictions when requested. This is where the ML magic happens, and this is the application we are developing.
Publish the prediction response for real time applications. The java application that requests forecasts also publishes the results to Kafka. We imagine this is where the end-user applications will subscribe to data.
Store the prediction response for post-analysis and monitoring.
How to create an API for you ML models
Our machine learning application provides an API where users can supply the base features each model uses and receive the load forecast for the input data they provided. We created the API using the Python library Flask, a minimal web application framework.
Each algorithm runs as a separate instance of the application, on the same code base. Each of these algorithm instances has one endpoint for prediction. The endpoint takes the bidding zone as an input parameter and routes the request to the correct model.
Two tips for developing APIs
Modularize the app and use blueprint with Flask to prevent chaos. Flask makes it easy to quickly create a small application, but the downside is that it is also easy to create a messy application. After first creating an application that worked, we refactored the code to modularize. Blueprints are templates that allow us to create re-usable components.
Test your API using mocking techniques. As we refactored the code and tried writing good unit tests for the API, we used the mock library to mock out objects and functions so that we can separate which functionality the different tests test and run our tests faster.
Where should derived features be calculated?
We have chosen to calculated derived features (i.e. rolling averages of temperature, lagged temperature, calendar features and so on) within the model, based on the base features as input data. This could have been outside of the models, for example by the java application that integrates the ML and streaming, or by the Kafka producers. There are two main reasons why we have chosen to keep these calculations inside our ML API:
The feature data is mostly model specific, and lagged data causes data duplication. This is not very useful for anyone outside of the specific model, and therefore it doesn’t make sense to create separate Kafka topics for these data or calculate them outside of the application.
Since we developed the API at an early stage, we had (and still have) some feature engineering to do, and we did not want to be dependent on someone else to provide all the different features that we wanted to explore before we could deploy a new model.
Class hierarchy connects general operators to specific feature definitions
The calculations of derived features are modularized in a feature operator module to ensure that the same operations are done in the same way everywhere. The link that applies these feature operators to the base features is implemented using class methods in a class hierarchy.
We have created a class hierarchy as illustrated in the figure above. The top class, ForecastModels, contains class methods for applying the feature operators to the input data in order to obtain the feature data frame. Which base features and which feature operators we want to be applied to the features are specified in the child class.
The child classes at the very bottom, LstmModel and RidgeRegression, inherit from their respective Keras and scikit learn classes as well, providing the functionality for the prediction and training.
As Tensorflow models have a few specific requirements, such as normalising the features, this is implemented as a class method for the class TensorFlowModels. In addition, the LSTM models require persisting the state for the next prediction. This is handled by class methods in the LstmModel class.
Establish automatic deployment of the application
Our deployment pipeline consists of two steps, which run each time we push to our version control system:
run tests
deploy the application to the test environment if all tests pass
We use GitLab for version control, and have set up our pipeline using the integrated continuous integration/continuous deployment pipelines. GitLab also integrates nicely with the container platform we use, OpenShift, allowing for automatic deployment of our models through a POST request to the OpenShift API, which is part of our pipeline.
Automatic deployment has been crucial in the collaboration with the developers who make the integrations between the data sources and our machine learning framework, as it lets us test different solutions easily and get results of improvements immediately. Of course, we also get immediate feedback when we make mistakes, and then we can hopefully fix it while we remember what the breaking change was.
How we deploy our application on OpenShift
Luckily, we had colleagues who were a few steps ahead of us and had just deployed a Flask application to OpenShift. We used the source to image-build strategy, which builds a Docker image from source code, given a git repository. The Python source to image image builder is even compatible with the environment management package we have used, Pipenv, and automatically starts the container running the script app.py. This allowed us to deploy our application without any adaptions to our code base other that setting an environment variable to enable Pipenv. The source-to-image strategy was perfect for the simple use case, but we are now going towards a Docker build strategy because we need more options in the build process.
For the LSTM model, we also needed to find a way to store the state, independently of deployments of the application. As a first step, we have chosen to store the states as flat files in a persistent storage, which is stored independently of new deployments of our application.
How do we get from test to production?
For deployment from the production to the test environment, we will set up a manual step after verifying that everything is working as expected in the test environment. The plan as of now is to publish the image running in the test environment in OpenShift to Artifactory. Artifactory is a repository manager we use for hosting internally developed Python packages and Docker images that our OpenShift deployments can subscribe to. The applications running in the OpenShift production environment can then be deployed from the updated image.
Flask famously warns us not to use it in production because it doesn’t scale. In our setting, one Flask application will receive one request per hour per bidding zone, so the problems should not be too big. However, if need be, we plan to use the Twisted web server for our Flask application. We have used Twisted previously when deploying a Dash web application, which is also based on Flask, and thought it worked well without too much effort.
Future improvements
Choosing the appropriate evaluation metric for models
In order to create a forecast that helps the balancing operators as much as possible, we need to optimize our models using a metric that indicates their need. For example, what is most important: Predicting the level of the top load in the day, or predicting the time the top load occurs on? Is it more important to predict the outliers correctly or should we focus on the overall level? This must be decided in close cooperation with the operators, illustrating how different models perform in the general and in the special case.
Understanding model behavior
Understanding model behavior is important both in developing models and to build trust in a model. We are looking into the SHAP framework, which combines Shapley values and LIME to explain model predictions. The framework provides plots that we think will be very useful for both balancing operators and data scientists.
Including uncertainty in predictions
There is of course uncertainty in the weather forecasts and the Norwegian Meteorological Institute provides ensemble forecasts. These 10 forecasts are obtained by using the same model, but perturbing the initial state, thereby illustrating uncertainty in the forecast. For certain applications of the load forecast, it could be helpful to get an indication of the uncertainty in the load forecast, for example by using these ensemble forecast to create different predictions, and illustrating the spread.
An important aspect of digital transformation for a utility company is to make data-driven decisions about risk. In electric power systems, the hardest part in almost any decision is to quantify the reliability of supply. That is why we have been working on methodology, data quality and simulation tools with this in mind for several years at Statnett. This post describes our simulation tool for long-term probabilistic reliability analysis called MONSTER.
The need for reliable power system services is ever increasing in our system as technology advances. At the same time, the focus on carefully analysed and socio economic profitable investments has perhaps never been more essential. In light of this, several initiatives focus on having probabilistic criteria as the frame for doing reliability analysis. This post describes our Monte Carlo simulation tool for probabilistic power system reliability analysis that we have dubbed MONSTER.
Here I’ll repeat the basics, ingnore most of the maths and expand on some details that have been left out in the paper.
Our tool consist of two main parts: AÂ Monte Carlo simulation tool and a Contingency analysis module. The main idea in the simulation part is to use hourly times series of failure probability for each component.
We do Monte Carlo simulations of the power system where we draw component failure times and durations, and then let the Contingency analysis module calculate the consequence of each of these failures. The Contingency evaluation module is essentially a power system simulation using optimal loadflow to find the appropriate remedial actions, and the resulting “lost load” (if any).
The end result of the whole simulation is typically expressed as expected loss of load, expected socio-economic costs or distributions of the same parameters. Such results can, for instance, be used to make investment decisions for new transmission capacity.
Time series manager
The main principle of the Time series manager is to first use failure observations to calculate individual failure rates for each component by using a Bayesian updating scheme, and then to distribute the failure rates in time by using historical weather data.  We have described the procedure in detail in a previous blog post.
We first calculate individual, annual failure rates using observed failure data in the power system. Then for each component we calculate an hourly time series for the probability of failure. The core step in this approach is to calculate the failure probabilities based on reanalysis weather data such that the resulting expected number of failures is consistent with the long term annual failure rate calculated in the Bayesian step. This is about spreading the failure rate in time such that we get one probability of failure at each time step. In this way, components that experience the same adverse weather will have elevated probabilities of failure at the same time. This phenomenon is called failure bunching and it’s effect is illustrated in the figure below. In fact we will get a consistent and realistic geographical and temporal development of failure probabilities throughout the simulation period.
Failure bunching dramatically increases the probability of multiple independent failures happening at the same time.
In our model we divide each fault into eight different categories: Either the fault is temporary or Permanent, and each of the failures can be due to either wind, lightning, snow/icing or unrelated to weather. Thus, for each component we get eight historical time series to be used in the simulations.
Biases in the fault statistics
There are two data sources for this part of the simulation. Observed failure data from the national fault statistics database FASIT, and approximately 30 years of reanalysis weather data calculated by Kjeller Vindteknikk. The reanalysis dataset is nice to work with from a data science perspective. As it is simulated data, there are no missing values, no outliers, uniform data quality etc.
The fault statistics, however, has lots of free text columns, missing values and inconsistent reporting (done by different people across the nation over two decades). So a lot of data prep is necessary to go from reported disturbances to statistics relevant for simulation. Also, there are some biases in the data to be aware of:
There is a bias towards too long repair/outage-times. This happens because most failures don’t lead to any critical consequences, therefore we spend more time than strictly necessary repairing the component or otherwise restoring normal operation. For our simulation we want the repair time we would have needed in case of a critical failure. We have remedied this to some extent by eliminating some of the more extreme outliers in our data.
There is a bias towards reporting too low failure rates; only failures that lead to an immediate disturbance in the grid count as failures in FASIT. Failures that lead to controlled disconnections are not counted as failures in this database, but are relevant for our simulations if the disconnection is forced (as opposed to postponable).
The two biases counteract each other, in addition this kind of epistemic uncertainty is possible to mitigate by doing sensitivity analysis.
Improving the input-data: Including asset health
Currently, we don’t use component specific failure rates. For example, all circuit breakers have the same failure rate regardless of age, brand, operating history etc. For long term simulations (like new power line investment decisions) this is a fair assumption as components are regularly maintained and swapped out as time passes by.
The accuracy of the simulations, especially on short to medium term, can be greatly improved by including some sort of component specific failure rate. Indeed, this is an important use-case in and by itself for risk based asset maintenance.
Monte Carlo simulation
A Monte Carlo simulation consist of a large number of random possible/alternative realizations of the world, typically on the order of 10000 or more. In our tool, we do Monte Carlo simulations for the time period where we have weather data, denoted by k years, where k~=30. For each simulation, we first draw the number N of failures in this period for each component and each failure type. We do this by using the negative binomial distribution. Then we distribute these failures in time by drawing with replacement from the k∗8760 hours in the simulation period such that the probability of drawing any particular hour is in proportion to the time series of probabilities calculated earlier. As the time resolution for the probabilities is hourly, we also draw a uniform random value of the failure start time within an hour. Then finally, for each of these failures we draw random duration of outages:
[code]
for each simulation do
for each component and failure type in study do
draw the number N of failures
draw N failures with replacement
for each failure do
draw random uniform start time within hour
draw duration of failure
[/code]
Convergence of the most and least frequent single- and double failure rates for the first 1000 simulations.
Outage manager
In addition to drawing random errors, we also have to incorporate maintenance planning. This could for example be done either as an explicit manual plan or through an optimization routine that distributes the services according to needs and probabilities of failure.
How detailed the maintenance plan is modeled, typically depends on the specific area of study. In some areas, the concern is primarily of a failure occurring during a few peak load hours when margins are very tight. At such times it is safe to assume that there is no scheduled maintenance – and we disregard it entirely in the model. In other areas, the critical period might be when certain lines are out for maintenance, and so it becomes important to model maintenance realistically.
For each simulation, we step through points in time where something happens and collect which components are disconnected. In this part we also add points in time where the outage pass some predefined duration, called outage phase in our terminology. We do this to be able to apply different remedial measures depending on how much time has passed since the failure occurred. The figure below illustrates the basic principles.
Collecting components and phases to contingency list. Dashed line indicate phase 1 and solid line phase 2. In this example the two outages contribute to five different contingencies. In phase 1, only automatic remedial measures (like system protection) can be applied. In phase 2, manual measures, like activation of tertiary reserves can be applied.
One important assumption is that each contingency (like the five contingencies in the figure above) can be evaluated independently. This makes the contingency evaluation, which is the computationally heavy part, embarrassingly parallel. Finally, we eliminate all redundant contingencies. We end up with a pool of unique contingencies that we send to the Contingency Analysis module for load flow calculations in order to find interrupted power.
Contingency analysis
Our task in the Contingency analysis module is to mimic the system operator’s response to failures. This is done in four steps:
Fast AC-screening to filter out contingencies that don’t result in any thermal overloads. If overloads, proceed to next step.
Solve the initial AC load-flow, if convergence move to next step. In case of divergence (due to voltage collapse) attempt to emulate the effect of voltage collapse with a heuristic, and report the consequence.
Apply remedial measures with DC-optimal load-flow, report all measures applied and go to next step.
Shed load iteratively using AC load-flow until no thermal overloads remain. Report lost load.
In step 3 above, there are many different remedial measures that can be applied:
Reconfiguration of the topology of the system
Reschedule production from one generator to another
Reduce or increase production in isolation
Change the load at given buses
For the time being, we do not consider voltage or stability problems in our model apart from the rudimentary emulation of full voltage collapse. However, it is possible to add system protection to shed load for certain contingencies (which is how voltage stability problems are handled in real life).
As a result from the contingency analysis, we get the cost of applying each measure and the interrupted power associated with each contingency.
Load flow models and OPF algorithm
Currently we use Siemens PTI PSS/E to hold our model files and to solve the AC load flow in steps 1,2 and 4 above. For the DC optimal load-flow we express the problem with power transfer distribution factors (PTDF) and solve it as a mixed integer linear program using a commercial solver.
Cost evaluation – post-processing
Finally, having established the consequence of each failure, we calculate the socio-economic costs associated with each interruption. This is done by following the Norwegian regulation. The general principle is described in this 2008 paper, but the interruption costs have been updated since then. For each failure, we know where in the grid there are disconnections, for how long they last, and assuming we know the type of users connected to the buses in our load-flow model, the reliability indices for different buses will be easy to calculate. Due to the Monte Carlo approach, we will get a distribution of each of the reliability indices.
Result example: Failure rate and lost load averaged over alle simulation for some contingencies (black dots).
A long and winding development history
MONSTER has been in development for several years at Statnett after we did an initial study and found that no commercial tools where available that fit the bil. At first we developed two separate tools: one Monte Carlo simulation tool in Matlab, used to find the probability of multiple contingencies, and an Python AC contingency evaluation module built on top of PSS/E, used to do contingency analysis with remedial measures. Both tools were built to solve specific tasks in the power system analysis department. However, combining the tools was a cumbersome process that involved large spreadsheets.
At some point, we decided to do a complete rewrite of the code. The rewriting process has thought us much of what we know today about modern coding practices (like 12-factor apps), and it has helped build the data science team at Statnett. Now, the backend is written in python, the front end is a web-app built around node.js and vue, and we use a SQL database to persist settings and results.
A screenshot from the application where the user defines substation configuration.