Is bid filtering effective against network congestion?

Earlier this year, I wrote an introduction to the bid filtering problem, and explained how my team at Statnett are trying to solve it. The system we’ve built at Statnett combines data from various sources in its attempt to make the right call. But how well is it doing its job? Or, more precisely, what is the effect on network congestion of applying our bid filtering system in its current form?

Kyoto. Photo: Belle Co

Without calling it a definitive answer, a paper I wrote for the CIGRE Symposium contains research results that provide new insight. The symposium was in Kyoto, but a diverse list of reasons (including a strict midwife) forced me to leave the cherry blossom to my imagination and test my charming Japanese phrases from a meeting room in Trondheim.

A quick recap

European countries are moving toward a new, more integrated way of balancing their power systems. In a country with highly distributed electricity generation, we want to automatically identify power reserves that should not be used in a given situation due to their location in the grid. If you would like to learn the details about the approach, you are likely to enjoy reading the paper. Here is the micro-version:

To identify bids in problematic locations, we need a detailed network model, we try to predict the future situation in the power grid, and then we apply a nodal market model which gives us the optimal plan for balancing activations for a specific situation. But since we don’t really know how much is going to flow into or out of the country, we optimize many times with different assumptions on cross-border flows. Each of the exchange scenarios tells its own story about which bids should -and shouldn’t- be activated. The scenarios don’t always agree, but in the aggregate they let us form a consensus result, determining which bids will be made unavailable for selection in the balancing market.

An unfair competition

Today, human operators at Statnett select power reserves for activation when necessary to balance the system, always mindful of their locations in the grid and potential bottlenecks. Their decisions on which balancing bids to activate – and not activate – often build years of operational experience and an abundance of real-time data.

Before discussing whether our machine can beat the human operators, it’s important to keep in mind that the bid filtering system will take part in a different context: the new balancing market, where everyday balancing will take place without the involvement of human operators. This will change the rules of the balancing game completely. While human operators constantly make a flow of integrated last-minute decisions, the new automatic processes are distinct in their separation of concerns and must often act much earlier to respect strict timelines.

Setting up simulations

The quantitative results in our paper come from simulating one day in the Norwegian power grid, using our detailed, custom-built Python model together with recorded data. The balancing actions -and the way they are selected- are different between the simulations.

The first simulation is Historical operation. Here, we simply replay the historical balancing decisions of the human operators.

The second simulation is Bid filtering. Here, we replace the historical human decisions with balancing actions selected by a zonal market mechanism that doesn’t see the internal network constraints or respect the laws of physics. The balancing decisions will often be different from the human ones in order to save some money. But before the market selects any bids, some of them are removed from the list by our bid filtering machine in order to prevent network congestion. We try not to cheat, the bid filtering takes place using data and forecasts available 30 minutes before the balancing actions take effect.

The third simulation is No filtering. Here we try to establish the impact on congestion of moving from today’s manual, but flexible operation to zonal, market-based balancing. This simulation is a parallel run of the market-based selection, but without pre-filtering any bids, and it provides a second, possibly more relevant benchmark.

Example from 09:30 on August 25, 2021. Red cells are balancing bids made unavailable in the bid filtering simulation. As a result, the market-based balancing will not select exactly the same bids in the Bid filtering scenario (black dots) and the No filtering scenario (white dots).

Power flow analyses

The interesting part of the simulation is when we inject the balancing decisions into the historical system state and calculate all power flows in the network. Comparing these flows to the operational limits reveals which balancing approaches are doing a better job at avoiding overloads in the network.

Example from 09:30 on August 25, 2021 showing reliability limits. Reliability limits in Norway restrict the flow on a combination of transmission lines, so-called Power Transfer Corridors (PTCs). These 13 PTC constraints are violated in one or more of the simulations.

The overloads are similar between the simulation, but they are not the same. To better understand the big picture, we created a congestion index that summarizes the resulting overload situation in a single value. The number doesn’t have any physical interpretation, but gives a relative indication of how severe the overload situation is.

Congestion index for reliability limits in the Norwegian system from August 25, 2021

When we run the simulation for 24 historical hours, we see that with market-based balancing, there would be overloads throughout the day. When we apply bid filtering and remove the bids expected to be problematic, overloads are reduced in 9 of the 24 hours, and we’re able to avoid the most serious problems in the afternoon.

No matter the balancing mechanism, the congestion index virtually never touches zero. Even the human operators with all their extra information and experience run into many of the same congestion problems. This shows that balancing activations play a role in the amount of congestion, but they are just one part of the story, along with several other factors.

With that in mind, if you’re going to let a zonal market mechanism decide your balancing decisions, it seems that bid filtering can have a clear, positive effect in reducing network overloads.

What do you think? Do you read the results differently? Don’t be afraid to get in touch, my team and I are always happy to discuss.

ありがとうございました

Automatic data quality validations with Great Expectations: An Introduction to DQVT

Hi, I’m Patrick, a Senior Data Engineer at Statnett. I’m happy to present some of our work that has proven useful recently: automatic validation of data quality.

We have created the Data Quality Validation Tool (DQVT), which helps us define the content of our datasets by testing it against a set of expectations on a regular basis. It is built on top of some cool open-source packages: Great Expectations, streamlit, FastAPI and D-Tale.

In this post, I will explain what DQVT actually does, and why we built it the way we did. But first, let me just mention why Statnett takes data quality so seriously.

Monitor your data assets

History has showed us that cascading blackouts of the power grid can result from a single failure, often caused by extreme weather conditions or a defective component. Statnett and other transmission system operators (TSOs) learn continuously from these failures, adapt to them and prepare against them in case these physical assets fail again. This is probably also true in your job as well. Everyone experiences failures, but not everyone is prepared.

Data quality is important in the same way. Not very long ago, data could be mere logs, archived in case you might need to dig into it once in a while. Today, complex automated flows of information are crucial in our decision processes. Just like defective physical assets, we need to understand that, at some point, unexpected data may break data pipelines, possibly with huge consequences. Statnett operates critical infrastructure for an entire country, and in this context, high-quality data isn’t just gold, it is necessary.

Always know what to expect from your data

The motto of Great Expectations hints at a basic, but beautiful principle. You prepare against data surprises by testing and documenting your data. And when data surprises do arise, you want to get notified quickly, and trigger a plan B, such as switching to an alternative data pipeline.

By analyzing your data, you can often figure out what kind of values (formats, ranges, rules etc.) you are supposed to get in the usual conditions, and how this might change over time. This data knowledge allows you to test periodically that you always get what you expected.

So, a great principle, and a great package. How did we make this work at Statnett?

Understanding what DQVT is

Like many organisations, Statnett uses lots of different data sources, some well known (Oracle/PostgreSQL databases, Kafka Streams, …) and others more domain-specific (IBM Big SQL instance, e-terra data platform, …). Needless to say, a concequence of this diversity is the abundance of data quality horror stories.

In order to understand our issues with data and improve the quality of our datasets, we wanted a dedicated tool able to

  1. profile and document the content of datasets stored in different data sources
  2. check the data periodically
  3. identify mismatch between the data and what we expect from it and
  4. help us include data quality checks in our data pipelines

So we built the Data Quality Validation Tool (DQVT).

It is not a data catalog. Rather, it aims at documenting what the content of a dataset is expected to look like. DQVT helps us define tests on the data, called expectations, which are turned into documentation (thanks to Great Expectations). DQVT validates these expectations on a regular basis and reports any mismatch between the data and its documentation. Finally, DQVT computes scores on data quality metrics defined through our internal data standard.

By filling these roles, DQVT takes us towards better data quality, and consequently also more reliable and more performant software systems.

The story of DQVT

Faced with several high-profile digitalization projects, Statnett recently ramped up its data quality initiatives. At the time, Python Bytes presented Great Expectations on episode #115 (we highly recommend this podcast if you are a Pythonista🐍).

We tested Great Expectations and became fans pretty quickly, for several reasons:

  • the simplicity of use: a command line interface providing guidance, supporting various types of SQL databases, Pandas and Spark.
  • a beautiful concept in line with development best practices (documentation-as-code). In the words of Great Expectations, tests are docs and docs are tests.
  • an extremely detailed user documentation
  • and an active and inclusive open source community (Slack, Discuss)

We were interested to see if this tool could help us monitor data quality on our own infrastructure at Statnett, which includes two particularly important platforms. We use the GitLab devops platform to host our code and provide continuous integration and deployment pipelines, and we use OpenShift as our on-premises Platform-as-a-Service to run and orchestrate our Docker containers, managed by Kubernetes.

The time came to build a proof-of-concept, and we started lean: a limited amount of time and resources to reduce technology risks. The main goals and scope were revolved aroupnd a handful of features and requirements:

The goal of our first demo was to document the content of our datasets, not what the columns and fields of a table are (that is the job of a data catalog), but what was expected from the values in these fields. We were also keen on having this documentation human-readable and to be kept automatically up-to-date. Finally, we wanted to get notified when data expectations were not met, inticating either problems in the data, or that our expectations needed adjustments.

At the time, we weren’t sure how we would deploy validations on a schedule, or whether Great Expectations would be able to fetch data from our Big Data Lake (an IBM Big SQL), which is a high performance massively parallel processing (MPP) SQL engine for Hadoop. Failing in any of these integrations would have ended the experiment.

Despite having to do a small hack to connect to our Big Data Lake, we were able to have our data quality validations run periodically on OpenShift in less than a month! 🎉

What’s next?

At the end of the Python Bytes episode, host Brian Okken wonders how data engineers might include the Great Expectations tool in their data pipelines. I will be back soon to show you how to do just that! I’m creating a tutorial that details the individual steps and technologies we use in DQVT, but the structure of DQVT is quite simple, so you would likely be able to reproduce it on your own infrastructure.

And if you have some experience of your own or are just curious to learn more, you’re more than welcome to leave a comment!

%d bloggers like this: