Cimsparql: Loading power system data into pandas dataframes in Python

In 2019, we started working on a model that should be able to handle intra-zonal constraints in the upcoming balancing market. That methodology has been presented in a previous post in January 2022. In this post, we will focus on an open source Python library called cimsparql that we have developed to support this model. For the model to be able to perform any analysis, it needs data that describe the state of the power system. At Statnett, these data are available as CIM (Common Information Model) profiles. The data is made available through a triple store (GraphDB/Blazegraph/rdf4j), using a resource description framework which is a standard model for data interchange.

The information about the power system available in these CIM profiles can be used for different purposes, and what information should be extracted depends on the requirement of your model. In the previously presented post, a DC optimal power flow model is used. Thus we need data on generation, demand and transmission lines. The purpose of the cimsparql package is to extract this information from the triple store, through a set of predefined sparql queries, and loading them into Python as pandas dataframes. Cimsparql will also make sure that columns in the dataframes have correct types, either string, float or integer, as defined by the CIM standard.

Cimsparql uses the SPARQLWrapper library to remotely execute sparql queries, and extends it with extra functionality, assuming the data conform to the CIM standard. Even though the package is an important part of the balancing market model, it is open source available from github and can be installed using pip.

~/pip install cimsparql

Once the library is installed, it must be configured to query a triple store using the ServiceConfig class in cimsparql.graphdb. The example below assumes you have a graphdb server with a CIM model in a repository called “micro_t1_nl”. This test case, available at the cimsparql repository on github, is used to test the development of the predefined queries.

  >>> service_cfg = ServiceConfig(repo="micro_t1_nl")
  >>> model = get_cim_model(service_cfg)

If you need to provide other configurations such as server, username and password, this can be done with the same ServiceConfig class.

Once the model is configured, the data can be loaded into a pandas dataframe using the predefined queries. In the example below, topological node information is extracted from the triple store.

>>> bus = model.bus_data()
>>> print(bus.to_string())
                                           busname      un
795a117d-7caf-4fc2-a8d9-dc8f4cf2344a  NL_Busbar__4  220.00
6bdc33de-d027-49b7-b98f-3b3d87716615   N1230822413   15.75
81b0e447-181e-4aec-8921-f1dd7813bebc   N1230992195  400.00
afddd60d-f7e6-419a-a5c2-be28d29beaf9   NL-Busbar_2  220.00
97d7d14a-7294-458f-a8d7-024700a08717    NL_TR_BUS2   15.75

Here the values in the nominal voltage column has been converted to float values as defined by the CIM standard, while node and bus names are strings.

All the predefined queries can be executed using the cimsparql.model.CimModel class. Examples are the already shown bus_data as well as loads, synchronous_machines, ac_lines and coordinates. The latter extracts coordinates of all equipment in the model from the CIM Geographical Location profile. Cimsparql orders the rows in the dataframe such that it is straightforward to use with plotly’s map functionality. The example below was made in a Jupyter notebook.

df = model.coordinates()
lines = df.query("rdf_type == ''")
stations = df.query("rdf_type == ''")
center_x, center_y = df["x"].mean(), df["y"].mean()

fig = px.line_mapbox(lines, lon="x", lat="y", color="mrid", height=1000)
fig2 = px.scatter_mapbox(stations, lon="x", lat="y", color="mrid", size=[1]*len(stations))
fig.update_geos(countrycolor="black", showcountries=True, showlakes=True, showrivers=True, fitbounds="locations")

all_fig = go.Figure( +, layout = fig.layout)
AC line segments and substations included in the model

The main goal of cimsparql is to read data for the purpose of running power flow analysis using sparql queries to read data from triple store into pandas dataframes in Python. Currently the package is used internally at Statnett, where we also have some data which is yet not covered by the CIM standard. Thus some of the queries contains a namespace which will probably only be used by Statnett. However, this should not pose any problem for the use of this package elsewhere, as these namespaces or columns have been made optional. So any query towards a data set that does not contain these, will just produce a column for the given namespace with NaN values.

The package can also be uses in cases where the predefined queries does not produce data for a specific purpose. In this case, the user can provide their own queries as a string argument to the get_table_and_convert method. The example below list out the numbers of ac line segments for each voltage level in your data.

>>> query='''
PREFIX cim: <>
PREFIX rdf: <>
select ?un (count(?mrid) as ?n) where { 
?mrid rdf:type cim:ACLineSegment;
   cim:ConductingEquipment.BaseVoltage/cim:BaseVoltage.nominalVoltage ?un.
} group by ?un'''
>>> df = model.get_table_and_convert(query)

So to summarize, the main contribution of cimsparl is a set of predefined queries for the purpose of running power flow simulations and type conversion of data that follows the CIM standard.

Author: Leif Warland

Member of Data Science department at Statnett

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: