In 2019, we started working on a model that should be able to handle intra-zonal constraints in the upcoming balancing market. That methodology has been presented in a previous post in January 2022. In this post, we will focus on an open source Python library called cimsparql that we have developed to support this model. For the model to be able to perform any analysis, it needs data that describe the state of the power system. At Statnett, these data are available as CIM (Common Information Model) profiles. The data is made available through a triple store (GraphDB/Blazegraph/rdf4j), using a resource description framework which is a standard model for data interchange.
The information about the power system available in these CIM profiles can be used for different purposes, and what information should be extracted depends on the requirement of your model. In the previously presented post, a DC optimal power flow model is used. Thus we need data on generation, demand and transmission lines. The purpose of the cimsparql package is to extract this information from the triple store, through a set of predefined sparql queries, and loading them into Python as pandas dataframes. Cimsparql will also make sure that columns in the dataframes have correct types, either string, float or integer, as defined by the CIM standard.
Cimsparql uses the SPARQLWrapper library to remotely execute sparql queries, and extends it with extra functionality, assuming the data conform to the CIM standard. Even though the package is an important part of the balancing market model, it is open source available from github and can be installed using pip.
~/pip install cimsparql
Once the library is installed, it must be configured to query a triple store using the ServiceConfig class in cimsparql.graphdb. The example below assumes you have a graphdb server with a CIM model in a repository called “micro_t1_nl”. This test case, available at the cimsparql repository on github, is used to test the development of the predefined queries.
>>> service_cfg =ServiceConfig(repo="micro_t1_nl")>>> model =get_cim_model(service_cfg)
If you need to provide other configurations such as server, username and password, this can be done with the same ServiceConfig class.
Once the model is configured, the data can be loaded into a pandas dataframe using the predefined queries. In the example below, topological node information is extracted from the triple store.
Here the values in the nominal voltage column has been converted to float values as defined by the CIM standard, while node and bus names are strings.
All the predefined queries can be executed using the cimsparql.model.CimModel class. Examples are the already shown bus_data as well as loads, synchronous_machines, ac_lines and coordinates. The latter extracts coordinates of all equipment in the model from the CIM Geographical Location profile. Cimsparql orders the rows in the dataframe such that it is straightforward to use with plotly’s map functionality. The example below was made in a Jupyter notebook.
AC line segments and substations included in the model
The main goal of cimsparql is to read data for the purpose of running power flow analysis using sparql queries to read data from triple store into pandas dataframes in Python. Currently the package is used internally at Statnett, where we also have some data which is yet not covered by the CIM standard. Thus some of the queries contains a namespace which will probably only be used by Statnett. However, this should not pose any problem for the use of this package elsewhere, as these namespaces or columns have been made optional. So any query towards a data set that does not contain these, will just produce a column for the given namespace with NaN values.
The package can also be uses in cases where the predefined queries does not produce data for a specific purpose. In this case, the user can provide their own queries as a string argument to the get_table_and_convert method. The example below list out the numbers of ac line segments for each voltage level in your data.
>>> query='''PREFIX cim: <http://iec.ch/TC57/2013/CIM-schema-cim16#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>select ?un (count(?mrid) as ?n) where { ?mrid rdf:type cim:ACLineSegment; cim:ConductingEquipment.BaseVoltage/cim:BaseVoltage.nominalVoltage ?un.} group by ?un'''>>> df = model.get_table_and_convert(query)
So to summarize, the main contribution of cimsparl is a set of predefined queries for the purpose of running power flow simulations and type conversion of data that follows the CIM standard.
Almost three years ago, I started on my journey towards an industrial PhD in
reinforcement learning applied to power systems. At around that time, I had used
Python daily for 6-7 years. But one day during the autumn of 2018, Thomas,
the founder of our Data Science team, threw out a new idea to me, like he used
to do at the time: "Have you ever tried Julia? You really should! Some guys at
MIT were tired of the slowness and problems with Python so they decided to create
a new language. And there’s a module for power systems too, the PowerModels.jl."
So there it started for me. And at the start my PhD project I thought it was a
good opportunity for me to learn a new programming language from scratch along the way.
Getting started with Julia
If you don’t have Julia installed already, I suggest you install it now so you
can follow along through the rest of this post. If you’re on Linux, the easiest
way to get the installation of Julia up and running is to use
juliaup:
Once you’re up and running, the easiest way to install a package when you’re in a Julia session is to just start using it by typing using <package name>, as for example in
julia>using Plots │ Package Plots not found, but a package named Plots is available from a registry. │ Install package? │ (@v1.8) pkg> add Plots └ (y/n/o) [y]:
Julia seeks to solve the two-language problem
Julia is an open-source, high-level, dynamically-typed language. It was started
back in 2009 by a group of people at MIT: Stefan Karpinski, Jeff Bezanson, Viral
B. Shah and Alan Edelmann. The first version was released in 2012. In their
blogpost
from february 2012, the authors stated they wanted a programming
language "… [with] the speed of C… as usable for general programming as
Python…, [and] as easy for statistics as Matlab". Following this, Julia seeks to
solve the two-language problem. If you want a language that is dynamic, flexible
and easy to prototype in, you usually have to sacrifice speed, as in Python and
R. On the other hand, if you want performance-critical code you have to
resort to fast, static and compiled languages such as C and C++.
The way Julia solves this is by having a native, feature-rich
REPL
(Read-Eval-Print Loop) and by having JIT (just in time)-compilation together
with a flexible, parametric type system along with multiple dispatch. More on
this later!
The Julia REPL is your best friend
When prototyping in Julia, the natural starting point is the Julia REPL, which
in many ways behaves pleasantly like the iPython interface but also with so much
more. The Julia REPL has four main modes (together with full history search):
The Julia code prompt
Help mode
Package manager
Shell mode
1) The standard Julia code prompt
This mode is invoked at startup and is the mode where you do all your
prototyping. For example, illustrating 3 different ways to write a function:
add_one_line(x) = x +1# one-lineadd_one_anon = x -> x +1# anonymous functionfunctionadd_one_full(x)return x +1end
2) The help mode
This is invoked by typing ? and give you quick access to the docstring of a
function. So if we are in Julia prompt mode and type
""" myadd(x,y)Add x and y."""functionmyadd(x,y)return x + yend
we can activate the help mode by typing ? and then type the function name:
help?> myaddsearch: myaddmyadd(x,y) Add x and y.
3) The package manager
Apart from the speed, this is perhaps my favourite feature of the Julia
language. In Python, there are number of different environment and package
managers, like "pip", "pipenv", "virtualenv" and "poetry". Choosing between them
and understanding how they work together can be confusing and
time-consuming. In Julia, the package manager is built into the language and it
just works. The package manager is invoked by typing ] and you leave it by
typing backspace. In an empty directory you can create a new environment like this:
(@v1.7) pkg> activate . Activating new project at `~/JuliaProjects/sandbox/blog`(blog) pkg> add UnicodePlots Updating registry at `~/.julia/registries/General.toml` Resolving package versions... Updating `~/JuliaProjects/sandbox/blog/Project.toml` [b8865327] + UnicodePlots v3.1.0 Updating `~/JuliaProjects/sandbox/blog/Manifest.toml`...(blog) pkg>julia>
This can also be done programatically in the Julia REPL like this:
An environment is defined by two .toml-files. The Project.toml contains the
packages you want in the environment together with version restrictions. The
Project.toml is the file to commit to github. The file Manifest.toml is the
detailed machine-generated file which contains all the information necessary to
exactly reproduce a specific environment. Both files are constructed when you
instantiate an environment and updated automatically when you add new packages.
4) The shell mode
By typing ; in the REPL you get instant access to a shell that behaves like a
bash shell, which in many occasions could be really handy. But in practice I
usually just open a new terminal with my regular bash shell.
Julia is not object-oriented – it relies on multiple dispatch
On the surface, the Julia syntax doesn’t differ too much from Python, the main
differences being 1-based indexing and that functions and control structures in
Julia are always closed by the end keyword. And as in Python, you don’t need
to annotate the type of the variable. In this context, an important part of Julia, is the type inference.
This means that the types are inferred from the actual variable values. For example,
julia> a =1; typeof(a)Int64julia> a =1.0; typeof(a)Float64julia> a ="Hello, word"; typeof(a)Stringjulia> a =1+1.0f0; typeof(a)Float32
One main difference from Python is that Julia is not objected-oriented, so there
are no classes and inheritance. Instead, Julia has a very flexible type
hiearchy, with C-like structs as the main building block. This means that
functions don’t belong to certain types or objects as they do in Python.
Instead, Julia has polymorphism where functions can have the same name but
which function to be used at runtime, or dispatched, depends on the types of
the arguments of the function call. I write arguments in plural, because Julia
has so called multiple dispatch, meaning that the language considers the types
of all the arguments when choosing which function to execute. This is a
remarkable feature. Let me show an example:
struct MyNumber x::Float64endf(x) ="I am the fallback for one argument"f(x::Integer) ="I'm dispatched on integers"f(x::Float64) ="I'm dispatched on floats"f(x, y) ="Fallback for two arguments"f(x::Int, y::Float64) ="First is Integer, second argument is Float64"f(x::MyNumber) ="Hello, mynumber!"mynumber =MyNumber(1.0)f("Hello, world!") # I am the fallback for one argumentf(1) # I'm dispatched on integersf(1.0) # I'm dispatched on floatsf("Hello", 1) # Fallback for two argumentsf(1, 2.0) # First is Integer, second argument is Float64f(mynumber) # "Hello, mynumber!"
Julia is compiled just-in-time
This brings me to the second main difference: Julia is compiled. The compilation
is done at runtime by a so-called JIT (just-in-time) compiler. JIT-compilation
is also found in for example the Numba library in Python. In the example above,
we have six different versions of the function called f. These versions are
called methods in Julia. We can examine what methods of the function f above
are actually defined by calling the methods function on f:
julia>methods(f)# 6 methods for generic function "f":[1] f(x::Int64, y::Float64) in Main at REPL[6]:1[2] f(x::Integer) in Main at REPL[3]:1[3] f(x::Real) in Main at REPL[4]:1[4] f(x::MyNumber) in Main at REPL[7]:1[5] f(x) in Main at REPL[2]:1[6] f(x, y) in Main at REPL[5]:1
So at runtime, when the program is about to execute a specific function call,
the compiler checks the types of all the actual input arguments of the function.
If it has been called and compiled before with this combination of argument
types, it uses a cached version, which of course is very fast. If not, a new
version based on all the argument types of the current call is compiled and
stored. This means we only need to compile functions actually used in the code.
Julia is fast because of code specialization
In principle, Julia code can get the speed of C
for any user-defined type. This is in contrast to for example numpy in Python,
where the speedups only apply to a limited collection of predefined datatypes. Compared
to native Python, the speedup naturally varies a lot depending on the task, but
figures of 2-10 times faster are not unheard of.
Even though modern data science tools in Python like Numpy and Tensorflow are
wrappers around C/C++ and with that bring speed to the table, it is still useful
to understand why pure Python is slow.
Coming from Python to Julia, the biggest change for me was to become more aware
of the types of the variables and objects you are actually doing things with.
"Is this an array of 64-bit floats?" "What are the types of the class members?"
Working in pure Python, none of these questions seemed relevant because in
Python a variable could bascially be of any type and things just still work.
So for example, when you want to sum an array of values in Python, each and
every time you add two values the langauge needs to check the type of the values
and at runtime find the right +-function to use in the summation and store the
resulting value together with its type. There is no way to inform the language
that all the values of an array are the same type, for example 64-bit precision
floats.
How slow this is can actually be illustrated in Julia, where the type Any is
the mother of all types and of which all other types is a subtype. The Julia
type Any is equivalent to the the Python generic type. By using the Julia
package BenchmarkTools we can compute statistics on the time spent on summing
10^7 random numbers in an array.
julia>using BenchmarkToolsjulia> N =10^7julia> arr_any = Any[rand() for i in1:N];julia> arr_float64 = Float64[rand() for i in1:N];julia>@benchmarksum($arr_any)BenchmarkTools.Trial:31 samples with 1 evaluation. Range (min … max):135.170 ms … 169.061 ms ┊ GC (min … max):0.00% … 16.89% Time (median):165.452 ms ┊ GC (median):16.91% Time (mean ± σ):163.840 ms ± 7.513 ms ┊ GC (mean ± σ):16.03% ± 4.23% ▂█ ▄ ▄▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▆▄████▆▄▄▄ ▁135 ms Histogram: frequency by time 169 ms < Memory estimate:152.59 MiB, allocs estimate:9999999.julia>@benchmarksum($arr_float64)BenchmarkTools.Trial:1193 samples with 1 evaluation. Range (min … max):4.047 ms … 4.706 ms ┊ GC (min … max):0.00% … 0.00% Time (median):4.165 ms ┊ GC (median):0.00% Time (mean ± σ):4.181 ms ± 78.071 μs ┊ GC (mean ± σ):0.00% ± 0.00% ▂▁▄▆██▅▆▇▂▃▅▅▃▄▂ ▂▂▂▃▃▆▅█████████████████▇█▇▆▅▆▄▄▄▇▃▅▃▃▄▄▃▃▃▂▃▂▁▂▁▂▁▂▂▁▁▁▁▁ ▄4.05 ms Histogram: frequency by time 4.42 ms < Memory estimate:0 bytes, allocs estimate:0.
As we can see, the compiler does its job and the median execution time is about 40
times slower with the generic type where the compiler cannot specialize on the
type of the argument.
SINTEF
and viralinstruction have
written two very helpful blogposts that elaborate on how to get performant Julia
code. In the Julia manual,
there is a dedicated section on performance tips.
I also highly recommend Steven Johson’s MIT
lecture
on Julia.
Julia has a growing ecosystem
The ecosystem of Python is impressive and mature. And the current era of Data Science has more
or less evolved hand in hand with Python and its machine learning libraries such as SciPy
and Tensorflow. Python was first released in 1991, more than twenty years before Julia.
And as of september 2022, there are 400 000 projects in PyPi. In comparison, Julia has at
the same time about 8 000 registered packages. In fact, already in 2017, Stefan Karpinski
stated that:
Even NumPy itself was a bit rough back then. It’s entirely possible that if the
SciPy ecosystem had been as well developed in 2009 as it is today, we never
would have started Julia.
But the number of users and packages in Julia is growing, and I’ve definitely
found what I need. To name a few I’ve used, there’s the
Plots.jl and
Makie.jl for plotting. There’s the Pandas
equivalent DataFrames.jl for
dataframes. There’s Flux.jl for machine
learning. There’s
ReinforcementLearning.jl
for reinforcement learning, obviously. And there’s the
PowerModels.jl for dealing
with power systems.
Even though not all packages are as far developed as in Python, this has not
been a major problem for me. And there are some benefits too of entering a
package in its early phase. Your issues and requests have a better chance of
being taken care of and it’s quite possible you are able to contribute to the
project yourself, either with code or documentation. The two packages I’ve been
deepest involved with,
ReinforcementLearning.jl
and
GraphNeuralNetworks.jl,
both have very friendly and helpful core developers.
A nice little Julia package: UnicodePlots.jl
I have to mention this neat little package in Julia,
UnicodePlots.jl, which allows
you to do simple plotting in the command line. For example,
using UnicodePlotslineplot([cos, sin], 0:0.01:2pi)barplot(["Oslo", "Stavanger", "Bergen", "Trondheim"], [2.244, 8.406, 4.92, 0.1], title ="Population")
which in my terminal renders as
How can Julia be used in power systems research?
My PhD project is about how to use reinforcement learning to handle failures in
the power system. This involves a lot of load flow calculations. This is super
easy in PowerModels.jl:
using PowerModels# Casefile "case.14m" downloaded from https://egriddata.org/sites/default/files/case14.mjulia> results =compute_dc_pf("case14.m")...Dict{String, Any} with 5 entries:"optimizer"=>"\\""termination_status"=>true"objective"=>0.0"solution"=>Dict{String, Any}("bus"=>Dict{String, Any}("4"=>Dict("va"=>-0.200698), "1"=>Dict("va"=>-0.0), "12"=>Dic…"solve_time"=>0.000584841
Perhaps even more useful is to do an optimal power flow using the
Ipopt.jl optimizer:
julia>using PowerModelsjulia>using Ipoptjulia>solve_dc_opf("case14.m", Ipopt.Optimizer)...Number of nonzeros in equality constraint Jacobian...: 106Number of nonzeros in inequality constraint Jacobian.: 80Number of nonzeros in Lagrangian Hessian.............: 5Total number of variables............................: 39 variables with only lower bounds:0 variables with lower and upper bounds:5 variables with only upper bounds:0Total number of equality constraints.................: 35Total number of inequality constraints...............: 40 inequality constraints with only lower bounds:20 inequality constraints with lower and upper bounds:0 inequality constraints with only upper bounds:20iter objective inf_pr inf_du lg(mu) ||d||lg(rg) alpha_du alpha_pr ls01.6032287e+029.32e-011.89e+01-1.00.00e+00-0.00e+000.00e+00018.6948059e+031.46e-166.86e+01-1.07.93e-01-1.41e-021.00e+00h 127.6460391e+036.66e-162.59e+00-1.01.43e+00-5.95e-019.60e-01f 137.6605209e+035.48e-163.39e+00-1.01.48e-01-9.76e-015.00e-01f 247.6544392e+036.66e-161.00e-06-1.03.40e-02-1.00e+001.00e+00f 157.6457228e+035.76e-162.83e-08-2.55.25e-02-1.00e+001.00e+00f 167.6432829e+034.44e-162.83e-08-2.51.85e-02-1.00e+001.00e+00f 177.6426423e+033.89e-161.50e-09-3.85.43e-03-1.00e+001.00e+00f 187.6425922e+036.66e-161.84e-11-5.74.35e-04-1.00e+001.00e+00f 197.6425918e+033.37e-162.51e-14-8.63.75e-06-1.00e+001.00e+00f 1Number of Iterations....: 9 (scaled) (unscaled)Objective...............: 1.9106479435807935e+027.6425917743231739e+03Dual infeasibility......: 2.5059035640133008e-141.0023614256053203e-12Constraint violation....: 3.3653635433950058e-163.3653635433950058e-16Variable bound violation:8.9305948524944555e-098.9305948524944555e-09Complementarity.........: 2.6421734022358593e-091.0568693608943436e-07Overall NLP error.......: 2.6421734022358593e-091.0568693608943436e-07Number of objective function evaluations =11Number of objective gradient evaluations =10Number of equality constraint evaluations =11Number of inequality constraint evaluations =11Number of equality constraint Jacobian evaluations =1Number of inequality constraint Jacobian evaluations =1Number of Lagrangian Hessian evaluations =1Total seconds in IPOPT =0.005EXIT: Optimal Solution Found.Dict{String, Any} with 8 entries:"solve_time"=>0.00622511"optimizer"=>"Ipopt""termination_status"=> LOCALLY_SOLVED"dual_status"=> FEASIBLE_POINT"primal_status"=> FEASIBLE_POINT"objective"=>7642.59"solution"=>Dict{String, Any}("baseMVA"=>100, "branch"=>Dict{String, Any}("4"=>Dict{String, Any}("qf"=>NaN, "qt"=>N…"objective_lb"=>-Inf
Deep learning in Julia? Use Flux!
Flux.jl is the go-to library for deep
learning in Julia. It is the Tensorflow and Pytorch equivalent.
Let’s do a simple example on training a linear regression model with intercept
at 1.5 and slope 2.0. We first start by importing the necessary library and
defining our data:
The piping operator |> is very handy in Julia, the same is the broadcasting
done by the .-operator. See the
manual
for more informations about these.
So now we have linear data with some noise added. Let’s also define a very
simple neural network, with 1 layer and only one node. As we can see, there are
two parameters in this model:
julia> model =Dense(1,1)# Dense(1 => 1) # 2 parameters
We can get the parameters of the model, a.k.a. the weights and the biases, by
julia>loss(input, output) = Flux.Losses.mse(model(input), output)# loss (generic function with 1 method)julia>loss(x, y)# 11.150247f0
So with random weight and bias, the loss is initially quite large. Let’s
implement a simple gradient descent, loop over the data and train on the cpu
(which is the default):
Finally let’s plot the results using the standard
Plots.jl library using the GR
backend:
julia>using LaTeXStringsjulia>using Plotsjulia>gr() # to activate the GR backend (not necessary as it is the default)# Plots.GRBackend()julia> p =scatter(x[1,:],y[1,:], label="observations", legend=:topleft, xlabel=L"x", ylabel=L"y")julia>plot!(p, x[1,:], model(x)[1,:], label="model")
In the last function call, we see an example of a Julia convention: Any function
that changes the argument in place has an "!" at the end of the identifier.
What’s not to like?
Time to first plot
The first time a Julia method is called it is compiled and this obviously
prolongs the execution time. Sometimes this can lead to some delays (and
frustration) when you’re typing interactively. Every time you close the Julia
session and start a new one, you have to go throuh the compilations again. This
manifests itself also in the so-called Time To First Plot (TTFP). Although this
has been improved in almost every release of Julia, you still have two wait some
long seconds. For example, these lines of code takes about 8s on my terminal
from a fresh Julia session:
using Plotsx =range(0, 2pi, length=100)plot(x, sin.(x))
Of course, the second time these lines are run from the same session it runs instantaneously.
In Python, the equivalent code took much less than 1s:
import matplotlib.pyplot as pltimport numpy as npx = np.linspace(0,2*np.pi,num=100,endpoint=True)plt.plot(x, np.sin(x))plt.show()
Debugging
The Julia debugger is also one part of Julia that has not brought forth the best
of my feelings. Although it also has improved recently, to be frank I don’t use
it much. In Python I used to step through the code in for example the Pycharm
IDE. This gave much insight. In Julia, I code much more interactively
through the REPL. I also implement shorter functions that are more testable and in a functional
programming style. In addition, I use tools like
Infiltrator.jl. After a year of
coding like this, I hardly miss a debugger.
Closing remarks
To me, learning Julia has been a very positive experience. Julia is fast, fun to read and
write, and it is applicable to a wide range of computational problems. But maybe more importantly,
by learning a new language I have learned a lot on how computers and
programming languages work together. And I have learned that there are alternatives to the Pythonic way.
So if in doubt: Give Julia a try!