Physics-informed machine learning for Power Systems

Environmental and sustainability concerns are transforming the US electricity sector with aggressive targets to achieve 100% carbon pollution-free electricity by 2035. Achieving this objective while maintaining a safe and reliable power grid in the presence of intermittent renewable generation requires new operating paradigms of computationally fast and accurate decision making in dynamic and service-critical environments. Optimization and machine learning emerge as key approaches, but neither alone are sufficient for future power system control and decision making. Classical approaches for optimization (particularly for large-scale problems and in the presence of mixed integer variables) remain prohibitively slow for dynamic applications where fast and frequent decision making is required. Data driven methods offer a significant speed-up, but out-of-the-box implementations typically cannot enforce hard constraints or address mixed integer variables. To close this gap of accuracy from model-based approaches and speed from data-driven methods, we propose the domain of physics-informed machine learning.

Physics-informed ML: This domain aims to bridge the gap between model-based and data-driven methods. We take the knowledge of our system (ex. physical constraints, observed patterns and behaviours) and embed these strategically within ML frameworks. The benefit is that data-driven methods can provide incredible speed-ups in decision making, by transferring some of the computational burden offline. For example, neural networks train offline and try to learn the patterns that exist in the dataset. They then use this information in real time to make decision.

AI + Optimization: Neural networks can learn patterns that may emerge in repeatedly solving a optimization problem.

Neural networks: Think of these as excellent functional approximators. With different activation functions and number of layers, neural networks can represent complex relationships with highly nonlinear functions.

End-to-end learning-based optimization framework for dynamic grid reconfiguration

The problem: In this project, we consider a problem in power grid operations: topology optimization. Imagine the power grid as a graph consisting of nodes (electrical buses where loads and generators are connected) and edges (the physical power lines connecting buses to one another). In topology optimization, we can reorganize the graph, to connect different nodes to one another by changing which edges are present. In the physical power grid, this consists of opening or closing switches in lines to either add or remove the electrical connection, sometimes called grid reconfiguration.

The math in words: To carry out topology optimization, we can write the decision as a mixed integer optimization problem: we want to decide which switches to turn on or off (the integer, or binary, decision) and also decide how much power each generator should produce (the continuous decision). We make these decisions to minimize a system cost (the objective function of the optimization problem). Let's consider the distribution system which distributes power from the transmission grid down to end use customers, like to the room you're sitting in right now. The distribution grid consists of lossy power lines, so a utility company may be interested in delivering power as efficiently as possible -- to maximize efficiency, they can minimize the line losses incurred from transporting power. Our objective function is then to minimize line losses, typically written as a quadratic function. The remainder of the optimization problem consists of the constraints:

Power physics: how power flows from one node to another, how voltages are related across the grid) which must be satisfied at all times. We consider the Linearized DistFlow model which is commonly used to model distribution grids.

Device/generator operating constraints: solar energy forecasts

Topological constraints: the distribution grid must be connected and radial, i.e. a tree connecting all nodes together

Our method: We propose a physics-informed ML approach. Specifically, we want to embed the discrete decisions of switch on/off status directly within the machine learning loop, so that our framework can be trained end-to-end. Our ML framework consists of a very simple neural network (just 2 layers based on the Universal approximation theorem) which predicts the probability that every switch is open or closed, and the voltages at all nodes in the grid. Then, we design a physics-informed rounding (PhyR) layer which converts the switch open/close probabilities to discrete decisions. We leverage the topological constraints (mainly the radiality) to design the PhyR layer. The radiality constraint explicitly describes the number of switches that can be open or closed in the grid -- let's call this X. We use this information to consider the relative probabilities of the switches: we close X switches with the highest probability of being closed (i.e. the power line is added) and we open the remaining switches (i.e. the power line is removed). This simple approach allows us to effectively learn how to solve the topology optimization problem. Additional details of the framework can be found in our paper.

Some results: We test our framework on distribution grids with different load and generation profiles. Our results show that we can learn to solve the topology optimization problem. To provide some intuitive understanding of why our framework with PhyR works, we summarize some key results here:

Sequential decision making simplifies the problem: Our method first selects a grid topology using PhyR then enforces the power flow upon it via the variable space completion (i.e. equality constraints are satisfied). This sequential decision making improves prediction performance by mimicking the simpler optimal power flow problem on a fixed topology in every prediction.

Embedding discrete decisions within the loop permits feasible solutions: The sequential decision making described above happens within the training loop. If we don't use PhyR, the neural predictor is solving a relaxation of the topology optimization task, where the neural network remains in a probabilistic framework and each switch has a probability of being open or closed. Rather than considering a single topology, power flow must be satisfied across a (potentially large) set of possible topologies. Not only is this a challenging task, but there may not exist a power flow solution which remains feasible across multiple topologies. This prohibits effectively learning the power flow representation, resulting in higher inequality violations.

Physics-informed ML enables explainable solutions that are meaningful for power systems operators: The inequality constraint errors incurred by our method can be easily described. The neural predictor is trying to minimize line losses by supplying loads with local generation, and thereby reducing the amount of power transmitted through the grid. The constraint violations happen when local generation is not available.

GraPhyR -- Embedding more physics: In the framework above we used a very simple neural network architecture, a feedforward neural network with just 2 layers. But can we embed more physics using another architecture? Yes -- let's consider graph neural networks (GNNs)! With a GNN, we can embed the underlying graph describing the power grid into the neural architecture. Rather than trying to learn how nodes are connected to one another, we explicitly tell the GNN these connections. In the framework above, we replace the neural network with a GNN, and continue to use the embedded discrete decisions permitted by PhyR.

To embed the switch decisions into the GNN, we use a few tricks:

Weights on the edges of the graph: Wherever we have a switch, we add a weight on the edge, in addition to weights at every node. During training, the neural network will update these weights as it tries to minimize the loss function. We randomly initialize the weights on the edges when we begin training.

Gated message passing: This is the key to embedding switches into the GNN. Mainly, we use gates which model each switch as a continuous switch, with value between 0 and 1. Higher values correspond to higher probabilities that the switch is closed, and more information is passed through the gate. This mimics the action of a continuous switch, and model the control of physical power flow between nodes. Higher probabilities = more information shared = stronger coupling of nodal voltages.

Local predictors: Rather than making predictions for the entire grid using a single predictor, we design a switch predictor and a line predictor. Switch predictors act only on pairs of nodes connected by a switch, and the line predictor acts on pairs of nodes connected by power lines without a switch. This allows the two predictors to decouple decisions between these different grid elements. This also permits a scalable solution.

You can see more details and simulation results in our paper.