top of page

Scaling Laws in Power Markets Forecasting

  • Writer: David Murray
    David Murray
  • Jun 11
  • 4 min read

How bigger data and bigger models deliver better results

A large hydro-electric plant benefits from economies of scale. So do machine learning models.
A large hydro-electric plant benefits from economies of scale. So do machine learning models.

The prices to which generators and loads are exposed in wholesale power markets are driven by the marginal cost of producing and moving the power to where and when it’s needed, subject to constraints on the grid. The process, security-constrained economic dispatch, is run sub-hourly by independent system operators to determine prices (or LMPs) for each resource on the grid. Security constrained unit commitment (SCUC) is a similar process in clearing the day-ahead market.  The mathematical theory that underpins the clearing of prices is optimal power flow (OPF), and it is used in both real-time (SCED) and day-ahead (SCUC) price formation. 


Any forward-looking price forecast approximates how each market will clear, which is why any machine learning model should know both where things are on the grid and when various inputs to price formation will change.


Simulating SCED


There is a large difference between predicting the next value in a series (next day’s prices) and predicting the outputs of price formation for any given hour in power markets. Many linear regression or basic time series models will be able to approximate the next value based on historical data. Compare today’s day-ahead market clears to tomorrows, and you have a very good forecast. Models that predict the output of OPF require more data inputs and more parameters to simulate the complex interactions in optimal power flow.


Tomorrow's forecasts (at time of writing) for congestion in ERCOT. Large models can pick up on localized congestion in McAllen, higher prices in Houston and depressed prices in the Panhandle because they incorporate a layer of the grid into the model, and scaling laws handle complex interactions.
Tomorrow's forecasts (at time of writing) for congestion in ERCOT. Large models can pick up on localized congestion in McAllen, higher prices in Houston and depressed prices in the Panhandle because they incorporate a layer of the grid into the model, and scaling laws handle complex interactions.

The first of those data inputs are approximations of the grid, including locations of generator information, large load centres, transmission lines for moving power, and any associated probabilities of how that grid can change before the forecast hour.

The second are approximations of inputs to SCED, including import-adjusted volumes for the marginal cost of power at each point on the grid (based on solar, wind, and expected thermal variable costs). These inputs are layered on top of static grid information as time series data and are the dynamic inputs into machine learning models to predict the next three days of prices.


Section Recap


  • Price formation in power markets is based on optimal power flow (OPF), implemented through SCED and SCUC to determine real-time and day-ahead prices.

  • Accurate forecasting requires understanding both grid structure and the timing of key inputs

  • Large models can simulate grid dynamics by layering static infrastructure with dynamic time series inputs


Bigger data is needed


Any change in load or generation in an ISO’s footprint will impact prices in another, so it follows that any model that can include the entire power grid will do a better job predicting the price at one point on the grid. The problem is size of data.


Large ISOs have mapped over 100,000 transmission facilities. There are thousands of injection points where many weather variables (wind speed, humidity, direction, temperature) will impact the amount of power produced. Any dataset that includes all these variables, over time, can grow to over 1 TB. In the same way that large language models perform better by training on the entire internet, price forecasts are more accurate by training on every aspect of the power grid.


For more information on the relationship between accuracy, dataset size, parameter counts, and compute in other domains, review Scaling Laws for Neural Language Models.
For more information on the relationship between accuracy, dataset size, parameter counts, and compute in other domains, review Scaling Laws for Neural Language Models.

There are trade offs between dataset size and the granularity of inputs: by clustering transmission outages, aggregating farm level data from weather inputs, and being intelligent in training, dataset sizes can be reduced. On the flip side, larger datasets provide more information about how SCED has been solved in the past and generally improves the accuracy of the models, provided they are optimized properly and are an appropriate size themselves.


Section Recap


  • Forecast accuracy improves when models consider the full power grid, but this leads to extremely large datasets

  • Especially in large ISOs, data scale is a major computational challenge.

  • It can be managed through intelligent input aggregation and clustering and larger, well-optimized datasets tend to yield better model performance.



Model Size Matters


In the same way that larger animals typically have larger brains, larger datasets require more parameters to simulate the complex interactions between where power is injected and where it needs to go. A parameter is a value that gets adjusted as the model learns to minimize the error and make accurate predictions. A common single-variable linear regression model (y = mx+b) that uses load to predict a hub price may be represented by two parameters, y and b.


The balance between dataset size, parameter count and accuracy is typically determined in a process known as neural architecture search and hyper parameter optimization, or more colloquially as ‘model training’.  Small datasets typically overfit on large parameter architectures, meaning they may perform well in training but disappoint in production. Larger datasets are only useful if the trade-offs are measured over time in training, and parameter counts scale appropriately with them.


It often makes sense to start a machine learning journey with a few nodes and a small dataset on a laptop with a low number of parameters. Software infrastructure constraints will quickly prevent larger datasets from being used, but the accuracy improvements can make the payoffs worth the investment.


Conclusion


Scaling laws apply just as powerfully in power markets forecasting as they do in other domains of machine learning. As models grow in complexity and datasets grow in scope, forecast accuracy improves—provided both are scaled intelligently. Although starting small is pragmatic, the greatest returns come from capturing the full complexity of the grid: from the topology of transmission to the probabilistic behavior of generation and demand. Investing in the infrastructure to support larger datasets and more sophisticated models isn’t just a technical choice—it’s a competitive advantage for anyone exposed to nodal price risk.


Enertel AI provides short-term energy price forecasts and bidding benchmarks for utilities, independent power producers (IPPs), and asset developers. You can view our data catalog, book a demo, or reach out to learn more.

 
 
 

Comentários


bottom of page