The Challenge

A mid-market renewable energy operator managing a portfolio of 14 solar farms and 8 wind installations across three states was struggling with demand forecasting. Their legacy approach — a combination of ARIMA models and manual adjustments by grid operators — consistently missed the mark. Average forecast error ran at 18% on a day-ahead basis, leading to chronic overproduction during low-demand periods and energy shortfalls during peaks.

The financial impact was severe. Overproduction meant selling surplus energy on spot markets at steep discounts, sometimes at negative prices during grid saturation. Underproduction forced the operator to purchase supplemental power from fossil fuel generators at premium rates to meet contractual obligations. Between curtailment losses, spot market penalties, and supplemental procurement costs, inaccurate forecasting was draining roughly $19M annually.

The core problem was that the legacy statistical models treated demand as a univariate time series. They captured daily and seasonal periodicity reasonably well but couldn't account for the complex interactions between weather patterns, economic activity, special events, and grid-level dynamics. A cold snap driving heating demand, a major sporting event shifting evening load curves, a heat wave triggering industrial cooling — these were all blind spots. The models also operated at a portfolio level, missing the regional variation between grid zones that drove materially different demand patterns.

Our Approach

Week 1-2: Data Integration & Feature Design

The first challenge was assembling a coherent dataset from fragmented sources. Demand data lived in the operator's SCADA systems. Weather data came from NOAA APIs and commercial providers. Economic indicators were scattered across Census Bureau and Bureau of Labor Statistics feeds. Event calendars — sports, concerts, school schedules, holidays — had to be compiled from multiple public sources and normalized.

We built an automated data pipeline using Apache Airflow that ingested, cleaned, and aligned data from 11 distinct sources on hourly cadence. Key feature engineering decisions included:

Week 3-5: Model Development & Ensemble Architecture

We built a two-layer ensemble model. The first layer combined three complementary forecasting approaches: Facebook Prophet for capturing trend and seasonality with automatic changepoint detection, LightGBM for learning complex non-linear interactions between weather, calendar, and demand features, and a temporal convolutional network (TCN) for sequence pattern recognition in the raw demand signal.

Each model was trained at the regional grid zone level — 6 zones total — rather than the portfolio level. This was critical because demand patterns in a coastal metropolitan zone behaved fundamentally differently from an inland agricultural zone. We trained on 3 years of historical data, with the most recent 6 months held out for validation.

The second layer was a meta-learner (a ridge regression) that combined the three base model outputs, dynamically weighting them based on recent performance. During stable weather periods, Prophet tended to dominate; during weather transitions and anomalous events, LightGBM contributed more; during rapid demand ramps, the TCN added value. The meta-learner learned these regimes automatically.

Week 6-7: Backtesting & Forecast Extension

We ran an extensive walk-forward backtest over the 6-month validation period, simulating production conditions — the model only saw data that would have been available at forecast time. We measured MAPE (mean absolute percentage error), RMSE, and a custom cost metric that weighted overproduction and underproduction errors differently based on their actual financial impact.

The ensemble achieved a MAPE of 13.0% on 24-hour-ahead forecasts (down from 18% with the legacy models). But the bigger win was extending the forecast horizon. We built a 72-hour-ahead forecast capability that still maintained 14.8% MAPE — better than the old system's 24-hour accuracy. This gave the operations team two additional days of visibility for scheduling, trading, and grid coordination.

Week 8: Deployment, Automation & Handoff

We deployed the forecasting system with automated weekly retraining triggered by Airflow. Each Monday, the model retrained on the latest data, ran a performance comparison against the prior model version, and promoted the new model only if it outperformed on a rolling 30-day window. This prevented model degradation from data distribution shifts.

We built a Grafana dashboard showing forecast vs. actual demand by zone, model confidence intervals, feature importance breakdowns, and financial impact estimates. Grid operators could see at a glance where the forecast was confident and where uncertainty was high, adjusting their operational posture accordingly.

Tech Stack

Python Prophet LightGBM Apache Airflow Snowflake AWS Grafana PostgreSQL

Results

Over the first two quarters of production deployment:

Our old forecasting approach was basically educated guesswork. Arkyon gave us a system that actually understands the variables driving demand — weather, events, economics, grid dynamics — and integrates them at a level of granularity we never had. The 72-hour forecast horizon alone changed how we operate.

A.R. — Director of Grid Operations, Renewable Energy Operator

What Made This Work