The Challenge
A mid-market renewable energy operator managing a portfolio of 14 solar farms and 8 wind installations across three states was struggling with demand forecasting. Their legacy approach — a combination of ARIMA models and manual adjustments by grid operators — consistently missed the mark. Average forecast error ran at 18% on a day-ahead basis, leading to chronic overproduction during low-demand periods and energy shortfalls during peaks.
The financial impact was severe. Overproduction meant selling surplus energy on spot markets at steep discounts, sometimes at negative prices during grid saturation. Underproduction forced the operator to purchase supplemental power from fossil fuel generators at premium rates to meet contractual obligations. Between curtailment losses, spot market penalties, and supplemental procurement costs, inaccurate forecasting was draining roughly $19M annually.
The core problem was that the legacy statistical models treated demand as a univariate time series. They captured daily and seasonal periodicity reasonably well but couldn't account for the complex interactions between weather patterns, economic activity, special events, and grid-level dynamics. A cold snap driving heating demand, a major sporting event shifting evening load curves, a heat wave triggering industrial cooling — these were all blind spots. The models also operated at a portfolio level, missing the regional variation between grid zones that drove materially different demand patterns.
Our Approach
Week 1-2: Data Integration & Feature Design
The first challenge was assembling a coherent dataset from fragmented sources. Demand data lived in the operator's SCADA systems. Weather data came from NOAA APIs and commercial providers. Economic indicators were scattered across Census Bureau and Bureau of Labor Statistics feeds. Event calendars — sports, concerts, school schedules, holidays — had to be compiled from multiple public sources and normalized.
We built an automated data pipeline using Apache Airflow that ingested, cleaned, and aligned data from 11 distinct sources on hourly cadence. Key feature engineering decisions included:
- Weather encoding that captured not just current conditions but forecast uncertainty — a 50% chance of extreme heat drives different preparation than a 95% chance
- Calendar features that encoded day type (workday, weekend, holiday), time of year, and proximity to major events, using embeddings rather than one-hot encoding to capture similarity between event types
- Lagged demand features at multiple granularities — same hour yesterday, same hour last week, same hour last year — to capture autocorrelation at different time scales
- Grid topology features that modeled interconnection constraints between zones, so the model understood when excess generation in one region could or couldn't offset deficit in another
Week 3-5: Model Development & Ensemble Architecture
We built a two-layer ensemble model. The first layer combined three complementary forecasting approaches: Facebook Prophet for capturing trend and seasonality with automatic changepoint detection, LightGBM for learning complex non-linear interactions between weather, calendar, and demand features, and a temporal convolutional network (TCN) for sequence pattern recognition in the raw demand signal.
Each model was trained at the regional grid zone level — 6 zones total — rather than the portfolio level. This was critical because demand patterns in a coastal metropolitan zone behaved fundamentally differently from an inland agricultural zone. We trained on 3 years of historical data, with the most recent 6 months held out for validation.
The second layer was a meta-learner (a ridge regression) that combined the three base model outputs, dynamically weighting them based on recent performance. During stable weather periods, Prophet tended to dominate; during weather transitions and anomalous events, LightGBM contributed more; during rapid demand ramps, the TCN added value. The meta-learner learned these regimes automatically.
Week 6-7: Backtesting & Forecast Extension
We ran an extensive walk-forward backtest over the 6-month validation period, simulating production conditions — the model only saw data that would have been available at forecast time. We measured MAPE (mean absolute percentage error), RMSE, and a custom cost metric that weighted overproduction and underproduction errors differently based on their actual financial impact.
The ensemble achieved a MAPE of 13.0% on 24-hour-ahead forecasts (down from 18% with the legacy models). But the bigger win was extending the forecast horizon. We built a 72-hour-ahead forecast capability that still maintained 14.8% MAPE — better than the old system's 24-hour accuracy. This gave the operations team two additional days of visibility for scheduling, trading, and grid coordination.
Week 8: Deployment, Automation & Handoff
We deployed the forecasting system with automated weekly retraining triggered by Airflow. Each Monday, the model retrained on the latest data, ran a performance comparison against the prior model version, and promoted the new model only if it outperformed on a rolling 30-day window. This prevented model degradation from data distribution shifts.
We built a Grafana dashboard showing forecast vs. actual demand by zone, model confidence intervals, feature importance breakdowns, and financial impact estimates. Grid operators could see at a glance where the forecast was confident and where uncertainty was high, adjusting their operational posture accordingly.
Tech Stack
Results
Over the first two quarters of production deployment:
- 28% improvement in forecast accuracy — day-ahead MAPE dropped from 18% to 13%, with zone-level granularity revealing even stronger gains in high-variability regions
- $12M in optimized energy distribution — reduced spot market losses from overproduction, lower supplemental procurement costs during underproduction, and better positioning for day-ahead energy trading
- 15% reduction in energy waste — more accurate forecasts meant generation schedules aligned more closely with actual demand, reducing curtailment across solar and wind assets
- Forecast horizon extended from 24 to 72 hours — giving operations teams three days of actionable visibility instead of one, enabling proactive grid coordination and trading strategy
- Automated retraining maintained accuracy without manual intervention, with the model adapting to seasonal transitions and shifting demand patterns automatically
Our old forecasting approach was basically educated guesswork. Arkyon gave us a system that actually understands the variables driving demand — weather, events, economics, grid dynamics — and integrates them at a level of granularity we never had. The 72-hour forecast horizon alone changed how we operate.
A.R. — Director of Grid Operations, Renewable Energy Operator
What Made This Work
- Multi-variate approach — incorporating weather, events, economic indicators, and grid topology as first-class features let the model capture demand drivers that univariate methods completely miss
- Zone-level granularity — training separate models per grid zone rather than at the portfolio level captured regional demand patterns that aggregate models averaged away
- Ensemble with dynamic weighting — the meta-learner automatically adapted to which base model performed best under different conditions, delivering robust accuracy across weather regimes
- Automated retraining with guardrails — weekly retraining kept the model current, while the performance gate prevented degradation from noisy or anomalous training data