How to Evaluate a Load Forecasting Vendor: A Technical Checklist for Utility Operators

Utility operator evaluating load forecasting vendor technical specifications

Most RFPs for load forecasting platforms focus heavily on benchmark MAPE and model architecture descriptions. Neither of these reliably predicts operational value. The questions that do predict value are more granular and harder for vendors to answer with marketing copy â€” which is exactly why they tend to be omitted from standard procurement processes.

Section 1: Data Integration and SCADA Architecture

The first section of any forecasting vendor evaluation should focus on how the system accesses and processes operational data. Forecast accuracy is bounded by data quality â€” a technically superior model trained on poor data will underperform a simpler model trained on validated data. Key questions:

What data access architecture does the system use? Does it require direct RTU or SCADA historian access, or is it designed to operate from an IT-side data copy received through a one-way data diode? Systems requiring direct SCADA access complicate NERC CIP compliance and typically require longer procurement and security review cycles.
What data quality validation does the system perform on incoming intervals? Ask specifically about: stuck register detection, step change detection, timestamp misalignment handling, and out-of-range value filtering. Vendors who respond with "we clean the data" without describing specific validation logic haven't implemented systematic validation.
What protocols does it support for data ingestion? DNP3, Modbus, IEC 61968 CIM, REST, and SFTP bulk upload are the common paths. A system that supports only REST and SFTP cannot provide true real-time updates from RTU telemetry â€” it's operating on batch data, regardless of what the marketing materials say about "real-time forecasting."
What is the minimum polling latency from SCADA to forecast update? Ask for the specific latency from the SCADA measurement timestamp to the updated forecast being available via API. Anything above 5 minutes is not genuinely short-interval forecasting.

Section 2: Forecast Output Specifications

Forecast accuracy claims in vendor presentations are frequently based on favorable benchmark conditions. The technical specification of forecast outputs reveals whether those conditions resemble your operating environment:

Point forecast or probabilistic? Point forecasts (single MW value per interval) are less operationally useful than probabilistic forecasts with calibrated confidence intervals. For demand-response dispatch decisions, the 80th and 95th percentile bands are more relevant than the median. Ask whether confidence intervals are calibrated on out-of-sample data or derived from model assumptions.
What is the forecast horizon and resolution? A system that publishes 72-hour-ahead forecasts at hourly resolution is a day-ahead planning tool. A system with 15-minute resolution and updates every polling cycle is a real-time dispatch tool. Be specific about which you need.
How does the model handle forecast horizon degradation? All forecasting models become less accurate as the horizon extends. Vendors should provide accuracy-by-horizon tables showing MAPE at 15 minutes, 1 hour, 4 hours, 12 hours, and 24 hours ahead. A vendor who reports only 24-hour MAPE without horizon-specific breakdown is obscuring where the model performs poorly.
What is the model recalibration cadence? Load profiles change over time (new industrial loads, building efficiency upgrades, EV adoption). A model trained once and not updated will drift. Ask specifically how frequently the model is retrained, what triggers a retrain, and who initiates it.

Section 3: Demand-Response Integration

If you operate a demand-response program, the integration between the forecasting system and the DR dispatch engine is a critical evaluation area that is frequently given insufficient attention:

Does the system trigger DR dispatch from forecast exceedance or from actual threshold crossing? Forecast-triggered dispatch provides the 15â€“30 minute lead time that HVAC assets require to respond before the load event materializes. Actual-threshold dispatch means you're responding to a problem that already exists.
Does the dispatch engine support priority-based asset sequencing? Simultaneous dispatch to all enrolled assets produces rebound effects and obscures individual asset performance. Priority-based sequencing with configurable weights for reliability score, recovery time, and cost is the operational standard. Ask whether this is configurable or hard-coded.
Does it support real-time telemetry feedback during curtailment events? The system should be able to ingest interval-level metering data from enrolled assets during an event and determine whether backup dispatch is needed when assets fail to respond, not after the event in settlement review.
How does it handle multi-event operating days? DR dispatch logic must track asset recovery state across the full operating day, not just within individual events. Ask for a description of the asset state model and how recovery windows are tracked between dispatch signals.

Section 4: NERC Compliance Reporting

For utilities operating as balancing authorities, compliance reporting capability is not a secondary feature â€” it's an operational requirement:

Does the system generate BAL-001-3 CPS1 and CPS2 calculations automatically? These calculations require 1-minute ACE data, frequency measurements, and frequency bias settings. A system that calculates from SCADA inputs is more reliable than one that requires manual data entry.
What export formats does the compliance reporting support? Most Regional Entities accept specific report formats. Verify that the system exports in a format compatible with your specific RE's submission requirements, not just a generic CSV.
How far back does the interval log extend? NERC compliance data retention requirements span 3 years. Verify that the system's interval logs are retained for the full required period and that historical data is accessible for audit purposes, not just recent data.

Section 5: Security and Infrastructure

For systems that interface with operational technology environments, security architecture is a vendor qualification criterion before technical capability becomes relevant:

What is the system's NERC CIP classification approach? A cloud-connected platform that processes SCADA telemetry needs a defined answer to where the CIP Electronic Security Perimeter boundary sits. "We're not subject to CIP" is not an acceptable answer if the system receives real-time telemetry from within the ESP.
What encryption standards apply to data in transit and at rest? TLS 1.2 or higher for data in transit, AES-256 for data at rest, are the current baseline requirements for any system handling utility operational data.
What is the system's availability SLA? A forecasting system with scheduled maintenance windows during peak demand periods is not operationally acceptable. The SLA should specify availability requirements during high-demand operating conditions explicitly, not just annual uptime percentages.

Section 6: Pilot Program and Evaluation Methodology

Reputable forecasting vendors should be willing to run a structured pilot using your historical data before a purchase commitment. The pilot should include: a back-test of the model against 12 months of held-out historical data with accuracy statistics broken down by season, hour-of-day, and day-type; an analysis of how the model's accuracy compares to your current forecasting baseline; and a settlement cost analysis estimating the reduction in imbalance charges achievable with the demonstrated accuracy improvement.

Vendors unwilling to provide a structured pilot with defined success metrics are signaling that their system does not perform well on actual utility data from systems similar to yours. Marketing benchmarks derived from academic datasets or dissimilar utility systems are not a substitute for a pilot on your own interval data.

The checklist above reflects the questions that distinguish systems that reduce balancing costs in practice from those that produce impressive slide presentations. As we discuss in our analysis of ISO/RTO imbalance settlement costs, the operational value of a forecasting system is ultimately measured in your settlement statement, not in benchmark tables.

Back to Blog