top of page

Project I

Prediction Model for Natural Gas Price

Background: To provide financial planning and net income calculations based on utility costs, Disney required an accurate natural gas price forecast model

Achievement:

  • A 12-month predictive linear regression model with 10.26% error rate based on 6 key drivers and achieved $2.5M potential savings for 2024 market scenario

  • Three time series models covering 1- to 12-month prediction horizons as benchmarks

  • Additional linear regression models with 3- and 6-month prediction horizons to support flexible hedging strategy

SO WHAT:

  •  Cost Reduction

Using 2024 price movement scenario, the model-based procurement approach significantly outperformed the existing method, generating $2.5M in savings compared to futures-based procurement.

  •  Short-term Tactical Decisions

The model enables precise budget planning and optimal procurement timing by providing accurate near-term price forecasts, allowing Disney to capitalize on favorable market conditions and adjust business operations ahead of anticipated price peaks.

  •  Long-term Strategic Planning

Six key price drivers support robust risk management frameworks and inform long-term contract structuring, enabling Disney to build sustainable hedging strategies against market volatility

Data Exploration

Identify and retrieve data from sources such as EIA and CME, conduct data quality assessments and exploratory analysis, and then apply feature engineering and transformations to enhance correlation with target variables, resulting in a modeling-ready dataset.

Filter.png

1. Data Collection (Representative Example)

  • Fundamental Data:  Heating degree days (HDD) and net natural gas imports, sourced from the Energy Information Administration (EIA)

  • Financial Data:  West Texas Intermediate (WTI) crude oil prices and natural gas futures, sourced from Yahoo Finance and Chicago Mercantile Exchange (CME)

  • Economic Data:  Gross Domestic Product (GDP) and interest rates, sourced from Federal Reserve Economic Data (FRED)

  • Statistical Data:  12-month lagged standard deviation of Henry Hub prices 

2. Data Preperation

  • Data cleansing: Handled missing values through mean imputation and standardized data formats across sources

  • Initial Data Transforming: Transformed lagged variables to match prediction horizons (e.g., natural gas production lag-12 as predictor for 12-month forward prices)

3. Exploratory Analysis

  • Relationship Visualizing:  Visualized correlations between features and target variable

  • Feature Engineering:  Conducted correlation analysis using VIF to narrow down the pool of features and applied year-over-year transformations to better capture long-term price trends.

Correlation analysis

Correlation analysis

Modeling

The modeling approach first established time series baselines (MA, ETS, SES) across 1-12 month horizons, then developed linear regression models prioritizing 12-month forecasts alongside 3- and 6-month models. VIF and stepwise regression identified the optimal feature set while avoiding multicollinearity.

Screenshot 2026-01-28 at 23.51 copy.png

1. Model Selection & Focus

  •  Transparency

The model must be understandable and applicable to business decisions and process improvements

  •  Model Viability

Model should be reliable, easy to update, and maintainable in the long term

  •  Alignment with Hedging Strategy

Models with varying forecast horizons support decision-making across different timeframes

Established time series models as baseline while developing linear regression model to enhance prediction accuracy and interpretability

2. Model Building

Baseline Model

  • Model Deciding: Moving Average (MA), Exponential Smoothing (ETS), and Simple Exponential Smoothing (SES)

  • Forecast Horizon Setup: Models were trained and evaluated for 1 to 12 months.

Primary Model

  • Model Deciding: Linear regression focused on 12-month horizon, with parallel development for 3- and 6-month horizons

  • Features Selection: VIF and stepwise regression to eliminate multicollinearity and optimize feature sets, with the 12-month prediction model identifying six key drivers.

Model Evaluation

Model evaluation revealed linear regression substantially outperformed baseline time series models across most metrics, while the performance gap narrowed as forecast horizons decreased.

23740280-C29C-4E49-999B-87308789EFC2.png
  •  Time Series Model

SES demonstrated best performance among baseline models with lowest errors across all metrics, showing superior adaptability to price trends

All baseline models improved as forecast horizons shortened, with SES MAPE decreasing from 72.47% (12-month) to 21.89% (1-month)

  •  Linear Regression Model

The 12-month horizon model achieved 10.26% MAPE, significantly outperforming the best baseline (SES: 72.47%) through optimized feature engineering

3-month and 6-month horizons, achieved 9.41% and 11.41% MAPE respectively, with no consistent trend as forecast horizon decreased

WechatIMG12383.jpg

© 2025 by Shangyue Song. 

  • Email icon
  • LinkedIn
bottom of page