Project I
Prediction Model for Natural Gas Price

Background: To provide financial planning and net income calculations based on utility costs, Disney required an accurate natural gas price forecast model
Achievement:
-
A 12-month predictive linear regression model with 10.26% error rate based on 6 key drivers and achieved $2.5M potential savings for 2024 market scenario
-
Three time series models covering 1- to 12-month prediction horizons as benchmarks
-
Additional linear regression models with 3- and 6-month prediction horizons to support flexible hedging strategy
SO WHAT:
-
Cost Reduction
Using 2024 price movement scenario, the model-based procurement approach significantly outperformed the existing method, generating $2.5M in savings compared to futures-based procurement.
-
Short-term Tactical Decisions
The model enables precise budget planning and optimal procurement timing by providing accurate near-term price forecasts, allowing Disney to capitalize on favorable market conditions and adjust business operations ahead of anticipated price peaks.
-
Long-term Strategic Planning
Six key price drivers support robust risk management frameworks and inform long-term contract structuring, enabling Disney to build sustainable hedging strategies against market volatility
Data Exploration
Identify and retrieve data from sources such as EIA and CME, conduct data quality assessments and exploratory analysis, and then apply feature engineering and transformations to enhance correlation with target variables, resulting in a modeling-ready dataset.

1. Data Collection (Representative Example)
-
Fundamental Data: Heating degree days (HDD) and net natural gas imports, sourced from the Energy Information Administration (EIA)
-
Financial Data: West Texas Intermediate (WTI) crude oil prices and natural gas futures, sourced from Yahoo Finance and Chicago Mercantile Exchange (CME)
-
Economic Data: Gross Domestic Product (GDP) and interest rates, sourced from Federal Reserve Economic Data (FRED)
-
Statistical Data: 12-month lagged standard deviation of Henry Hub prices
2. Data Preperation
-
Data cleansing: Handled missing values through mean imputation and standardized data formats across sources
-
Initial Data Transforming: Transformed lagged variables to match prediction horizons (e.g., natural gas production lag-12 as predictor for 12-month forward prices)
3. Exploratory Analysis
-
Relationship Visualizing: Visualized correlations between features and target variable
-
Feature Engineering: Conducted correlation analysis using VIF to narrow down the pool of features and applied year-over-year transformations to better capture long-term price trends.

Correlation analysis
Modeling
The modeling approach first established time series baselines (MA, ETS, SES) across 1-12 month horizons, then developed linear regression models prioritizing 12-month forecasts alongside 3- and 6-month models. VIF and stepwise regression identified the optimal feature set while avoiding multicollinearity.

1. Model Selection & Focus
-
Transparency
The model must be understandable and applicable to business decisions and process improvements
-
Model Viability
Model should be reliable, easy to update, and maintainable in the long term
-
Alignment with Hedging Strategy
Models with varying forecast horizons support decision-making across different timeframes
Established time series models as baseline while developing linear regression model to enhance prediction accuracy and interpretability
2. Model Building
Baseline Model
-
Model Deciding: Moving Average (MA), Exponential Smoothing (ETS), and Simple Exponential Smoothing (SES)
-
Forecast Horizon Setup: Models were trained and evaluated for 1 to 12 months.
Primary Model
-
Model Deciding: Linear regression focused on 12-month horizon, with parallel development for 3- and 6-month horizons
-
Features Selection: VIF and stepwise regression to eliminate multicollinearity and optimize feature sets, with the 12-month prediction model identifying six key drivers.
Model Evaluation
Model evaluation revealed linear regression substantially outperformed baseline time series models across most metrics, while the performance gap narrowed as forecast horizons decreased.

-
Time Series Model
SES demonstrated best performance among baseline models with lowest errors across all metrics, showing superior adaptability to price trends
All baseline models improved as forecast horizons shortened, with SES MAPE decreasing from 72.47% (12-month) to 21.89% (1-month)
-
Linear Regression Model
The 12-month horizon model achieved 10.26% MAPE, significantly outperforming the best baseline (SES: 72.47%) through optimized feature engineering
3-month and 6-month horizons, achieved 9.41% and 11.41% MAPE respectively, with no consistent trend as forecast horizon decreased
