Forecasting ETT with Neural Network and XGBoost

A comparative study on multivariate time-series forecasting using the Electricity Transformer Temperature (ETT) dataset.
Implemented both ensemble-based regression (XGBoost) and deep learning models (Neural Networks) to predict transformer oil temperature (OT) based on 11 correlated sensor readings.
By redefining the target variable as Δy (change in temperature) instead of the absolute value, the model captured temporal dynamics more effectively and achieved top 1% accuracy among 200 participants.


Background

The ETT dataset records two years of transformer operational data (2016–2018), including features such as time indices and multiple sensor readings.
It was modified for this project to include 11 features and artificial missing values to simulate real-world data imperfections.
The goal was to build models capable of learning multivariate dependencies and temporal patterns for accurate oil temperature prediction.

Problem Setup

Approach Overview

The project consists of two parallel pipelines:

  1. Ensemble Model (XGBoost) — regression using boosted decision trees
  2. Neural Network (PyTorch) — regression using a fully-connected deep network

Each model was trained and compared based on Mean Squared Error (MSE), feature impact, and temporal generalization.


0. Feature Engineering

Core idea: redefine the target as Δy = yₜ − yₜ₋₁ instead of absolute y, and enrich the dataset with temporal, statistical, and cyclic features to improve generalization.

Step Operation Purpose
1 Missing value handling: forward-fill, then fill remaining NaNs with train mean (train/test) Maintain time-series continuity
2 Lag features (Train): create y_lag_1, y_lag_2; remove first 2 NaN rows Inject temporal dependency
3 Lag features (Test): same shift logic, fill first rows using train mean Stabilize initial test sequence
4 Difference feature: y_diff = y_lag_1 − y_lag_2 Capture short-term variation
5 Residual target: y_residual = y − y_lag_1 → renamed as y Learn Δy instead of absolute y
6 Outlier handling: replace out-of-range values (feat_5 > 100, feat_7 ∉ [1,8]) with NaN → forward-fill Remove extreme noise
7 Moving average (MA): apply 5-step mean to feat_1~8*_ma Smooth short-term fluctuations
8 Log transform: log-scale feat_1~8 using train-min offset Normalize skewed distributions
9 Cyclic encoding: convert day_of_week, hoursin/cos pairs Encode daily/weekly periodicity
10 Boundary handling: detect if test follows train → if yes, copy last train y’s; otherwise, use meta XGBoost to predict initial lags Ensure seamless test transition
11 Post-processing (y features): apply MA(5) to y_lag_1, y_diff, y_residual Reduce high-frequency noise
12 Feature selection: choose active features from raw/log/MA/lag/diff/cyclic; set y_residual → y as final target Avoid overfitting, finalize training set
13 Inverse transform: inverse_y(pred) = pred + y_lag_1 (auto-handled for train/test) Convert residuals back to actual y
14 Validation sync: remove first 2 train rows to align with augmented dataset Maintain consistency during evaluation

Final Training Target

Selected Feature Groups

Overall, this feature pipeline separates trend (baseline) from change (Δy), reduces nonstationarity, and enhances generalization through temporal smoothing and cyclic signal encoding.

1. Ensemble Model (XGBoost)

Trained an ensemble regression model using XGBoost with carefully tuned hyperparameters.
Feature engineering included time decomposition, rolling-window statistics, and residual-based target transformation.

best_model_config = {
    "experiment_name": "final_model",
    "objective": "reg:squarederror",
    "n_estimators": 50,
    "learning_rate": 0.015,
    "random_state": 2025,
    "max_depth": 6,
    "max_leaves": 0,
    "min_child_weight": 4.0,
    "gamma": 0.75,
    "subsample": 0.7,
    "colsample_bytree": 0.7,
    "colsample_bynode": 0.7,
    "grow_policy": "depthwise",
    "tree_method": "hist",
    "reg_alpha": 0.1,
    "reg_lambda": 2.0,
    "scale_pos_weight": 1.0,
    "n_jobs": -1,
    "device": "cpu",
    "max_bin": 256,
    "enable_categorical": False,
    "max_cat_to_onehot": None,
    "multi_strategy": "one_output_per_tree",
}

Result: Achieved an MSE of 0.1298, ranking within the top 1% among 200 participants. Properly tuned depth and subsampling enhanced stability and reduced overfitting across time windows.

2. Neural Network Model


Repository & Code

All related data, code, and implementation details are available on GitHub: 🔗 GitHub Repository