Forecasting ETT with Neural Network and XGBoost
A comparative study on multivariate time-series forecasting using the Electricity Transformer Temperature (ETT) dataset.
Implemented both ensemble-based regression (XGBoost) and deep learning models (Neural Networks) to predict transformer oil temperature (OT) based on 11 correlated sensor readings.
By redefining the target variable as Δy (change in temperature) instead of the absolute value, the model captured temporal dynamics more effectively and achieved top 1% accuracy among 200 participants.
Background
The ETT dataset records two years of transformer operational data (2016–2018), including features such as time indices and multiple sensor readings.
It was modified for this project to include 11 features and artificial missing values to simulate real-world data imperfections.
The goal was to build models capable of learning multivariate dependencies and temporal patterns for accurate oil temperature prediction.
Problem Setup
- Task: Multivariate time-series regression
- Input: 11 features (including time indices and 8 measured variables)
- Target: Oil Temperature (OT) inside the transformer
- Challenges: Missing values, high dimensionality, and nonlinear temporal dependencies
Approach Overview
The project consists of two parallel pipelines:
- Ensemble Model (XGBoost) — regression using boosted decision trees
- Neural Network (PyTorch) — regression using a fully-connected deep network
Each model was trained and compared based on Mean Squared Error (MSE), feature impact, and temporal generalization.
0. Feature Engineering
Core idea: redefine the target as Δy = yₜ − yₜ₋₁ instead of absolute y, and enrich the dataset with temporal, statistical, and cyclic features to improve generalization.
| Step | Operation | Purpose |
|---|---|---|
| 1 | Missing value handling: forward-fill, then fill remaining NaNs with train mean (train/test) | Maintain time-series continuity |
| 2 | Lag features (Train): create y_lag_1, y_lag_2; remove first 2 NaN rows |
Inject temporal dependency |
| 3 | Lag features (Test): same shift logic, fill first rows using train mean | Stabilize initial test sequence |
| 4 | Difference feature: y_diff = y_lag_1 − y_lag_2 |
Capture short-term variation |
| 5 | Residual target: y_residual = y − y_lag_1 → renamed as y |
Learn Δy instead of absolute y |
| 6 | Outlier handling: replace out-of-range values (feat_5 > 100, feat_7 ∉ [1,8]) with NaN → forward-fill |
Remove extreme noise |
| 7 | Moving average (MA): apply 5-step mean to feat_1~8 → *_ma |
Smooth short-term fluctuations |
| 8 | Log transform: log-scale feat_1~8 using train-min offset |
Normalize skewed distributions |
| 9 | Cyclic encoding: convert day_of_week, hour → sin/cos pairs |
Encode daily/weekly periodicity |
| 10 | Boundary handling: detect if test follows train → if yes, copy last train y’s; otherwise, use meta XGBoost to predict initial lags | Ensure seamless test transition |
| 11 | Post-processing (y features): apply MA(5) to y_lag_1, y_diff, y_residual |
Reduce high-frequency noise |
| 12 | Feature selection: choose active features from raw/log/MA/lag/diff/cyclic; set y_residual → y as final target |
Avoid overfitting, finalize training set |
| 13 | Inverse transform: inverse_y(pred) = pred + y_lag_1 (auto-handled for train/test) |
Convert residuals back to actual y |
| 14 | Validation sync: remove first 2 train rows to align with augmented dataset | Maintain consistency during evaluation |
Final Training Target
- Train:
y ← y_residual (Δy) - Test prediction restored as:
ŷ = inverse_y(ŷ_residual) = ŷ_residual + y_lag_1
Selected Feature Groups
- Raw:
feat_1,2,3,4,6,7,8 - Log-scaled:
feat_1~4,6~8_log - Moving averages:
feat_1~4,6~8_ma - Lag/diff:
y_lag_1, y_diff, y_lag_1_ma, y_diff_ma - Cyclic encodings:
day_sin, day_cos, hour_sin, hour_cos
Overall, this feature pipeline separates trend (baseline) from change (Δy), reduces nonstationarity, and enhances generalization through temporal smoothing and cyclic signal encoding.
1. Ensemble Model (XGBoost)
Trained an ensemble regression model using XGBoost with carefully tuned hyperparameters.
Feature engineering included time decomposition, rolling-window statistics, and residual-based target transformation.
best_model_config = {
"experiment_name": "final_model",
"objective": "reg:squarederror",
"n_estimators": 50,
"learning_rate": 0.015,
"random_state": 2025,
"max_depth": 6,
"max_leaves": 0,
"min_child_weight": 4.0,
"gamma": 0.75,
"subsample": 0.7,
"colsample_bytree": 0.7,
"colsample_bynode": 0.7,
"grow_policy": "depthwise",
"tree_method": "hist",
"reg_alpha": 0.1,
"reg_lambda": 2.0,
"scale_pos_weight": 1.0,
"n_jobs": -1,
"device": "cpu",
"max_bin": 256,
"enable_categorical": False,
"max_cat_to_onehot": None,
"multi_strategy": "one_output_per_tree",
}
Result: Achieved an MSE of 0.1298, ranking within the top 1% among 200 participants. Properly tuned depth and subsampling enhanced stability and reduced overfitting across time windows.
2. Neural Network Model
- Implemented in PyTorch
- Architecture:
- Input: 29-dimensional vector (features)
- Layers: Three hidden layers (256 → 128 → 64 neurons) with ReLU activation and Dropout(0.2)
- Output: 1 (predicted oil temperature)
- Training details:
- Optimizer: Adam
- Loss: MSELoss
- Scheduler: ReduceLROnPlateau
- Applied hyperparameter tuning on hidden size, learning rate, and batch size.
- Result: Achieved a Test MSE of 0.132, ranking within the top 1% among 200 participants.
Repository & Code
All related data, code, and implementation details are available on GitHub: 🔗 GitHub Repository