ITS-Mina: A Harris Hawks Optimization-Based All-MLP Framework with Iterative Refinement and External Attention for Multivariate Time Series Forecasting

—

ArXiv PDF

100%

ITS-Mina: A Harris Hawks Optimization-Based All-MLP Framework with Iterative Refinement and External Attention for Multivariate Time Series Forecasting

Pourya Zamanvaziri

Amirhossein Sadr

Aida Pakniyat

Dara Rahmati

Published on 4/30/2026

Cross-asset

Machine learning

Deep learning

ITS-Mina is a novel all-MLP framework for multivariate time series forecasting that addresses key limitations of existing MLP-based models. The framework integrates three innovations: (1) iterative refinement via a shared-parameter residual mixer stack, which deepens effective computation without increasing parameter count by reapplying the same mixer block multiple times; (2) an external attention module that replaces self-attention with learnable memory units, capturing cross-sample global dependencies with linear complexity and acting as an implicit regularizer; and (3) Harris Hawks Optimization (HHO) for automatic dropout rate tuning, enabling adaptive regularization tailored to each dataset.

The architecture processes input through instance normalization, iterative refinement (N rounds of a depth-M mixer stack), external attention, and temporal projection with denormalization. Extensive experiments on six benchmark datasets (Traffic, Electricity, ETTh1, ETTh2, ETTm1, ETTm2) show that ITS-Mina achieves state-of-the-art or highly competitive performance compared to eleven baselines, including Transformer-based and MLP-based models. The results demonstrate the effectiveness of iterative refinement, external attention, and HHO-based optimization in improving forecasting accuracy while maintaining computational efficiency.

Highlights

1Proposes ITS-Mina, an all-MLP framework for multivariate time series forecasting with three key innovations: iterative refinement via shared-parameter mixer loops, external attention for efficient global context, and HHO-based dropout optimization.
2Achieves state-of-the-art or highly competitive performance on six benchmark datasets (Traffic, Electricity, ETTh1, ETTh2, ETTm1, ETTm2) against eleven baselines across multiple forecasting horizons.
3Demonstrates that iterative refinement with weight tying deepens effective computation without increasing parameter count, improving representation quality.
4Shows that external attention captures cross-sample global dependencies with linear complexity and acts as an implicit regularizer.
5Introduces HHO for automatic dropout rate tuning, providing adaptive regularization tailored to each dataset.

Methods

M
Iterative refinement via shared-parameter residual mixer stack: applies the same depth-M mixer stack N times with tied weights, deepening computation without multiplying parameters.
M
External attention module: uses two learnable memory matrices (slots) to capture global inter-sample correlations with O(LCS) complexity, replacing self-attention's O(L^2).
M
Harris Hawks Optimization (HHO) for dropout rate tuning: formulates dropout optimization as a continuous problem and uses HHO's exploration-exploitation balance to find optimal rates.
M
Instance-wise normalization and temporal projection: normalizes input before mixing and applies linear readout with inverse normalization for forecasting.

Results

R
ITS-Mina achieves state-of-the-art MSE/MAE on the majority of dataset-horizon combinations across six benchmarks.
R
Outperforms Transformer-based models (Informer, Autoformer, FEDformer) and MLP-based models (DLinear, TSMixer) on most settings.
R
Iterative refinement with N=3 rounds and M=2 mixer blocks yields best performance on average.
R
External attention with S=64 slots provides effective global context while maintaining linear complexity.
R
HHO-based dropout tuning finds optimal rates around 0.1-0.3, improving generalization over fixed rates.

0/5turns

Analyze Paper

Generate insights from "ITS-Mina: A Harris Hawks Optimization-Based All-MLP Framewor...".

Suggested Actions