DeepUnifiedMom - Multi-Horizon Momentum Through Deep Learning

You run a 12-month momentum screen. You rank stocks, allocate capital, and move on. Three months later, fast mean-reversion wipes half your winners before the long-horizon signal had time to pay. The problem is not momentum itself. The problem is that a single lookback window forces you to choose one trend speed and ignore every other timeframe where momentum is working right now.

DeepUnifiedMom, a framework published by Joel Ong and Dorien Herremans at the Singapore University of Technology and Design (arXiv:2406.08742, June 2024), attacks this limitation directly. It uses multi-task deep learning with a multi-gate mixture of experts to build a single portfolio that dynamically allocates capital across fast, medium, and slow multi-horizon momentum signals. No manual lookback selection. No static equal-weight blending. One model, end-to-end.

I find this approach interesting because it formalizes something I have noticed anecdotally for years: the best momentum traders already adjust their holding period based on how a market is behaving. DeepUnifiedMom makes that adaptation systematic, testable, and repeatable across asset classes.

Why Fixed-Horizon Momentum Falls Short

Standard time-series momentum (TSMOM) takes a single lookback window and applies it uniformly across every asset. Moskowitz, Ooi, and Pedersen (2012) used 12 months. The signal is simple: buy if the past-12-month excess return is positive, sell if negative, and scale by volatility.

The limitation shows up in two ways. First, different assets trend at different speeds. Crude oil might trend on a 3-month cycle while the S&P 500 trends on 6 to 12 months. Forcing one window onto both misallocates. Second, the same asset changes speed over time. A stock trending slowly for six months can accelerate into a parabolic move. A 12-month signal is too slow to capture that acceleration, and a 1-month signal would have been whipsawed during the earlier quiet phase.

Practitioners know this. They run multiple TSMOM portfolios at different speeds and blend them with fixed weights. That helps, but it is still static. Equal-weighting a 1-month and a 12-month signal assumes both contribute equally at all times. They do not. In a grinding uptrend, the slow signal dominates. In a fast reversal, only the short lookback captures the shift quickly enough.

What most traders get wrong here: they treat lookback selection as a setup decision made once, not as an ongoing allocation problem. The backtested results of any single TSMOM variant look acceptable in isolation. But the opportunity cost of ignoring the other speeds is invisible unless you run all of them simultaneously and measure the gap.

Multi-Horizon Momentum Architecture

DeepUnifiedMom solves the speed-selection problem by training three task-specific networks in parallel. Each one predicts a forward-looking TSMOM signal at a different horizon: 1 month (fast), 3 months (medium), and 6 months (slow). Each network generates a complete momentum portfolio at its timeframe.

The architecture rests on two components from the deep learning literature:

Multi-task learning (MTL) means all three networks share a set of LSTM expert layers. These shared layers learn features that are useful across all horizons. An uptrend shows certain patterns regardless of whether you measure it over 20 or 120 trading days. Sharing these patterns accelerates training and reduces overfitting because each task regularizes the others.

Multi-gate mixture of experts (MoME) means each task-specific network has its own gating mechanism. The gate receives the same input features and outputs a set of weights that determine how much each shared LSTM expert contributes to that particular task. The fast-momentum task might lean heavily on an expert that tracks short-term volatility patterns. The slow-momentum task might weight an expert that captures longer structural shifts. The gates learn this specialization during training.

On top of these three task portfolios sits the Capital Allocation Network (CAN). It takes the outputs of the fast, medium, and slow portfolios and determines how much capital to assign to each one at every rebalance. The CAN uses the Sharpe ratio as its training objective, specifically a version with a soft-capping mechanism that prevents overfitting to extreme values in noisy batches.

The entire system trains end-to-end. That is the structural advantage over running three separate TSMOM strategies and blending them after the fact. The CAN sees how all three interact and optimizes the combination, not just the individual parts.

What the Backtest Shows

Ong and Herremans backtested DeepUnifiedMom across 49 futures contracts spanning equity indexes, fixed income, foreign exchange, and commodities. The test period runs from January 2000 to December 2023 using expanding-window cross-validation. Transaction costs were set at 3 basis points per trade.

The headline numbers for DeepUnifiedMom(CAN): annualized return 1.92%, annualized volatility 0.82%, Sharpe ratio 2.33, Sortino ratio 3.81, maximum drawdown -1.02%.

Compare that to the best classical benchmark, TSMOM(1,12), which equal-weights all lookbacks from 1 to 12 months: Sharpe 1.07, Sortino 1.58, maximum drawdown -2.01%. The unified model more than doubles the Sharpe ratio while cutting peak drawdown in half.

The comparison that matters most for practitioners: DeepUnifiedMom(CAN) versus DeepUnifiedMom(EQWT), which simply equal-weights the three horizon portfolios after training. Equal weighting achieves a Sharpe of 2.31 and maximum drawdown of -0.99%. The CAN adds a marginal improvement (Sharpe 2.33 vs 2.31). The real gain comes from the multi-task LSTM experts, not from the final allocation layer alone.

What this means: even a simplified version of the idea, where you train separate momentum models at different speeds and then equal-weight them, captures most of the benefit. You do not need the full CAN architecture to gain from multi-horizon thinking.

Where the Model Assigns Weight

I expected the CAN to shift dramatically between fast and slow allocations based on market regime. It does not. The weight profile published in the paper shows remarkable consistency across the 24-year period. The slow portfolio generally receives the largest allocation, with fast and medium sharing the remainder.

This surprised me. It means the primary source of alpha is not regime-timing between horizons but the improved signal quality that comes from multi-task shared learning. The LSTM experts, forced to learn features useful for all three speeds simultaneously, produce better individual signals than they would in isolation. The unification happens at the feature level, not just the allocation level.

The practical implication: if you are implementing a multi-horizon approach without deep learning, you can approximate much of the benefit by running independent momentum signals at 1M, 3M, and 6M lookbacks and allocating roughly 40% slow, 30% fast, 30% medium. You will not match the full model, but the gap is smaller than you might think.

Where people get this wrong: they assume the value of a multi-task model lies entirely in dynamic reallocation. It does not. The value lies in what the shared layers learn. Separate models at each horizon cannot share information about common trend features.

Practical Implementation Without the Full Model

I run standard momentum screens using a single lookback window. After studying this paper, I started tracking what happens when I overlay a second horizon check.

The simplest version works like this. Compute the volatility-scaled return at three horizons: 1 month (21 trading days), 3 months (63 days), and 6 months (126 days). For each asset, the TSMOM signal at horizon s is:

TSMOM_s = \frac{r_{t-s,t}}{\sigma_s}

Where $r_{t-s,t}$ is the log return over s days and $\sigma_s$ is the realized volatility over the same window. A positive signal means the trend is up relative to its own noise level.

I weight the three signals using a simple regime heuristic. When trailing 21-day realized volatility for the broad market is below its 6-month median, I weight the fast signal at 50% and split the remainder between medium and slow. When vol is above its median, I flip: 50% to the slow signal and split the rest. This is a crude approximation of what the MoE gates learn to do automatically.

The paper does not report results for this simplified approach directly, but the logic follows from the backtest data: combining momentum signals across different horizons consistently outperforms any single fixed lookback, and the improvement is robust to the exact weighting scheme.

What DeepUnifiedMom Does Not Solve

Three limitations worth flagging.

First, the backtest uses 49 futures contracts. These are among the most liquid instruments on earth. Applying the same framework to individual equities introduces execution constraints, capacity limits, and microstructure noise that futures do not have. The Sharpe numbers will compress in less liquid markets.

Second, the annualized returns are modest in absolute terms (1.92% for the best variant). This is a volatility-scaled portfolio designed for low risk. The Sharpe ratio is excellent, but if you need absolute return, you would need to lever the strategy or combine it with other signals. The authors explicitly scale all positions by realized vol, which compresses returns by design.

Third, the model requires daily retraining on an expanding window. Each training iteration takes roughly one hour on a GPU (NVIDIA RTX 2090 in the original experiments). For a retail trader, this is not trivial infrastructure. For an institutional desk, it is routine. The walk-forward methodology is sound, but the computational overhead is real.

The common misconception: people see a 2.33 Sharpe ratio and assume it translates directly to their equity portfolio. It does not. This is a diversified multi-asset futures strategy with daily volatility targeting. The principle (unifying horizons) transfers. The exact numbers will not.

Connection to Other Momentum Research

DeepUnifiedMom sits in a growing family of research that treats momentum as a signal to be refined rather than a standalone factor. The LLM-conditioned momentum approach filters momentum entries using news sentiment. The X-Trend regime detection method adapts trend-following without retraining by using few-shot learning to classify market states.

Where DeepUnifiedMom differs: it does not add an external signal. It stays purely within price momentum but operates on the temporal dimension. The innovation is structural (how you combine lookback windows) rather than informational (what additional data you bring in).

For traders who already run multiple momentum timeframes manually, the paper provides empirical confirmation that this practice works. It also shows where the ceiling might be: even the best dynamic allocation (CAN) only marginally beats equal-weighting, which suggests that the act of running multiple horizons matters more than the precision of how you weight them.

When Multi-Horizon Momentum Helps Most

The cases where fixed-horizon momentum fails most visibly are momentum crashes. Daniel and Moskowitz (2016) documented these: sharp reversals where long-momentum stocks collapse and short-momentum stocks rally simultaneously. These crashes are concentrated in the first months of a bear market.

A multi-horizon approach reduces crash exposure because the fast signal flips negative before the slow signal does. If you are blending horizons, your net exposure decreases naturally as the short-term trend turns while the long-term trend is still technically positive. You do not need a separate crash indicator. The disagreement between horizons is itself a warning signal.

I pay attention to this divergence in my own trading. When my 1-month rate of change turns negative while the 6-month signal is still positive, I reduce position size by at least a third. This is not from the DeepUnifiedMom model directly, but the principle is identical: short-term disagreement with long-term trend means uncertainty is rising, and smaller positions are appropriate.

The mistake most momentum traders make during these transitions: they treat the long-horizon signal as the “real” trend and dismiss the short-term reversal as noise. Sometimes it is noise. But when it is not, the loss can erase months of gains. Using multiple horizons forces you to respect both signals instead of choosing one and hoping.

Building Multi-Horizon Momentum Into Your Process

You do not need deep learning to capture most of the insight from this research. The core finding is that combining momentum signals at 1M, 3M, and 6M horizons beats any single lookback window, and the improvement is robust even with simple equal-weighting.

Start with three steps. First, compute volatility-normalized momentum at three horizons for every asset in your universe. Second, require at least two of three horizons to agree on direction before taking a position. This filters out the assets where short-term and long-term trends conflict, which are exactly the positions most likely to reverse. Third, size the position proportionally to how many horizons agree: full size when all three align, half size when only two agree.

The source code for the full DeepUnifiedMom model is publicly available (github.com/joelowj/unified_mom_mmoe). If you have the infrastructure to train and deploy it, the paper demonstrates its edge. If you do not, the simplified version still captures the structural advantage of multi-horizon thinking over single-window momentum.

Educational content only. Not investment advice. Trading involves risk. You are responsible for your decisions.

Why Fixed-Horizon Momentum Falls Short

Multi-Horizon Momentum Architecture

What the Backtest Shows

Where the Model Assigns Weight

Practical Implementation Without the Full Model

What DeepUnifiedMom Does Not Solve

Connection to Other Momentum Research

When Multi-Horizon Momentum Helps Most

Building Multi-Horizon Momentum Into Your Process

You Might Also Like

Elliott Wave Theory: A Practical Wave-Counting Framework

Seasonal Patterns in Stock Markets: What Works and What Is Data Mining

Gann HiLo Activator Formula and Signals Guide