The regime engine, in one note

Most of the design work on the long sleeve sits in one place: the regime classifier. Everything else — the sleeve bands, the bond floor, the drawdown throttle, the reserve pool — is downstream of “what regime are we in”. This note is the classifier, end to end.

The first decision was the number of states. We landed on three: bull, chop, bear. Two is too few; the behaviour of a quiet, range-bound tape is genuinely different from a bear, and conflating them costs about 200bps a year on the sleeve. Five is too many — the extra states (mild bull, late-stage X, recovery, etc.) overfit to the calibration window and the smoothed posterior bounces between them in ways that drive turnover without driving return. Three is the smallest number where each state earns its keep. We’ve re-tested four and five every six months for two years; three has won every audit so far.

A Hidden Markov Model is only as good as what it observes. Ours observes six features, computed daily on a rolling 60-day window, and we’ll walk each one because the choice is where most of the research time went.

The first is realised volatility on a 20-day window — annualised log returns of the composite risk basket (60/40 BTC/SPY blend, weighted to current sleeve allocation). Bear regimes have characteristic volatility jumps; chop has elevated baseline; bull has the lowest baseline of the three and the rarest jumps. The shape of the distribution differs, not just the level.

The second is trend strength via Mann-Kendall on the 90-day log price curve. Robust to outliers, doesn’t pretend monotonicity exists when it doesn’t, handles regime transitions cleanly. We tried EWMA-trend and slope-of-OLS first; both were worse out of sample.

The third is breadth. Of the four sleeve-eligible assets — BTC, SPY, GLD, SHY — how many are above their 60-day moving average. 3-of-4 is a reasonable bull confirm; 0-of-4 is a strong bear confirm; 2-of-4 with the right composition (defensive heavy) reads as chop.

Fourth is the term-structure slope from SHY-to-IEF, the differential between short and medium-duration Treasury yields. Inverts in late-cycle, steepens in early-cycle. The sleeve doesn’t trade it, but it’s a clean regime-conditioning feature and the HMM loves it because it carries macro information that price-only features miss.

Fifth is the risk-on/risk-off ratio: SPY return minus GLD return on a 30-day window. A surprisingly clean discriminator at regime transitions, because SPY and GLD diverge most sharply when the regime is genuinely changing — they’re the two cleanest proxies for “risk appetite increasing” and “risk appetite collapsing”.

Sixth is drawdown depth: distance from the rolling 252-day equity peak. Conditioning on this means a 10% drawdown in a bull is treated differently from a 10% drawdown in a bear; the regime mixing handles itself once you let the model condition on its own depth.

We tried adding seven more features (CDS spreads, VIX term structure, perp funding rates, breadth on extended baskets). None of them improved out-of-sample classification accuracy by more than 0.4pp. Some made it worse by adding noise. Six is what survived the cull, and we re-test the cull yearly.

The model itself is straightforward Baum-Welch on a five-year rolling window, re-fit monthly. The transition matrix is regularised toward a diagonally dominant prior — we don’t want the model to learn that regimes flip every other week, because they don’t, and a model that thinks they do generates turnover that eats the edge. Posterior probabilities get a 14-day exponentially-weighted smoother before they’re used downstream; raw posteriors are too jittery and would cause the sleeve to flip-flop on noise.

The cleanest design lesson from the last three years has nothing to do with the model and everything to do with how the model is wired into the allocation. The regime should not make the bet; the regime should set the ceiling on the bet.

Each sleeve has a band — BTC 5-35%, SPY 10-45%, GLD 5-15%, SHY ≥20%. The regime determines where in the band the sleeve sits, but the band itself is structural and never moves. In a strong bull, BTC sits near 35%. In a chop, near 15-20%. In a bear, near 5%. The bond floor is non-negotiable in any regime. This is regime-conditional exposure ceilings, not regime-conditional positioning, and the distinction matters precisely when the regime is wrong, which it is some non-trivial fraction of the time. When the classifier mis-labels chop as bull, the sleeve over-allocates to BTC at the top of the band but it doesn’t go above the band, because the band is a structural cap, not a model output. The damage is bounded.

A regime-conditional positioning model — bear means short, bull means leveraged long — would not have that bound. We considered it and rejected it. The 90% confidence intervals on regime classification at any given moment in time are wide enough that you don’t want the regime making the actual sign of the position; you want it modulating exposure inside an already-bounded structure.

Regime transitions are the riskiest moments. The model’s confidence drops, posteriors become roughly equal across two states, and the sleeve should do less, not more, until the new regime stabilises. We enforce this with a soft de-risking rule: if the largest posterior probability is below 0.65 for more than five consecutive days, every sleeve allocation moves halfway toward its bear-regime weight, regardless of what the most-likely regime is. This costs a small amount of expected return when the model resolves cleanly into bull or chop, and saves a lot of pain when it resolves into bear. The asymmetry of the payoff makes the trade obvious.

The honest weakness of the current setup is the calibration window. Five years is too short — it doesn’t include enough complete cycles. We’re moving toward a 12-year window with a hierarchical prior that lets recent data weigh more without dominating. That change is the only structural one we expect to make in the next year. Everything else has held up.

— inite team