transition_01_research_thesis

From Institutional Research Problem to Cross-Sectional ML Investigation

Core research question

Can a shared cross-sectional ML framework extract persistent relative-return structure across heterogeneous institutional ETF regimes while remaining interpretable, validation-aware, and executable under realistic trading frictions?

Core thesis

The investigation is not fundamentally attempting to demonstrate extraordinary returns. It is attempting to determine whether quantitative research itself can remain interpretable, chronologically disciplined, diagnostically transparent, and operationally coherent under non-stationary market structure.

Research state

Cross-sectional prediction is investigated under intentionally heterogeneous macro structure. The ETF universe spans equities, international exposure, commodities, rates, and sector concentration because relative-ranking systems become difficult to interpret when macro sensitivity collapses into a single dominant factor. The compact universe intentionally introduces tension between interpretability and cross-sectional richness. Although the nine-ETF panel limits breadth relative to institutional equity universes, it preserves enough heterogeneity to produce meaningful ranking structure while keeping chronology, attribution, and instability propagation interpretable at the asset level.

Validation doctrine

Chronology is treated as a governing constraint rather than a retrospective validation appendix. All later signal interpretation therefore inherits the train-test separation structure established in walk_forward_timeline. Predictive structure is evaluated only after chronology integrity has already constrained the investigation.

Expected pressure

The central tension emerges immediately. Predictive structure may appear statistically meaningful while remaining too unstable to survive executable portfolio translation. Temporary ranking separation, compressed dispersion, coefficient reversals, and transaction-cost amplification all become possible failure states long before realized performance deteriorates visibly.

Model posture

Ridge regression is selected deliberately. The objective is not architectural sophistication. The objective is coefficient observability, feature attribution continuity, and regime-conditioned interpretability. The model therefore functions less as a black-box optimization engine and more as an observable research surface through which instability can propagate visibly.

Evidence continuity

Institutional credibility emerges from continuity between hypothesis construction, feature engineering, ranking behaviour, validation chronology, execution realism, and eventual degradation. Intermediate artefacts remain preserved because instability itself constitutes research evidence. Coefficient reversals, failed validation splits, and deteriorating ranking geometry are therefore retained explicitly rather than compressed into aggregate summaries.

Research skepticism

Asset-return prediction is approached as a structurally low signal-to-noise problem in which temporary predictive persistence can emerge from unstable macro relationships. Instability, degradation, and reversal are therefore treated as expected states of the investigation rather than exceptional outcomes. This framing becomes essential later when turnover acceleration and validation deterioration emerge coherently from upstream predictive compression.

Transition exit state

The methodological system itself consequently becomes part of the research object. Validation chronology, feature lineage, ranking persistence, instability propagation, and diagnostic continuity remain continuously observable because the investigation is fundamentally concerned with how quantitative reasoning evolves under evidence pressure.

transition_02_data_and_feature_engineering

From Raw Market Structure to Temporally Aligned Hypothesis Construction

Core thesis

Feature engineering functions as formalized market-theory construction. Each feature family encodes assumptions that must later survive unseen chronological regimes.

active evidence statesubstrate

02A Feature Substrate Construction

Feature systems encode market hypotheses before prediction begins.

Raw market observations become a temporally aligned feature substrate through pooled panel construction and leakage-safe normalization. Momentum persistence, volatility compression, mean-reversion pressure, and beta sensitivity function as competing explanatory systems.

Feature IC heatmap showing cross-sectional information coefficient by feature and walk-forward split.

feature_family_taxonomyHypothesis encoding

Trend	Momentum, trend strength, breakout, risk-adjusted momentum	Persistence
Volatility	Realized volatility, vol compression	Risk-regime texture
Mean-Reversion	Z-score, drawdown distance	Reversal pressure
Market Structure	Rolling market beta	Macro sensitivity

02B Chronology Integrity & Comparability

Comparability is maintained as chronology infrastructure.

Rolling z-score normalization preserves temporal comparability without importing future distributional information. Warmup erosion and forward label loss remain visible instead of being hidden as implementation detail.

chronology_constraintsAlignment discipline

Panel	9 assets, 3020 aligned trading days
Warmup erosion	252 rows removed per asset
Forward horizon	21 trading days
Effective range	2014-01-02 to 2024-11-29
Normalization	rolling, past-only parameters

02C Regime-Conditioned Behaviour

Feature usefulness becomes conditional evidence, not stable legitimacy.

Several feature structures strengthen during trend-dominated environments while degrading or reversing during compressed-dispersion regimes. The feature system therefore exits this transition as a structured hypothesis space, not an established predictive claim.

Feature family IC by walk-forward split.

Feature correlation heatmap exposing dependence structure.

Feature regime z-score surface showing activated and stressed feature states over time.

transition_03_cross_sectional_signal_construction

From Hypothesis Space to Cross-Sectional Conviction Geometry

Core thesis

Predictive legitimacy emerges through persistent ranking geometry, monotonic ordering, and observable cross-sectional separation rather than isolated prediction accuracy.

active signal statescores

03A Cross-Sectional Score Emergence

The feature substrate begins producing relative ordering structure.

Predictive structure is evaluated through ranking behaviour rather than directional forecasting accuracy alone. Positive cross-sectional IC indicates that the system is repeatedly identifying relative winners and losers, but this remains a pre-execution signal diagnostic.

Daily cross-sectional IC surface showing ranking signal consistency and inversion regimes.

Raw model prediction distribution showing score formation around the ranking target.

signal_emergenceRanking signal diagnostics

Positive-IC ranking periods	63.1%
Mean monthly cross-sectional IC	0.1450
Positive cross-sectional IC months	65.6%
Model-fitting observations	24,912

03B Ranking Geometry

Signal legitimacy depends on separation, persistence, and realized spread.

Ranking geometry becomes the central diagnostic surface because it combines score dispersion, top-bottom discrimination, realized economic separation, and rank persistence through time.

Ranking geometry diagnostic showing score dispersion, ranking separation, realized spread, and rank persistence.

ranking_geometry_summarySeparation and persistence

Mean score IQR	0.0346
Min score IQR	0.0120
Mean top-bottom score spread	0.0715
Mean realized spread	0.891% pre-cost
Positive realized spread	69%
Mean monthly rank autocorrelation	0.785
Positive rank persistence	100%

03C Conditional Predictive Persistence

Prediction magnitude carries information, but conviction remains unstable.

Monotonic ordering confirms that prediction strength is not arbitrary score variation. At the same time, compressed dispersion and unstable consistency leave predictive legitimacy and predictive fragility inside the same ranking system.

Prediction strength diagnostic showing monotonic realized return ordering by score group.

Rolling directional accuracy and IC consistency surface exposing rank persistence through time.

prediction_group_orderingPre-cost discrimination

Top group	1.301%
Mid group	1.050%
Bottom group	0.231%
Top-bottom spread	1.070% pre-cost

transition exit state

Ranking structure exists; monotonicity remains observable.

Dispersion compresses; instability emerges through time.

transition_04_predictive_ml_behaviour

From Ranking Legitimacy to Instability Propagation

Core thesis

Positive predictive structure does not imply stable feature semantics or persistent ranking conviction. Predictive cognition remains conditionally stable and temporally fragile.

active predictive statepersistence

04A Predictive Persistence

Positive predictive structure does not behave uniformly through time.

The IC regime surface exposes alternating persistence, deterioration, recovery, and temporary inversion. The model can identify relative structure while still failing to maintain stable predictive behaviour across changing regimes.

Rolling IC regime diagnostic exposing predictive persistence, deterioration, recovery, and inversion.

conditional_predictive_statesPersistence is not binary

Strengthening	specific macro structures
Weakening	compressed-dispersion environments
Inversion	several stressed transitions

04B Coefficient Drift

Regularization constrains magnitude, but not semantic instability.

Repeated sign reversals indicate regime-specific learning rather than universal predictive laws. Coefficient behaviour becomes an interpretability surface where feature meaning changes under evidence pressure.

Coefficient sign heatmap showing feature sign reversals across walk-forward splits.

Coefficient stability diagnostic showing sign consistency and coefficient dispersion.

coefficient_reversal_surfaceLeast stable semantics

20D Momentum	57% sign consistency
60D Momentum	57% sign consistency
21D Realized Volatility	43% sign consistency
20D Z-Score	43% sign consistency
63D Breakout Strength	43% sign consistency

04C Structural Fragility

Predictive cognition becomes conditional, concentrated, and temporally fragile.

Feature and family contribution surfaces show explanatory dominance shifting through time. Predictive optimism gives way to instability propagation: structure exists, but explanatory stability deteriorates.

Feature contribution heatmap showing realized predictive influence through time.

Feature family contribution timeline showing family dominance and contribution share transitions.

contribution_instabilityExplanatory concentration

Dominant family	Trend, 96% of periods
Family leadership transitions	8
Mean contribution concentration	0.437 HHI
Most volatile feature	63D Breakout Strength

transition exit state

Predictive structure exists, but explanatory stability deteriorates through time.

The investigation turns from predictive optimism toward instability propagation.

transition_05_strategy_formation_and_executional_translation

From Predictive Structure to Executable Portfolio Behaviour

Core thesis

Execution friction amplifies predictive instability. Compressed ranking conviction propagates into allocation churn, turnover acceleration, and realized degradation.

active execution statetranslation

05A Ranking Translation

Predictions become executable only after they are forced into allocation structure.

Cross-sectional scores do not directly generate returns. They become evolving relative-conviction surfaces that must survive top-k selection, weight formation, rebalance timing, and look-ahead-safe execution.

lookahead_safe_execution_specPrediction enters the book only after time advances

signal timestamp	close of day t
entry timing	open t+1
applied weights	weights.shift(1)
portfolio rule	top-k relative conviction
cost model	5 bps one-way

Portfolio allocation history showing evolving ETF weights, concentration regimes, and allocation transitions.

05B Portfolio Dynamics

Allocation behaviour exposes whether ranking conviction is stable enough to hold.

Concentrated exposure can strengthen realized spread, but rapid colour transitions reveal ranking churn. When predictive separation compresses, nearly indistinguishable candidates rotate through the portfolio.

Portfolio turnover diagnostic showing daily absolute weight changes and clustered rebalance pressure.

portfolio_structure_readoutRanking instability becomes allocation instability

concentration	narrow ETF subsets dominate during locked regimes
ranking churn	rapid allocation transitions during compressed scores
defensive migration	TLT / GLD exposure increases in risk-off periods
translation risk	small score perturbations reorder weights

05C Execution Friction

Implementation friction amplifies instability already present inside the predictive system.

Turnover spikes are visible manifestations of collapsing conviction geometry. Transaction costs compound exactly when predictive legitimacy is weakening, converting model instability into executable degradation.

Equity and drawdown surface showing downstream realized drift, drawdown persistence, and recovery pressure.

Rolling Sharpe diagnostic showing sustained periods of execution-adjusted underperformance and recovery fragility.

execution_friction_chainImplementation realism is part of validation

instability chain	compressed ranking -> unstable allocation -> turnover
turnover diagnostic	daily absolute weight-change surface
friction mechanism	cost applied to every unit of rebalance
downstream effect	drawdown persistence and weakened rolling Sharpe

transition exit state

Predictive structure does not translate cleanly into executable portfolio stability.

Implementation friction amplifies instability already present inside the predictive system.

transition_06_walk_forward_validation_chronology

From Executable Behaviour to Chronological Validation Credibility

Core thesis

Validation functions as robustness interrogation under unseen temporal structure, not as retrospective performance confirmation.

active validation statechronology

06A Chronology Integrity

Validation begins by making time impossible to hide.

Walk-forward structure treats chronology as an infrastructural constraint. Each split trains on prior information before exposure to an unseen future regime, preserving the temporal boundary that retrospective summaries often erase.

Walk-forward train and test timeline showing chronological validation windows and out-of-sample split structure.

chronology_integrity_specForward-only validation boundary

validation type	rolling
train window	48 months
test window	12 months
step size	12 months
temporal rule	test window follows train window with no overlap

06B Validation Dispersion

Forward validation exposes survival as uneven, not universal.

Split-level evidence preserves both positive survival and material deterioration. Dispersion becomes a research finding: validation credibility depends on retaining regimes that weaken the aggregate story.

Per-split out-of-sample Sharpe distribution showing validation dispersion and negative regimes.

Train versus test Sharpe comparison showing in-sample and out-of-sample divergence across validation splits.

validation_dispersion_readoutNon-random structure remains regime-sensitive

validation windows	7 chronological splits
positive OOS splits	5 of 7
mean OOS Sharpe	0.64
failure mode	high split variance and regime-conditioned deterioration

06C Institutional Credibility

Credibility comes from preserving failure continuity.

The validation layer refuses to compress failed windows into a single retrospective number. Stitched OOS behaviour and split trajectories retain drawdown persistence, recovery asymmetry, and unstable generalization as visible evidence.

Stitched walk-forward out-of-sample segments showing genuine unseen-period behaviour in chronological order.

Split equity curves showing divergent validation trajectories and retained failed regimes.

failure_continuity_recordDeterioration remains part of the evidence

2018 regime	negative OOS Sharpe and prolonged drawdown persistence
generalization risk	relationships learned in one environment invalidate later
research posture	failed regimes remain preserved rather than averaged away
credibility signal	chronology realism matters more than retrospective smoothness

transition exit state

Predictive structure survives in some regimes, but validation dispersion remains structurally significant.

Institutional credibility emerges from preserving instability and chronology realism, not from hiding deterioration.

transition_07_structural_interpretability

Failure is Structurally Interpretable Rather Than Random

Core thesis

The failures documented in T06 were already structurally implied by the instability conditions established in T04 and the amplification mechanics established in T05. Each failure event possesses a coherent structural address. The investigation closes knowing not only where deterioration occurred but why it was interpretable.

active synthesis surfaceconfiguration

01structural fragility specificationfragility_dimension × indicator_state × dependency_type × downstream_consequence

02causal address matrixfailure_taxonomy → failure_event × T04_conditions × T05_mechanics × T06_outcome

03evidence lineage & interpretability closureinterpretive_claim × source_evidence × observed_rate × structural_prediction

07A Structural Configuration

A structurally fragile configuration is a specific multi-dimensional state, not a general description of instability.

T04 and T05 established that instability exists and that execution amplifies it. T07 names the specific configuration — the simultaneous state of multiple measurable dimensions — whose combination makes deterioration architecturally foreseeable. Each dimension carries a structural dependency role that explains why its contribution to failure is architecturally necessary rather than contingent.

structural_fragility_specificationConfiguration dimensions, their instability indicators, and their structural dependency roles

coefficient semanticssign consistency < 50%necessary_precursorranking separation degrades — score IQR compresses

feature contributionsingle-family dominance > 90%amplification_contextallocation concentration rises — diversification collapses

predictive regimeIC deterioration activeregime_triggerscore separation narrows — ranking becomes fragile

ranking discriminationscore IQR compressedamplification_triggerchurn threshold lowered — allocation becomes indiscriminate

execution exposuretop-k allocation subject to churncost_exposure_multiplierfriction compresses returns — noise rebalancing penalised

07B Causal Address Matrix

Each observed failure possesses a coherent structural address across the three independently established evidence layers.

Failure first requires classification before it can be structurally addressed. The taxonomy names the structural categories; the causal address matrix demonstrates that each observed failure event maps to a specific combination of T04 conditions and T05 mechanics rather than appearing at random. Stable windows confirm the inverse: structural conditions predict stability as reliably as they predict failure.

failure_taxonomyStructural categories of failure — classification key for the address matrix

regime-transition failurecoefficient semantic inversionranking separation compressed → allocation churnnegative OOS Sharpe, prolonged drawdown

dispersion-collapse failureIC compressed across universeuniform score compression → low-discrimination allocationflat split outcomes, positive but near-zero

predictive deteriorationcoefficient drift in training windowhigh-conviction allocation to learned-regime assetsstrong train Sharpe, weak test Sharpe divergence

causal_address_matrixEach failure event mapped to its structural address across T04, T05, and T06

split 4 — 2018 regimevol / z-score sign invertedturnover elevated, cost drag amplifiednegative OOS Sharpe, prolonged drawdown

split 6 deteriorationcontribution concentration activeallocation churn near regime boundariesbelow-median OOS Sharpe, unstable recovery

near-failure windowsIC temporary inversionranking compression activepositive but compressed OOS Sharpe

stable windowssign consistency maintainedturnover within normal rangepositive OOS Sharpe, clean recovery

07C Evidence Lineage & Interpretability Closure

Failure clusters in structurally fragile configurations. The distribution is inconsistent with random occurrence.

The evidence lineage makes explicit that T07's interpretive acts are grounded in established findings — each component attributed to its source transition. The coherence test closes the argument: if failure were random, it would be proportionally distributed. The observed clustering is consistent with structural prediction rather than statistical coincidence.

evidence_lineage_systemResearch provenance — each interpretive claim attributed to its source evidence

01structural fragility configuration named

coefficient sign analysis

T04

02regime-specific learning confirmed

IC regime behavior

T04

03execution amplification quantified

turnover and drawdown surfaces

T05

04failure geometry documented

split Sharpe and equity outcomes

T06

05failure taxonomy derived

synthesis of T04 + T05 + T06

T07

06causal address matrix constructed

synthesis of T04 + T05 + T06

T07

07interpretability claim established

synthesis of T04 + T05 + T06

T07

interpretability_confirmationCoherence test — failure rate by structural configuration versus random expectation

fragile configuration active28.6% (2 of 7 base rate)2 of 3 fragile windows failedconsistent with structural prediction

stable configuration active71.4% (5 of 7 base rate)4 of 4 stable windows passedconsistent with structural prediction

failure type matched taxonomyN/Afailure type matches taxonomy classificationregime-transition and deterioration types confirmed

transition exit state

The failures documented in T06 were not isolated outcomes. They were already structurally implied by the instability conditions documented in T04 and the amplification mechanics documented in T05.

Failure possesses a structural address. The investigation demonstrates not only where deterioration occurred, but why deterioration was interpretable rather than random.

transition_08_institutional_conclusion

Visible Institutional Research Cognition

Final assessment

T01–T07 demonstrate not a strategy but a systematic research capacity. The investigation is complete. The cognition is visible.

surface_01 — primary hero

Research Cognition Architecture

Phase IFoundationT01 + T02

establishes the research substrate

hypothesis system + temporal universe + feature families

Phase IIDiagnosticsT03 + T04

characterises signal and its fragility

signal geometry + IC consistency + instability map

Phase IIITranslationT05 + T06

tests the signal in execution reality

execution friction + amplification mechanics + validation record

Phase IVInterpretationT07 + T08

derives understanding from outcomes

structural interpretability + visible research cognition

01research_thesisResearch Hypothesis Construction
Establishes the investigation's epistemic contract — prediction is regime-conditional, not universal; failure is pre-declared as admissible evidence, not hidden.
02data_infrastructureTemporal Substrate & Feature Engineering
Establishes the evidentiary substrate — a 9-asset ETL universe with strict temporal alignment and 13 features encoding four competing market hypotheses.
T03Phase II — Diagnostics
03signal_constructionCross-Sectional Signal Geometry
Establishes signal legitimacy — 65.6% positive IC months and mean IC 0.1450 confirm ranking structure before any execution or validation claim is made.
04ml_behaviourPredictive Diagnostics & Instability
Establishes the instability map — coefficient sign reversals, regime-specific learning, and feature concentration document the structural conditions that will become interpretable in T07.
T05Phase III — Translation
05portfolio_translationExecution Architecture & Friction
Establishes the amplification mechanics — the 5bps cost model and turnover surface quantify how execution reality transforms signal instability into return compression.
06walk_forward_validationChronological Validation Discipline
Establishes the validation record — 7 chronological splits with mean OOS Sharpe 0.64; failed regimes retained as visible evidence rather than removed from account.
T07Phase IV — Interpretation
07failure_interpretationStructural Interpretability
Establishes the interpretive closure — each failure event in T06 maps to specific T04 conditions and T05 mechanics; the distribution is inconsistent with random occurrence.
08institutional_conclusionVisible Research Cognition System
The complete investigation made visible — not a strategy record, but a documented capacity for systematic quantitative research across all four investigation phases.

surface_02

Research Capability Matrix

research_capability_matrixEvidence-backed capability summary — what was visibly exercised and where

research framinghypothesis construction; failure pre-declared as expected evidenceT01

data engineering9-asset ETF universe; temporal alignment; normalisation disciplineT02

feature engineering13 features across 4 hypothesis families; regime-conditional ICT02

signal constructioncross-sectional ranking; 65.6% positive IC months; mean IC 0.1450T03

ml diagnosticscoefficient evolution; regime-specific learning; instability typedT04

portfolio constructiontop-k allocation; 5bps one-way cost model; turnover quantifiedT05

validation disciplinewalk-forward; 7 chronological splits; mean OOS Sharpe 0.64T06

failure interpretationfailure taxonomy; causal address matrix; interpretability confirmedT07

research infrastructureartefact persistence; provenance system; experiment versioningZeto

publication engineeringcanonical research dossier; narrative pacing; evidence hierarchyT01–T08

surface_03

Investigation Infrastructure Record

investigation_infrastructureZeto platform subsystems — what each contributed to this investigation

experiment_engineconfiguration-driven hypothesis testing with full parameter provenance6 hypothesis configurations tested; every parameter logged — T01, T02

artefact_persistencefilesystem-first output registry with versioning and lineage trackingIC surfaces, coefficient tables, split equity curves all traceable — T03–T06

validation_frameworkwalk-forward chronological engine with strict temporal boundary enforcement7 chronological splits; no look-ahead contamination possible — T06

diagnostics_layerIC surfaces, coefficient analysis, portfolio construction, turnover metricsfull evidence stack from signal construction to validation outcome — T03–T06

publication_layerreport drafting, frontend rendering, evidence hierarchy assemblyT01–T08 research dossier assembled and rendered as canonical showcase

orchestration_coreLLM-assisted workflow navigation, schema generation, experiment configurationAI augments research execution; human oversight remains foundational — Zeto

provenance_systemexperiment versioning, input hashing, output lineage, audit trailevery diagnostic surface traceable to its generating experiment — all transitions

investigation exit state

Not merely a strategy builder. A quantitative research system designer.

T01–T08 constitute a complete, visible record of systematic quantitative research — from hypothesis construction to infrastructure design to validation discipline to institutional communication.