transition_01_research_thesis

From Institutional Research Problem to Cross-Sectional ML Investigation

Core research question

Can a shared cross-sectional ML framework extract persistent relative-return structure across heterogeneous institutional ETF regimes while remaining interpretable, validation-aware, and executable under realistic trading frictions?

Core thesis

The investigation is not fundamentally attempting to demonstrate extraordinary returns. It is attempting to determine whether quantitative research itself can remain interpretable, chronologically disciplined, diagnostically transparent, and operationally coherent under non-stationary market structure.

Research state

Cross-sectional prediction is investigated under intentionally heterogeneous macro structure. The ETF universe spans equities, international exposure, commodities, rates, and sector concentration because relative-ranking systems become difficult to interpret when macro sensitivity collapses into a single dominant factor. The compact universe intentionally introduces tension between interpretability and cross-sectional richness. Although the nine-ETF panel limits breadth relative to institutional equity universes, it preserves enough heterogeneity to produce meaningful ranking structure while keeping chronology, attribution, and instability propagation interpretable at the asset level.

Validation doctrine

Chronology is treated as a governing constraint rather than a retrospective validation appendix. All later signal interpretation therefore inherits the train-test separation structure established in walk_forward_timeline. Predictive structure is evaluated only after chronology integrity has already constrained the investigation.

Expected pressure

The central tension emerges immediately. Predictive structure may appear statistically meaningful while remaining too unstable to survive executable portfolio translation. Temporary ranking separation, compressed dispersion, coefficient reversals, and transaction-cost amplification all become possible failure states long before realized performance deteriorates visibly.

Model posture

Ridge regression is selected deliberately. The objective is not architectural sophistication. The objective is coefficient observability, feature attribution continuity, and regime-conditioned interpretability. The model therefore functions less as a black-box optimization engine and more as an observable research surface through which instability can propagate visibly.

Evidence continuity

Institutional credibility emerges from continuity between hypothesis construction, feature engineering, ranking behaviour, validation chronology, execution realism, and eventual degradation. Intermediate artefacts remain preserved because instability itself constitutes research evidence. Coefficient reversals, failed validation splits, and deteriorating ranking geometry are therefore retained explicitly rather than compressed into aggregate summaries.

Research skepticism

Asset-return prediction is approached as a structurally low signal-to-noise problem in which temporary predictive persistence can emerge from unstable macro relationships. Instability, degradation, and reversal are therefore treated as expected states of the investigation rather than exceptional outcomes. This framing becomes essential later when turnover acceleration and validation deterioration emerge coherently from upstream predictive compression.

Transition exit state

The methodological system itself consequently becomes part of the research object. Validation chronology, feature lineage, ranking persistence, instability propagation, and diagnostic continuity remain continuously observable because the investigation is fundamentally concerned with how quantitative reasoning evolves under evidence pressure.

transition_02_data_and_feature_engineering

From Raw Market Structure to Temporally Aligned Hypothesis Construction

Core thesis

Feature engineering functions as formalized market-theory construction. Each feature family encodes assumptions that must later survive unseen chronological regimes.

active evidence statesubstrate

02A Feature Substrate Construction

Feature systems encode market hypotheses before prediction begins.

Raw market observations become a temporally aligned feature substrate through pooled panel construction and leakage-safe normalization. Momentum persistence, volatility compression, mean-reversion pressure, and beta sensitivity function as competing explanatory systems.
primary substratefeature_ic_heatmap
Feature IC heatmap showing cross-sectional information coefficient by feature and walk-forward split.
feature_family_taxonomyHypothesis encoding
TrendMomentum, trend strength, breakout, risk-adjusted momentumPersistence
VolatilityRealized volatility, vol compressionRisk-regime texture
Mean-ReversionZ-score, drawdown distanceReversal pressure
Market StructureRolling market betaMacro sensitivity

02B Chronology Integrity & Comparability

Comparability is maintained as chronology infrastructure.

Rolling z-score normalization preserves temporal comparability without importing future distributional information. Warmup erosion and forward label loss remain visible instead of being hidden as implementation detail.
chronology_constraintsAlignment discipline
Panel9 assets, 3020 aligned trading days
Warmup erosion252 rows removed per asset
Forward horizon21 trading days
Effective range2014-01-02 to 2024-11-29
Normalizationrolling, past-only parameters

02C Regime-Conditioned Behaviour

Feature usefulness becomes conditional evidence, not stable legitimacy.

Several feature structures strengthen during trend-dominated environments while degrading or reversing during compressed-dispersion regimes. The feature system therefore exits this transition as a structured hypothesis space, not an established predictive claim.
family structurefeature_family_ic
Feature family IC by walk-forward split.
dependence structurefeature_correlation_heatmap
Feature correlation heatmap exposing dependence structure.
regime activationml_feature_regimes
Feature regime z-score surface showing activated and stressed feature states over time.

transition_03_cross_sectional_signal_construction

From Hypothesis Space to Cross-Sectional Conviction Geometry

Core thesis

Predictive legitimacy emerges through persistent ranking geometry, monotonic ordering, and observable cross-sectional separation rather than isolated prediction accuracy.

active signal statescores

03A Cross-Sectional Score Emergence

The feature substrate begins producing relative ordering structure.

Predictive structure is evaluated through ranking behaviour rather than directional forecasting accuracy alone. Positive cross-sectional IC indicates that the system is repeatedly identifying relative winners and losers, but this remains a pre-execution signal diagnostic.
ranking signalcross_sectional_ic
Daily cross-sectional IC surface showing ranking signal consistency and inversion regimes.
score formationml_prediction_distribution
Raw model prediction distribution showing score formation around the ranking target.
signal_emergenceRanking signal diagnostics
Positive-IC ranking periods63.1%
Mean monthly cross-sectional IC0.1450
Positive cross-sectional IC months65.6%
Model-fitting observations24,912

03B Ranking Geometry

Signal legitimacy depends on separation, persistence, and realized spread.

Ranking geometry becomes the central diagnostic surface because it combines score dispersion, top-bottom discrimination, realized economic separation, and rank persistence through time.
primary signal geometryranking_geometry
Ranking geometry diagnostic showing score dispersion, ranking separation, realized spread, and rank persistence.
ranking_geometry_summarySeparation and persistence
Mean score IQR0.0346
Min score IQR0.0120
Mean top-bottom score spread0.0715
Mean realized spread0.891% pre-cost
Positive realized spread69%
Mean monthly rank autocorrelation0.785
Positive rank persistence100%

03C Conditional Predictive Persistence

Prediction magnitude carries information, but conviction remains unstable.

Monotonic ordering confirms that prediction strength is not arbitrary score variation. At the same time, compressed dispersion and unstable consistency leave predictive legitimacy and predictive fragility inside the same ranking system.
outcome monotonicityprediction_strength
Prediction strength diagnostic showing monotonic realized return ordering by score group.
rank persistenceml_rolling_da
Rolling directional accuracy and IC consistency surface exposing rank persistence through time.
prediction_group_orderingPre-cost discrimination
Top group1.301%
Mid group1.050%
Bottom group0.231%
Top-bottom spread1.070% pre-cost
transition exit state

Ranking structure exists; monotonicity remains observable.

Dispersion compresses; instability emerges through time.

transition_04_predictive_ml_behaviour

From Ranking Legitimacy to Instability Propagation

Core thesis

Positive predictive structure does not imply stable feature semantics or persistent ranking conviction. Predictive cognition remains conditionally stable and temporally fragile.

active predictive statepersistence

04A Predictive Persistence

Positive predictive structure does not behave uniformly through time.

The IC regime surface exposes alternating persistence, deterioration, recovery, and temporary inversion. The model can identify relative structure while still failing to maintain stable predictive behaviour across changing regimes.
primary predictive regimeml_ic_regime
Rolling IC regime diagnostic exposing predictive persistence, deterioration, recovery, and inversion.
conditional_predictive_statesPersistence is not binary
Strengtheningspecific macro structures
Weakeningcompressed-dispersion environments
Inversionseveral stressed transitions

04B Coefficient Drift

Regularization constrains magnitude, but not semantic instability.

Repeated sign reversals indicate regime-specific learning rather than universal predictive laws. Coefficient behaviour becomes an interpretability surface where feature meaning changes under evidence pressure.
primary coefficient instabilityml_coefficient_sign_heatmap
Coefficient sign heatmap showing feature sign reversals across walk-forward splits.
coefficient dispersionml_coefficient_stability
Coefficient stability diagnostic showing sign consistency and coefficient dispersion.
coefficient_reversal_surfaceLeast stable semantics
20D Momentum57% sign consistency
60D Momentum57% sign consistency
21D Realized Volatility43% sign consistency
20D Z-Score43% sign consistency
63D Breakout Strength43% sign consistency

04C Structural Fragility

Predictive cognition becomes conditional, concentrated, and temporally fragile.

Feature and family contribution surfaces show explanatory dominance shifting through time. Predictive optimism gives way to instability propagation: structure exists, but explanatory stability deteriorates.
feature contributionfeature_contribution_heatmap
Feature contribution heatmap showing realized predictive influence through time.
family contributionfamily_contribution_timeline
Feature family contribution timeline showing family dominance and contribution share transitions.
contribution_instabilityExplanatory concentration
Dominant familyTrend, 96% of periods
Family leadership transitions8
Mean contribution concentration0.437 HHI
Most volatile feature63D Breakout Strength
transition exit state

Predictive structure exists, but explanatory stability deteriorates through time.

The investigation turns from predictive optimism toward instability propagation.

transition_05_strategy_formation_and_executional_translation

From Predictive Structure to Executable Portfolio Behaviour

Core thesis

Execution friction amplifies predictive instability. Compressed ranking conviction propagates into allocation churn, turnover acceleration, and realized degradation.

active execution statetranslation

05A Ranking Translation

Predictions become executable only after they are forced into allocation structure.

Cross-sectional scores do not directly generate returns. They become evolving relative-conviction surfaces that must survive top-k selection, weight formation, rebalance timing, and look-ahead-safe execution.
lookahead_safe_execution_specPrediction enters the book only after time advances
signal timestampclose of day t
entry timingopen t+1
applied weightsweights.shift(1)
portfolio ruletop-k relative conviction
cost model5 bps one-way
portfolio weight dynamicsallocation_history
Portfolio allocation history showing evolving ETF weights, concentration regimes, and allocation transitions.

05B Portfolio Dynamics

Allocation behaviour exposes whether ranking conviction is stable enough to hold.

Concentrated exposure can strengthen realized spread, but rapid colour transitions reveal ranking churn. When predictive separation compresses, nearly indistinguishable candidates rotate through the portfolio.
turnover analysisportfolio_turnover
Portfolio turnover diagnostic showing daily absolute weight changes and clustered rebalance pressure.
portfolio_structure_readoutRanking instability becomes allocation instability
concentrationnarrow ETF subsets dominate during locked regimes
ranking churnrapid allocation transitions during compressed scores
defensive migrationTLT / GLD exposure increases in risk-off periods
translation risksmall score perturbations reorder weights

05C Execution Friction

Implementation friction amplifies instability already present inside the predictive system.

Turnover spikes are visible manifestations of collapsing conviction geometry. Transaction costs compound exactly when predictive legitimacy is weakening, converting model instability into executable degradation.
downstream consequence layerequity_and_drawdown
Equity and drawdown surface showing downstream realized drift, drawdown persistence, and recovery pressure.
rolling recovery pressurerolling_sharpe
Rolling Sharpe diagnostic showing sustained periods of execution-adjusted underperformance and recovery fragility.
execution_friction_chainImplementation realism is part of validation
instability chaincompressed ranking -> unstable allocation -> turnover
turnover diagnosticdaily absolute weight-change surface
friction mechanismcost applied to every unit of rebalance
downstream effectdrawdown persistence and weakened rolling Sharpe
transition exit state

Predictive structure does not translate cleanly into executable portfolio stability.

Implementation friction amplifies instability already present inside the predictive system.

transition_06_walk_forward_validation_chronology

From Executable Behaviour to Chronological Validation Credibility

Core thesis

Validation functions as robustness interrogation under unseen temporal structure, not as retrospective performance confirmation.

active validation statechronology

06A Chronology Integrity

Validation begins by making time impossible to hide.

Walk-forward structure treats chronology as an infrastructural constraint. Each split trains on prior information before exposure to an unseen future regime, preserving the temporal boundary that retrospective summaries often erase.
primary chronology anchorwalk_forward_timeline
Walk-forward train and test timeline showing chronological validation windows and out-of-sample split structure.
chronology_integrity_specForward-only validation boundary
validation typerolling
train window48 months
test window12 months
step size12 months
temporal ruletest window follows train window with no overlap

06B Validation Dispersion

Forward validation exposes survival as uneven, not universal.

Split-level evidence preserves both positive survival and material deterioration. Dispersion becomes a research finding: validation credibility depends on retaining regimes that weaken the aggregate story.
primary validation dispersionsplit_sharpes
Per-split out-of-sample Sharpe distribution showing validation dispersion and negative regimes.
train / test divergencetrain_vs_test_sharpe
Train versus test Sharpe comparison showing in-sample and out-of-sample divergence across validation splits.
validation_dispersion_readoutNon-random structure remains regime-sensitive
validation windows7 chronological splits
positive OOS splits5 of 7
mean OOS Sharpe0.64
failure modehigh split variance and regime-conditioned deterioration

06C Institutional Credibility

Credibility comes from preserving failure continuity.

The validation layer refuses to compress failed windows into a single retrospective number. Stitched OOS behaviour and split trajectories retain drawdown persistence, recovery asymmetry, and unstable generalization as visible evidence.
stitched out-of-sample pathwalk_forward_stitched
Stitched walk-forward out-of-sample segments showing genuine unseen-period behaviour in chronological order.
failure continuity surfacesplit_equity_curves
Split equity curves showing divergent validation trajectories and retained failed regimes.
failure_continuity_recordDeterioration remains part of the evidence
2018 regimenegative OOS Sharpe and prolonged drawdown persistence
generalization riskrelationships learned in one environment invalidate later
research posturefailed regimes remain preserved rather than averaged away
credibility signalchronology realism matters more than retrospective smoothness
transition exit state

Predictive structure survives in some regimes, but validation dispersion remains structurally significant.

Institutional credibility emerges from preserving instability and chronology realism, not from hiding deterioration.

transition_07_structural_interpretability

Failure is Structurally Interpretable Rather Than Random

Core thesis

The failures documented in T06 were already structurally implied by the instability conditions established in T04 and the amplification mechanics established in T05. Each failure event possesses a coherent structural address. The investigation closes knowing not only where deterioration occurred but why it was interpretable.

active synthesis surfaceconfiguration
01structural fragility specificationfragility_dimension × indicator_state × dependency_type × downstream_consequence
02causal address matrixfailure_taxonomy → failure_event × T04_conditions × T05_mechanics × T06_outcome
03evidence lineage & interpretability closureinterpretive_claim × source_evidence × observed_rate × structural_prediction

07A Structural Configuration

A structurally fragile configuration is a specific multi-dimensional state, not a general description of instability.

T04 and T05 established that instability exists and that execution amplifies it. T07 names the specific configuration — the simultaneous state of multiple measurable dimensions — whose combination makes deterioration architecturally foreseeable. Each dimension carries a structural dependency role that explains why its contribution to failure is architecturally necessary rather than contingent.
structural_fragility_specificationConfiguration dimensions, their instability indicators, and their structural dependency roles
coefficient semanticssign consistency < 50%necessary_precursorranking separation degrades — score IQR compresses
feature contributionsingle-family dominance > 90%amplification_contextallocation concentration rises — diversification collapses
predictive regimeIC deterioration activeregime_triggerscore separation narrows — ranking becomes fragile
ranking discriminationscore IQR compressedamplification_triggerchurn threshold lowered — allocation becomes indiscriminate
execution exposuretop-k allocation subject to churncost_exposure_multiplierfriction compresses returns — noise rebalancing penalised

07B Causal Address Matrix

Each observed failure possesses a coherent structural address across the three independently established evidence layers.

Failure first requires classification before it can be structurally addressed. The taxonomy names the structural categories; the causal address matrix demonstrates that each observed failure event maps to a specific combination of T04 conditions and T05 mechanics rather than appearing at random. Stable windows confirm the inverse: structural conditions predict stability as reliably as they predict failure.
failure_taxonomyStructural categories of failure — classification key for the address matrix
regime-transition failurecoefficient semantic inversionranking separation compressed → allocation churnnegative OOS Sharpe, prolonged drawdown
dispersion-collapse failureIC compressed across universeuniform score compression → low-discrimination allocationflat split outcomes, positive but near-zero
predictive deteriorationcoefficient drift in training windowhigh-conviction allocation to learned-regime assetsstrong train Sharpe, weak test Sharpe divergence
causal_address_matrixEach failure event mapped to its structural address across T04, T05, and T06
split 4 — 2018 regimevol / z-score sign invertedturnover elevated, cost drag amplifiednegative OOS Sharpe, prolonged drawdown
split 6 deteriorationcontribution concentration activeallocation churn near regime boundariesbelow-median OOS Sharpe, unstable recovery
near-failure windowsIC temporary inversionranking compression activepositive but compressed OOS Sharpe
stable windowssign consistency maintainedturnover within normal rangepositive OOS Sharpe, clean recovery

07C Evidence Lineage & Interpretability Closure

Failure clusters in structurally fragile configurations. The distribution is inconsistent with random occurrence.

The evidence lineage makes explicit that T07's interpretive acts are grounded in established findings — each component attributed to its source transition. The coherence test closes the argument: if failure were random, it would be proportionally distributed. The observed clustering is consistent with structural prediction rather than statistical coincidence.
evidence_lineage_systemResearch provenance — each interpretive claim attributed to its source evidence
01structural fragility configuration named

coefficient sign analysis

T04
02regime-specific learning confirmed

IC regime behavior

T04
03execution amplification quantified

turnover and drawdown surfaces

T05
04failure geometry documented

split Sharpe and equity outcomes

T06
05failure taxonomy derived

synthesis of T04 + T05 + T06

T07
06causal address matrix constructed

synthesis of T04 + T05 + T06

T07
07interpretability claim established

synthesis of T04 + T05 + T06

T07
interpretability_confirmationCoherence test — failure rate by structural configuration versus random expectation
fragile configuration active28.6% (2 of 7 base rate)2 of 3 fragile windows failedconsistent with structural prediction
stable configuration active71.4% (5 of 7 base rate)4 of 4 stable windows passedconsistent with structural prediction
failure type matched taxonomyN/Afailure type matches taxonomy classificationregime-transition and deterioration types confirmed
transition exit state

The failures documented in T06 were not isolated outcomes. They were already structurally implied by the instability conditions documented in T04 and the amplification mechanics documented in T05.

Failure possesses a structural address. The investigation demonstrates not only where deterioration occurred, but why deterioration was interpretable rather than random.

transition_08_institutional_conclusion

Visible Institutional Research Cognition

Final assessment

T01–T07 demonstrate not a strategy but a systematic research capacity. The investigation is complete. The cognition is visible.

surface_01 — primary hero

Research Cognition Architecture

Phase IFoundationT01 + T02

establishes the research substrate

hypothesis system + temporal universe + feature families

Phase IIDiagnosticsT03 + T04

characterises signal and its fragility

signal geometry + IC consistency + instability map

Phase IIITranslationT05 + T06

tests the signal in execution reality

execution friction + amplification mechanics + validation record

Phase IVInterpretationT07 + T08

derives understanding from outcomes

structural interpretability + visible research cognition

  1. 01research_thesisResearch Hypothesis Construction

    Establishes the investigation's epistemic contract — prediction is regime-conditional, not universal; failure is pre-declared as admissible evidence, not hidden.

  2. 02data_infrastructureTemporal Substrate & Feature Engineering

    Establishes the evidentiary substrate — a 9-asset ETL universe with strict temporal alignment and 13 features encoding four competing market hypotheses.

  3. T03Phase II — Diagnostics
  4. 03signal_constructionCross-Sectional Signal Geometry

    Establishes signal legitimacy — 65.6% positive IC months and mean IC 0.1450 confirm ranking structure before any execution or validation claim is made.

  5. 04ml_behaviourPredictive Diagnostics & Instability

    Establishes the instability map — coefficient sign reversals, regime-specific learning, and feature concentration document the structural conditions that will become interpretable in T07.

  6. T05Phase III — Translation
  7. 05portfolio_translationExecution Architecture & Friction

    Establishes the amplification mechanics — the 5bps cost model and turnover surface quantify how execution reality transforms signal instability into return compression.

  8. 06walk_forward_validationChronological Validation Discipline

    Establishes the validation record — 7 chronological splits with mean OOS Sharpe 0.64; failed regimes retained as visible evidence rather than removed from account.

  9. T07Phase IV — Interpretation
  10. 07failure_interpretationStructural Interpretability

    Establishes the interpretive closure — each failure event in T06 maps to specific T04 conditions and T05 mechanics; the distribution is inconsistent with random occurrence.

  11. 08institutional_conclusionVisible Research Cognition System

    The complete investigation made visible — not a strategy record, but a documented capacity for systematic quantitative research across all four investigation phases.

surface_02

Research Capability Matrix

research_capability_matrixEvidence-backed capability summary — what was visibly exercised and where
research framinghypothesis construction; failure pre-declared as expected evidenceT01
data engineering9-asset ETF universe; temporal alignment; normalisation disciplineT02
feature engineering13 features across 4 hypothesis families; regime-conditional ICT02
signal constructioncross-sectional ranking; 65.6% positive IC months; mean IC 0.1450T03
ml diagnosticscoefficient evolution; regime-specific learning; instability typedT04
portfolio constructiontop-k allocation; 5bps one-way cost model; turnover quantifiedT05
validation disciplinewalk-forward; 7 chronological splits; mean OOS Sharpe 0.64T06
failure interpretationfailure taxonomy; causal address matrix; interpretability confirmedT07
research infrastructureartefact persistence; provenance system; experiment versioningZeto
publication engineeringcanonical research dossier; narrative pacing; evidence hierarchyT01–T08

surface_03

Investigation Infrastructure Record

investigation_infrastructureZeto platform subsystems — what each contributed to this investigation
experiment_engineconfiguration-driven hypothesis testing with full parameter provenance6 hypothesis configurations tested; every parameter logged — T01, T02
artefact_persistencefilesystem-first output registry with versioning and lineage trackingIC surfaces, coefficient tables, split equity curves all traceable — T03–T06
validation_frameworkwalk-forward chronological engine with strict temporal boundary enforcement7 chronological splits; no look-ahead contamination possible — T06
diagnostics_layerIC surfaces, coefficient analysis, portfolio construction, turnover metricsfull evidence stack from signal construction to validation outcome — T03–T06
publication_layerreport drafting, frontend rendering, evidence hierarchy assemblyT01–T08 research dossier assembled and rendered as canonical showcase
orchestration_coreLLM-assisted workflow navigation, schema generation, experiment configurationAI augments research execution; human oversight remains foundational — Zeto
provenance_systemexperiment versioning, input hashing, output lineage, audit trailevery diagnostic surface traceable to its generating experiment — all transitions
investigation exit state

Not merely a strategy builder. A quantitative research system designer.

T01–T08 constitute a complete, visible record of systematic quantitative research — from hypothesis construction to infrastructure design to validation discipline to institutional communication.