Workstation (Sim)

Theory and internal notes for the research stack. Use these to understand the pipeline at a glance, then dive deeper. The Sim environment mirrors the live pipeline but runs under tighter control: reproducible data cuts, fixed seeds, and explicit assumptions. It’s where we explain choices, track calibration, and sanity-check that results hold up outside a specific window. When something looks off in Live, this is the bench where we take it apart.

Pipeline, explained Stages, data flow, scheduling Mathematics Losses, regimes, risk model Code & experiments Design, ablations, benchmarks Most recent model: EQX-M1 → Open the current experimental model card

Pipeline, at a glance

Ingest — prices, options flow, news/sentiment, light fundamentals. We assemble a clean daily tape: split-adjusted prices, basic fundamentals for context, option chains for implied-vol and skew, and headline sentiment. Time is aligned and gaps are handled conservatively; anything ambiguous is dropped rather than “fixed.”
Features — cross-sectional & temporal; normalised with leakage guards. We build signals that compare each stock to its peers and to itself through time. Normalisation uses only information available at the time (no peeking). Any feature that quietly reuses future data is flagged and removed.
Models — calibrated families (Core, Sentinel, Experimental). Instead of one big hero model, I keep families with different roles: durable signals, risk sentries, and new ideas. Models are trained on rolling windows and kept simple enough to explain, if I can’t describe why a view is changing, I treat it as noise.
Calibration — reliability via ECE; treat uncertainty explicitly. A strong score still needs to be honest. We check that predicted confidence lines up with reality (reliability curves and ECE bins). If a slice drifts out of spec, that view is down-weighted or paused until it proves itself again.
Selection — per-ticker signals with confidence and cutoffs. Raw model output is converted to simple choices (tilt up, down, or stand aside) with a confidence score. We respect cutoffs that avoid low-signal noise and cap how much any one name can influence the book.
Portfolio — gross/net bounds, per-symbol/sector caps, turnover control. Signals become a position set under hard constraints. We monitor Greeks and concentration, budget turnover to keep costs tame, and route execution through a realistic cost model so simulated gains don’t vanish in slippage.
Evaluation — walk-forward CV, regime slicing, ops health. Results are scored the way you would actually trade them: roll the window forward, never see tomorrow’s data, and record the run. We then slice by regime (volatility/credit/liquidity) and track operational health so a good backtest doesn’t hide a fragile process.

Model families

Core — Stable, interpretable signals designed for daily deployment.
Sentinel — Risk-aware overlays (hedges/halts) that watch for regime breaks.
Experimental — New ideas under live shadowing before promotion.

Calibration & uncertainty

We don’t just predict; we prove how often we’re right. Reliability curves and ECE bins tell us whether confidence tracks realised outcomes. When bins widen or drift, the system down-weights that slice or pauses it entirely. You’ll see the same story reflected in the product: model cards include a calibration snapshot, the homepage shows when confidence is thin, and compare lets you check stability across different windows. When predicting markets there is always error, however while the market remains unperfect there is edge to be exploited.

Portfolio & risk

Decisions flow into a constraint graph: gross/net limits, per-symbol and per-sector caps, and turnover budgets. We watch portfolio-level Greeks and exposures, simulate fees and slippage, and prefer de-risking early over squeezing the last basis point. The homepage “Portfolio Greeks” tile mirrors the same controls you see here. If liquidity narrows or concentration creeps up, the system tilts smaller or steps aside before the problem grows.

Evaluation & regimes

Every result is produced with walk-forward validation so the clock only ever moves one way. We then score by regime—combinations of volatility, credit stress, and liquidity—and report consistency, drawdowns, and hit-rate alongside return. Model cards summarize this as “by-regime performance,” while the Compare page lets you line up models and baselines over the same windows. A “pause-o-meter” summarises when the safest move is to trade less or not at all.

Roadmap

Stabilise regime labeling; cross-validate thresholds.
Broaden universe coverage; stress-test liquidity filters.
Tighten calibration; monitor ECE drift.
Streamline feature families; document wins and trade-offs.
Reconcile live vs. sim; extend paper-trading harness.

Notes

Research summaries and next steps post here. The Live Pipeline page surfaces daily snapshots; this area stays theory-focused.