Tracker PnL bug fix — losses no longer inflated by cron-tick gaps
The hourly tracker was recording actualPnlPct from the mark-to-market price at the cron tick, not the stop-loss price the position would have been closed at. If price gapped past stop between hourly ticks, the recorded loss ballooned far past the actual risk. Worst case discovered: a CRV short stored as -334.78% (mathematically impossible without leverage); PYTH short with a 0.5% stop stored as -22.87%; PENDLE long with a 4% stop stored as -29.75%. Now: on stopped_out / hit_t2 / hit_t1, the recorded PnL is the level's math, not whatever the price drifts to afterward. Symmetric fix for target hits prevents gap-up over-recording on wins. peak/adverse invariants enforced too (peak ≥ actual ≥ adverse on closed positions).
Before
avg loss -8.71% (skewed by CRV -334%) · max single recorded loss -334.78%
After
avg loss -2.67% · max single -7.41% · 58 historical rows backfixed
Metric
Track Record avg-loss + worst-case outlier
See corrected track record →Agent Gateway — public API + OpenBB Workspace + MCP server
Read-only programmatic access to the desk: REST at /api/agent/v1/*, OpenBB Workspace JSON at /api/openbb/* with widgets.json auto-discovery manifest, and @egolds/mcp for Claude Desktop / Cursor / any MCP-aware client. Token format eg_live_<48hex>, sha256-hashed at rest, per-key scopes (theses.read, track-record.read, cross-dex.read, oracle.read, news.read, markets.read, *) and rate limit (default 60/min). Every request logged for forensics (90d TTL). Trading scopes intentionally omitted — the desk surface is read-only by design; live execution is not part of the product.
Create your API key →Backtest replay tool published
Public deterministic replay over the last 500 closed theses. Anyone can adjust R:R thresholds + ATR floor and see how the desk's gate would re-classify history. Same logic the production Risk Manager applies — just exposed without the LLM step.
Open replay tool →Conviction calibration chart on track record
Realized win rate per conviction band (40-49…80-89) plotted against the 45° ideal-calibration diagonal. Bubbles below the line = overconfident; on or above = well-calibrated. Computed live from closed theses.
See chart →R:R rubric: regime-aware gate replaces flat 1.5
Risk Manager swapped flat 1.5:1 minimum to a regime-aware gate: 1.2 trending (ADX≥25), 1.5 chop (ADX 15-25), 1.8 high-vol (ADX<15 or ATR>6%). Bull/Bear analysts gained an explicit "R:R discipline" block: target distance must be ≥ 2.5× ATR(14); if nearest tech level is closer, skip to next major structural objective. Caused the 5-day approval drought to break.
Before
0 approvals over 5 days · last 3 attempts rejected R:R 1.0-1.2
After
100% approval rate over next 24h · 7 theses · 3W/1L closed = 75% WR (replay predicted 73.9%)
Metric
Approval rate + closed-thesis WR
Replay this change →Sprint 5a.5 phase 4 cutover — orchestrator owns thesis-gen
Apps/web legacy frontend retired. Thesis generation cron runs in-process inside the dedicated orchestrator container instead of an HTTP hop from data-engine to apps/web. Single source of truth, simpler debug surface, fewer moving parts.
Before
data-engine → HTTP → apps/web cron · 2 services involved · double-write risk
After
orchestrator in-process cron · single service · ENABLE_AGENT_SCHEDULER=false on data-engine
Metric
Architecture clarity
Debate-rebuttal defensive guards
MIMO LLM occasionally returns valid JSON that fails strict zod array validation, leaving fields undefined. The rebuttal prompt builder now guards every field access (Array.isArray, typeof string, finite-number checks) so partial outputs degrade gracefully into the next prompt block instead of throwing.
Before
1 rebuttal failure per cron cycle on average · pipeline caught + continued
After
0 rebuttal failures observed · adversarial rebuttal complete on every run
Metric
Rebuttal success rate
Conviction calibration prompt — anti-park-at-55
Thesis-engine prompt rewritten with explicit conviction-distribution rules. Forces 40-85 spread instead of clustering at 55. Each conviction value must be justified by counted signals: 40-49 weak, 50-59 moderate (minority), 60-69 strong, 70-79 very strong, 80-85 conviction trade.
Before
Modal conviction = 55 · WR 56% · EV +0.63%/trade · poor signal compression
After
Distribution spread 40-85 · per-band WR visible on /track-record calibration chart
Metric
Conviction distribution + per-band WR
See calibration →Market-candles rewritten DEX-only
The legacy Kraken+Bybit fetcher caused the May 12 thesis-gen stall — Bybit doesn't list DEX-only LSTs/wrapped tokens (BMX, SUPERFORM, WEETH, WBNB, …) so every miss cooled the symbol down 600s. Rewrote against Hyperliquid candleSnapshot 1h interval. No CEX dependency, no per-symbol cooldown state machine, no Bybit affiliate code injection.
Reflection-agent weekly post-mortem live
Every 7 days the reflection-agent reads the last 50 closed theses, computes win-rate / EV / conviction distribution, asks MIMO to surface winning + losing patterns, and writes an addendum to agent_prompt_overrides. Active addendums get appended to every analyst's system prompt next run.
Before
Manual prompt tuning · changes ad-hoc · no historical context
After
Automatic weekly self-reflection · prompts adjust based on outcomes
Metric
Self-improvement loop closed
Strategy lock: 4 DEX verticals (HL + edgeX + Aster + Polymarket)
Q2 2026 market-share data: Hyperliquid ~50%, edgeX ~26%, Aster ~21% — together 97% of perp DEX flow. Dropped dYdX/Drift/GMX (each <3%, defer). CEX coverage kept only as cross-venue reference.
Risk Manager rubric initial
First version of the Risk Manager prompt: validate stop loss at technical invalidation, position sizing 2-5% portfolio, ATR-based volatility adjustment, ADX<15 weak-trend reject, stochastic extreme warning. Flat R:R 1.5:1 minimum (later replaced 2026-05-19).