Desk methodology changelog

How the desk evolves

Public log of every methodology change shipped to the EGOLDS AI desk — prompt rubrics, gate thresholds, scoring formulas, agent architecture. Each entry includes the before/after impact measured against closed theses where possible. The purpose is transparency: closed AI desks cannot show you their evolution. We can.

For per-thesis bull/bear traces see track record. For the gate replay tool see backtest replay.

2026-05-22Infra

Tracker PnL bug fix — losses no longer inflated by cron-tick gaps

The hourly tracker was recording actualPnlPct from the mark-to-market price at the cron tick, not the stop-loss price the position would have been closed at. If price gapped past stop between hourly ticks, the recorded loss ballooned far past the actual risk. Worst case discovered: a CRV short stored as -334.78% (mathematically impossible without leverage); PYTH short with a 0.5% stop stored as -22.87%; PENDLE long with a 4% stop stored as -29.75%. Now: on stopped_out / hit_t2 / hit_t1, the recorded PnL is the level's math, not whatever the price drifts to afterward. Symmetric fix for target hits prevents gap-up over-recording on wins. peak/adverse invariants enforced too (peak ≥ actual ≥ adverse on closed positions).

Before

avg loss -8.71% (skewed by CRV -334%) · max single recorded loss -334.78%

After

avg loss -2.67% · max single -7.41% · 58 historical rows backfixed

Metric

Track Record avg-loss + worst-case outlier

See corrected track record →

2026-05-22Infra

Agent Gateway — public API + OpenBB Workspace + MCP server

Read-only programmatic access to the desk: REST at /api/agent/v1/*, OpenBB Workspace JSON at /api/openbb/* with widgets.json auto-discovery manifest, and @egolds/mcp for Claude Desktop / Cursor / any MCP-aware client. Token format eg_live_<48hex>, sha256-hashed at rest, per-key scopes (theses.read, track-record.read, cross-dex.read, oracle.read, news.read, markets.read, *) and rate limit (default 60/min). Every request logged for forensics (90d TTL). Trading scopes intentionally omitted — the desk surface is read-only by design; live execution is not part of the product.

Create your API key →

2026-05-19Eval

Backtest replay tool published

Public deterministic replay over the last 500 closed theses. Anyone can adjust R:R thresholds + ATR floor and see how the desk's gate would re-classify history. Same logic the production Risk Manager applies — just exposed without the LLM step.

Open replay tool →

2026-05-19Eval

Conviction calibration chart on track record

Realized win rate per conviction band (40-49…80-89) plotted against the 45° ideal-calibration diagonal. Bubbles below the line = overconfident; on or above = well-calibrated. Computed live from closed theses.

See chart →

2026-05-19Prompt

R:R rubric: regime-aware gate replaces flat 1.5

Risk Manager swapped flat 1.5:1 minimum to a regime-aware gate: 1.2 trending (ADX≥25), 1.5 chop (ADX 15-25), 1.8 high-vol (ADX<15 or ATR>6%). Bull/Bear analysts gained an explicit "R:R discipline" block: target distance must be ≥ 2.5× ATR(14); if nearest tech level is closer, skip to next major structural objective. Caused the 5-day approval drought to break.

Before

0 approvals over 5 days · last 3 attempts rejected R:R 1.0-1.2

After

100% approval rate over next 24h · 7 theses · 3W/1L closed = 75% WR (replay predicted 73.9%)

Metric

Approval rate + closed-thesis WR

Replay this change →

2026-05-18Infra

Sprint 5a.5 phase 4 cutover — orchestrator owns thesis-gen

Apps/web legacy frontend retired. Thesis generation cron runs in-process inside the dedicated orchestrator container instead of an HTTP hop from data-engine to apps/web. Single source of truth, simpler debug surface, fewer moving parts.

Before

data-engine → HTTP → apps/web cron · 2 services involved · double-write risk

After

orchestrator in-process cron · single service · ENABLE_AGENT_SCHEDULER=false on data-engine

Metric

Architecture clarity

2026-05-18Prompt

Debate-rebuttal defensive guards

MIMO LLM occasionally returns valid JSON that fails strict zod array validation, leaving fields undefined. The rebuttal prompt builder now guards every field access (Array.isArray, typeof string, finite-number checks) so partial outputs degrade gracefully into the next prompt block instead of throwing.

Before

1 rebuttal failure per cron cycle on average · pipeline caught + continued

After

0 rebuttal failures observed · adversarial rebuttal complete on every run

Metric

Rebuttal success rate

2026-05-17Prompt

Conviction calibration prompt — anti-park-at-55

Thesis-engine prompt rewritten with explicit conviction-distribution rules. Forces 40-85 spread instead of clustering at 55. Each conviction value must be justified by counted signals: 40-49 weak, 50-59 moderate (minority), 60-69 strong, 70-79 very strong, 80-85 conviction trade.

Before

Modal conviction = 55 · WR 56% · EV +0.63%/trade · poor signal compression

After

Distribution spread 40-85 · per-band WR visible on /track-record calibration chart

Metric

Conviction distribution + per-band WR

See calibration →

2026-05-17Infra

Market-candles rewritten DEX-only

The legacy Kraken+Bybit fetcher caused the May 12 thesis-gen stall — Bybit doesn't list DEX-only LSTs/wrapped tokens (BMX, SUPERFORM, WEETH, WBNB, …) so every miss cooled the symbol down 600s. Rewrote against Hyperliquid candleSnapshot 1h interval. No CEX dependency, no per-symbol cooldown state machine, no Bybit affiliate code injection.

2026-05-16Infra

Reflection-agent weekly post-mortem live

Every 7 days the reflection-agent reads the last 50 closed theses, computes win-rate / EV / conviction distribution, asks MIMO to surface winning + losing patterns, and writes an addendum to agent_prompt_overrides. Active addendums get appended to every analyst's system prompt next run.

Before

Manual prompt tuning · changes ad-hoc · no historical context

After

Automatic weekly self-reflection · prompts adjust based on outcomes

Metric

Self-improvement loop closed

2026-05-15Infra

Strategy lock: 4 DEX verticals (HL + edgeX + Aster + Polymarket)

Q2 2026 market-share data: Hyperliquid ~50%, edgeX ~26%, Aster ~21% — together 97% of perp DEX flow. Dropped dYdX/Drift/GMX (each <3%, defer). CEX coverage kept only as cross-venue reference.

2026-04-27Prompt

Risk Manager rubric initial

First version of the Risk Manager prompt: validate stop loss at technical invalidation, position sizing 2-5% portfolio, ATR-based volatility adjustment, ADX<15 weak-trend reject, stochastic extreme warning. Flat R:R 1.5:1 minimum (later replaced 2026-05-19).

Methodology disclosure

EGOLDS publishes prompt rubrics, gate thresholds, scoring formulas and architectural decisions because the moat is the public outcome record + audit trail, not the secret sauce. Anyone reading this page can build a similar desk; nobody can build a similar track record without the same outcome data we have already published.