EGOLDSv4
Sign in
About EGOLDS
Desk methodology changelog

How the desk evolves

Public log of every methodology change shipped to the EGOLDS AI desk — prompt rubrics, gate thresholds, scoring formulas, agent architecture. Each entry includes the before/after impact measured against closed theses where possible. The purpose is transparency: closed AI desks cannot show you their evolution. We can.

For per-thesis bull/bear traces see track record. For the gate replay tool see backtest replay.

2026-05-22Infra

Tracker PnL bug fix — losses no longer inflated by cron-tick gaps

The hourly tracker was recording actualPnlPct from the mark-to-market price at the cron tick, not the stop-loss price the position would have been closed at. If price gapped past stop between hourly ticks, the recorded loss ballooned far past the actual risk. Worst case discovered: a CRV short stored as -334.78% (mathematically impossible without leverage); PYTH short with a 0.5% stop stored as -22.87%; PENDLE long with a 4% stop stored as -29.75%. Now: on stopped_out / hit_t2 / hit_t1, the recorded PnL is the level's math, not whatever the price drifts to afterward. Symmetric fix for target hits prevents gap-up over-recording on wins. peak/adverse invariants enforced too (peak ≥ actual ≥ adverse on closed positions).

Before
avg loss -8.71% (skewed by CRV -334%) · max single recorded loss -334.78%
After
avg loss -2.67% · max single -7.41% · 58 historical rows backfixed
Metric
Track Record avg-loss + worst-case outlier
See corrected track record
2026-05-22Infra

Agent Gateway — public API + OpenBB Workspace + MCP server

Read-only programmatic access to the desk: REST at /api/agent/v1/*, OpenBB Workspace JSON at /api/openbb/* with widgets.json auto-discovery manifest, and @egolds/mcp for Claude Desktop / Cursor / any MCP-aware client. Token format eg_live_<48hex>, sha256-hashed at rest, per-key scopes (theses.read, track-record.read, cross-dex.read, oracle.read, news.read, markets.read, *) and rate limit (default 60/min). Every request logged for forensics (90d TTL). Trading scopes intentionally omitted — the desk surface is read-only by design; live execution is not part of the product.

Create your API key
2026-05-19Eval

Backtest replay tool published

Public deterministic replay over the last 500 closed theses. Anyone can adjust R:R thresholds + ATR floor and see how the desk's gate would re-classify history. Same logic the production Risk Manager applies — just exposed without the LLM step.

Open replay tool
2026-05-19Eval

Conviction calibration chart on track record

Realized win rate per conviction band (40-49…80-89) plotted against the 45° ideal-calibration diagonal. Bubbles below the line = overconfident; on or above = well-calibrated. Computed live from closed theses.

See chart
2026-05-19Prompt

R:R rubric: regime-aware gate replaces flat 1.5

Risk Manager swapped flat 1.5:1 minimum to a regime-aware gate: 1.2 trending (ADX≥25), 1.5 chop (ADX 15-25), 1.8 high-vol (ADX<15 or ATR>6%). Bull/Bear analysts gained an explicit "R:R discipline" block: target distance must be ≥ 2.5× ATR(14); if nearest tech level is closer, skip to next major structural objective. Caused the 5-day approval drought to break.

Before
0 approvals over 5 days · last 3 attempts rejected R:R 1.0-1.2
After
100% approval rate over next 24h · 7 theses · 3W/1L closed = 75% WR (replay predicted 73.9%)
Metric
Approval rate + closed-thesis WR
Replay this change
2026-05-18Infra

Sprint 5a.5 phase 4 cutover — orchestrator owns thesis-gen

Apps/web legacy frontend retired. Thesis generation cron runs in-process inside the dedicated orchestrator container instead of an HTTP hop from data-engine to apps/web. Single source of truth, simpler debug surface, fewer moving parts.

Before
data-engine → HTTP → apps/web cron · 2 services involved · double-write risk
After
orchestrator in-process cron · single service · ENABLE_AGENT_SCHEDULER=false on data-engine
Metric
Architecture clarity
2026-05-18Prompt

Debate-rebuttal defensive guards

MIMO LLM occasionally returns valid JSON that fails strict zod array validation, leaving fields undefined. The rebuttal prompt builder now guards every field access (Array.isArray, typeof string, finite-number checks) so partial outputs degrade gracefully into the next prompt block instead of throwing.

Before
1 rebuttal failure per cron cycle on average · pipeline caught + continued
After
0 rebuttal failures observed · adversarial rebuttal complete on every run
Metric
Rebuttal success rate
2026-05-17Prompt

Conviction calibration prompt — anti-park-at-55

Thesis-engine prompt rewritten with explicit conviction-distribution rules. Forces 40-85 spread instead of clustering at 55. Each conviction value must be justified by counted signals: 40-49 weak, 50-59 moderate (minority), 60-69 strong, 70-79 very strong, 80-85 conviction trade.

Before
Modal conviction = 55 · WR 56% · EV +0.63%/trade · poor signal compression
After
Distribution spread 40-85 · per-band WR visible on /track-record calibration chart
Metric
Conviction distribution + per-band WR
See calibration
2026-05-17Infra

Market-candles rewritten DEX-only

The legacy Kraken+Bybit fetcher caused the May 12 thesis-gen stall — Bybit doesn't list DEX-only LSTs/wrapped tokens (BMX, SUPERFORM, WEETH, WBNB, …) so every miss cooled the symbol down 600s. Rewrote against Hyperliquid candleSnapshot 1h interval. No CEX dependency, no per-symbol cooldown state machine, no Bybit affiliate code injection.

2026-05-16Infra

Reflection-agent weekly post-mortem live

Every 7 days the reflection-agent reads the last 50 closed theses, computes win-rate / EV / conviction distribution, asks MIMO to surface winning + losing patterns, and writes an addendum to agent_prompt_overrides. Active addendums get appended to every analyst's system prompt next run.

Before
Manual prompt tuning · changes ad-hoc · no historical context
After
Automatic weekly self-reflection · prompts adjust based on outcomes
Metric
Self-improvement loop closed
2026-05-15Infra

Strategy lock: 4 DEX verticals (HL + edgeX + Aster + Polymarket)

Q2 2026 market-share data: Hyperliquid ~50%, edgeX ~26%, Aster ~21% — together 97% of perp DEX flow. Dropped dYdX/Drift/GMX (each <3%, defer). CEX coverage kept only as cross-venue reference.

2026-04-27Prompt

Risk Manager rubric initial

First version of the Risk Manager prompt: validate stop loss at technical invalidation, position sizing 2-5% portfolio, ATR-based volatility adjustment, ADX<15 weak-trend reject, stochastic extreme warning. Flat R:R 1.5:1 minimum (later replaced 2026-05-19).

Methodology disclosure

EGOLDS publishes prompt rubrics, gate thresholds, scoring formulas and architectural decisions because the moat is the public outcome record + audit trail, not the secret sauce. Anyone reading this page can build a similar desk; nobody can build a similar track record without the same outcome data we have already published.