Index  ·  Data  ·  Evidence

Research infrastructure.

The indexed workspace used to study markets, separate candidates, and keep the evidence traceable.

How the research is organised: the index, the market data, the evaluation notebooks, and the records kept when a candidate reaches live capital. Strategy logic stays private. The surrounding evidence does not.

00_INDEX catalogue 4,983 files scanned Python · Notebooks · OANDA data
Research index Copied, classified, and made searchable. 00_INDEX · Inventory · Families

The working research root is organised around 00_INDEX — a classified research library, not a loose archive. Family summaries, copy manifests, dependency maps, and a dated trade-data inventory. When a number appears in a note, it traces to a file family, a source path, and a reason.

The latest inventory records ~5,000 non-cache files across 15 project families: 287 Python source files, 67 notebooks, 96 parquet market-data files, 63 JSON configs, and 68 charts. The main families: NAS/SPX intraday VWAP, IDM+ORB hybrid, FX/XAU currency-strength, cross-asset OU/stat-arb, macro sentiment, forecast-to-fill, and 10 academic paper implementations. Failed branches stay in the index. They are evidence too.

Backtesting & evaluation Walk-forward, out-of-sample, stressed-cost. NAS · OU · FX/XAU

Each research family has its own burden of proof. NAS candidates go through rolling walk-forward folds. Cross-asset OU candidates are split into formation, validation, and final out-of-sample slices. FX/XAU research is separated from broad FX transfer tests — the useful result came from XAU-heavy strength, not a clean multi-pair FX edge.

Selected published results: NAS V3 risk pass records six stressed-cost folds, best risk-adjusted candidate at 1.216 average Sharpe and 47.32% summed test return. Top-5 OU sleeve records a final out-of-sample Sharpe of 2.19 with max drawdown near -0.93%. FX/XAU strength at 7.63% total return and -4.60% max drawdown in the best stressed candidate.

Negative results stay visible. Cross-asset VWAP mean reversion failed under stressed costs. Raw FX session breakout failed. The macro-sentiment replication rejected itself when AUC stayed near chance. The infrastructure keeps refusals beside the candidates that passed.

Capital testing Capital, small size, daily reconciliation. Execution · Reporting · Limits

Live tests run at deliberately small size. The purpose is not scale — it is to expose research to real fills, real spreads, broker statements, overnight handling, and the behaviour of the operator under live conditions.

Position sizing is rule-based. Each candidate has a pre-declared sizing approach, a per-position cap, and a portfolio-level cap. No single test, and no combination of tests, risks an outcome that would compromise the research programme.

Reconciliation is built around records. Raw OANDA transaction exports, dashboard runtime state, and signal logs are all indexed. Every fill is matched to the journal or broker statement. Drawdown alarms pause new entries until the cause is documented.

Tools & stack Named from the index. Python · Parquet · Notebooks
Index layer
00_INDEX — family summaries, copy manifests, dependency maps, file-reference samples, manual notes, and a dated trade-data inventory. 15 numbered project families, ~5,000 non-cache files.
Language
Python throughout. pandas, NumPy, and scikit-learn as the base layer. Optuna for walk-forward hyperparameter tuning. All research, backtesting, risk checks, and reporting run on the same stack.
Market data
OANDA REST API candle caches (96 parquet files), raw transaction exports, CSV research outputs, and dated snapshots. Data separated by purpose: market caches, strategy outputs, and live execution artifacts.
Research families
NAS/SPX intraday VWAP, IDM+ORB hybrid, FX/XAU currency-strength, cross-asset OU/stat-arb, macro sentiment (FinBERT+GDELT), forecast-to-fill, GA/MSSR, and 10 academic paper implementations. Each family lives in a numbered directory with its own backtest outputs.
Backtesting
Walk-forward folds, held-out OU slices, stressed-cost tests, final out-of-sample checks. Strategy-level CSVs and portfolio summaries. Successful branches are kept beside the branches that failed.
Execution
Live bots deployed on PythonAnywhere. OANDA REST API for order execution. Two segregated accounts: one for multibot strategies, one for VWAP. Dashboard monitors runtime state, drawdown alarms, and signal logs.
Broker records
Reconciled through broker statements, transaction exports, dashboard state, and runtime logs. Every fill matched back to the journal or broker statement.
Version control
Git for code, dated snapshots for data, named output files for results. Figures that cannot be traced to a notebook, CSV, or inventory entry are not published.
ML & AI infrastructure Models as components, not convictions. Signal · Regime · Autonomy

Machine learning enters the stack at three levels. At the signal level, statistical models evaluate regime context and generate position signals. Every output is auditable against its input features. At the monitoring level, automated drawdown checks, correlation tracking, and regime detection feed into position sizing logic without human override.

At the research level, LLM-assisted synthesis produces daily market condition summaries published as reading material, not trading directives. Separately, an autonomous loop prototype (internally called LAM) uses an LLM reasoning layer to observe market state, form hypotheses, and execute paper trades through the same OANDA API client used by the live stack. LAM runs in paper mode only and exists to test whether an LLM-driven decision process can generate auditable trade rationale at speed.

Models that fail are documented alongside models that work. The macro-sentiment replication (FinBERT + GDELT) built a full feature pipeline across 3,800+ cached files, then rejected itself when predictive accuracy stayed near chance. That negative result sits in the same index as the OU-arbitrage sleeve that passed out-of-sample checks.

Signal models
Regime-conditional position signals. HMM regime detection, Ornstein-Uhlenbeck mean-reversion z-scores, and walk-forward Optuna-tuned classifiers.
LLM synthesis
Daily market condition review via Anthropic API. Structured output, published as context not as instruction.
Autonomous agent
LAM (Large Action Model) — observer, reasoner, executor architecture. Paper trading only. Tests whether LLM-driven decisions produce auditable rationale.
Failed models
FinBERT+GDELT macro sentiment (AUC near 0.5), raw FX session breakout, cross-asset VWAP mean reversion under stressed costs. Documented and preserved in the index.
Deployment Where the code runs. PythonAnywhere · OANDA · MT5

Live strategies run on PythonAnywhere, executing through the OANDA REST API. Two segregated OANDA sub-accounts separate multibot strategies from VWAP strategies. Each account has independent position limits, drawdown alarms, and reconciliation records. The dashboard monitors both accounts in a single view.

A separate MT5-based execution path is under development for prop-firm evaluation (FTMO). This branch uses the same research infrastructure and evaluation standards but targets a different broker and execution environment. It is not yet live.

Open material What is public, what is private, and why. Boundary
In public view
  • This site — method, infrastructure, and operating discipline.
  • The News page — working papers, note excerpts, and selected research outcomes.
  • Stack components, data families, and the principles behind sizing and reporting.
  • Selected capital-test outcomes, where they can be reconstructed from a versioned record.
Kept private
  • Strategy logic — the rules a candidate uses to enter, exit, and stay out.
  • Specific sizing parameters and per-position limits.
  • Individual live signals, their timing, and raw account records.
  • Credentials, execution switches, and anything whose disclosure would weaken the test rather than inform the reader.

The boundary above exists so that the public record can be honest about method without exposing signals, account records, or execution machinery.

Correspondence Discuss the methodology. Replies within three working days
Correspondence

Questions on data, evaluation, or live-test reporting are welcome.

bilal@monolithresearch.uk