The 10-Phase Pipeline — QuantMechanica

Phase 01

Research & Build

Automated

What It Does

Strategy mining from every credible source I can find. 46 research rounds so far, spanning academic papers, trading books, AI-assisted research, and unconventional domains. Sources include Carver, Chan, Clenow, Kaufman, Elder, AQR, Quantpedia, MQL5 articles, GitHub repositories, and even physics-based models drawn from thermodynamics, chaos theory, signal processing, and quantum mechanics.

Each idea gets implemented as an Expert Advisor in MQL5. The goal is not to find "the one perfect strategy" — it is to cast the widest possible net across diverse edge types and let the pipeline do the filtering.

Why It Matters

The quality of the final portfolio depends entirely on the diversity and intellectual rigor of the research. I do not just test one idea — I systematically mine every credible source available. A portfolio built from a single edge type is fragile. A portfolio built from 81+ distinct edge types has structural diversification that no amount of parameter tuning can replicate.

Gate Criteria

EA compiles without errors in MQL5

Strategy logic is distinct from existing EAs in the pipeline

Result

369

EAs Compiled

81+

Edge Types

46

Research Rounds

Edge Types Explored

Calendar Anomalies Session Breakouts Mean Reversion Overnight Drift Volatility Regime Microstructure Trend Following Gold & Commodity Fix Window Exploitation Physics-Based Models

Phase 02

Baseline Screening

Automated

What It Does

Every EA is tested across 23 symbols — forex majors and minors, gold, silver, oil, and indices — using the development period only (2017–2022) with real tick data (MT5 Model 4: Every Tick Based on Real Ticks). The out-of-sample period (2023–2025) is deliberately withheld at this stage to preserve it for walk-forward validation in Phase 4. Three MetaTrader 5 terminals running in parallel, each processing backtests concurrently. This is not sampling — it is exhaustive testing across every instrument the strategy could theoretically trade.

Why It Matters

A strategy that only works on one symbol is not robust — it probably found noise, not signal. Testing across 23 diverse instruments forces the strategy to prove it captures a real market phenomenon, not a statistical accident. If a trend-following strategy only profits on EURUSD but fails on every other pair, it is not a trend-following strategy — it memorized EURUSD price history.

Gate Criteria

Profit Factor > 1.30

Total Trades > 200 (statistical significance)

Max Drawdown < 12%

What Gets Eliminated

~88% of all strategies. Most EAs fail here because they either trade too rarely (insufficient data for conclusions), have razor-thin edges that disappear across instruments, or show acceptable returns only with unacceptable drawdowns. This is the most brutal filter in the pipeline — and intentionally so.

Result

80+

Strategies Survive

18,463+

Backtests Generated

~18%

Pass Rate

Phase 03

Robustness Testing (Parameter Sweeps)

Automated

What It Does

For each surviving EA, I test 100–200+ different parameter configurations. If the strategy only works with one exact combination of numbers, it is overfitted — it memorized the training data rather than learning a genuine pattern. Parameter sweeps reveal whether the edge lives in the concept or in the specific numbers the optimizer chose.

Why It Matters

Curve-fitting is the #1 killer in quantitative trading. A robust strategy should work across a neighborhood of parameter values, not just the single combination the optimizer found. This is the fundamental difference between a strategy that discovered a real market edge and one that memorized historical noise.

"SM_419 was tested with 168 different parameter combinations. 156 of them were profitable. That is 92.9%. When 93% of all reasonable parameter choices make money, the edge is in the concept, not the numbers."

Gate Criteria

High percentage of configurations must be profitable

Top performers reach 80–93% PASS rates

Star Performers

100%

SM_387 & SM_414

92.9%

SM_419 ProGo

91.7%

SM_406 Blastoff

Phase 04

Rolling Walk-Forward + Commission Test

Automated

What It Does

Two-part validation. First, Rolling Walk-Forward: the strategy is tested across 4+ overlapping anchored folds — for example DEV 2017-2020 / HO 2021-2022, then DEV 2017-2021 / HO 2022-2023, and so on. The strategy must perform on ALL out-of-sample windows, not just one lucky split. This eliminates strategies that only worked in one specific market period. Second, Commission Test: the strategy is tested for the first time across the full history (2017–2025) with real ECN commissions ($7/lot round-trip) included. This is deliberately the first phase to use the complete dataset — ensuring that walk-forward validation was performed on genuinely unseen data.

Why Walk-Forward Matters

In-sample performance means nothing. The only question that matters is: does this strategy make money on data it has never seen? A single train/test split can get lucky. That is why I use rolling walk-forward with multiple out-of-sample windows — the strategy must perform consistently across different time periods, not just one holdout set. If it cannot profit on unseen data across multiple windows, it is worthless — no matter how impressive the backtest looks.

Why Commission Testing Matters

Trading is not free. Many strategies look profitable until you add real costs. A strategy with PF 2.1 can drop to PF 1.3 after commissions — a 38% decline. If the edge does not exceed the cost of trading, there is no edge. Commission testing eliminates strategies whose apparent profits are really just paying for the privilege of trading.

Gate Criteria

Rolling WF: PASS on all windows (4+ anchored folds)

Commission-adjusted Profit Factor > 1.0

Result

39

EAs Rolling WF Tested

39

All Windows PASS

42

Commission PASS

Phase 05

Stress Test MEDIUM

Automated

What It Does

Apply realistic adverse conditions to every surviving strategy: 2 pips additional slippage on every trade, 2x spread widening (simulating volatile market conditions), and $14/lot commission (double the normal ECN rate). This simulates what real trading looks like during news events, low-liquidity sessions, and unfavorable broker conditions.

Why It Matters

Live trading conditions are systematically worse than backtests. Spreads widen during news releases. Slippage is a fact of life, especially for retail traders. By testing with doubled costs and widened spreads, I am stress-testing the edge under conditions that are realistic for prop firm and retail trading. A strategy that barely survives normal conditions will collapse under stress.

Gate Criteria

2 pips additional slippage

2x spread widening

$14/lot commission (2x normal)

Must remain profitable under all stress conditions

Result

76

PASS

93%

Pass Rate (76/82)

The high pass rate at this stage is not a sign of a weak filter — it is evidence that phases 1–4 already eliminated the fragile strategies. The survivors at this point have edges wide enough to absorb realistic cost increases.

Phase 06

Stress Test HARSH

New Automated

What It Does

Extreme adversity simulation. 5 pips slippage (2.5x the MEDIUM level), 3x spread widening, $20/lot commission (nearly 3x normal), and — critically — randomly reject 10% of trades. That last condition simulates broker rejections, requotes, and connectivity issues that every live trader encounters but no standard backtest models.

Why It Matters

"I want strategies that can survive conditions worse than anything I will encounter in real trading. If a strategy dies under HARSH conditions, it had a razor-thin edge that any unexpected event could destroy. The strategies that pass HARSH have edges wide enough to survive the real world."

This phase was added mid-project because I was not confident that MEDIUM stress was severe enough. In real trading, the worst conditions always exceed your expectations. HARSH ensures the portfolio can absorb scenarios that are deliberately worse than reality.

Gate Criteria

5 pips slippage

3x spread widening

$20/lot commission

10% random trade rejection

Profit Factor > 1.0

Max Drawdown < 15%

Phase 07

Multi-Seed Overfitting Detection

New Automated

What It Does

Run every surviving strategy with 5 different random seeds (42, 17, 99, 7, 2026). Random seeds affect order execution simulation, tick generation, and timing within the backtesting engine. If a strategy's results change dramatically depending on the random seed, its "edge" is an artifact of one specific simulation path — not a real market phenomenon.

Why It Matters

"This is perhaps the most subtle test in the entire pipeline. A genuinely robust strategy should produce similar results regardless of which random seed the simulator uses. If changing the seed turns a winner into a loser, the strategy's 'edge' was an artifact of the specific simulation path — not a real market phenomenon."

Most retail strategy developers never run this test. They optimize on one simulation path, see a great result, and assume it is real. Multi-seed testing is one of the clearest ways to separate genuine edges from simulation artifacts.

Gate Criteria

PF variance across 5 seeds < 20%

No seed produces PF < 1.0

Seeds tested: 42, 17, 99, 7, 2026

Phase 08

Statistical Validation

Automated

What It Does

Ten rigorous sub-phases of statistical testing, designed to answer one question: is this edge real, or did it appear by chance because I tested 369 strategies?

Sub-Phases

8.1 — Correlation Analysis

Strategies must be genuinely independent. Results: mean pairwise |r| = 0.037, zero pairs above 0.50, and 92% of pairs below 0.20. If two strategies are highly correlated, deploying both creates hidden concentration risk.

8.2 — Deflated Sharpe Ratio + Monte Carlo + FDR

Adjusts for multiple testing bias. With 369 strategies tested, some will look good purely by chance — that is the multiple comparisons problem. The Deflated Sharpe Ratio (DSR) accounts for this. Monte Carlo simulation (10,000 iterations) validates robustness. The Benjamini-Hochberg False Discovery Rate (FDR) controls for family-wise error rate.

8.3 — Tail Dependence

Do strategies correlate more during extreme market conditions? This checks for synchronized losses — the worst-case scenario for any portfolio. Result: correlations actually decrease under stress (-0.011), meaning the portfolio diversifies better precisely when it matters most.

8.4 — Seasonal Analysis

No strategy should depend on a single month or season. All 12 months must be profitable across the portfolio. A strategy that only works in January is a calendar anomaly trade — and needs to be identified as such, not deployed as if it were a year-round edge.

8.5 — Neighborhood Stability

Test nearby parameter values to confirm the strategy sits on a stable plateau, not a sharp peak. A plateau means the edge is robust to small changes in market microstructure. A peak means the strategy will likely fail the moment conditions shift slightly.

8.6 — Chopping Block (Davey)

Remove the top 5% most profitable trades and recalculate the equity curve. If the strategy collapses without its best trades, it depends on lucky outliers and will likely fail live. Based on 3x World Cup Champion Kevin Davey's research across 2,000+ strategies: those passing this test show 25–30% better real-time performance.

8.7 — Probability of Backtest Overfitting

Combinatorially Symmetric Cross-Validation (CSCV) from López de Prado & Bailey (2014). Splits backtest data into all possible train/test combinations and measures how often the best in-sample configuration is also good out-of-sample. A PBO > 0.40 means the entire selection process — not just one strategy — is likely overfit.

8.8 — Edge Decay Analysis

Rolling 12-month Profit Factor over the full 9-year backtest period. A strategy with overall PF=1.8 that declined from 2.5 in 2017 to 1.1 in 2025 has a dying edge — deploying it means betting on a trend that is already fading. Walk-forward tells you if a strategy works on unseen data; edge decay tells you if it is getting weaker.

8.9 — Trade Consistency (Runs Test)

A Wald-Wolfowitz runs test on the win/loss sequence detects whether profitable trades are randomly distributed or clustered in time. Clustering means the strategy only works in specific market regimes. Combined with a profit concentration ratio: if the top 20% of months account for more than 70% of total profit, the strategy is fragile.

8.10 — Regime Analysis

Markets alternate between trending, ranging, and volatile regimes. A strategy that only profits in one regime is a hidden risk. Each EA is tested across 3 volatility regimes (low, normal, high) using ATR-based classification. A strategy must be profitable in all three regimes to PASS. The results also inform portfolio construction — mixing regime-dominant EAs creates genuine diversification beyond simple correlation.

Why It Matters

"Most strategy developers stop after backtesting. But a backtest is just the beginning. When you test 369 strategies, some will look good purely by chance — that is the multiple testing problem. Statistical validation separates genuine alpha from statistical noise. This is the difference between hobby backtesting and real quantitative finance."

Gate Criteria

DSR p < 0.05 for Tier 1 Core classification

Benjamini-Hochberg FDR PASS for Tier 2 Watchlist

Chopping Block: PF > 1.0 after removing top 5% trades

PBO < 0.40 (not overfit)

Edge Decay: PF decline < 40% over backtest period

Runs Test: no significant win clustering (p > 0.05)

Regime Analysis: profitable in all 3 volatility regimes

Result

—

Pending

—

Tier 1 Core TBD

—

Tier 2 Watchlist TBD

—

Rejected TBD

Phase 09

Portfolio Construction

Manual

What It Does

Select and weight strategies for deployment under strict diversification rules. This is the only phase where human judgment plays a role — the data from phases 1–8 provides the inputs, but the final portfolio composition requires understanding of correlation dynamics, market regime exposure, and practical deployment constraints.

Why It Matters

"Even genuine strategies can blow up if you deploy too many correlated ones. A portfolio of 10 breakout strategies on GBPUSD is not diversification — it is concentration with extra steps. Portfolio construction enforces true diversification across edge types, instruments, and timeframes."

Construction Rules

Family-Cap: 3 — Max 3 strategies from same edge type

Symbol-Cap: 2 — Max 2 strategies per instrument

Risk: 0.50% per trade (live) — All backtests use fixed $1,000 risk for honest, non-compounding returns

Target

10–15

Target EAs (V1)

Phase 9b

Implementation Fidelity Gate

Manual

What It Does

Before any strategy touches a live account, it must pass four mandatory pre-deployment checks. This gate catches the silent killers that slip through statistical validation: stale binaries, wrong symbol suffixes, misconfigured risk modes, and orphaned set files.

Four Mandatory Checks

Compile Proof — Every EA compiles without errors on the target terminal

Set-File Audit — Correct symbol, magic number, and risk mode in every .set file

Symbol Check — Broker suffixes present in backtest sets, removed in live sets

Version Match — Binary timestamp matches last source edit. No stale .ex5 files

Phase 10

Deployment

Manual

What It Does

Three-stage deployment protocol. First: demo account forward testing for a minimum of 2 weeks, comparing live fills against backtest expectations. If the forward test metrics match within acceptable variance, stage two: deploy to a funded prop firm account on a dedicated VPS. Stage three: continuous monitoring through Myfxbook with automated alerts for performance deviation.

Why It Matters

"The final reality check. No amount of backtesting can fully replicate live trading. Forward testing on demo catches execution issues, broker-specific quirks, and data feed differences before real capital is at risk. The strategies that make it to deployment have passed 9 layers of validation — but the market always gets the last word."

Deployment Stages

Stage 1: Demo account — 2 weeks minimum forward test

Stage 2: Funded account — Deploy to prop firm via VPS

Stage 3: Monitoring — Continuous tracking via Myfxbook

The 10-Phase
Pipeline.

369 enter. Most won't survive.

Every Phase, Explained.

Why This Matters.

Most EA vendors show you one cherry-picked backtest.

The pipeline eliminates strategies before they reach capital, not after.

90% rejection rate is a feature, not a bug.

Full transparency: every result is published.

Pipeline at a Glance.

See the Evidence.

Strategy Archive

Stay Updated

The 10-PhasePipeline.

369 enter. Most won't survive.

Every Phase, Explained.

Why This Matters.

Most EA vendors show you one cherry-picked backtest.

The pipeline eliminates strategies before they reach capital, not after.

90% rejection rate is a feature, not a bug.

Full transparency: every result is published.

Pipeline at a Glance.

See the Evidence.

Strategy Archive

Stay Updated

The 10-Phase
Pipeline.