The 10-Phase
Pipeline.

433 strategies enter. ~34 survive. Here is every filter they must pass — and why each one exists.

433 enter. Most won't survive.

10 gates. Each one eliminates strategies that lack a genuine edge. The pipeline is live — 5 phases complete, 5 remaining.

433
strategies built
200
stress PASS
5
gates remaining
?
will survive

Every Phase, Explained.

No shortcuts, no hand-waving. Here is exactly what each phase does, why it matters, what the gate criteria are, and what gets eliminated.

Phase 01
Research & Build
Automated
What It Does

Strategy mining from every credible source I can find. 46 research rounds so far, spanning academic papers, trading books, AI-assisted research, and unconventional domains. Sources include Carver, Chan, Clenow, Kaufman, Elder, AQR, Quantpedia, MQL5 articles, GitHub repositories, and even physics-based models drawn from thermodynamics, chaos theory, signal processing, and quantum mechanics.

Each idea gets implemented as an Expert Advisor in MQL5. The goal is not to find "the one perfect strategy" — it is to cast the widest possible net across diverse edge types and let the pipeline do the filtering.

Why It Matters

The quality of the final portfolio depends entirely on the diversity and intellectual rigor of the research. I do not just test one idea — I systematically mine every credible source available. A portfolio built from a single edge type is fragile. A portfolio built from 81+ distinct edge types has structural diversification that no amount of parameter tuning can replicate.

Gate Criteria
EA compiles without errors in MQL5
Strategy logic is distinct from existing EAs in the pipeline
Result
433
EAs Compiled
81+
Edge Types
46
Research Rounds
Edge Types Explored
Calendar Anomalies Session Breakouts Mean Reversion Overnight Drift Volatility Regime Microstructure Trend Following Gold & Commodity Fix Window Exploitation Physics-Based Models
Phase 02
Baseline Screening
Automated
What It Does

Every EA is tested across 23 symbols — forex majors and minors, gold, silver, oil, and indices — using the development period only (2017–2022) with real tick data (MT5 Model 4: Every Tick Based on Real Ticks). The out-of-sample period (2023–2025) is deliberately withheld at this stage to preserve it for walk-forward validation in Phase 4. Two MetaTrader terminals running 24/7, each processing backtests in parallel. This is not sampling — it is exhaustive testing across every instrument the strategy could theoretically trade.

Why It Matters

A strategy that only works on one symbol is not robust — it probably found noise, not signal. Testing across 23 diverse instruments forces the strategy to prove it captures a real market phenomenon, not a statistical accident. If a trend-following strategy only profits on EURUSD but fails on every other pair, it is not a trend-following strategy — it memorized EURUSD price history.

Gate Criteria
Profit Factor > 1.30
Total Trades > 200 (statistical significance)
Max Drawdown < 12%
What Gets Eliminated

~88% of all strategies. Most EAs fail here because they either trade too rarely (insufficient data for conclusions), have razor-thin edges that disappear across instruments, or show acceptable returns only with unacceptable drawdowns. This is the most brutal filter in the pipeline — and intentionally so.

Result
80+
Strategies Survive
30,000+
Backtests Generated
~18%
Pass Rate
Phase 03
Robustness Testing (Parameter Sweeps)
Automated
What It Does

For each surviving EA, I test 100–200+ different parameter configurations. If the strategy only works with one exact combination of numbers, it is overfitted — it memorized the training data rather than learning a genuine pattern. Parameter sweeps reveal whether the edge lives in the concept or in the specific numbers the optimizer chose.

Why It Matters

Curve-fitting is the #1 killer in quantitative trading. A robust strategy should work across a neighborhood of parameter values, not just the single combination the optimizer found. This is the fundamental difference between a strategy that discovered a real market edge and one that memorized historical noise.

"SM_419 was tested with 168 different parameter combinations. 156 of them were profitable. That is 92.9%. When 93% of all reasonable parameter choices make money, the edge is in the concept, not the numbers."
Gate Criteria
High percentage of configurations must be profitable
Top performers reach 80–93% PASS rates
Star Performers
100%
SM_387 & SM_414
92.9%
SM_419 ProGo
91.7%
SM_406 Blastoff
Phase 04
Rolling Walk-Forward + Commission Test
Automated
What It Does

Two-part validation. First, Rolling Walk-Forward: the strategy is tested across 4+ overlapping anchored folds — for example DEV 2017-2020 / HO 2021-2022, then DEV 2017-2021 / HO 2022-2023, and so on. The strategy must perform on ALL out-of-sample windows, not just one lucky split. This eliminates strategies that only worked in one specific market period. Second, Commission Test: the strategy is tested for the first time across the full history (2017–2025) with real ECN commissions ($7/lot round-trip) included. This is deliberately the first phase to use the complete dataset — ensuring that walk-forward validation was performed on genuinely unseen data.

Why Walk-Forward Matters

In-sample performance means nothing. The only question that matters is: does this strategy make money on data it has never seen? A single train/test split can get lucky. That is why I use rolling walk-forward with multiple out-of-sample windows — the strategy must perform consistently across different time periods, not just one holdout set. If it cannot profit on unseen data across multiple windows, it is worthless — no matter how impressive the backtest looks.

Why Commission Testing Matters

Trading is not free. Many strategies look profitable until you add real costs. A strategy with PF 2.1 can drop to PF 1.3 after commissions — a 38% decline. If the edge does not exceed the cost of trading, there is no edge. Commission testing eliminates strategies whose apparent profits are really just paying for the privilege of trading.

Gate Criteria
Rolling WF: PASS on all windows (4+ anchored folds)
Commission-adjusted Profit Factor > 1.0
Result
39
EAs Rolling WF Tested
39
All Windows PASS
42
Commission PASS
Phase 05
Stress Test MEDIUM
Automated
What It Does

Apply realistic adverse conditions to every surviving strategy: 2 pips additional slippage on every trade, 2x spread widening (simulating volatile market conditions), and $14/lot commission (double the normal ECN rate). This simulates what real trading looks like during news events, low-liquidity sessions, and unfavorable broker conditions.

Why It Matters

Live trading conditions are systematically worse than backtests. Spreads widen during news releases. Slippage is a fact of life, especially for retail traders. By testing with doubled costs and widened spreads, I am stress-testing the edge under conditions that are realistic for prop firm and retail trading. A strategy that barely survives normal conditions will collapse under stress.

Gate Criteria
2 pips additional slippage
2x spread widening
$14/lot commission (2x normal)
Must remain profitable under all stress conditions
Result
76
PASS
93%
Pass Rate (76/82)

The high pass rate at this stage is not a sign of a weak filter — it is evidence that phases 1–4 already eliminated the fragile strategies. The survivors at this point have edges wide enough to absorb realistic cost increases.

Phase 06
Stress Test HARSH
New Automated
What It Does

Extreme adversity simulation. 5 pips slippage (2.5x the MEDIUM level), 3x spread widening, $20/lot commission (nearly 3x normal), and — critically — randomly reject 10% of trades. That last condition simulates broker rejections, requotes, and connectivity issues that every live trader encounters but no standard backtest models.

Why It Matters
"I want strategies that can survive conditions worse than anything I will encounter in real trading. If a strategy dies under HARSH conditions, it had a razor-thin edge that any unexpected event could destroy. The strategies that pass HARSH have edges wide enough to survive the real world."

This phase was added mid-project because I was not confident that MEDIUM stress was severe enough. In real trading, the worst conditions always exceed your expectations. HARSH ensures the portfolio can absorb scenarios that are deliberately worse than reality.

Gate Criteria
5 pips slippage
3x spread widening
$20/lot commission
10% random trade rejection
Profit Factor > 1.0
Max Drawdown < 15%
Phase 07
Multi-Seed Overfitting Detection
New Automated
What It Does

Run every surviving strategy with 5 different random seeds (42, 17, 99, 7, 2026). Random seeds affect order execution simulation, tick generation, and timing within the backtesting engine. If a strategy's results change dramatically depending on the random seed, its "edge" is an artifact of one specific simulation path — not a real market phenomenon.

Why It Matters
"This is perhaps the most subtle test in the entire pipeline. A genuinely robust strategy should produce similar results regardless of which random seed the simulator uses. If changing the seed turns a winner into a loser, the strategy's 'edge' was an artifact of the specific simulation path — not a real market phenomenon."

Most retail strategy developers never run this test. They optimize on one simulation path, see a great result, and assume it is real. Multi-seed testing is one of the clearest ways to separate genuine edges from simulation artifacts.

Gate Criteria
PF variance across 5 seeds < 20%
No seed produces PF < 1.0
Seeds tested: 42, 17, 99, 7, 2026
Phase 08
Statistical Validation
Automated
What It Does

Ten rigorous sub-phases of statistical testing, designed to answer one question: is this edge real, or did it appear by chance because I tested 433 strategies?

Sub-Phases
8.1 — Correlation Analysis

Strategies must be genuinely independent. Results: mean pairwise |r| = 0.037, zero pairs above 0.50, and 92% of pairs below 0.20. If two strategies are highly correlated, deploying both creates hidden concentration risk.

8.2 — Deflated Sharpe Ratio + Monte Carlo + FDR

Adjusts for multiple testing bias. With 433 strategies tested, some will look good purely by chance — that is the multiple comparisons problem. The Deflated Sharpe Ratio (DSR) accounts for this. Monte Carlo simulation (10,000 iterations) validates robustness. The Benjamini-Hochberg False Discovery Rate (FDR) controls for family-wise error rate.

8.3 — Tail Dependence

Do strategies correlate more during extreme market conditions? This checks for synchronized losses — the worst-case scenario for any portfolio. Result: correlations actually decrease under stress (-0.011), meaning the portfolio diversifies better precisely when it matters most.

8.4 — Seasonal Analysis

No strategy should depend on a single month or season. All 12 months must be profitable across the portfolio. A strategy that only works in January is a calendar anomaly trade — and needs to be identified as such, not deployed as if it were a year-round edge.

8.5 — Neighborhood Stability

Test nearby parameter values to confirm the strategy sits on a stable plateau, not a sharp peak. A plateau means the edge is robust to small changes in market microstructure. A peak means the strategy will likely fail the moment conditions shift slightly.

8.6 — Chopping Block (Davey)

Remove the top 5% most profitable trades and recalculate the equity curve. If the strategy collapses without its best trades, it depends on lucky outliers and will likely fail live. Based on 3x World Cup Champion Kevin Davey's research across 2,000+ strategies: those passing this test show 25–30% better real-time performance.

8.7 — Probability of Backtest Overfitting

Combinatorially Symmetric Cross-Validation (CSCV) from López de Prado & Bailey (2014). Splits backtest data into all possible train/test combinations and measures how often the best in-sample configuration is also good out-of-sample. A PBO > 0.40 means the entire selection process — not just one strategy — is likely overfit.

8.8 — Edge Decay Analysis

Rolling 12-month Profit Factor over the full 9-year backtest period. A strategy with overall PF=1.8 that declined from 2.5 in 2017 to 1.1 in 2025 has a dying edge — deploying it means betting on a trend that is already fading. Walk-forward tells you if a strategy works on unseen data; edge decay tells you if it is getting weaker.

8.9 — Trade Consistency (Runs Test)

A Wald-Wolfowitz runs test on the win/loss sequence detects whether profitable trades are randomly distributed or clustered in time. Clustering means the strategy only works in specific market regimes. Combined with a profit concentration ratio: if the top 20% of months account for more than 70% of total profit, the strategy is fragile.

8.10 — Regime Analysis

Markets alternate between trending, ranging, and volatile regimes. A strategy that only profits in one regime is a hidden risk. Each EA is tested across 3 volatility regimes (low, normal, high) using ATR-based classification. A strategy must be profitable in all three regimes to PASS. The results also inform portfolio construction — mixing regime-dominant EAs creates genuine diversification beyond simple correlation.

Why It Matters
"Most strategy developers stop after backtesting. But a backtest is just the beginning. When you test 433 strategies, some will look good purely by chance — that is the multiple testing problem. Statistical validation separates genuine alpha from statistical noise. This is the difference between hobby backtesting and real quantitative finance."
Gate Criteria
DSR p < 0.05 for Tier 1 Core classification
Benjamini-Hochberg FDR PASS for Tier 2 Watchlist
Chopping Block: PF > 1.0 after removing top 5% trades
PBO < 0.40 (not overfit)
Edge Decay: PF decline < 40% over backtest period
Runs Test: no significant win clustering (p > 0.05)
Regime Analysis: profitable in all 3 volatility regimes
Result
Pending
Tier 1 Core TBD
Tier 2 Watchlist TBD
Rejected TBD
Phase 09
Portfolio Construction
Manual
What It Does

Select and weight strategies for deployment under strict diversification rules. This is the only phase where human judgment plays a role — the data from phases 1–8 provides the inputs, but the final portfolio composition requires understanding of correlation dynamics, market regime exposure, and practical deployment constraints.

Why It Matters
"Even genuine strategies can blow up if you deploy too many correlated ones. A portfolio of 10 breakout strategies on GBPUSD is not diversification — it is concentration with extra steps. Portfolio construction enforces true diversification across edge types, instruments, and timeframes."
Construction Rules
Family-Cap: 3 — Max 3 strategies from same edge type
Symbol-Cap: 2 — Max 2 strategies per instrument
Risk: 0.15% per trade — Position sizing based on account equity
Target
10–15
Target EAs (V1)
Phase 10
Deployment
Manual
What It Does

Three-stage deployment protocol. First: demo account forward testing for a minimum of 2 weeks, comparing live fills against backtest expectations. If the forward test metrics match within acceptable variance, stage two: deploy to a funded prop firm account on a dedicated VPS. Stage three: continuous monitoring through Myfxbook with automated alerts for performance deviation.

Why It Matters
"The final reality check. No amount of backtesting can fully replicate live trading. Forward testing on demo catches execution issues, broker-specific quirks, and data feed differences before real capital is at risk. The strategies that make it to deployment have passed 9 layers of validation — but the market always gets the last word."
Deployment Stages
Stage 1: Demo account — 2 weeks minimum forward test
Stage 2: Funded account — Deploy to prop firm via VPS
Stage 3: Monitoring — Continuous tracking via Myfxbook

Why This Matters.

The pipeline is not a marketing gimmick. It is the core engineering process that determines whether capital gets risked or not. Here is what makes it different from what you will find elsewhere.

Most EA vendors show you one cherry-picked backtest.

No walk-forward validation. No commission testing. No stress testing. No statistical validation for multiple testing bias. They optimize on 5 years of data, show you the best curve, and call it a product. The 10-phase pipeline exists specifically to make that kind of self-deception impossible.

The pipeline eliminates strategies before they reach capital, not after.

Every filter exists for one reason: to prevent a strategy from losing real money. It is cheaper to kill a strategy in a backtest than to kill it in a live account. The pipeline is designed so that by the time a strategy touches real capital, it has already survived conditions worse than what the live market will throw at it.

92% rejection rate is a feature, not a bug.

Of 433 strategies built, approximately 34 survive to genuine status. That is an 8% survival rate. This is not a sign of failure — it is a sign that the filters are working. The strategies that make it through have been validated at a level that most institutional quant funds would recognize as rigorous.

Full transparency: every result is published.

Not just the winners. The Strategy Archive includes every strategy I have built — including the 382+ that failed. You can see exactly why each one was rejected. I publish the failures because they are the evidence that the pipeline works.

Pipeline at a Glance.

433
EAs Built
30,000+
Backtests Run
45
EAs Sweep-Tested
200
Stress PASS
5/10
Phases Complete
0
Deployed (yet)

See the Evidence.

Browse the complete Strategy Archive or subscribe to get notified when new strategies pass the pipeline.

Strategy Archive

Every strategy I have built — with full backtest data, sweep results, walk-forward charts, and commission reports. Including every failure.

Browse the Archive

Stay Updated

Get notified when new strategies survive the pipeline, new research rounds launch, or deployment milestones are reached.