433 strategies enter. ~34 survive. Here is every filter they must pass — and why each one exists.
The Funnel
10 gates. Each one eliminates strategies that lack a genuine edge. The pipeline is live — 5 phases complete, 5 remaining.
Deep Dive
No shortcuts, no hand-waving. Here is exactly what each phase does, why it matters, what the gate criteria are, and what gets eliminated.
Strategy mining from every credible source I can find. 46 research rounds so far, spanning academic papers, trading books, AI-assisted research, and unconventional domains. Sources include Carver, Chan, Clenow, Kaufman, Elder, AQR, Quantpedia, MQL5 articles, GitHub repositories, and even physics-based models drawn from thermodynamics, chaos theory, signal processing, and quantum mechanics.
Each idea gets implemented as an Expert Advisor in MQL5. The goal is not to find "the one perfect strategy" — it is to cast the widest possible net across diverse edge types and let the pipeline do the filtering.
The quality of the final portfolio depends entirely on the diversity and intellectual rigor of the research. I do not just test one idea — I systematically mine every credible source available. A portfolio built from a single edge type is fragile. A portfolio built from 81+ distinct edge types has structural diversification that no amount of parameter tuning can replicate.
Every EA is tested across 23 symbols — forex majors and minors, gold, silver, oil, and indices — using the development period only (2017–2022) with real tick data (MT5 Model 4: Every Tick Based on Real Ticks). The out-of-sample period (2023–2025) is deliberately withheld at this stage to preserve it for walk-forward validation in Phase 4. Two MetaTrader terminals running 24/7, each processing backtests in parallel. This is not sampling — it is exhaustive testing across every instrument the strategy could theoretically trade.
A strategy that only works on one symbol is not robust — it probably found noise, not signal. Testing across 23 diverse instruments forces the strategy to prove it captures a real market phenomenon, not a statistical accident. If a trend-following strategy only profits on EURUSD but fails on every other pair, it is not a trend-following strategy — it memorized EURUSD price history.
~88% of all strategies. Most EAs fail here because they either trade too rarely (insufficient data for conclusions), have razor-thin edges that disappear across instruments, or show acceptable returns only with unacceptable drawdowns. This is the most brutal filter in the pipeline — and intentionally so.
For each surviving EA, I test 100–200+ different parameter configurations. If the strategy only works with one exact combination of numbers, it is overfitted — it memorized the training data rather than learning a genuine pattern. Parameter sweeps reveal whether the edge lives in the concept or in the specific numbers the optimizer chose.
Curve-fitting is the #1 killer in quantitative trading. A robust strategy should work across a neighborhood of parameter values, not just the single combination the optimizer found. This is the fundamental difference between a strategy that discovered a real market edge and one that memorized historical noise.
Two-part validation. First, Rolling Walk-Forward: the strategy is tested across 4+ overlapping anchored folds — for example DEV 2017-2020 / HO 2021-2022, then DEV 2017-2021 / HO 2022-2023, and so on. The strategy must perform on ALL out-of-sample windows, not just one lucky split. This eliminates strategies that only worked in one specific market period. Second, Commission Test: the strategy is tested for the first time across the full history (2017–2025) with real ECN commissions ($7/lot round-trip) included. This is deliberately the first phase to use the complete dataset — ensuring that walk-forward validation was performed on genuinely unseen data.
In-sample performance means nothing. The only question that matters is: does this strategy make money on data it has never seen? A single train/test split can get lucky. That is why I use rolling walk-forward with multiple out-of-sample windows — the strategy must perform consistently across different time periods, not just one holdout set. If it cannot profit on unseen data across multiple windows, it is worthless — no matter how impressive the backtest looks.
Trading is not free. Many strategies look profitable until you add real costs. A strategy with PF 2.1 can drop to PF 1.3 after commissions — a 38% decline. If the edge does not exceed the cost of trading, there is no edge. Commission testing eliminates strategies whose apparent profits are really just paying for the privilege of trading.
Apply realistic adverse conditions to every surviving strategy: 2 pips additional slippage on every trade, 2x spread widening (simulating volatile market conditions), and $14/lot commission (double the normal ECN rate). This simulates what real trading looks like during news events, low-liquidity sessions, and unfavorable broker conditions.
Live trading conditions are systematically worse than backtests. Spreads widen during news releases. Slippage is a fact of life, especially for retail traders. By testing with doubled costs and widened spreads, I am stress-testing the edge under conditions that are realistic for prop firm and retail trading. A strategy that barely survives normal conditions will collapse under stress.
The high pass rate at this stage is not a sign of a weak filter — it is evidence that phases 1–4 already eliminated the fragile strategies. The survivors at this point have edges wide enough to absorb realistic cost increases.
Extreme adversity simulation. 5 pips slippage (2.5x the MEDIUM level), 3x spread widening, $20/lot commission (nearly 3x normal), and — critically — randomly reject 10% of trades. That last condition simulates broker rejections, requotes, and connectivity issues that every live trader encounters but no standard backtest models.
This phase was added mid-project because I was not confident that MEDIUM stress was severe enough. In real trading, the worst conditions always exceed your expectations. HARSH ensures the portfolio can absorb scenarios that are deliberately worse than reality.
Run every surviving strategy with 5 different random seeds (42, 17, 99, 7, 2026). Random seeds affect order execution simulation, tick generation, and timing within the backtesting engine. If a strategy's results change dramatically depending on the random seed, its "edge" is an artifact of one specific simulation path — not a real market phenomenon.
Most retail strategy developers never run this test. They optimize on one simulation path, see a great result, and assume it is real. Multi-seed testing is one of the clearest ways to separate genuine edges from simulation artifacts.
Ten rigorous sub-phases of statistical testing, designed to answer one question: is this edge real, or did it appear by chance because I tested 433 strategies?
Strategies must be genuinely independent. Results: mean pairwise |r| = 0.037, zero pairs above 0.50, and 92% of pairs below 0.20. If two strategies are highly correlated, deploying both creates hidden concentration risk.
Adjusts for multiple testing bias. With 433 strategies tested, some will look good purely by chance — that is the multiple comparisons problem. The Deflated Sharpe Ratio (DSR) accounts for this. Monte Carlo simulation (10,000 iterations) validates robustness. The Benjamini-Hochberg False Discovery Rate (FDR) controls for family-wise error rate.
Do strategies correlate more during extreme market conditions? This checks for synchronized losses — the worst-case scenario for any portfolio. Result: correlations actually decrease under stress (-0.011), meaning the portfolio diversifies better precisely when it matters most.
No strategy should depend on a single month or season. All 12 months must be profitable across the portfolio. A strategy that only works in January is a calendar anomaly trade — and needs to be identified as such, not deployed as if it were a year-round edge.
Test nearby parameter values to confirm the strategy sits on a stable plateau, not a sharp peak. A plateau means the edge is robust to small changes in market microstructure. A peak means the strategy will likely fail the moment conditions shift slightly.
Remove the top 5% most profitable trades and recalculate the equity curve. If the strategy collapses without its best trades, it depends on lucky outliers and will likely fail live. Based on 3x World Cup Champion Kevin Davey's research across 2,000+ strategies: those passing this test show 25–30% better real-time performance.
Combinatorially Symmetric Cross-Validation (CSCV) from López de Prado & Bailey (2014). Splits backtest data into all possible train/test combinations and measures how often the best in-sample configuration is also good out-of-sample. A PBO > 0.40 means the entire selection process — not just one strategy — is likely overfit.
Rolling 12-month Profit Factor over the full 9-year backtest period. A strategy with overall PF=1.8 that declined from 2.5 in 2017 to 1.1 in 2025 has a dying edge — deploying it means betting on a trend that is already fading. Walk-forward tells you if a strategy works on unseen data; edge decay tells you if it is getting weaker.
A Wald-Wolfowitz runs test on the win/loss sequence detects whether profitable trades are randomly distributed or clustered in time. Clustering means the strategy only works in specific market regimes. Combined with a profit concentration ratio: if the top 20% of months account for more than 70% of total profit, the strategy is fragile.
Markets alternate between trending, ranging, and volatile regimes. A strategy that only profits in one regime is a hidden risk. Each EA is tested across 3 volatility regimes (low, normal, high) using ATR-based classification. A strategy must be profitable in all three regimes to PASS. The results also inform portfolio construction — mixing regime-dominant EAs creates genuine diversification beyond simple correlation.
Select and weight strategies for deployment under strict diversification rules. This is the only phase where human judgment plays a role — the data from phases 1–8 provides the inputs, but the final portfolio composition requires understanding of correlation dynamics, market regime exposure, and practical deployment constraints.
Three-stage deployment protocol. First: demo account forward testing for a minimum of 2 weeks, comparing live fills against backtest expectations. If the forward test metrics match within acceptable variance, stage two: deploy to a funded prop firm account on a dedicated VPS. Stage three: continuous monitoring through Myfxbook with automated alerts for performance deviation.
Philosophy
The pipeline is not a marketing gimmick. It is the core engineering process that determines whether capital gets risked or not. Here is what makes it different from what you will find elsewhere.
No walk-forward validation. No commission testing. No stress testing. No statistical validation for multiple testing bias. They optimize on 5 years of data, show you the best curve, and call it a product. The 10-phase pipeline exists specifically to make that kind of self-deception impossible.
Every filter exists for one reason: to prevent a strategy from losing real money. It is cheaper to kill a strategy in a backtest than to kill it in a live account. The pipeline is designed so that by the time a strategy touches real capital, it has already survived conditions worse than what the live market will throw at it.
Of 433 strategies built, approximately 34 survive to genuine status. That is an 8% survival rate. This is not a sign of failure — it is a sign that the filters are working. The strategies that make it through have been validated at a level that most institutional quant funds would recognize as rigorous.
Not just the winners. The Strategy Archive includes every strategy I have built — including the 382+ that failed. You can see exactly why each one was rejected. I publish the failures because they are the evidence that the pipeline works.
Current Metrics
Next Steps
Browse the complete Strategy Archive or subscribe to get notified when new strategies pass the pipeline.
Every strategy I have built — with full backtest data, sweep results, walk-forward charts, and commission reports. Including every failure.
Browse the ArchiveGet notified when new strategies survive the pipeline, new research rounds launch, or deployment milestones are reached.