How do I evaluate a trading system?
In one paragraph
Evaluating a trading system requires examining win rate, average risk/reward, maximum drawdown, and the statistical robustness of backtest results across different market conditions.
What this actually means
Evaluating a trading system goes well beyond asking whether it made money in recent backtests. A system that produced strong returns over one market regime may fail in another, and separating genuine edge from historical coincidence requires a structured assessment process.
The core metrics that matter most are: win rate (percentage of trades that close profitably), average reward-to-risk ratio (average winner size divided by average loser size), expectancy (the expected profit or loss per dollar risked, calculated from win rate and reward/risk), and maximum drawdown (the largest peak-to-trough decline in account equity). A system with a low win rate can be highly profitable if the average winner is large relative to the average loser. The math of expectancy ties these together.
Maximum drawdown is particularly important because it determines whether a trader can psychologically and financially survive the system's worst periods. A system with a 40% maximum historical drawdown may look attractive on paper but will almost certainly be abandoned under the emotional pressure of real-money losses of that magnitude.
Robustness testing distinguishes a true edge from curve-fitting. Traders evaluate robustness by testing the system across multiple time periods (including out-of-sample data not used during development), different instruments, and by slightly varying key parameters to confirm that performance doesn't collapse when inputs shift by small amounts. A system that only works with very specific parameter combinations is likely overfit to historical noise.
Sample size also matters. A backtest covering 20 trades reveals little. A system needs hundreds of trades across varied conditions before the statistics become meaningful.
Finally, transaction costs — commissions, slippage, and spread — must be incorporated realistically. Many systems that appear profitable in testing show significantly reduced or negative expectancy once real-world execution costs are applied.
