Development and Evaluation


Once the initial hypothesis has been validated, the next step in to refine the prototype and develop a true working strategy. This process will vary dramatically depending on the type of system being developed. Execution edge and algorithmic strategies, for instance, will require a profound knowledge of electronic financial markets and considerable investment in the technology required to exploit the perceived edge. Moreover, these systems usually provide very marginal per-trade profit margins, so understanding trade costs and the impact of potential execution errors here is essential.

Longer-term strategies using statistical edges are somewhat more forgiving when it comes to trade costs, as these typically represent a small fraction of the system’s per-trade profit expectancy. But all strategies, irrespective of their timeframe, will require a comprehensive risk-management framework to ensure that all trading costs, fixed and variable, as well as potential equity drawdowns are clearly understood and accounted for in the system design.

Below we look at the various steps involved in converting a prototype into a trading strategy.

Variables and Filters
The validated prototype should already contain the basic logic describing the trading edge. However, various variables and filters can be tested to see if they can further improve the performance of the system. These could include long-term market condition (bullish/bearish), short-term market condition (overbought/oversold), trading volume, price-action, plus a multitude of other variables and indicators. The general rule is that the simpler the system, the better, as a multiplication of variables will only serve to reduce the system’s trade count and will often lends itself to curve-fitting.  Simple systems with “blunt edges” tend to be more resilient over time, and are considerably easier to trade live.

Commissions represent the variable cost of trading and these must be accounted for at this stage of the development process. All too often an otherwise sound strategy that appears consistently profitable on paper will be outdone by commission costs. This is particularly true with short-term trading systems (scalping, day-trading, etc).  The good news is that fierce competition among discount brokers since the late 1990’s has resulted in relatively low fees. Moreover most brokers offer volume discounts, further lowering per trade costs.

Slippage is the difference between the last printed ask/bid price and the price at which the order is actually filled. Slippage is arguably the one cost with the greatest potential impact on a strategy’s profitability. Low volume instruments are the most likely to suffer from slippage, but even those with high volumes and small bid/ask spreads can “slip” considerably in fast-moving markets, particularly when prices are heading south. Most development platforms offer a slippage variable that can be used when backtesting a strategy. Using a conservative (i.e. large) value will help gauge the robustness of the system during turbulent market conditions.

Risk-management relates to a set of rules aimed at protecting the trader’s capital. These may include stop losses, position size parameters, maximum drawdown rules, maximum number of trades per day rules, and so on. Risk-management should also describe how the new strategy will interact with other production strategies (capital allocation, directional conflicts, etc), particularly if the strategy is intended to be traded automatically without human intervention.

With all the parameters in place – system logic, variables & filters, commission costs, allowance for slippage – the strategy can then be optimized in search of “optimal edge manifestation”. Optimizations are always performed on the in-sample data (not the out-of-sample data). The objective here is to search for the best set of parameters without compromising the integrity of the system. A fair understanding of statistics, a lot of experience plus some degree of common sense are important to avoid curve-fitting the results.

Out-of-sample test
This should always be done at the very end of the development process, after the strategy has been optimized. At this stage the final strategy is applied to the out-of-sample data – that is, the data that was not used in developing and optimizing the strategy. Only the results of an out-of-sample test can confirm the system’s validity. The performance statistics of the in-sample and out-of-sample tests are then compared. Similar results would suggest that the trading edge was properly captured during the development process and persisted in the out-of-sample test. Poor out-of-sample results would suggest that the trading edge did not withstand the test of time and/or that the system was over-optimized.

Peer review
Traders and system developers tend to be very secretive and rarely share their strategies, particularly the profitable ones. Still, it is very easy to get caught up in one’s own theories to the point of sometimes not seeing the forest for the trees. It is therefore helpful for strategy development to be done in teams so that others may provide input at tall stages of the process.


Below we look at the key measures to focus on when evaluating the performance results of a trading strategy.

Trade Count
The higher the trade count, the greater the statistical significance of the results. Ideally a study should contain a large number of instances. However, certain systems (particularly seasonal ones) only trigger rarely and should therefore not be discounted purely based on low trade count.

Percent Profitable (Win Rate)
A high % win rate is often preferred by many traders, as it means that the maximum number of consecutive losers (and associated max drawdown) is likely to be contained by the system. But a low win rate coupled with a high average win to average loss ratio can still yield a profitable trading system. So win rate must be viewed as only one of many performance measures.

Profit Factor (PF)
A system’s profit factor is defined as gross profits / gross losses. A number higher than 1 (in a long system) indicates a winning system, a number below 1 a losing system. The higher the PF, the better. Consistency of PF figures during the in-sample, out-of-sample and walk-forward periods is essential as this tends to confirm the viability of the system over time.

Average Trade Net Profit (ATNP)
This is the average amount of money won or lost for all completed trades. Typically systems with short holding periods will generate low ATNP numbers while those with longer holding periods will generate higher ATNP numbers. This often overlooked piece of data is key in assessing the profitability of the system when real-life commission costs and slippage are added to the analysis.

Ratio of Average Win to Average Loss
Generally the higher this number the better, but profitable systems with very high win rates may sometimes yield low average win/loss ratios.

Max consecutive Winning & Losing Trades
In the absence of an equity curve, this information offers a view into the consistency of the system over time.

Maximum Drawdown (max DD)
A number of drawdown measures (peak-to-valley, trade-close to trade-trade, etc) show the system’s realized and unrealized losses during the study period. All investors seek to minimize drawdown numbers, but some systems (particularly those that step into extreme oversold scenarios) tend to suffer from relatively high DD numbers. For these, adequate position sizing – rather than the use of stop losses arguably the best way to manage risk.

Sharpe Ratio & RINA Index
There exist a number of industry-standard system evaluation measures, including Sharpe and RINA, but only quantitative analysts who fully understand the way these statistics are generated should make use of them.

Intuitiveness of Study
The best studies are those where the forces driving the bias – be they bullish or bearish – are clearly understood. This provides confidence in the system and motivates the trader to use it confidently and systematically despite the occasional and inevitable string of losing trades. Systems generated using “brute-force”  i.e. not in support of an initial hypothesis – will often disappoint when put into production.

Integrity of the Development Process
There should be no short-cuts in the development process of a trading system. Over-optimization and curve-fitting are pit-falls that can only be avoided using a standardized development process that uses in-sample, out-of-sample and walk-forward analysis.

Equity Curve
Most seasoned developers use the equity curve to make their final assessment of the overall health of a trading system. Eyeballing the curve, looking for a consistent slope over time and a lack of major outliers, is for many the best way to gauge the overall viability of a system.