Zacks

Professional Services


 

Back to Stump the Quant Main Page

Besides look-ahead and survivorship bias,
what are the most serious bias issues one
should be aware of while backtesting
historical financial data?
1. Data Mining

Classic data mining involves using multiple models or variables until something seems to "fit" the data. Stepwise regression, pattern recognition and neural networks are often used in data mining expeditions. However, simply running a lot of analyses with different variables or combinations of variables will have the same effect. Models developed by data mining frequently fit past data very well but have poor predictive quality outside the time span sued in deriving and fitting the model.

A single test with a t-statistic that is significant at the 5% level implies that there is a 1-in-20 probability that that the observed relationship arose by chance.

The flip side of this is that if you run 100 tests of variables with no underlying relationship, some will have t-statistic that appear significant at the 5% level.

2. Transaction Costs

If a strategy had actually been used in a substantive way, some stocks' prices would have been different from the prices that were recorded. How much buying or selling pressure would have eliminated inefficiency is unknowable but might have been quite small in small or mid cap stocks. Investors typically underestimate price impact, which accounts for the large part of transaction costs. A good summary of the issues appeared in the Wall Street journal on June 9, 1997: "Trading Costs Rising Along With the Market."

 

You can E-mail your questions to: comments@zacks.com

 

 

Copyright © 2001 Zacks Investment Research, Inc.