Zacks

Professional Services


 

Back to Stump the Quant Main Page

What are the most common conventions for setting up an effective backtesting database?

In a typical backtest project, up to 80% of the time is spent in designing and building the backtest database to be used. The actual backtest runs are simple endeavors if the up-front work is done well.

While everyone has different variables that are important to them and different buy and sell criteria, similar conventions run common through most all backtest databases. The following discussion certainly does not exhaust the list, but touches on a range of the more common conventions.

Work within a custom database. Zacks ZBT_PRI.DBS contains many of the data items in which a user is interested and is a good universe of stocks with which to begin. However, most often, there are custom items that the user will want to create. We discourage the practice of writing custom items to the end of a production database. The custom database can contain the items and time series specific to the user's individual needs.

Start with a large, inclusive universe. For a backtest to be run without survivor bias, the database must include research companies. In ZBT_PRI, of the 6800+ available companies, roughly 1/3 of the universe is made up of research companies. Often a user will have initial screening criteria; for example, market capitalization or S&P500 membership. The universe needs to be large enough to allow the selection of a representative sample of those companies that would have passed the screening criteria at any point in time. The universe must also include the benchmark that is to be used.

Include key data items. Include cusips. Cusips can be extremely useful if data is being brought into the custom database from an outside source. Because tickers may be reused, linkage can become a problem that cusips can help to overcome.

Holding period returns, prices, shares outstanding and dividends are critical. Obviously, holding period returns are necessary in a backtest database in order to measure performance. ZBT_PRI stores monthly HPRs. Frequency of items needs to match the test. Weekly HPR for weekly backtest, monthly Market Cap for monthly test, etc.

Consider an appropriate time series and frequency of data items. While seemingly simple, lots of headaches occur around these areas. We'll address the issue of time series first. Enough data must be included in the database to cover the length of the test. Obvious, right? Remember, however, that if a user is calculating a 5 year moving average of anything and the test is a 10 year backtest, the user needs AT LEAST 15 years of underlying data that will be transformed to actually provide 10 years of 5 year moving average data. There is also the consideration of lagging data in order to avoid look ahead bias.. It is appropriate to lag many kinds of data for 1 quarter, 1 or 2 months, and so on. Earnings for 12/31/99, while stored in the 12/31/99-time slot are not truly known on that date. More realistically, they may not have been reported until late January or February 2000. The data needs to be lagged in these cases to provide accurate backtesting results. Take that one step further. If underlying data needs to be calculated and lagged, it too needs to have more periods available than just what it takes to cover the test period.

There are two items for your consideration: One, there are many right ways of doing things. Two, it is never too late to teach an old backtester new tricks. Much of what we do is by trial and error. Nothing beats hands on experience as a teacher. You are welcome and encouraged to share tricks that you've learned over time.

 

You can E-mail your questions to: comments@zacks.com

 

 



Copyright © 2001
Zacks Investment Research, Inc.