What
are the most common conventions for setting up an
effective backtesting database?
In a typical
backtest project, up to 80% of the time is spent in
designing and building the backtest database to be
used. The actual backtest runs are simple endeavors
if the up-front work is done well.
While everyone
has different variables that are important to them
and different buy and sell criteria, similar
conventions run common through most all backtest
databases. The following discussion certainly does
not exhaust the list, but touches on a range of the
more common conventions.
Work within a
custom database. Zacks ZBT_PRI.DBS contains many of
the data items in which a user is interested and is
a good universe of stocks with which to begin.
However, most often, there are custom items that
the user will want to create. We discourage the
practice of writing custom items to the end of a
production database. The custom database can
contain the items and time series specific to the
user's individual needs.
Start with a
large, inclusive universe. For a backtest to be run
without survivor bias, the database must include
research companies. In ZBT_PRI, of the 6800+
available companies, roughly 1/3 of the universe is
made up of research companies. Often a user will
have initial screening criteria; for example,
market capitalization or S&P500 membership. The
universe needs to be large enough to allow the
selection of a representative sample of those
companies that would have passed the screening
criteria at any point in time. The universe must
also include the benchmark that is to be
used.
Include key data
items. Include cusips. Cusips can be extremely
useful if data is being brought into the custom
database from an outside source. Because tickers
may be reused, linkage can become a problem that
cusips can help to overcome.
Holding period
returns, prices, shares outstanding and dividends
are critical. Obviously, holding period returns are
necessary in a backtest database in order to
measure performance. ZBT_PRI stores monthly HPRs.
Frequency of items needs to match the test. Weekly
HPR for weekly backtest, monthly Market Cap for
monthly test, etc.
Consider an
appropriate time series and frequency of data
items. While seemingly simple, lots of headaches
occur around these areas. We'll address the issue
of time series first. Enough data must be included
in the database to cover the length of the test.
Obvious, right? Remember, however, that if a user
is calculating a 5 year moving average of anything
and the test is a 10 year backtest, the user needs
AT LEAST 15 years of underlying data that will be
transformed to actually provide 10 years of 5 year
moving average data. There is also the
consideration of lagging data in order to avoid
look ahead bias.. It is appropriate to lag many
kinds of data for 1 quarter, 1 or 2 months, and so
on. Earnings for 12/31/99, while stored in the
12/31/99-time slot are not truly known on that
date. More realistically, they may not have been
reported until late January or February 2000. The
data needs to be lagged in these cases to provide
accurate backtesting results. Take that one step
further. If underlying data needs to be calculated
and lagged, it too needs to have more periods
available than just what it takes to cover the test
period.
There are two
items for your consideration: One, there are many
right ways of doing things. Two, it is never too
late to teach an old backtester new tricks. Much of
what we do is by trial and error. Nothing beats
hands on experience as a teacher. You are welcome
and encouraged to share tricks that you've learned
over time.
You can E-mail your questions to:
comments@zacks.com