Composing Synthetic Prices For Extended Historical ETF Data

Backtesting of strategies is best done with at least 10+ years of historical price data, preferably even more. However, ETF's with such prolonged historical data are rare. The obvious escape would be to replace ETF's with look-a-likes from mutual fund companies like Vanguard or Fidelity. Can't we have both ways at the same time?


Actually we can, thanks to Michael Kapler's excellent Systematic Investor Toolbox for R: SIT. Some time ago Michael added a special function to his SIT: getSymbols.extra. Using "R", the language for statistical computing and graphics (get it here), the code composes synthetic ETF's with extended historical data retrieved from suitable mutual funds on full auto while taking care of a seamless fit along the way. With a minor addition to Michael's code, in one and the same run for each ETF the newly created synthetic prices can be saved to CSV-file. Next, with its Import Wizard, these CSV-files can be imported into AmiBroker's database quite easy. Alternatively, pulling the CSV-data into Excel is another achievable option.

CSV-data after import in Excel

The chart below shows the extended price history of $VWO. First the data of EEM is added and calibrated to fit and extend VWO. Next the same is done with the addition of VEIEX' data. Finally the data of FEMKX is added, thus prolongating VWO's history from 2005 back into 1990 adding almost 15 years of backtestable data.

$VWO: VWO extended with EEM extended with VEIEX extended with FEMKX

How many synthetic ETF's are necessary is for anyone to decide, but to demonstrate the batch process for 8 ETF's the following composition is used:

Quartz output by R

Using Michael's getSymbols.extra code as foundation, only one extra line had to be added to save the extended price series for each ticker to a separate CSV-file. For the sole purpose of distinction between the native ETF and its synthetic extended couterpart, a $-sign is added as prefix to indicate the synthetic nature of the ticker ($ETF.csv).

After importing the CSV-files into AmiBroker's database, the program allows to use the synthetic data just like any other price data imported by AmiQuote, but of course AmiQuote is not capable of keeping these synthetic tickers up-to-date. So for updates, say quarterly, R has to be re-engaged. Remember to create separate tls-files for the newly composed synthetic tickers too.


Only to demonstrate a backtest over 20 years of data (1994 - 2013) with the following 5 synthetic tickers:
$MDY, $EFA, $VWO, $IEF and $TLT.
Rotation into the best performing asset over the past 65 days with a 15 day price smoothing for momentum calculation.




For an example universe consisting of 40 synthetic assets, a collection of ETF's with suitable mutual funds is shown below. On my platform R manages to download, assemble and save this universe, involving nearly 100 underlying tickers with way more than 1,000 years of combined price data, in under 2 (!) minutes. The same is true for the import into AmiBroker. So all work can be done within no more than 5 minutes. The composition of the above N40 universe is largely based on Appendix B from a paper by Wouter Keller and Hugo van Putten: Tactical MPT and Momentum: the Modern Asset Allocation (MAA).

Finally the complete code, in this case for the example universe with 8 synthetic tickers, is provided below for copy/paste.
NB! When modifying the ticker list be sure to end each ticker line with a comma except for the final ticker line.

Please share your thoughts in the comment section. And do feel free to add new ticker combinations.

Updates:
LEXMX added to the composition for DBC; FEMKX added to the composition for VWO with updated R generated Quartz chart.
Import into Excel example added.