Introduction to Bootstrapping with R

Bootstrapping is a process where your original data is re-sampled to produce a distribution of a single statistic such as the mean. The Wikipedia article has more information here. Since time-series data often have some amount of dependency from one period to the next, we re-sample the original data in blocks. Assuming your data set has 100 observations and the block size is 6, we start at a random observation and sample the next 6 observations. We start at another random observation and sample another consecutive 6 observations repeating this process until we have 100 observations which is called a replicate. (When n/block size is not an integer, we get a shorter block at the end.) The statistic (in this case the mean) is calculated for each replicate. Multiple replicates are used to construct a distribution of the statistic so we can now draw inferences from the data. Since the function tsboot only returns the result of bootstrapping the statistic, you will need to find the actual data in each replicate in order for others to reproduce your results. The original data is not stored in the output but the index of the original data can be obtained using the boot.array command. Before you use your newly generated data, you will want to check whether the settings you used such as the number of replicates and the block size are adequate to capture the statistic of interest. A simple plot of the tsboot output will do the trick. For a more in-depth analysis of the proper block size, check out the b.star function in the np package and research by Poltis and White (2004). When you tire of calculating basic statistics, tsboot will accept a custom function that can be used instead of a command such as “mean” to calculate the statistic. The December 2002 edition of R News has a good overview of the boot package.

require(boot)
# Simple example of bootstrapping mean from 100 random numbers
x <- rnorm(100)
y <- tsboot(x, mean, R=10, l=20, sim=”fixed”)
# histogram of R number of statistic (index = t of interest from tsboot output)
plot(y, index=1)
# matrix of the data index (column) occurring in bootstrap replicate (row)
y.array <- boot.array(y)
# Write to CSV
write.csv(y.array,”/yourdirectory/yourfile1.csv”)
write.csv(x,”/yourdirectory/yourfile2.csv”)

Author: Probable Pattern

Former Marine and Curious Critter

%d bloggers like this: