Blog

Pastor and Stambaugh Liquidity

While there a many measures of liquidity, the metric in Pastor and Stambaugh (2003) is the standard used in empirical asset pricing. The first thing we need is daily data. Academics can use the CRSP data set but the idea can be replicated with free data. You will need Date, unique stock identifier (I use CUSIP but ticker works just as well), Volume, Return (make sure to include dividends), and the market return (Here I use the CRSP value weighted index but any broad value weighted index should work). Assuming you’ve already formatted the date properly and the data so everything is numeric and not factor or string, we start with a panel I’ve named ‘crsp’. I also recommend setting up a parallel backend to make use of all your CPU cores. Here I use the parallel package but if you have multiple machines be sure to check out doRedis for a relatively simple way to make the poor man’s computer cluster.

require(data.table); require(doParallel);
require(parallel); require(foreach);
registerDoParallel(detectCores()-1) # Keep 1 CPU core for the OS
crsp=fread(paste(path,'Data_Sets/WRDS_CRSP_LiquidityCalc_1993-2013.csv',sep=''),header=TRUE)
setnames(crsp,c('PERMNO','Date','Ticker','CUSIP','Price','Volume','Return','Shares','MktReturn'))
crsp=crsp[Price>5]; crsp[,MktCap:=1000*Shares*Price];
crsp=crsp[MktCap>1000000]
crsp[,Date:=as.Date(as.character(Date),format='%Y%m%d')];
crsp[,Month:=as.yearmon(Date)]; crsp[,Return:=as.numeric(Return)]
#### Create Metrics ##############################################
crsp[,Ret.e:=Return-MktReturn];
crsp[,SVolume:=Volume*(Ret.e/abs(Ret.e))]
crsp[,L.Ret.e:=c(NA,head(Ret.e,-1)),by='CUSIP']
crsp[,L.SVolume:=c(NA,head(SVolume,-1)),by='CUSIP']
crsp=crsp[,list(Date,Month,CUSIP,Ret.e,L.Ret.e,L.SVolume)]
#### Eliminate months with insufficient observations #############
crsp[,Obs:=length(na.omit(L.Ret.e)),by=c("CUSIP","Month")]
crsp=crsp[Obs>10]
#### Run model on daily data for each month ######################
model=Ret.e~L.Ret.e+L.SVolume
gammas=foreach(cusip=unique(crsp[,CUSIP]),.combine='rbind') %dopar% {
 stock=crsp[CUSIP==cusip];
 stock[,Gamma:=coef(lm(model,data=.SD))[3],by=c('Month')]
 stock=unique(stock[,list(Month,CUSIP,Gamma)]); return(stock)
}
save(gammas,file=paste(path,'Panels/Liquidity_Panel.RData',sep=''))

Ocean Gravity

Drift diving is where the diver finds an area with a strong current and plans to ride the current over the course of the dive rather than fight against it. This video uses a combination of drift diving and free diving to give some perspective on the size and power of the ocean relative to a human.

The Econometric Cycle

Use OLS
Quick results but assumptions are probably violated

Use Nonparametric Techniques
Results are not general enough because my sample isn’t big enough or lacks certain characteristics

Everything is a State Space Model!
Too many parameters → I don’t own a quantum computer.

Use GMM
Think more; use less parameters → Thinking is hard.

Use OLS
OLS is pretty robust

VPIN in R

This code is an implementation of Volume Synchronized Probability of Informed Trading by Easley, Lopez de Prado, and O’Hara (2012) published in the Review of Financial Studies. Easley et al. argue that the CDF of VPIN indicates order flow toxicity. This is their explanation of the flash crash in 2010. You can see slides to my R/Finance 2014 presentation on the topic.

This version of the code is not particularly fast and they are plenty of opportunities for a better programmer than me to tune it up for speed.

#### VPIN calculation #########################################################
#install.packages('fasttime',repos='http://www.rforge.net/')
require(data.table); require(fasttime); require(plyr)
# Assuming TAQ data is arranged in 1 year stock csv files
stock=fread('/TAQ_data.csv'); stock=stock[,1:3,with=FALSE]
setnames(stock,colnames(stock),c('DateTime','Price','Volume'));
stock[,DateTime:=paste(paste(substr(DateTime,1,4),substr(DateTime,5,6),
    substr(DateTime,7,8),sep='-'),substr(DateTime,10,17))]
setkey(stock,DateTime);
stock[,DateTime:=fastPOSIXct(DateTime,tz='GMT')]
stock=as.xts(stock)
# Now we have an xts data frame called 'stock' with a DateTime index and... 
# two columns: Price and Volume
# Vbucket=Number of volume buckets in an average volume day (Vbucket=50)
VPIN=function(stock,Vbucket) {
  stock$dP1=diff(stock[,'Price'],lag=1,diff=1,na.pad=TRUE)
  ends=endpoints(stock,'minutes')
  timeDF=period.apply(stock[,'dP1'],INDEX=ends,FUN=sum)
  timeDF$Volume=period.apply(stock[,'Volume'],INDEX=ends,FUN=sum)
  Vbar=mean(period.apply(timeDF[,'Volume'],INDEX=endpoints(timeDF,'days'),
    FUN=sum))/Vbucket
  timeDF$Vfrac=timeDF[,'Volume']/Vbar
  timeDF$CumVfrac=cumsum(timeDF[,'Vfrac'])
  timeDF$Next=(timeDF[,'CumVfrac']-floor(timeDF[,'CumVfrac']))/timeDF[,'Vfrac']
  timeDF[timeDF[,'Next']<1,'Next']=0
  timeDF$Previous=lag(timeDF[,'dP1'])*lag(timeDF[,'Next'])
  timeDF$dP2=(1-timeDF[,'Next'])*timeDF[,'dP1'] + timeDF[,'Previous']
  timeDF$Vtick=floor(timeDF[,'CumVfrac'])
  timeDF[,'Vtick']=timeDF[,'Vtick']-diff(timeDF[,'Vtick']); timeDF[1,'Vtick']=0
  timeDF=as.data.frame(timeDF); timeDF[,'DateTime']=row.names(timeDF)
  timeDF=ddply(as.data.frame(timeDF),.(Vtick),last)
  timeDF=as.xts(timeDF[,c('Volume','dP2','Vtick')],
    order.by=fastPOSIXct(timeDF$DateTime,tz='GMT'))
  timeDF[1,'dP2']=0
  timeDF$sigma=rollapply(timeDF[,'dP2'],Vbucket,sd,fill=NA)
  timeDF$sigma=na.fill(timeDF$sigma,"extend")
  timeDF$Vbuy=Vbar*pnorm(timeDF[,'dP2']/timeDF[,'sigma'])
  timeDF$Vsell=Vbar-timeDF[,'Vbuy']
  timeDF$OI=abs(timeDF[,'Vsell']-timeDF[,'Vbuy'])
  timeDF$VPIN=rollapply(timeDF[,'OI'],Vbucket,sum)/(Vbar*Vbucket)
  timeDF=timeDF[,c('VPIN')]; return(timeDF)
}
out=VPIN(stock,50)
###############################################################################

Here is what the original file looks like:

1993-01-04 09:35:25,10.375,5300,40,0,,N
1993-01-04 09:36:49,10.375,25000,40,0,,N
1993-01-04 09:53:06,10.375,100,40,0,,N
1993-01-04 10:04:13,10.375,200,40,0,,N
1993-01-04 10:04:20,10.375,100,40,0,,N
1993-01-04 10:24:42,10.375,1000,40,0,,N
1993-01-04 10:25:19,10.375,600,40,0,,N
1993-01-04 11:31:04,10.5,10000,40,0,,N
1993-01-04 12:13:09,10.5,200,0,0,,M
1993-01-04 12:13:38,10.5,200,0,0,,M

Stacey Kent

It baffles me why we don’t have more of this style of jazz which is simplistic enough to be musically accessible to everyone but substantial in chord structure and style. This is not the same as smooth jazz or elevator music. Raconte Moi is beautiful but Ces Petits Riens is my favorite.

Emacs and ESS on OS X for Beginners

When I first started learning R, I used R Commander as my GUI since it worked with all my operating systems and was relatively easy to setup and use. After a few months, I came to appreciate the simplicity of the RGui that is standard on R installation. Now, I am ready for something more advanced and have decided to try ESS (Emacs Speaks Statistics). Since I do not have any prior experience with Emacs and have not invested any time creating configuration settings and learning keyboard shortcuts, my main goal was to get a very Mac friendly Emacs version that just worked. OS X does come with Emacs installed but I didn’t want to go through the hassle of setting up ESS with the existing EMacs. I decided to use Aquamacs because it looks and behaves the most like a modern OS X application and has ESS enabled when installed. Emacs is available through the terminal but setting up and using ESS without menus does not provide an easier working environment than the RGUI or R Commander.

Emacs uses what are called meta commands since it originated before toolbars. To execute a meta command, type <Ctrl> u, <alt> x and the actual command. On Aquamacs, the right option key is set up to be the meta key so you can simply press <option> x before the actual command. You can also use the <esc> key as the meta key. To start R, press <option> x R <return> or <esc> x R <return>. You can also open an existing R script and click the R icon to start R. To customize ESS, select Customize Aquamacs -> Customize Group… from the Options menu and type ess at the prompt. You can also click the settings icon only after you have opened an R script. I like to have ESS start from the the same directory every time so I add my directory to the “ESS Directory” option and turn off the “ESS Ask for ESS Directory” option. Make sure to “Save for future sessions” before closing the buffer. I had already deleted R32 so I know that Emacs will default to R64 but I’m not sure how to work with both versions.

Update: I have now switched to RStudio since it is free, cross platform compatible, and simple to setup. I highly recommend it to any R user.

Two-Way Sorting

One of the popular ways to motivate a two dimensional model is to double sort your data based on quantiles. This makes empirical asset pricing models exceedingly easy to understand. If I had already defined the breakpoints of the sort, I could just use a function like “plyr” to apply a function like “mean” to the data subsets. For my application, I want to create five portfolios based on one characteristic and then form five portfolio within each of the first five based on another characteristic. To generalize from a 5×5 sort, I include the parameter q to allow for quarters or deciles. The x and y parameters are the column names of the characteristics you want to sort by. As you can see from the last line, I am sorting by CDS premium and liquidity premium. I use latex for my publication needs so I output the dataframe to xtable. I also use the awesome foreach package to speed things up a tiny bit but if you don’t use a parallel backend with foreach, then just replace “%dopar%” with “%do%”. Happy sorting!

require(foreach); require(xtable)
doublesort=function(data,x,y,q) {
 breaks1=quantile(data[,x],probs=seq(0,1,1/q))
 tiles=foreach(i=1:q,.combine=cbind) %dopar% {
 Q=data[data[,x]>=breaks1[i] & data[,x]<breaks1[i+1],y]
 breaks2=quantile(Q,probs=seq(0,1,1/q))
 foreach(j=1:q,.combine=rbind) %dopar%
  mean(Q[Q>=breaks2[j] & Q<breaks2[j+1]])
 }
 colnames(tiles)=foreach(i=1:q) %do% paste(x,i)
 row.names(tiles)=foreach(i=1:q) %do% paste(y,i)
 tiles
}
xtable(doublesort(data,"CDS","lp",5),digits=4)

Dashboards

Precious Metals Dashboard
This R script produces charts of Gold, Platinum, Palladium, and Silver.

# Charts Gold prices in USD from Oanda
require(Quantmod)
#fetch data
getSymbols(c("XAU/USD","XPT/USD","XPD/USD","XAG/USD"),src="oanda")
#setup the plot area, each TA is another graph
layout(matrix(1:4, nrow=2))
#charts to be included
chartSeries(XAUUSD,name="Gold (.oz) in USD", layout=NULL)
chartSeries(XPTUSD,name="Platinum (.oz) in USD", layout=NULL)
chartSeries(XPDUSD,name="Palladium (.oz) in USD", layout=NULL)
chartSeries(XAGUSD,name="Silver (.oz) in USD", layout=NULL)

Interest Rate Dashboard
Another R script that charts the last 36 months of the following interest rates: 1 year U.S. Treasury, 10 year TIPS, 30 year fixed mortgage, Prime, Moody’s Aaa Corporate, and Moody’s Baa Corporate.

library(quantmod)
getSymbols("DGS1",src="FRED") #1 Year US Treasury Daily Constant Maturity rate
getSymbols("DPRIME",src="FRED") #Daily Prime Loan Rate
getSymbols("DAAA",src="FRED") #Daily Moody's Aaa Corporate Bond Yield
getSymbols("DBAA",src="FRED") #Daily Moody's Baa Corporate Bond Yield
getSymbols("MORTG",src="FRED") #30 Year Conventional Mortgage Rate
getSymbols("FII10",src="FRED") #10 Year TIPS Constant Maturity
# Chart Data
layout(matrix(1:6, nrow=3))
chartSeries(last(DGS1,'36 months'),name="US Treasury 1 Year Constant Maturity",layout=NULL)
chartSeries(last(FII10,'36 months'),name="10 Year TIPS Constant Maturity",layout=NULL)
chartSeries(last(MORTG,'36 months'),name="30 Year Fixed Mortgage Rate",layout=NULL)
chartSeries(last(DPRIME,'36 months'),name="Prime Loan Rate",layout=NULL)
chartSeries(last(DAAA,'36 months'),name="Moody's Aaa Corporate Yield",layout=NULL)
chartSeries(last(DBAA,'36 months'),name="Moody's Baa Corporate Yield",layout=NULL)

Rolling Correlations
Chart rolling correlations with a 3 month window.

library(quantmod)
getSymbols(c("SPY","^DJI"),src="yahoo")
data=data.frame(SPY[,6],DJI[,6])
data=as.xts(data,order.by=as.Date(row.names(data),"%Y-%m-%d"))
c1=rollapply(data,65,cor,by.column=F)
Correlation=c1[,2]; chartSeries(Correlation)

First Database in R

The number of simulations I am now running in R consumes all the memory on my workstation causing it to freeze. While I would like to execute the simulations in smaller batches, I need to keep a record of the intermediate data generated which is currently being stored in arrays. One possible solution would be to generate many CSV files to store the intermediate data and read from those but reassembling the data for audit purposes would not be practical. After some excellent suggestions on Stack Overflow, I narrowed my options down to SQLite and MySQL databases with a preference for SQLite due to simplicity. Not being a command line purist, I thought the next step was to find a GUI for setting up the database. While SQLite Database Browser seemed to be the only FOSS GUI available, it requires the Qt library to run and I prefer not to install additional components on my Mac when possible. MySQL does have a FOSS GUI called MySQL Workbench and Thomas Henlich created a plugin for exporting to SQLite but it turns out that SQLite is easy enough to use without a GUI even if you have no prior database experience. The easiest way to get a handle on it is to use a known data set and learn how to send and retrieve it from SQLite using R code. I highly recommend trying to run the database from disk rather than memory since it provides a closer approximation to the problem you are likely to be solving.

Here’s a short example of reading a data set that is currently stored in a csv format, creating a SQLite database, writing the data into a table, and retrieving the information. Make sure that the RSQLite package is installed and active.

# Read data
dataset=read.csv("/yourdirectory/yourfile.csv",skip=1,header=TRUE)
# Create DB
testdb=dbConnect(SQLite(), dbname = "/yourdirectory/testdb.dat")
# Write the data set into a table called table1
dbWriteTable(testdb,"table1",dataset)
# List the tables available in the database
dbListTables(testdb)
# Remove the original data to free up memory
rm(dataset)
# Retrieve the entire table
dbReadTable(testdb,"table1")
# Retrieve the single column called series from the table
buf<-dbSendQuery(testdb,"select series from table1")
testing<-fetch(buf,n=-1); dbClearResult(buf)
dbDisconnect(testdb); rm(buf,testdb)

SQL queries generate a result that is stored in a buffer. Fetch gets data from the buffer and directs it to the variable within R. Before doing another SQL query on the same data, you need to clear the buffer using dbClearResult. This process gets to be a pain when making many queries but the SQLiteDF package has an excellent method for treating SQLite database tables as data frames. I also recommend reading about the TSSQLite package if you are using time series data and work with zoo objects.