Be Wary of The Data


A new academic paper by Albert Menkveld  titled “Shades of Darkness: A Pecking Order of Trading Venues” caught our attention.  The paper looks to analyze how and when orders are routed based on cost and immediacy.  Here are a few main points from the paper:

–          The fragmentation of trading—between exchanges and dark venues, as well as across dark venues— is a double-edged sword. It creates a conflict between the efficient interaction among investors and investors’ demand for a diverse set of trading mechanisms.  

 –          We hypothesize that investors “sort” dark and lit venues by the associated costs (price impact, information leakage) and immediacy, in the form of a “pecking order.”  

 –          The top of the pecking order consists of venues with the lowest cost and lowest immediacy, and the bottom of the pecking order consists of venues with the highest cost and highest immediacy. The pecking order hypothesis predicts that, on average, investors prefer searching for liquidity in low-cost, low- immediacy venues but will move towards higher-cost, higher-immediacy venues if their trading needs become more urgent.  

 –          The pecking order hypothesis predicts that, following an upward shock to VIX or earnings surprise, trading volumes should migrate from low-cost low-immediacy venues to high-cost high-immediacy venues; therefore, the change in volume share of a venue should become progressively larger the further out it is in the pecking order. 

 –          The data support the pecking order hypothesis.  Following an upward shock to VIX, dark pools that cross orders at the midpoint lose a substantial market share, dark pools that allow some price flexibility lose a moderate market share, and lit venues gain market share. The same pattern is observed following an earnings-surprise shock.

While the Menkveld paper is interesting and raises some good points, it does have one major weakness: it lacks good data.

One of the first things we do when reading a new academic paper is to check the bibliography section and check the source of the data being referenced.  Here is what Menkveld cites as his source for data:

“Our data sample covers 117 stocks in October 2010 (21 business days). In addition to TAQ trading volumes and National Best Bid and Offer, we use two main proprietary data sources: (1) transactions in five types of dark venues, (2) HFT activity in the NASDAQ main market.” 

Menkveld is referencing data from what is known as the NASDAQ HFT data set.  Why do academics continue to only use the outdated NASDAQ HFT data set when they are trying to determine HFT activity?  The answer is because they don’t have access to any other data which identifies the style of trader who executed the trade.   Academic papers that rely on only a subset of data (produced four years ago by a for-profit stock exchange) must be questioned.

Not only is the NASDAQ HFT data set old but it is also incomplete since it only identifies known proprietary trading HFT firms.  Menkveld noted that “we refer to a trade as an HFT trade if at least one side of the trade is an HFT firm”.   However, Nasdaq is unable to differentiate if an HFT is using a brokers pipes to send orders  or if a proprietary trading division of a large broker dealer is trading in a high frequency trading capacity. Brogaard and  Hendershott noted this problem in their  paper “High Frequency Trading and Price Discovery” :

One limitation of the data is that NASDAQ cannot identify all HFT. Possible excluded HFT firms are those that also act as brokers for customers and engage in proprietary lower-frequency trading strategies, for example, Goldman Sachs, Morgan Stanley, and other large and integrated firms. HFTs who route their orders through these large integrated firms cannot be clearly identified, so they are also excluded. The 26 HFT firms in the NASDAQ data are best thought of as independent proprietary trading firms.”

While we had high hopes for the content of Menkveld paper, unfortunately since it uses data from the 2010 NASDAQ HFT data set, we must cast it aside due to questionable data. We have no doubt that the academics who produce these research reports mean well and are trying to help regulators make informed decisions, but without access to current, full sets of data, the research they produce must be challenged.