Good Data Bad Data Part I



Today and tomorrow, our notes to you want to touch on the topic of Regulator Data.

Today we address the challenges the SEC is facing with data they are collecting about our markets, even as they use Tradeworx’s (Manoj Narang) MIDAS System.

The SEC late last week released some new information on their market structure website, including part two of their review of literature on high frequency trading.  Before they get to the review of the academic literature, the SEC noted how difficult it is to isolate the effects of HFT since their data is limited:

In the absence of trading account data, the use of general proxies for HFT that can be calculated with publicly available, market-wide data may capture a great deal of algorithmic and computer-assisted trading that should not be classified as HFT. ..In addition, other types of computer-assisted trading tools are common in today’s markets that may generate market activity that is difficult to distinguish from HFT, at least in the absence of datasets that can tie market activity to particular trading accounts. “

“A summary of the economic literature must begin by highlighting a formidable challenge facing any researcher of HFT – obtaining useful data that can identify HFT activity. Publicly available data on orders and trades does not reveal the identity of buyers and sellers. As a result, it is impossible to identify orders and trades as originating from an HFT account when relying solely on publicly available information. “

This reads to us that the “general proxies for HFT” that they are using are lumping in too many types of trading (perhaps institutional algorithmic trading, as well at ETF and index arbitrage?), and so even they are wary of drawing conclusions from data derived from this system. Specifically, the SEC notes that it is very difficult to get data on orders and trades down to the user level.

The SEC then goes on to summarize 31 research papers that had access to non-public information, but yet again they warn about the limited data available even in these non-public datasets:

“The staff believes it is important to highlight the limits on the scope of the economic literature to date in examining HFT. Due to the formidable data challenges facing researchers, the papers included in this literature review examine only a relatively small amount of HFT activity. The HFT Datasets generally have been limited to particular products or markets, and the data time periods now are relatively outdated, particularly given the pace of change in trading technology and practices. Accordingly, while the recent economic literature has made great progress in beginning to fill in the picture of HFT, much of the picture remains unfinished…Because the HFT Datasets generally have been limited to particular markets or products, they provide little opportunity to assess HFT strategies that simultaneously seek to capture price differentials across different products and markets.”

The SEC admits in their report that their analysis of HFT is poor due to the lack of refined data.  Absence of datasets that tie market activity to end users, inability to separate algorithmic trading from high frequency trading, outdated and small exchange datasets and the inability to identify particular HFT strategies are all reasons that the SEC cites for the lack of better analysis.

The SEC is awaiting data that may become available to them in the future.  We can only assume this means data that will be generated from the Consolidated Audit Trail.  Unfortunately, the CAT is still years away from being operational and will not include end-user identification codes.  And from what we hear, bidders on the CAT have been dropping quickly due to this long time frame and poor potential for profit.

How on earth are we supposed to feel confident that our regulators are keeping our markets safe when they don’t even have the proper data to analyze?

Tomorrow we will touch on a bit of good news coming out of the CFTC with regard to their Visiting Academic Research Program, which they had shut down last year due to pressure from the CME exchange.