When something smells foul in markets, you better believe that HFT already knows about it.
High-frequency trading (HFT) algorithms are mainly known for their high speeds and positive effect of increasing liquidity in markets. While this is a proper recognition, very little is known about the intelligence used in these algorithms that allow them to survive for extended periods of time and not go bankrupt. As we’ll see, the story is much deeper than just buying at the bid and selling at the ask.
But before understanding the nuances of these algorithms, let’s take a look at how they get their essential nutrients (data):
Data — Not Your Typical yFinance
The bulk of HFT activities are related to market making, so instead of a quant fund that may use historical price data , HFT mainly uses real-time messaging data. The main provider of this data is CME’s Market Data Platfrom (MDP) 3.0.
This initial stage of the data represents the most raw form, just take a look:
This is a visual representation of what CME returns for 1 customer trade. The FIX message represents how the order was processed by the exchange (we briefly touched on this here), then the trade Trade Summary adds details like the aggressiveness of the trade (aggressive buy = buy over the lowest ask).
More established firms have written their own logic to convert all of these codes into interpretable formats, but this would be extremely tedious for new HFT shops.
Luckily, there are numerous data providers that provide a cleaner interaction with the exchange. An example is DataBento:
Now that you have an idea of how these systems get their data in the first place, let’s take a deeper look:
Regime Sniffing — Formulaic Alphas
It is very important for any HFT operation to constantly be aware of current market regimes that require them to quickly adapt. To better understand this, let’s see an example:
Imagine a naive HFT algorithm that just tries to set a tight bid-ask-spread. On a relatively illiquid stock, the current Bid/Ask for 100 shares is $5.00x6.00. The holder of the stock doesn’t want to sell for $5, and the potential buyer doesn’t want to buy it for $6.
So, the algorithm would post a new bid at $5.50 — now, the seller has a better choice of selling for $5.50 instead of $5 and takes it, leaving the algorithm with 100 shares at a cost of $5.50 each. With this new inventory, it then posts a new ask of $5.51 — at this point, the buyer has a better choice of buying at $5.51 instead of $6, and takes it. The algorithm generates a $1 profit.
This might be fine for a few trades, but what if today is the day of a major fed decision? The stock in question is a company which holds an abnormally high amount of debt, so when news breaks of substantially higher interest rates across the board, the current holders rush to sell.
Our naive algorithm only takes in the current prices, so it sees the bid-ask-spread change from $5.00x$6.00 to $3.50x$5.00. Obviously, this represents an even larger profit opportunity, right?
Well, no — what’s taking place is a regime change. The balance of orders have changed drastically and if the naive algorithm made a trade as usual, it would be stuck in a terrible loop. First, it would buy above the current bid to cross the spread and fill the ask order. With this new position at a cost of, say, $4.50, it would then turn around and try selling it at $4.51 — except it would never fill because it is a seller’s regime with no competitive buyers.
What’s more likely to happen is a new sell order would come in below the algorithms ask, forcing it to put in an even lower ask ensuring a loss. It does this because it tries not to hold the stock for any longer than a few seconds. Repeating this in a tight loop can cause the systems wealth to evaporate quickly.
So, how does a real algorithm respond to these regime changes? There are hundreds of methods, most proprietary, few public —but a common one is through formulaic alphas.
As the name suggests, formulaic alphas are formulas that generate alpha. These formulas act as “signals” for regime shifts, price movements, trade suggestions, and whatever else the formula is written to model. This subject was popularized in the paper 101 Formulaic Alphas.
To understand how this might work in a real HFT algorithm, let’s walk through an example.
There is an orderbook with 2 sides:
500 shares @ 19.98
700 shares @ 19.95
1000 shares @ 19.90
Bid Depth = 1000 + 700 + 500 = 2,200
100 shares @ 20.01
300 shares @ 20.05
500 shares @ 20.10
Ask Depth = 100 + 300 + 500 = 900
Imbalance = Bid Depth(2200) — Ask Depth(900) = 1300
This simple calculation first tells us that based on the current order book state, there is an imbalance of 1,300 which means that there are more willing buyers than sellers. When the imbalance is negative, it means that there are more sellers than buyers.
With this imbalance value, it can be factored into a simple formula that gives a signal when the imbalance is 2 or more standard deviations above the average imbalance:
signal = if imbalance > or < (2_std * mean_imbalance)
The theory behind this is that the signal can represent whether a stock is currently oversold or overbought, which would then inform where the algorithm would post its bid/ask quotes.
For example, if in the current session, the average imbalance is 300 shares (more buyers than sellers), with a standard deviation of 150 shares — then suddenly the imbalance shifts to -5,000, here’s what would happen:
This new regime of significantly more sellers than buyers means that liquidity is needed on the bid, so it would focus on bidding at the lowest possible prices first, since it assumes the imbalance is mean reverting and that if it fills the sell orders at lower and lower prices, then when the imbalance normalizes it will be able to sell at higher prices for a profit.
This type of imbalance formula is actually widely attempted at major firms, you can read more about it here: The Short-Term Predictability of Returns in Order Book Markets. The catch of it being attempted widely means that it is extremely difficult to capture. Because many firms connect to the same exchange (CME), it becomes a race to who can capture and restore the imbalance the fastest:
As demonstrated, the numbers range in the sub-10 micro-seconds, so yeah, saying the competition is steep is an understatement.
If this article piqued your interest, you’d definitely enjoy some of my other posts just like this one:
- Kick-Off! Quasi-Arbitraging NFL Markets [Code Included]
- No One Model Should Have This Much Power.
- Deploying A VIX-Based Volatility Frown System [Code Included]
Happy trading! 😄