High Frequency Trading Is More Powerful Than Ever
The advancements of machine learning has left the world stunned, and no where is that more true than in HFT.
Pretend for a moment that you’re a high frequency trader. Your job is to receive orders, assess your holdings, and fulfill the orders based on those holdings. You need to receive the order flow to make money, so it’s important that you keep up with speeds or else your competitor will just get the order flow, putting you out of business.
You maintain and stay alive for years, but you eventually reach a speed threshold because science only moves so fast, so you must innovate. So you think to yourself: “What if I knew what the order was before I even fully received it?” If so, then you’d run circles around the competition by submitting orders before they can even be fully read by even the fastest machine.
Well, now you can.
Before you understand how these systems work, you first have to understand how FIX messages work.
Background
FIX (Financial Information eXchange) messages are a standardized format used by financial institutions to communicate information about trades, orders, and other market data. In high-frequency trading (HFT), FIX messages play a critical role in the fast and accurate execution of trades.
A FIX message consists of a series of fields, each with its own tag number, value, and data type. The tags are used to identify each field, and the values contain the actual data being communicated.
For example, here’s a simplified version of a FIX message:
8=FIX.4.2|9=123|35=D|49=XYZ|56=ABC|34=12|52=20220409-10:00:00|11=12345|55=AAPL|54=1|38=100|40=1|44=150|10=123|
Let’s break this down:
8=FIX.4.2
- This field indicates the version of the FIX protocol being used. In this case, it's version 4.2.9=123
- length of the message in bytes35=D
- message type. In this case, it's a “New Order Single” message (type D).49=XYZ
- sender of the message56=ABC
- receiver of the message34=12
- unique message sequence number52=20220409-10:00:00
- timestamp of the message11=12345
- unique order ID55=AAPL
- symbol being traded54=1
- whether the order is a buy (1) or sell (2)38=100
- order quantity40=1
- type of order (1 for market order, 2 for limit order, etc.)44=150
- limit price (if applicable)10=123
- checksum of the message
Predicting The Future (a.k.a. Pre-Parsing)
One approach that has been proposed is to use machine learning algorithms to predict the contents of FIX messages before they are fully processed. This is known as “pre-parsing” or “early parsing.” The idea is that by predicting the contents of a FIX message, a firm can submit an order to the market faster than its competitors, giving it a significant advantage.
As an example, let’s take a look at the paper “Deep Learning for Early Parsing of Financial FIX Messages”. In the paper, the authors propose a convolutional neural network (CNN)-based model for pre-parsing FIX messages.
Here’s an example of how the algorithm works:
Consider the following FIX message:
8=FIX.4.2|9=100|35=0|34=1|49=SENDER|52=20161207–12:32:48|56=RECEIVER|10=212|
The algorithm would process this message using the following steps:
- Tokenization: The algorithm would first tokenize the raw FIX message into individual fields using the “|” delimiter. This would result in the following tokenized message:
["8=FIX.4.2", "9=100", "35=0", "34=1", "49=SENDER", "52=20161207–12:32:48", "56=RECEIVER", "10=212"]
- Feature extraction: The algorithm would then use the CNN to extract features from each token in the message. For example, the token “8=FIX.4.2” might be represented by a vector of numerical features such as [0.2, 0.1, 0.3, 0.4, 0.5].
- Labeling: The algorithm would then use a layer of the CNN to label each token in the message with the appropriate FIX field name. For example, the token “8=FIX.4.2” would be labeled with the “BeginString” field name. Similarly, the token “9=100” would be labeled with the “BodyLength” field name.
- Output: Finally, the algorithm would output the structured FIX message, which would consist of a list of labeled fields. For example, the output for the above FIX message might be:
{"BeginString": "FIX.4.2", "BodyLength": "100", "MsgType": "0", "MsgSeqNum": "1", "SenderCompID": "SENDER", "SendingTime": "20161207–12:32:48", "TargetCompID": "RECEIVER", "CheckSum": "212"}
The network used in the algorithm is a type of neural network that is able to model the dependencies between adjacent labels, allowing it to improve the accuracy of the labeling process.
We used just one FIX message for this, but in practice, there can be several hundred thousand messages per hour!
So, by using the CNN to quickly predict the full contents of those incoming messages then quickly returning that readable output, the system shaves off vital seconds in latency and shores up its competitive advantage.
Now, that is pretty cool.
If this article piqued your interest, you’d likely enjoy some of my other posts just like this one:
- Google Search? Let’s Arbitrage That.
- Exploiting The Volatility of Volatility
- It’s Showtime: Our Sports Betting Algorithm is Live and Profitable! [Code + Data Included]
Happy trading! :)