FI-2010 Dataset
The FI-2010 dataset is a public Limit Order Book dataset for research in high-frequency trading and stock price prediction. It contains LOB data from five Finnish companies: Kesko Oyj, Outokumpu Oyj, Sampo, Rautaruukki, and Wärtsilä Oyj, collected from the NASDAQ Nordic stock market over ten trading days in June 2010. The dataset includes approximately 4 million limit order messages, sampled every 10 events, resulting in 394,337 LOB observations with 10 levels.
Due to the market's inherent fluctuations and shocks, the dataset employs a labelling strategy that defines the trend based on the average mid-price over a future horizon
k:
a+(k,t)=k1i=1∑km(t+i)
Using a threshold
θ=0.002, the trend labels are assigned according to the trend definitions:
- Upward (U): If a+(k,t)>m(t)(1+θ)
- Downward (D): If a+(k,t)<m(t)(1−θ)
- Stable (S): If a+(k,t)∈[m(t)(1−θ), m(t)(1+θ)]
Class Imbalance: The dataset exhibits class imbalance across different horizons
k. Participants should be aware of this imbalance and may consider techniques to address it in their models.
Reference Benchmark
The problem setup and dataset are thoroughly discussed in the paper:
This paper provides a comprehensive survey of state-of-the-art deep learning models applied to stock price trend prediction using LOB data. It also introduces various classical models that serve as benchmarks for this task. The accompanying code repository contains implementations of different classical models, providing a valuable reference point for participants. You are encouraged to consult this paper to gain insights into the problem and to compare your quantum models against established benchmarks.
Solution
Participants are required to develop a quantum neural network model using PyTorch that predicts the stock price trend for each market observation.
The model should output, for each sample, a prediction for each of the 5 horizons
k∈{1,2,3,5,10}. The predictions should be integers corresponding to the classes:
- 0: Downward (D)
- 1: Stationary (S)
- 2: Upward (U)
The output should be a NumPy array of shape
(N,5), where
N is the number of samples.
Toy Example
Let's consider a toy example with a small batch of market observations.
Assume we have a batch of 3 samples. The model should output predictions for each of the 5 horizons.
Example output:
import numpy as np
# Sample predictions for 3 samples and 5 horizons
output = np.array([
[2, 1, 0, 1, 2], # Predictions for sample 1
[0, 2, 1, 0, 1], # Predictions for sample 2
[1, 0, 2, 2, 0], # Predictions for sample 3
])
Template
You can download a solution template by running the following command:
aqora template quantum-signals-lob
cd quantum-signals-lob
aqora install
You can then fill in submission/solution.ipynb
with your solution. The notebook should include:
- Loading the FI-2010 dataset using the provided
FIDataset
class.
- Defining and loading your quantum neural network model.
- Generating predictions on the test dataset.
- Saving the predictions in the required format.
You can test your solution by running:
And finally, you can upload your solution by running:
Scoring
Each submitted model will be evaluated across 5 different time horizons ( k \in {1, 2, 3, 5, 10} ), representing predictions for different future trends in stock prices. The model performance will be compared to a benchmark model that has achieved the following scores for each horizon:
Horizon ( k ) | Benchmark Score (%) |
---|
1 | 88.7 |
2 | 80.6 |
3 | 80.1 |
5 | 88.2 |
10 | 91.6 |
Metric: F1-Score
For each horizon, the F1-score is calculated based on the predictions for that horizon. The F1-score is a standard evaluation metric that balances precision and recall, especially useful in classification tasks like this one, where there may be class imbalance (e.g., between upward, downward, and stationary trends).
Ranking Mechanism
To rank participant models, we compute how close their model scores are to the benchmark model across all 5 horizons. This is done using a percentage difference method:
-
Compute the F1-score for each horizon ( k ) for your model.
-
Compare each score to the benchmark using the following formula for each horizon:
Percentage Difference=Benchmark Score(Your Model’s Score−Benchmark Score)×100
-
Average the percentage differences across all 5 horizons to get an overall score. A positive percentage indicates that your model performs better than the benchmark, while a negative percentage indicates underperformance.
Example
Let’s say your model produces the following F1-scores across the 5 horizons:
Horizon ( k ) | Your Model's Score (%) | Benchmark Score (%) | Percentage Difference (%) |
---|
1 | 87.5 | 88.7 | (-1.35) |
2 | 81.0 | 80.6 | (0.50) |
3 | 78.9 | 80.1 | (-1.50) |
5 | 89.0 | 88.2 | (0.91) |
10 | 90.2 | 91.6 | (-1.53) |
The overall average percentage difference is:
5−1.35+0.50−1.50+0.91−1.53=−0.99%
This means that, on average, your model is about 1% worse than the benchmark across the 5 horizons.
Final Ranking
Models are ranked based on their average percentage difference from the benchmark model. The closer a model’s average percentage difference is to zero (or a positive value), the better it ranks. If two models have similar percentage differences, tie-breaking may be done based on performance on individual horizons, with preference given to models that perform better on more difficult horizons (e.g., higher horizons).