Authors: Michele Grossi, Enrico Enrico Bothmann, Zenny Wettersten
Overview
Although there are many stages to High-Energy Physics (HEP) predictions, the fundamental physics are described in terms of probabilities (cross sections) calculated from perturbative Quantum Field Theories (QFTs). However, since HEP processes are stochastic, it is insufficient to evaluate the total cross section --- rather, to compare statistically compare experiment with theory we need to
sample the underlying distribution, known as the
differential cross section dσ (also known as the scattering amplitude or matrix element of the process).
In this challenge, your goal is to build a quantum (or hybrid classical-quantum) algorithm to evaluate this differential cross section at a given phase space point. For the implementation, you are given as input:
- (Parametrised) phase space points containing the kinematic information of the particles in a given event.
Your task is to match this to the output:
- Leading Order (LO) differential cross section at this phase space point.
As this is an interpolation/regression problem, a standard choice of performance metric would be the mean squared error (MSE). In order to more accurately capture the shape of the distribution, though, you might instead choose to use the mean absolute percentage error (MAPE), although it should be noted that the differential cross section is small across large swaths of the phase space.
Background
Monte Carlo Event Generation in HEP
Although there are analytic formulae for the
differential cross sections (dσ) in HEP, there are several issues that arise when trying to connect these to experimental measurements; there is (generally) no closed form solution for the integral of
dσ; the formulae scale badly with both the number of particles and the perturbative order of the calculation; and the stochastic nature of QFT means we need to compare experiment statistically with samples following the underlying phase space distribution of
dσ.
Since we cannot evaluate the integrated cross section directly, we need to use numerical methods, and since we do not a priori know it we need to also
determine the phase space distribution of
dσ. To solve these problems, we use Monte Carlo methods in so-called
event generators; software that generates phase space points for given interactions at the rate theory predicts we should observe them. In simple terms, event generators work in three steps:
- Calculate the total cross section using Monte Carlo integration.
- Using the calculated cross section, estimate the underlying distribution across phase space.
- Sample this distribution at the rate theory predicts interactions would occur.
Due to the complexity of event generation, it is to this day an active area of research.
Phase Space Distributions
The level of individual particles, the
parton level, is described in terms of the 4-momenta of all particles that are part of the interaction. Two degrees of freedom disappear from conservation of energy and momentum, meaning for a process with
n partons we need to integrate
dσ over a
(4n−2)-dimensional space. However, due to e.g. Lorentz invariance, for analysis we typically project this higher-dimensional space onto key kinematic variables (such as the transverse momentum
pT, the pseudorapidity
η, or the azimuthal angle
ϕ).
Most importantly for this challenge is that we can
parametrise the phase space in lower dimensions. Not only can we relate phase space points to a lower-dimensional parameter set; we can also parametrise aspects of the distribution into this parameter set such that the parameters do not showcase the 'spiky' nature of
dσ across phase space, where we see singular peaks around process resonances and near-zero values far from them.
Challenge
Participants are tasked with building a
quantum (or hybrid classical-quantum) algorithm for
differential cross section regression, mapping a (parametrised) phase space point to the scalar value of
dσ at that point. Four sets of samples corresponding to the processes
qqˉ→Z0+njets,n=0,…,3,
and the evaluated LO differential cross section
dσ at those phase space points are provided. Note that
dσ varies between the four processes, and that the analytic formulae quickly grow in complexity with
n.
The samples have been generated with the Pepper event generator, and the phase space points are given by the internal parametrisation used by the Chili phase space sampler. Although these variables are related to kinematic observables, they are not identical --- you may think of them as a flatter mapping of the kinematics.
Dataset
The samples are provided in the HDF5 format compromising two datasets:
- randomNumbers, the parametrised phase space points (differing in dimensionality with n).
- events, the resulting event at that point as evaluated using Pepper. The only relevant entry here is the final element of each event, which is the leading order dσ evaluated at that phase space point.
Another four sample sets, excluding the events datasets, are provided for performance evaluation of your algorithms.
Problem Statement
Your task is to set up estimators for
dσ based on the sample sets, one estimator for each set. How to go about this is left up to your own interpretation; the dimensionality of the problem is constant for each value of
n, and different methods may perform differently for each dimensionality.
When developing and testing your algorithm, you should take the following in mind:
- Accuracy: Does your algorithm predict dσ well?
- Robustness: Does your algorithm perform well across different regions of the phase space?
Additionally, the analytic (classical) solution grows factorially in complexity with respect to the number of partons. Can you tell whether the same seems to hold true for your solution?
Evaluation
Alongside the labelled samples you are to build your algorithm on, each process also has an unlabelled sample containing parametrised phase space points but
not the evaluated
dσ.
Run your algorithm on these phase space points and store the results according to the submission format below. The performance of your algorithm(s) will be evaluated with the mean squared error.
Submission format
You must submit a file (e.g., submission.csv) with one row per event in the test set, for each subfolder dataset containing:
- EventID – The integer position of the event in the original hdf5 file.
- dσ – Predicted differential cross section for that event.
Notes for Participants
- The challenge has been structured to balance difficulty and feasibility, making it accessible to participants with diverse expertise.
- The problem is rooted in real-world applications of Monte Carlo simulations and event generators used in high energy physics research.
- Finalist will be asked to provide the solution (quantum algorithm and details) for a comprehensive evaluation.
We look forward to seeing innovative quantum solutions to this complex and exciting problem!
FAQ
- Can I use classical ML components alongside quantum circuits?
Yes, a hybrid approach is encouraged. The essential requirement is at least one quantum element in your workflow.
- Is domain knowledge of HEP required?
Not strictly; we’ve provided background so you can focus on the quantum modelling. The main idea is to treat it as a high-dimensional regression task. Additional questions can be asked in the discussion section
- How do we install HEP or quantum frameworks locally?
• For quantum frameworks: See Qiskit Docs, Pennylane Installation, etc.
• No specialised HEP software is strictly required unless you want to explore advanced analyses.