Multidimensional Regression on LHC collision jets

Model the behavior of particle jets from LHC collisions using advanced Quantum Algorithms to bridge theoretical predictions with observed data.

Hosted by

CERN

Authors: Michele Grossi, Yacine Haddad

Overview

High-Energy Physics (HEP) experiments at the Large Hadron Collider (LHC) involve protons colliding at nearly the speed of light, producing various particles in the final state. When quarks and gluons (collectively known as partons) are produced in high-energy collisions, they undergo a complex process governed by Quantum Chromodynamics (QCD), resulting in jets-collimated sprays of particles that appear in detectors. In this hackathon challenge, participants will build quantum (or hybrid classical-quantum) algorithms to model how parton-level information translates into final-state jets. Specifically, you will be given:

Parton-level data (the simplified theoretical representation of the collision before QCD showering/hadronisation).
Final-state jet data (the observed jets in the detector). Your task is to predict:

The number of jets in each event.
The transverse momenta ( $p_T$ ) of the leading and subleading jets. We will compare your predicted distributions (multiplicity and $p_T$ ) to the actual final-state jets using KL divergence.

Background for Quantum Scientists (Without HEP Knowledge)

When two protons collide at the LHC, their constituent partons (quarks and gluons) interact. Sometimes these collisions produce intermediate particles like the Z boson, which can decay into quarks. Each quark then emits additional gluons (and quarks can split into quark-antiquark pairs), creating parton showers. After hadronisation (the process by which partons form bound states like pions, kaons, protons, etc.), these sprays of particles manifest in the detector as jets.

Partons: Fundamental constituents (quarks and gluons) inside protons.
Jets: Collimated clusters of hadrons observed in the detector. Each jet is typically represented by its kinematic properties such as transverse momentum ( $p_T$ ) and direction (η, $\phi$ ).
Leading Jet: The jet with the highest $p_T$ (transverse momentum) in an event.
Sub-leading Jet: The jet with the second highest $p_T$ . Because of QCD effects, the number of final-state jets can differ from the number of initial partons. For instance, you might expect 2 quarks (and hence 2 jets) in a simple $q\bar{q} \to Z \to q\bar{q}$ process, but additional radiation can yield 3, 4, or more jets in the final state.

Objective

Learn a mapping:

\text{Partons} \longrightarrow \text{Jets}

. Predict:

Number of jets ( $n_\text{jets}$ ) in each collision event.
Leading and Sub-leading jet $p_T$ . These predictions should be as close as possible to the ground truth distributions.

The data

You will receive an HDF5 file containing two main collections:

Parton-Level Data (partons) includes arrays 4-vectors components: $p_x$ , $p_y$ , $p_z$ , $E$ , particle id, charge, for each of the 2 partons. Each index corresponds to a collision event. The particle id follows the discription from Particle Data Group.
Jet-Level collection (jets): naturally vary in size from event to event (some events have 2 jets, others 3, 4, etc.). The jet collection is structured as a 3D array with shape [num_events, num_max_jets, 4 ], where num_max_jets represents the maximum number of jets per event (set to 5 in the current dataset), padded with zeros for events with fewer jets. For example, if an event has only 2 jets, the remaining entries in the jet dimension are filled with zeros to maintain consistent array dimensions. To handle four-vector calculations (Lorentz vectors) easily, we use Vector, a library that provides a unified API for 2D, 3D, and 4D vectors. This is particularly useful for HEP problems, where you often deal with momentum four-vectors. In HEP, we often use Lorentz 4-vectors (E, px, py, pz) to compute derived quantities such as:

$p_T = \sqrt{p_x^2 + p_y^2}$ (transverse momentum),
$\eta$ (pseudo-rapidity),
$\phi$ (azimuthal angle),
$\Delta R$ (distance in $\eta-\phi$ space). Why is this useful?
Instead of manually writing equations for each kinematic quantity, vector provides built-in methods like .pt, .eta, .phi, and .deltaRapidityPhi(...) for pairwise angular separation.

Evaluation

Your model’s performance will be evaluated using the Kullback-Leibler (KL) divergence between:

Jet Multiplicity Distribution: The distribution of $n_\text{jets}$ in the data vs. your predictions.
Leading & Subleading Jet $p_T$ Distribution: Histograms of the predicted vs. true $p_T$ for the leading and subleading jets across all events. We will compute a final score based on a combination of these divergences (the exact weighting to be announced). Lower is better. Note: KL Divergence is defined as:

D_\text{KL}(P \parallel Q) = \sum_i P(i) \log \frac{P(i)}{Q(i)}

Submission format

You must submit a file (e.g., submission.csv) with one row per event in the test set, containing:

EventID – The identifier for the event.
$\hat{n}_\text{jets}$ – Predicted number of jets for that event.
Leading Jet $p_T$ – The predicted transverse momentum of the highest- $p_T$ jet.
Subleading Jet $p_T$ – The predicted transverse momentum of the second-highest- $p_T$ jet. An example row:

EventID,n_jets_pred,leading_pt_pred,subleading_pt_pred
0,2,45.8,36.2
1,3,78.1,52.4

Important:

If you predict fewer than 2 jets, fill subleading_pt_pred with a default value (e.g., 0 or -1).
If you predict more than 2 jets, only the leading and sub-leading jets $p_T$ should be placed in the respective columns.
Make sure your predictions match the test events EventID order.
Finalist will be asked to provide the solution (quantum algorithm and details) for a comprehensive evaluation. FAQ

Can I use classical ML components alongside quantum circuits? Yes, a hybrid approach is encouraged. The essential requirement is at least one quantum element in your workflow.
How do I handle events with zero or one jet in my model? Predict $\hat{n}_\text{jets} = 0$ or 1 accordingly. For leading and sub-leading $p_T$ , set the missing ones to a placeholder (e.g., 0 or -1) in your submission file.
What if my model predicts more jets than physically exist? That will be accounted for in the KL divergence on the multiplicity distribution. Only leading and sub-leading jets matter for the $p_T$ portion of the score.
Is domain knowledge of HEP required? Not strictly; we’ve provided background so you can focus on the quantum modelling. The main idea is to treat it as a high-dimensional regression task.
How do we install HEP or quantum frameworks locally? • For quantum frameworks: See Qiskit Docs, Pennylane Installation, etc. • No specialised HEP software is strictly required unless you want to explore advanced analyses.