This subset of HamLib aggregates Hamiltonians encoding binary optimization problems (e.g., Max-3SAT / SATLIB families and related generators). It targets benchmarking of quantum (QAOA, VQE-style) and classical solvers, enabling reproducible comparisons across structured and random instances and sizes.
Typical uses:
- Compare solver quality and runtime across instance families and sizes
- Study scaling and instance hardness
- Extract features (graph/term statistics) for meta-learning or portfolio schedulers
Quick start
Polars (streaming from Aqora)
import polars as pl
from aqora_cli.pyarrow import dataset
df = pl.scan_pyarrow_dataset(
dataset("bernalde/hamlib-binary-optimization", "v1.0.0")
).collect()
Pandas (Aqora URI)
import pandas as pd
df = pd.read_parquet("aqora://bernalde/hamlib-binary-optimization/v1.0.0")
Replace the version with the one you intend to use.
Schema
One row = one Hamiltonian instance.
Column | Type | Description |
---|
hamlib_id | string | Stable 16-char ID derived from domain/problem/collection/file/entry. |
domain | string | Dataset domain; for this subset typically binaryoptimization . |
problem | string | Problem family (e.g., max3sat ). |
collection | string | Source bucket (e.g., satlib , random_instances ). |
instance_name | string | Original archive-derived instance label. |
operator_format | string | Encoding of the Hamiltonian payload (e.g., qiskit_sparse_pauli , raw_text , graph , clauses ). |
payload | struct | Nested representation of the instance; see Payload. |
n_qubits | int32 | Number of qubits/variables inferred from the operator/attrs. |
n_terms | int32 | Number of operator terms (linear + pairwise) when available. |
one_norm | float64 | Sum of absolute coefficients when available. |
Payload
payload
is a nested Arrow/Parquet struct with optional subfields. Only the relevant subfields are populated per row based on operator_format
.
payload: {
paulis: list<string> # e.g., ["IZX...","ZII..."] for sparse-Pauli form
coeffs: list<float64> # coefficient per pauli label
terms: list<struct{
term: list<struct{qubit:int32, pauli:string}>
coeff: float64
}> # explicit operator terms (alternative representation)
text: string # raw text operator when structured parse isn't available
graph_edges: list<struct{u:int32, v:int32, w:float64}> # optional graph context
clauses: list<list<int32>> # optional CNF-like clauses for SAT/MaxSAT contexts
}
Mapping by format:
qiskit_sparse_pauli
→ paulis
, coeffs
raw_text
→ text
graph
→ graph_edges
clauses
→ clauses
Examples
Filter and inspect (Polars)
import polars as pl
from aqora_cli.pyarrow import dataset
df = pl.scan_pyarrow_dataset(dataset("bernalde/hamlib-binary-optimization", "v1.0.0"))
small_max3sat = (
df.filter(
(pl.col("problem") == "max3sat") &
(pl.col("n_qubits") < 100)
)
.select(["hamlib_id","collection","instance_name","n_qubits","n_terms","operator_format"])
.collect()
)
print(small_max3sat.head(10))
Work with sparse-Pauli form (Pandas)
import pandas as pd
df = pd.read_parquet("aqora://bernalde/hamlib-binary-optimization/v1.0.0")
row = df[df["operator_format"]=="qiskit_sparse_pauli"].iloc[0]
paulis = row["payload"]["paulis"]
coeffs = row["payload"]["coeffs"]
print("n_terms:", len(paulis), "sample:", list(zip(paulis[:3], coeffs[:3])))
Raw-text fallback
row = df[df["operator_format"]=="raw_text"].iloc[0]
print(row["payload"]["text"][:400])
Graph / clauses context (if present)
g = df[df["operator_format"]=="graph"].iloc[0]["payload"]["graph_edges"]
print("first 5 edges:", g[:5])
c = df[df["operator_format"]=="clauses"].iloc[0]["payload"]["clauses"]
print("first clause:", c[0])
Notes and best practices
- Use
operator_format
to choose the correct parsing path for each row.
- Prefer streaming (
scan_pyarrow_dataset
) to filter and project before materializing.
- For fair comparisons, stratify by
problem
, collection
, and size (n_qubits
, n_terms
). Report one_norm
when normalizing objectives.
How to cite
Please include both citations: the original paper and the Aqora dataset entry you actually used (with version).
1) Original HamLib paper
@article{Sawaya_2024,
title={HamLib: A library of Hamiltonians for benchmarking quantum algorithms and hardware},
volume={8},
ISSN={2521-327X},
url={http://dx.doi.org/10.22331/q-2024-12-11-1559},
DOI={10.22331/q-2024-12-11-1559},
journal={Quantum},
publisher={Verein zur Forderung des Open Access Publizierens in den Quantenwissenschaften},
author={Sawaya, Nicolas PD and Marti-Dafcik, Daniel and Ho, Yang and Tabor, Daniel P and Neira, David E Bernal and Magann, Alicia B and Premaratne, Shavindra and Dubey, Pradeep and Matsuura, Anne and Bishop, Nathan and Jong, Wibe A de and Benjamin, Simon and Parekh, Ojas and Tubman, Norm and Klymko, Katherine and Camps, Daan},
year={2024},
month=dec, pages={1559} }
}
2) Aqora dataset entry (pin your version)
@misc{aqora_hamlib_binaryoptimization,
title = {HamLib — Binary Optimization Benchmark Suite (Aqora mirror)},
author = {Sawaya, Nicolas PD and Marti-Dafcik, Daniel and Ho, Yang and Tabor, Daniel P and Neira, David E Bernal and Magann, Alicia B and Premaratne, Shavindra and Dubey, Pradeep and Matsuura, Anne and Bishop, Nathan and Jong, Wibe A de and Benjamin, Simon and Parekh, Ojas and Tubman, Norm and Klymko, Katherine and Camps, Daan},
howpublished = {\url{https://aqora.io/datasets/bernalde/hamlib-binary-optimization}},
note = {Aqora Datasets Hub. Please cite the pinned version you used, e.g., v1.1.1},
year = {2025},
publisher = {Aqora}
}