bernalde / HamLib Binary Optimization

Public

About this dataset version

HamLib: Binary Optimization Benchmark Suite

This subset of HamLib aggregates Hamiltonians encoding binary optimization problems (e.g., Max-3SAT / SATLIB families and related generators). It targets benchmarking of quantum (QAOA, VQE-style) and classical solvers, enabling reproducible comparisons across structured and random instances and sizes. Typical uses:
  • Compare solver quality and runtime across instance families and sizes
  • Study scaling and instance hardness
  • Extract features (graph/term statistics) for meta-learning or portfolio schedulers

Quick start

Polars (streaming from Aqora)

import polars as pl
from aqora_cli.pyarrow import dataset
df = pl.scan_pyarrow_dataset(
    dataset("bernalde/hamlib-binary-optimization", "v1.0.0")
).collect()

Pandas (Aqora URI)

import pandas as pd
df = pd.read_parquet("aqora://bernalde/hamlib-binary-optimization/v1.0.0")
Replace the version with the one you intend to use.

Schema

One row = one Hamiltonian instance.
ColumnTypeDescription
hamlib_idstringStable 16-char ID derived from domain/problem/collection/file/entry.
domainstringDataset domain; for this subset typically binaryoptimization.
problemstringProblem family (e.g., max3sat).
collectionstringSource bucket (e.g., satlib, random_instances).
instance_namestringOriginal archive-derived instance label.
operator_formatstringEncoding of the Hamiltonian payload (e.g., qiskit_sparse_pauli, raw_text, graph, clauses).
payloadstructNested representation of the instance; see Payload.
n_qubitsint32Number of qubits/variables inferred from the operator/attrs.
n_termsint32Number of operator terms (linear + pairwise) when available.
one_normfloat64Sum of absolute coefficients when available.

Payload

payload is a nested Arrow/Parquet struct with optional subfields. Only the relevant subfields are populated per row based on operator_format.
payload: {
  paulis:       list<string>              # e.g., ["IZX...","ZII..."] for sparse-Pauli form
  coeffs:       list<float64>             # coefficient per pauli label
  terms:        list<struct{
                   term:  list<struct{qubit:int32, pauli:string}>
                   coeff: float64
                 }>                        # explicit operator terms (alternative representation)
  text:         string                     # raw text operator when structured parse isn't available
  graph_edges:  list<struct{u:int32, v:int32, w:float64}>   # optional graph context
  clauses:      list<list<int32>>          # optional CNF-like clauses for SAT/MaxSAT contexts
}
Mapping by format:
  • qiskit_sparse_paulipaulis, coeffs
  • raw_texttext
  • graphgraph_edges
  • clausesclauses

Examples

Filter and inspect (Polars)

import polars as pl
from aqora_cli.pyarrow import dataset
df = pl.scan_pyarrow_dataset(dataset("bernalde/hamlib-binary-optimization", "v1.0.0"))
small_max3sat = (
    df.filter(
        (pl.col("problem") == "max3sat") &
        (pl.col("n_qubits") < 100)
    )
    .select(["hamlib_id","collection","instance_name","n_qubits","n_terms","operator_format"])
    .collect()
)
print(small_max3sat.head(10))

Work with sparse-Pauli form (Pandas)

import pandas as pd
df = pd.read_parquet("aqora://bernalde/hamlib-binary-optimization/v1.0.0")
row = df[df["operator_format"]=="qiskit_sparse_pauli"].iloc[0]
paulis = row["payload"]["paulis"]
coeffs = row["payload"]["coeffs"]
print("n_terms:", len(paulis), "sample:", list(zip(paulis[:3], coeffs[:3])))

Raw-text fallback

row = df[df["operator_format"]=="raw_text"].iloc[0]
print(row["payload"]["text"][:400])

Graph / clauses context (if present)

g = df[df["operator_format"]=="graph"].iloc[0]["payload"]["graph_edges"]
print("first 5 edges:", g[:5])
c = df[df["operator_format"]=="clauses"].iloc[0]["payload"]["clauses"]
print("first clause:", c[0])

Notes and best practices

  • Use operator_format to choose the correct parsing path for each row.
  • Prefer streaming (scan_pyarrow_dataset) to filter and project before materializing.
  • For fair comparisons, stratify by problem, collection, and size (n_qubits, n_terms). Report one_norm when normalizing objectives.

How to cite

Please include both citations: the original paper and the Aqora dataset entry you actually used (with version).

1) Original HamLib paper

@article{Sawaya_2024,
   title={HamLib: A library of Hamiltonians for benchmarking quantum algorithms and hardware},
   volume={8},
   ISSN={2521-327X},
   url={http://dx.doi.org/10.22331/q-2024-12-11-1559},
   DOI={10.22331/q-2024-12-11-1559},
   journal={Quantum},
   publisher={Verein zur Forderung des Open Access Publizierens in den Quantenwissenschaften},
   author={Sawaya, Nicolas PD and Marti-Dafcik, Daniel and Ho, Yang and Tabor, Daniel P and Neira, David E Bernal and Magann, Alicia B and Premaratne, Shavindra and Dubey, Pradeep and Matsuura, Anne and Bishop, Nathan and Jong, Wibe A de and Benjamin, Simon and Parekh, Ojas and Tubman, Norm and Klymko, Katherine and Camps, Daan},
   year={2024},
   month=dec, pages={1559} }
}

2) Aqora dataset entry (pin your version)

@misc{aqora_hamlib_binaryoptimization,
  title        = {HamLib — Binary Optimization Benchmark Suite (Aqora mirror)},
  author       = {Sawaya, Nicolas PD and Marti-Dafcik, Daniel and Ho, Yang and Tabor, Daniel P and Neira, David E Bernal and Magann, Alicia B and Premaratne, Shavindra and Dubey, Pradeep and Matsuura, Anne and Bishop, Nathan and Jong, Wibe A de and Benjamin, Simon and Parekh, Ojas and Tubman, Norm and Klymko, Katherine and Camps, Daan},
  howpublished = {\url{https://aqora.io/datasets/bernalde/hamlib-binary-optimization}},
  note         = {Aqora Datasets Hub. Please cite the pinned version you used, e.g., v1.1.1},
  year         = {2025},
  publisher    = {Aqora}
}