This corpus mirrors
all circuits released as
MNISQ across three image domains (MNIST-784, Fashion-MNIST, Kuzushiji-MNIST), three fidelity thresholds (f80, f90, f95),
both encodings (DenseMatrix/Qulacs and Base/portable gates), and
all official splits.
When working with this dataset from within your codebase, please consider its size of 28.12 GB
and how this impacts loading times and space requirements.
Coverage & counts
Per (domain × fidelity) combination:
- Splits included:
train_orig
(60k) / base_train_orig
(60k) / train
(480k) / test
(10k) / base_test
(10k)
- Sum per combo: 60k + 60k + 480k + 10k + 10k = 620k
- Combos: 3 domains × 3 fidelity tiers = 9
- Grand total: 620k × 9 = 5,580,000 circuits
This Aqora entry is a faithful consolidation for convenience and reproducibility. It does not introduce new circuits; it aggregates all official splits (including the base_*
splits) and encodings in one place.
Why two encodings?
- DenseMatrix (Qulacs-optimized): highest-fidelity simulation via dense operators in Qulacs.
- Base (portable gates): standard gate set for cross-tool compatibility (e.g., Qiskit, Cirq, PennyLane backends).
Use DenseMatrix for performance studies on Qulacs; use Base for simulator/toolchain comparisons.
Data schema
All records ship with the following columns:
dataset
: "mnist_784" | "Fashion-MNIST" | "Kuzushiji-MNIST"
split
: train_orig | base_train_orig | train | test | base_test
is_base
: true
for the two base_*
splits, else false
fidelity_bucket
: "f80" | "f90" | "f95"
fidelity_min
: 0.80 | 0.90 | 0.95
fidelity_value
: exact fidelity for the example
filename
: internal path (e.g., qasm/1234
)
n_qubits
: circuit width
has_dense_operator
: true
for DenseMatrix variants, false
for Base
label
: class label (0–9)
qasm
: OpenQASM text of the circuit (some entries may be OpenQASM 3)
state_gz
: optional gzipped amplitudes (one line per complex pair: real imag
)
Quickstart (Python)
Namespace + version used below: leopla/mnisq
, v1.0.0
.
Replace with the namespace/version you actually use.
state_gz
exists for DenseMatrix entries and is absent for Base entries.
1) Load a slice and grab qasm
+ state_gz
# --- Polars (lazy) ---
import polars as pl
from aqora_cli.pyarrow import dataset
df = pl.scan_pyarrow_dataset(dataset("leopla/mnisq", "v1.0.0")).collect()
qasm_str = df["qasm"][0]
sgz = df["state_gz"][0] # None for Base entries
# --- pandas (eager) ---
import pandas as pd
df = pd.read_parquet("aqora://leopla/mnisq/v1.0.0")
qasm_str = df.loc[df.index[0], "qasm"]
sgz = df.loc[df.index[0], "state_gz"] # NaN/None for Base entries
2) Decompress state_gz
→ NumPy complex vector (if present)
import gzip, numpy as np
state = None
if sgz is not None and not (isinstance(sgz, float) and np.isnan(sgz)):
if isinstance(sgz, memoryview):
sgz = sgz.tobytes()
if isinstance(sgz, bytearray):
sgz = bytes(sgz)
text = gzip.decompress(sgz).decode("utf-8")
pairs = [tuple(map(float, line.split())) for line in text.splitlines() if line.strip()]
state = np.array([complex(r, i) for r, i in pairs], dtype=np.complex128)
print("Loaded state amplitudes:", None if state is None else state.shape)
3) Use the circuit in your favorite framework
Qiskit: build from QASM, inspect, and (optionally) wrap the state
from qiskit import QuantumCircuit
try:
# OpenQASM 2
qc = QuantumCircuit.from_qasm_str(qasm_str)
except Exception:
# OpenQASM 3
from qiskit.qasm3 import loads as qasm3_loads
qc = qasm3_loads(qasm_str)
print(qc) # or qc.draw("mpl")
# If you have a state vector (DenseMatrix rows):
if state is not None:
from qiskit.quantum_info import Statevector
sv = Statevector(state)
print("Qiskit Statevector dim:", sv.dim)
PennyLane: create a QNode from QASM (if supported) or from amplitudes
import pennylane as qml
import numpy as np
# Option A — parse QASM directly (requires a PL version with QASM loading):
qnode = None
try:
dev = qml.device("default.qubit")
qnode = qml.from_qasm(qasm_str, device=dev) # type: ignore[attr-defined]
print("Loaded PennyLane circuit from QASM")
except Exception:
pass
# Option B — initialize from provided state (works whenever `state` is available)
if qnode is None and state is not None:
n_qubits = int(np.log2(state.size))
dev = qml.device("default.qubit", wires=n_qubits)
@qml.qnode(dev)
def qnode():
qml.StatePrep(state, wires=range(n_qubits))
return qml.state()
psi = qnode()
print("Initialized PennyLane from amplitudes; |ψ|^2 sum:", float(np.sum(np.abs(psi)**2)))
Cirq: build from QASM and (optionally) simulate with provided state
import cirq
from cirq.contrib.qasm_import import circuit_from_qasm
cc = circuit_from_qasm(qasm_str)
print(cc)
# If you have a state vector:
if state is not None:
sim = cirq.Simulator()
n_qubits = len(cc.all_qubits())
assert state.size == 2**n_qubits, "State size must match circuit qubit count."
result = sim.simulate(cc, initial_state=state)
print("Final state norm:", float(np.linalg.norm(result.final_state_vector)))
Tips
• For smaller transfers with Polars, push filters/projections before .collect()
.
• If a framework can’t parse OpenQASM 3, try its QASM-3 loader (shown for Qiskit) or convert to QASM-2.
• state_gz
is a pure state; normalization should be ~1 (up to float error).
Visualize an encoded image (28×28, PennyLane-style)
Below are two minimal pathways mirroring the PennyLane dataset docs: (A) use the provided amplitudes (fastest), or (B) simulate the circuit and then visualize.
A) From state_gz
(fastest)
import numpy as np
import matplotlib.pyplot as plt
# `state` from step 2 above (complex vector)
assert state is not None, "This example needs a DenseMatrix row with state_gz."
image_array = np.reshape(np.abs(state[:784]), (28, 28))
plt.imshow(image_array)
plt.axis("off")
plt.show()
B) From a PennyLane circuit (simulate, then visualize)
import numpy as np
import matplotlib.pyplot as plt
import pennylane as qml
# Try to build a PL circuit directly from QASM (requires PL with qasm support)
try:
dev = qml.device("default.qubit")
qnode = qml.from_qasm(qasm_str, device=dev) # type: ignore[attr-defined]
@qml.qnode(dev)
def circuit():
qnode() # execute loaded circuit
return qml.state()
psi = circuit()
image_array = np.reshape(np.abs(psi[:784]), (28, 28))
plt.imshow(image_array)
plt.axis("off")
plt.show()
except Exception as e:
print("PennyLane QASM loader unavailable; use the state_gz path or Qiskit/Cirq to simulate.")
Reproducibility & versions
- Pin a version in your code and paper (e.g.,
v1.0.0
) for exact reproducibility.
- Use deterministic seeds across Python/NumPy/your QC library to align shuffles, inits, and simulator draws.
Benchmarks (as reported in the MNISQ paper)
- Quantum kernels: up to ~97% accuracy.
- Classical sequence models (e.g., S4 on tokenized QASM): ~77%.
License & attribution
- The original MNISQ dataset is released under CC BY-SA 4.0. Please comply with ShareAlike and attribution terms.
- This Aqora entry is a repackaged mirror for convenience; credit the original authors and include the Aqora dataset URL + pinned version you used.
How to cite
Please include both citations: the original paper and the Aqora dataset entry you actually used (with version).
1) Original MNISQ paper
@misc{placidi2023mnisq,
title = {MNISQ: A Large-Scale Quantum Circuit Dataset for Machine Learning on/for Quantum Computers in the NISQ era},
author = {Placidi, Leonardo and Hataya, Ryuichiro and Mori, Toshio and Aoyama, Koki and Morisaki, Hayata and Mitarai, Kosuke and Fujii, Keisuke},
year = {2023},
eprint = {2306.16627},
archivePrefix = {arXiv},
primaryClass = {quant-ph},
doi = {10.48550/arXiv.2306.16627},
url = {https://arxiv.org/abs/2306.16627}
}
2) Aqora dataset entry (pin your version)
@misc{aqora_mnisq_full,
title = {MNISQ: Comprehensive Quantum Circuit Corpus (Aqora mirror)},
howpublished = {\url{https://aqora.io/datasets/leopla/mnisq}},
note = {Aqora Datasets Hub. Please cite the pinned version you used, e.g., v1.0.0},
year = {2025},
publisher = {Aqora}
}
If your venue supports @dataset
, you can switch the entry type accordingly.
Related subsets on Aqora
Provenance
MNISQ was introduced by Placidi et al. (2023) with ~4.95M examples across nine sub-datasets. This Aqora page aggregates all official splits and both encodings in one place to simplify discovery and reproducible benchmarking.