leopla / MNISQ 784

Public

About this dataset version

MNISQ-784: Digits (MNIST-784) Quantum Circuit Subset

Total size: 1,860,000 circuits
This subset mirrors all MNIST-784 circuits released as MNISQ across three fidelity thresholds (f80, f90, f95), both encodings (DenseMatrix/Qulacs and Base/portable gates), and all official splits.
This is a convenience mirror for reproducibility. It does not introduce new circuits; it consolidates the MNIST-784 domain of MNISQ with both encodings and all official splits.

Coverage & counts

Per fidelity tier (f80/f90/f95):
  • Splits included: train_orig (60k) / base_train_orig (60k) / train (480k) / test (10k) / base_test (10k)
  • Sum per fidelity: 60k + 60k + 480k + 10k + 10k = 620k
  • Fidelity tiers: 3 → 3 × 620k = 1,860,000 circuits

Why two encodings?

  • DenseMatrix (Qulacs-optimized): highest-fidelity simulation via dense operators in Qulacs.
  • Base (portable gates): standard gate set for cross-tool compatibility (Qiskit, Cirq, PennyLane).
    Use DenseMatrix for Qulacs performance studies; use Base for simulator/toolchain comparisons.

Data schema

All records ship with the following columns:
  • dataset: always "mnist_784"
  • split: train_orig | base_train_orig | train | test | base_test
  • is_base: true for the two base_* splits, else false
  • fidelity_bucket: "f80" | "f90" | "f95"
  • fidelity_min: 0.80 | 0.90 | 0.95
  • fidelity_value: exact fidelity for the example
  • filename: internal path (e.g., qasm/1234)
  • n_qubits: circuit width
  • has_dense_operator: true for DenseMatrix variants, false for Base
  • label: class label (0–9)
  • qasm: OpenQASM text of the circuit (some entries may be OpenQASM 3)
  • state_gz: optional gzipped amplitudes (one line per complex pair: real imag)

Quickstart (Python)

Namespace + version used below: leopla/mnisq-784, v1.0.0.
state_gz exists for DenseMatrix entries and is absent for Base entries.

1) Load a slice and grab qasm + state_gz

# --- Polars (lazy) ---
import polars as pl
from aqora_cli.pyarrow import dataset
df = pl.scan_pyarrow_dataset(dataset("leopla/mnisq-784", "v1.0.0")).collect()
qasm_str = df["qasm"][0]
sgz = df["state_gz"][0]  # None for Base entries
# --- pandas (eager) ---
import pandas as pd
df = pd.read_parquet("aqora://leopla/mnisq-784/v1.0.0")
qasm_str = df.loc[df.index[0], "qasm"]
sgz = df.loc[df.index[0], "state_gz"]  # NaN/None for Base entries

2) Decompress state_gz → NumPy complex vector (if present)

import gzip, numpy as np
state = None
if sgz is not None and not (isinstance(sgz, float) and np.isnan(sgz)):
    if isinstance(sgz, memoryview):
        sgz = sgz.tobytes()
    if isinstance(sgz, bytearray):
        sgz = bytes(sgz)
    text = gzip.decompress(sgz).decode("utf-8")
    pairs = [tuple(map(float, line.split())) for line in text.splitlines() if line.strip()]
    state = np.array([complex(r, i) for r, i in pairs], dtype=np.complex128)

3) Use the circuit in your favorite framework

Qiskit (QASM2/3), PennyLane (QASM if available or StatePrep), Cirq (QASM2) — same patterns as in mnisq.

Visualize an encoded image (28×28)

Use either the provided amplitudes (state_gz) or simulate from QASM, as shown in mnisq.

Reproducibility & versions

  • Pin a version in your code and paper (e.g., v1.0.0) for exact reproducibility.
  • Use deterministic seeds across Python/NumPy/your QC library to align shuffles, inits, and simulator draws.

Benchmarks (as reported in the MNISQ paper)

  • Quantum kernels: up to ~97% accuracy.
  • Classical sequence models (e.g., S4 on tokenized QASM): ~77%.

License & attribution

  • The original MNISQ dataset is released under CC BY-SA 4.0. Please comply with ShareAlike and attribution terms.
  • This Aqora entry is a repackaged mirror for convenience; credit the original authors and include the Aqora dataset URL + pinned version you used.

How to cite

Please include both citations: the original paper and the Aqora dataset entry you actually used (with version).

1) Original MNISQ paper

@misc{placidi2023mnisq,
  title         = {MNISQ: A Large-Scale Quantum Circuit Dataset for Machine Learning on/for Quantum Computers in the NISQ era},
  author        = {Placidi, Leonardo and Hataya, Ryuichiro and Mori, Toshio and Aoyama, Koki and Morisaki, Hayata and Mitarai, Kosuke and Fujii, Keisuke},
  year          = {2023},
  eprint        = {2306.16627},
  archivePrefix = {arXiv},
  primaryClass  = {quant-ph},
  doi           = {10.48550/arXiv.2306.16627},
  url           = {https://arxiv.org/abs/2306.16627}
}

2) Aqora dataset entry (pin your version)

@misc{aqora_mnisq_784,
  title        = {MNISQ-784: Digits (MNIST-784) Quantum Circuit Subset (Aqora mirror)},
  howpublished = {\url{https://aqora.io/datasets/leopla/mnisq-784}},
  note         = {Aqora Datasets Hub. Please cite the pinned version you used, e.g., v1.0.0},
  year         = {2025},
  publisher    = {Aqora}
}
If your venue supports @dataset, you can switch the entry type accordingly.

Provenance

MNISQ was introduced by Placidi et al. (2023). This page scopes the MNIST-784 domain with all official splits and both encodings for convenience and reproducible benchmarking.