Overview
This dataset contains comprehensive quantum chemistry calculations and Variational Quantum Eigensolver (VQE) results for the molecular nitrogen (N₂) dissociation curve using the Contextual Subspace (CS) approach on a superconducting quantum computer.
Source
Dataset Description
The dataset (n2_cs_vqe_complete.parquet) contains 160 rows (geometries) and 46 columns (features) in a unified structure. Each row represents a molecular geometry along the N₂ dissociation curve from 0.8 to 2.0 Å.
The dataset combines:
- Classical quantum chemistry calculations for all 160 geometries (complete dissociation curve)
- Experimental quantum computing results from IBM superconducting quantum processors at 10 selected geometries
All data for a given geometry is contained in a single row, eliminating the need for separate datasets or complex joins.
Key Features
Data Organization
- 160 rows: One row per molecular geometry (bond length)
- has_experimental_data: Boolean flag indicating whether experimental quantum computing data is available for this geometry (10 geometries have
True, 150 have False)
Molecular Geometry
bond_length_angstrom: N-N bond length in Ångströms (ranging from 0.8 to 2.0 Å, 160 evenly-spaced points)
Classical Quantum Chemistry Energy Calculations
Available for all 160 geometries:
energy_hf: Hartree-Fock energy
energy_mp2: MP2 (Møller-Plesset 2nd order) energy
energy_cisd: Configuration Interaction Singles Doubles energy
energy_ccsd: Coupled Cluster Singles Doubles energy
energy_ccsd_t: CCSD with perturbative triples energy
energy_fci: Full Configuration Interaction energy (exact within basis set)
Complete Active Space (CAS) Methods
Available for all 160 geometries:
energy_casci_4_2, energy_casci_5_4, energy_casci_6_6, energy_casci_7_8: CASCI energies
energy_casscf_4_2, energy_casscf_5_4, energy_casscf_6_6, energy_casscf_7_8: CASSCF energies
- Format notation:
(n_orbitals, n_electrons) e.g., (4,2) = 4 orbitals, 2 electrons
Quantum Computing Results
energy_cs_dd_5q: Contextual Subspace with double-d approximation (5 qubits) - noiseless simulation (all 160 geometries)
energy_nc: Non-contextual energy - noiseless simulation (all 160 geometries)
energy_cs_vqe_mean: Experimental CS-VQE energy from real quantum hardware (mean) - available for 10 geometries only
energy_cs_vqe_std: Standard deviation of CS-VQE experimental results - available for 10 geometries only
The 10 experimental geometries are at bond lengths: 0.80, 0.94, 1.06, 1.20, 1.34, 1.46, 1.60, 1.74, 1.86, and 2.00 Å (approximately evenly spaced across the dissociation curve).
Entanglement Entropies
Von Neumann entanglement entropy (all 160 geometries):
entropy_mp2, entropy_cisd, entropy_ccsd, entropy_fci
entropy_casci_4_2, entropy_casci_5_4, entropy_casci_6_6, entropy_casci_7_8
entropy_casscf_4_2, entropy_casscf_5_4, entropy_casscf_6_6, entropy_casscf_7_8
entropy_cs_dd_5q: Entropy for CS-DD method
Available for all 160 geometries:
mo_energy_0 through mo_energy_9: Molecular orbital energies for the first 10 orbitals
Diagnostics
Available for all 160 geometries:
t1_diagnostic: T1 diagnostic (indicates multi-reference character)
d1_diagnostic: D1 diagnostic (another multi-reference indicator)
Scientific Context
The Contextual Subspace Variational Quantum Eigensolver (CS-VQE) is an advanced quantum algorithm designed to compute electronic structure properties of molecules on near-term quantum computers. This dataset demonstrates the calculation of the complete dissociation curve of N₂, one of the most challenging molecules for quantum chemistry due to its strong multi-reference character at stretched geometries.
The dataset uniquely combines:
- Complete classical quantum chemistry benchmark data for the full dissociation curve (160 geometries from 0.8 to 2.0 Å)
- Experimental quantum computing results from IBM superconducting quantum processors at 10 carefully selected geometries
Key Scientific Contributions
- Hardware-Aware Quantum Algorithms: Implementation of CS-VQE on actual superconducting quantum hardware (IBM Guadalupe, Hanoi, Auckland, Washington)
- Dissociation Curve Calculation: Complete potential energy surface from equilibrium to dissociation
- Benchmark Data: Comprehensive comparison between quantum computing results and classical quantum chemistry methods
- Error Mitigation: Demonstrates the effectiveness of:
- Zero-noise extrapolation (ZNE)
- Measurement error mitigation (M3)
- Contextual subspace methods for reducing quantum errors
Data Quality Notes
- Overall completeness: 95.9% - Excellent data quality
- Null values (4.1% of cells) occur only in experimental quantum computing columns:
energy_cs_vqe_mean and energy_cs_vqe_std: Available for 10 geometries only (93.8% null)
- All classical quantum chemistry data (energies, entropies, MO energies, diagnostics): 100% complete for all 160 geometries
- Unified structure: No need for joins or merges - all data for a geometry is in one row
The dataset is stored in Apache Parquet format, which provides:
- Efficient columnar storage
- Built-in compression
- Schema preservation
- Fast read/write operations
- Cross-platform compatibility
Usage
Python Example
import polars as pl
from aqora_cli.pyarrow import dataset
import matplotlib.pyplot as plt
# Load the unified dataset from Aqora
df = pl.scan_pyarrow_dataset(dataset("aqora/n2-cs-vqe", "v1.0.0")).collect()
print(f"Dataset shape: {df.shape}")
print(f"Geometries with experimental data: {df['has_experimental_data'].sum()}")
# Example 1: Plot the complete dissociation curve
plt.figure(figsize=(12, 6))
# Plot classical methods (all 160 points)
plt.plot(df['bond_length_angstrom'], df['energy_fci'],
label='FCI (Exact)', linewidth=2, color='black', zorder=3)
plt.plot(df['bond_length_angstrom'], df['energy_ccsd_t'],
label='CCSD(T)', linestyle='--', alpha=0.7)
plt.plot(df['bond_length_angstrom'], df['energy_cs_dd_5q'],
label='CS-DD (5q, noiseless)', linestyle='-.', color='blue')
# Add experimental quantum results (10 points only)
df_exp = df.filter(pl.col('has_experimental_data'))
plt.errorbar(df_exp['bond_length_angstrom'].to_numpy(),
df_exp['energy_cs_vqe_mean'].to_numpy(),
yerr=df_exp['energy_cs_vqe_std'].to_numpy(),
fmt='o', label='CS-VQE (Quantum Hardware)',
capsize=5, markersize=8, color='red', zorder=4)
plt.xlabel('Bond Length (Å)', fontsize=12)
plt.ylabel('Energy (Hartree)', fontsize=12)
plt.title('N₂ Dissociation Curve: Classical vs Quantum Computing', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Example 2: Error analysis for experimental geometries
fig, ax = plt.subplots(figsize=(10, 6))
# Calculate errors relative to FCI (convert to eV)
df_exp = df_exp.with_columns([
((pl.col('energy_cs_dd_5q') - pl.col('energy_fci')) * 27.2114).alias('error_cs_dd'),
((pl.col('energy_cs_vqe_mean') - pl.col('energy_fci')) * 27.2114).alias('error_cs_vqe')
])
ax.plot(df_exp['bond_length_angstrom'].to_numpy(),
df_exp['error_cs_dd'].abs().to_numpy(),
'o-', label='CS-DD (noiseless)', markersize=8)
ax.errorbar(df_exp['bond_length_angstrom'].to_numpy(),
df_exp['error_cs_vqe'].abs().to_numpy(),
yerr=df_exp['energy_cs_vqe_std'].to_numpy() * 27.2114,
fmt='o-', label='CS-VQE (hardware)', capsize=4, markersize=8)
ax.axhline(y=0.0016 * 27.2114, color='gray', linestyle='--',
label='Chemical accuracy (1 kcal/mol)', alpha=0.7)
ax.set_xlabel('Bond Length (Å)', fontsize=12)
ax.set_ylabel('Absolute Error vs FCI (eV)', fontsize=12)
ax.set_title('Quantum Computing Accuracy Across Dissociation Curve', fontsize=14)
ax.legend(fontsize=10)
ax.set_yscale('log')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Example 3: Analyze entanglement entropy
plt.figure(figsize=(10, 6))
plt.plot(df['bond_length_angstrom'].to_numpy(),
df['entropy_fci'].to_numpy(),
label='FCI', linewidth=2)
plt.plot(df['bond_length_angstrom'].to_numpy(),
df['entropy_ccsd'].to_numpy(),
label='CCSD', linestyle='--')
plt.plot(df['bond_length_angstrom'].to_numpy(),
df['entropy_cs_dd_5q'].to_numpy(),
label='CS-DD', linestyle='-.', color='blue')
# Highlight experimental geometries
exp_entropy = df.filter(pl.col('has_experimental_data'))['entropy_fci'].to_numpy()
plt.scatter(df_exp['bond_length_angstrom'].to_numpy(),
exp_entropy,
color='red', s=100, zorder=5, label='Experimental geometries')
plt.xlabel('Bond Length (Å)', fontsize=12)
plt.ylabel('von Neumann Entropy', fontsize=12)
plt.title('Entanglement Entropy Along Dissociation Curve', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Visualizations
Dissociation Curve

The plot shows the complete N₂ dissociation curve from 0.8 to 2.0 Å. Classical quantum chemistry methods (FCI, CCSD(T)) provide the benchmark, while experimental CS-VQE results from IBM quantum hardware are shown at 10 selected geometries with error bars.
Error Analysis

Comparison of absolute errors relative to FCI across the dissociation curve. The CS-DD method (noiseless simulation) and CS-VQE (experimental quantum hardware) are evaluated. The gray dashed line indicates chemical accuracy (1 kcal/mol ≈ 0.043 eV).
Entanglement Entropy

Von Neumann entanglement entropy along the dissociation curve for different methods. The entropy increases at stretched geometries, indicating growing multi-reference character. Red markers highlight the 10 geometries where experimental quantum computing data was collected.
Citation
If you use this dataset in your research, please also cite the original publication:
@article{weaving2024contextual,
title={Contextual Subspace Variational Quantum Eigensolver Calculation of the Dissociation Curve of Molecular Nitrogen on a Superconducting Quantum Computer},
author={Weaving, Tim and others},
journal={npj Quantum Information},
year={2024},
publisher={Nature Publishing Group},
doi={10.1038/s41534-024-00952-4}
}
Acknowledgments
This dataset was generated as part of research on quantum computing applications to quantum chemistry, demonstrating the capabilities and challenges of near-term quantum devices for molecular simulations.