aqora / N2-CS-VQE

Public

About dataset version

N2-CS-VQE Dataset

Overview

This dataset contains comprehensive quantum chemistry calculations and Variational Quantum Eigensolver (VQE) results for the molecular nitrogen (N₂) dissociation curve using the Contextual Subspace (CS) approach on a superconducting quantum computer.

Source

Dataset Description

The dataset (n2_cs_vqe_complete.parquet) contains 160 rows (geometries) and 46 columns (features) in a unified structure. Each row represents a molecular geometry along the N₂ dissociation curve from 0.8 to 2.0 Å.
The dataset combines:
  • Classical quantum chemistry calculations for all 160 geometries (complete dissociation curve)
  • Experimental quantum computing results from IBM superconducting quantum processors at 10 selected geometries
    All data for a given geometry is contained in a single row, eliminating the need for separate datasets or complex joins.

Key Features

Data Organization

  • 160 rows: One row per molecular geometry (bond length)
  • has_experimental_data: Boolean flag indicating whether experimental quantum computing data is available for this geometry (10 geometries have True, 150 have False)

Molecular Geometry

  • bond_length_angstrom: N-N bond length in Ångströms (ranging from 0.8 to 2.0 Å, 160 evenly-spaced points)

Classical Quantum Chemistry Energy Calculations

Available for all 160 geometries:
  • energy_hf: Hartree-Fock energy
  • energy_mp2: MP2 (Møller-Plesset 2nd order) energy
  • energy_cisd: Configuration Interaction Singles Doubles energy
  • energy_ccsd: Coupled Cluster Singles Doubles energy
  • energy_ccsd_t: CCSD with perturbative triples energy
  • energy_fci: Full Configuration Interaction energy (exact within basis set)

Complete Active Space (CAS) Methods

Available for all 160 geometries:
  • energy_casci_4_2, energy_casci_5_4, energy_casci_6_6, energy_casci_7_8: CASCI energies
  • energy_casscf_4_2, energy_casscf_5_4, energy_casscf_6_6, energy_casscf_7_8: CASSCF energies
  • Format notation: (n_orbitals, n_electrons) e.g., (4,2) = 4 orbitals, 2 electrons

Quantum Computing Results

  • energy_cs_dd_5q: Contextual Subspace with double-d approximation (5 qubits) - noiseless simulation (all 160 geometries)
  • energy_nc: Non-contextual energy - noiseless simulation (all 160 geometries)
  • energy_cs_vqe_mean: Experimental CS-VQE energy from real quantum hardware (mean) - available for 10 geometries only
  • energy_cs_vqe_std: Standard deviation of CS-VQE experimental results - available for 10 geometries only
    The 10 experimental geometries are at bond lengths: 0.80, 0.94, 1.06, 1.20, 1.34, 1.46, 1.60, 1.74, 1.86, and 2.00 Å (approximately evenly spaced across the dissociation curve).

Entanglement Entropies

Von Neumann entanglement entropy (all 160 geometries):
  • entropy_mp2, entropy_cisd, entropy_ccsd, entropy_fci
  • entropy_casci_4_2, entropy_casci_5_4, entropy_casci_6_6, entropy_casci_7_8
  • entropy_casscf_4_2, entropy_casscf_5_4, entropy_casscf_6_6, entropy_casscf_7_8
  • entropy_cs_dd_5q: Entropy for CS-DD method

Molecular Orbital Information

Available for all 160 geometries:
  • mo_energy_0 through mo_energy_9: Molecular orbital energies for the first 10 orbitals

Diagnostics

Available for all 160 geometries:
  • t1_diagnostic: T1 diagnostic (indicates multi-reference character)
  • d1_diagnostic: D1 diagnostic (another multi-reference indicator)

Scientific Context

The Contextual Subspace Variational Quantum Eigensolver (CS-VQE) is an advanced quantum algorithm designed to compute electronic structure properties of molecules on near-term quantum computers. This dataset demonstrates the calculation of the complete dissociation curve of N₂, one of the most challenging molecules for quantum chemistry due to its strong multi-reference character at stretched geometries.
The dataset uniquely combines:
  1. Complete classical quantum chemistry benchmark data for the full dissociation curve (160 geometries from 0.8 to 2.0 Å)
  2. Experimental quantum computing results from IBM superconducting quantum processors at 10 carefully selected geometries

Key Scientific Contributions

  1. Hardware-Aware Quantum Algorithms: Implementation of CS-VQE on actual superconducting quantum hardware (IBM Guadalupe, Hanoi, Auckland, Washington)
  2. Dissociation Curve Calculation: Complete potential energy surface from equilibrium to dissociation
  3. Benchmark Data: Comprehensive comparison between quantum computing results and classical quantum chemistry methods
  4. Error Mitigation: Demonstrates the effectiveness of:
  • Zero-noise extrapolation (ZNE)
  • Measurement error mitigation (M3)
  • Contextual subspace methods for reducing quantum errors

Data Quality Notes

  • Overall completeness: 95.9% - Excellent data quality
  • Null values (4.1% of cells) occur only in experimental quantum computing columns:
  • energy_cs_vqe_mean and energy_cs_vqe_std: Available for 10 geometries only (93.8% null)
  • All classical quantum chemistry data (energies, entropies, MO energies, diagnostics): 100% complete for all 160 geometries
  • Unified structure: No need for joins or merges - all data for a geometry is in one row

File Format

The dataset is stored in Apache Parquet format, which provides:
  • Efficient columnar storage
  • Built-in compression
  • Schema preservation
  • Fast read/write operations
  • Cross-platform compatibility

Usage

Python Example

import polars as pl
from aqora_cli.pyarrow import dataset
import matplotlib.pyplot as plt
# Load the unified dataset from Aqora
df = pl.scan_pyarrow_dataset(dataset("aqora/n2-cs-vqe", "v1.0.0")).collect()
print(f"Dataset shape: {df.shape}")
print(f"Geometries with experimental data: {df['has_experimental_data'].sum()}")
# Example 1: Plot the complete dissociation curve
plt.figure(figsize=(12, 6))
# Plot classical methods (all 160 points)
plt.plot(df['bond_length_angstrom'], df['energy_fci'], 
         label='FCI (Exact)', linewidth=2, color='black', zorder=3)
plt.plot(df['bond_length_angstrom'], df['energy_ccsd_t'], 
         label='CCSD(T)', linestyle='--', alpha=0.7)
plt.plot(df['bond_length_angstrom'], df['energy_cs_dd_5q'], 
         label='CS-DD (5q, noiseless)', linestyle='-.', color='blue')
# Add experimental quantum results (10 points only)
df_exp = df.filter(pl.col('has_experimental_data'))
plt.errorbar(df_exp['bond_length_angstrom'].to_numpy(),
             df_exp['energy_cs_vqe_mean'].to_numpy(),
             yerr=df_exp['energy_cs_vqe_std'].to_numpy(),
             fmt='o', label='CS-VQE (Quantum Hardware)',
             capsize=5, markersize=8, color='red', zorder=4)
plt.xlabel('Bond Length (Å)', fontsize=12)
plt.ylabel('Energy (Hartree)', fontsize=12)
plt.title('N₂ Dissociation Curve: Classical vs Quantum Computing', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Example 2: Error analysis for experimental geometries
fig, ax = plt.subplots(figsize=(10, 6))
# Calculate errors relative to FCI (convert to eV)
df_exp = df_exp.with_columns([
    ((pl.col('energy_cs_dd_5q') - pl.col('energy_fci')) * 27.2114).alias('error_cs_dd'),
    ((pl.col('energy_cs_vqe_mean') - pl.col('energy_fci')) * 27.2114).alias('error_cs_vqe')
])
ax.plot(df_exp['bond_length_angstrom'].to_numpy(), 
        df_exp['error_cs_dd'].abs().to_numpy(), 
        'o-', label='CS-DD (noiseless)', markersize=8)
ax.errorbar(df_exp['bond_length_angstrom'].to_numpy(), 
            df_exp['error_cs_vqe'].abs().to_numpy(),
            yerr=df_exp['energy_cs_vqe_std'].to_numpy() * 27.2114,
            fmt='o-', label='CS-VQE (hardware)', capsize=4, markersize=8)
ax.axhline(y=0.0016 * 27.2114, color='gray', linestyle='--', 
           label='Chemical accuracy (1 kcal/mol)', alpha=0.7)
ax.set_xlabel('Bond Length (Å)', fontsize=12)
ax.set_ylabel('Absolute Error vs FCI (eV)', fontsize=12)
ax.set_title('Quantum Computing Accuracy Across Dissociation Curve', fontsize=14)
ax.legend(fontsize=10)
ax.set_yscale('log')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# Example 3: Analyze entanglement entropy
plt.figure(figsize=(10, 6))
plt.plot(df['bond_length_angstrom'].to_numpy(), 
         df['entropy_fci'].to_numpy(), 
         label='FCI', linewidth=2)
plt.plot(df['bond_length_angstrom'].to_numpy(), 
         df['entropy_ccsd'].to_numpy(), 
         label='CCSD', linestyle='--')
plt.plot(df['bond_length_angstrom'].to_numpy(), 
         df['entropy_cs_dd_5q'].to_numpy(), 
         label='CS-DD', linestyle='-.', color='blue')
# Highlight experimental geometries
exp_entropy = df.filter(pl.col('has_experimental_data'))['entropy_fci'].to_numpy()
plt.scatter(df_exp['bond_length_angstrom'].to_numpy(), 
           exp_entropy,
           color='red', s=100, zorder=5, label='Experimental geometries')
plt.xlabel('Bond Length (Å)', fontsize=12)
plt.ylabel('von Neumann Entropy', fontsize=12)
plt.title('Entanglement Entropy Along Dissociation Curve', fontsize=14)
plt.legend(fontsize=10)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Visualizations

Dissociation Curve

test_plot1.png
The plot shows the complete N₂ dissociation curve from 0.8 to 2.0 Å. Classical quantum chemistry methods (FCI, CCSD(T)) provide the benchmark, while experimental CS-VQE results from IBM quantum hardware are shown at 10 selected geometries with error bars.

Error Analysis

test_plot2.png
Comparison of absolute errors relative to FCI across the dissociation curve. The CS-DD method (noiseless simulation) and CS-VQE (experimental quantum hardware) are evaluated. The gray dashed line indicates chemical accuracy (1 kcal/mol ≈ 0.043 eV).

Entanglement Entropy

test_plot3.png
Von Neumann entanglement entropy along the dissociation curve for different methods. The entropy increases at stretched geometries, indicating growing multi-reference character. Red markers highlight the 10 geometries where experimental quantum computing data was collected.

Citation

If you use this dataset in your research, please also cite the original publication:
@article{weaving2024contextual,
  title={Contextual Subspace Variational Quantum Eigensolver Calculation of the Dissociation Curve of Molecular Nitrogen on a Superconducting Quantum Computer},
  author={Weaving, Tim and others},
  journal={npj Quantum Information},
  year={2024},
  publisher={Nature Publishing Group},
  doi={10.1038/s41534-024-00952-4}
}

Acknowledgments

This dataset was generated as part of research on quantum computing applications to quantum chemistry, demonstrating the capabilities and challenges of near-term quantum devices for molecular simulations.