📋 Overview
This dataset contains comprehensive quantum error correction (QEC) simulation data for
magic state cultivation using lattice surgery protocols. The data is derived from the research paper
"Efficient Magic State Cultivation with Lattice Surgery" and provides detailed performance metrics, error rates, and operational costs for various quantum error correction schemes.
Magic state cultivation is a critical component of fault-tolerant quantum computing, enabling universal quantum computation through the distillation of high-fidelity magic states. This dataset captures the performance characteristics of novel lattice surgery techniques applied to the Steane code merged with rotated surface codes.
🎯 Key Features
- Single Unified File: One Parquet file with complete context in every row
- 25 Simulation Runs: Comprehensive results across 5 different experiment types
- Multiple Error Correction Codes: Steane [[7,1,3]] code and rotated surface codes (distance 3, 5, 7)
- Detailed Performance Metrics: Logical error rates, acceptance rates, qubit-round costs
- Protocol Stage Analysis: Success rates for each stage (injection, stabilization, surgery, epilogue)
- Complementary Gap Measurements: Novel decoder performance metrics for post-selection
- ML-Ready Format: Denormalized structure ideal for analysis, visualization, and machine learning
📊 Dataset Structure
The dataset is a single denormalized Parquet file where each row represents one complete simulation run with all parameters, results, and code specifications.
Column Groups
🔑 Identification (4 columns)
Unique identifiers for each simulation run:
| Column | Type | Description |
|---|
experiment_id | string | Unique identifier for the experiment configuration |
run_id | string | Unique identifier for this specific run |
experiment_type | string | Type of simulation (see experiment types below) |
run_number | int | Run number within the experiment (0-4) |
⚙️ Simulation Parameters (12 columns)
Configuration settings for the simulation:
| Column | Type | Description |
|---|
error_probability | float | Physical error rate per gate operation (0.0005-0.001) |
num_shots | int | Number of Monte Carlo samples (50K-1M) |
surface_distance | int | Code distance for surface code patches (3, 5, or 7) |
initial_value | string | Initial logical state (Plus, Zero, or null) |
syndrome_extraction_pattern | string | Pattern for syndrome measurements (XZZ, ZXZ, etc.) |
perfect_initialization | bool | Whether perfect state initialization is used |
with_heuristic_post_selection | bool | Enable heuristic post-selection |
with_heuristic_gap_calculation | bool | Enable heuristic gap calculation |
full_post_selection | bool | Apply full post-selection protocol |
num_stabilization_rounds_after_surgery | int | Stabilizer rounds after surgery |
num_epilogue_syndrome_extraction_rounds | int | Final syndrome extraction rounds |
gap_threshold | float | Threshold for complementary gap acceptance |
experiment_description | string | Human-readable description |
📈 Simulation Results (8 columns)
Outcome metrics from the simulation:
| Column | Type | Description |
|---|
num_valid_samples | int | Number of samples that passed checks |
num_wrong_samples | int | Number of samples with detected errors |
num_discarded_samples | int | Number of samples rejected by post-selection |
logical_error_rate | float | Probability of logical error (wrong/total_accepted) |
acceptance_rate | float | Fraction of samples accepted after post-selection |
qubitrounds_cost | float | Resource cost in qubit-rounds |
gap_value | float | Complementary gap measurement value |
complementary_gap_detector_id | int | Detector ID for gap measurement |
simulation_timestamp | datetime | Timestamp of simulation run |
🔬 Quantum Code Specifications (7 columns)
Details of the quantum error correction code used:
| Column | Type | Description |
|---|
code_type | string | Type of quantum code (Steane, Surface_Rotated_d3/5/7, Merged) |
num_data_qubits | int | Number of data qubits (7-49) |
num_syndrome_qubits | int | Number of syndrome measurement qubits |
code_distance | int | Code distance (3, 5, or 7) |
num_logical_qubits | int | Number of encoded logical qubits (1 or 2) |
num_x_stabilizers | int | Number of X-type stabilizer generators |
num_z_stabilizers | int | Number of Z-type stabilizer generators |
code_description | string | Human-readable code description |
Aggregated success metrics for each protocol stage:
| Column | Type | Description |
|---|
injection_success_rate | float | Success rate for magic state injection (0.85-0.92) |
stabilize1_success_rate | float | Success rate for first stabilization (0.92-0.97) |
surgery_success_rate | float | Success rate for lattice surgery (0.88-0.95) |
stabilize2_success_rate | float | Success rate for second stabilization (0.93-0.98) |
epilogue_success_rate | float | Success rate for epilogue stage (0.94-0.99) |
injection_qubits_used | int | Qubit count for injection stage |
surgery_qubits_used | int | Qubit count for surgery stage |
total_protocol_cost | float | Total resource cost across all stages |
🔧 Stim Circuit Definitions (2 columns)
Complete quantum circuit specifications in Stim format for full reproducibility:
| Column | Type | Description |
|---|
stim_circuit_definition | string | Complete quantum circuit in Stim format - includes qubit coordinates, gate operations (RX, CX, M, MX, MPP), noise models (DEPOLARIZE1/2, X_ERROR, Z_ERROR), timing (TICK), error detection (DETECTOR), and observables. Ranges from 7,535 to 183,929 characters depending on circuit complexity. |
stim_circuit_length | int | Character count of circuit definition (for reference) |
Stim Circuit Format:
- QUBIT_COORDS: Physical qubit layout on 2D grid
- Gates: RX (X-basis reset), R (Z-basis reset), CX (CNOT), MX/M (measurements), S/S_DAG (phase gates), MPP (Pauli product measurements)
- Noise: DEPOLARIZE1/2, X_ERROR, Z_ERROR with error probability parameters
- Timing: TICK markers separate time steps
- Detectors: DETECTOR annotations mark syndrome extraction points
- Observables: OBSERVABLE_INCLUDE defines logical measurements
🧪 Experiment Types
The dataset includes 5 distinct experiment configurations:
1. Lattice Surgery with Complementary Gap (lattice_surgery_complementary_gap)
Standard lattice surgery protocol with complementary gap threshold for post-selection.
- Code: Merged Steane + Surface (16 data qubits)
- Distance: 3
- Shots: 1,000,000
- Focus: High-statistics study of gap-based post-selection
2. Lattice Surgery Error Detection (lattice_surgery_error_detection)
Error detection scenario with perfect initialization and full post-selection.
- Code: Surface code distance 5 (25 data qubits)
- Distance: 5
- Shots: 100,000
- Focus: Detection capabilities with larger code
3. Surface Code Complementary Gap (surface_complementary_gap)
Surface code patch with complementary gap measurement.
- Code: Surface code distance 3 (9 data qubits)
- Distance: 3
- Shots: 500,000
- Focus: Surface code performance with gap metric
4. Inject and Cultivate (inject_cultivate)
Magic state injection and cultivation protocol using Steane code.
- Code: Steane [[7,1,3]] (7 data qubits)
- Distance: 3
- Shots: 50,000
- Focus: Magic state distillation fundamentals
5. Surface Code Expansion (surface_code_expansion)
Surface code distance expansion from d=3 to d=7.
- Code: Surface code distance 7 (49 data qubits)
- Distance: 7
- Shots: 100,000
- Focus: Scaling behavior with increased distance
💡 Use Cases
Quantum Computing Research
- Error Rate Analysis: Compare logical error rates across different codes and distances
- Resource Optimization: Study tradeoffs between fidelity and qubit-round costs
- Post-Selection Strategies: Evaluate effectiveness of complementary gap thresholds
- Scaling Studies: Analyze how performance scales with code distance
Education & Teaching
- QEC Fundamentals: Understand relationships between physical and logical error rates
- Code Comparisons: Compare Steane vs. surface codes in practical scenarios
- Protocol Visualization: Examine stage-by-stage success rates in magic state cultivation
Machine Learning Applications
- Predictive Modeling: Train models to predict logical error rates from parameters
- Anomaly Detection: Identify unusual simulation runs or parameter configurations
- Feature Importance: Determine which parameters most impact success metrics
- Optimization: Use ML to suggest optimal experiment configurations
📥 Loading the Dataset
Python (Pandas)
import pandas as pd
# Load the dataset
df = pd.read_parquet("aqora://aqora/magic-state-cultivation-with-lattice-surgery/v0.0.0")
# Display basic info
print(f"Total runs: {len(df)}")
print(f"Columns: {len(df.columns)}")
print(f"Experiment types: {df['experiment_type'].nunique()}")
# View first few rows
print(df.head())
# Get summary statistics
print(df.describe())
📊 Analysis Examples
Example 1: Error Rate Distribution by Experiment Type
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
df = pd.read_parquet("aqora://aqora/magic-state-cultivation-with-lattice-surgery/v0.0.0")
# Create visualization
plt.figure(figsize=(12, 6))
sns.boxplot(data=df, x='experiment_type', y='logical_error_rate')
plt.xticks(rotation=45, ha='right')
plt.title('Logical Error Rate Distribution by Experiment Type')
plt.ylabel('Logical Error Rate')
plt.xlabel('Experiment Type')
plt.tight_layout()
plt.show()
# Print statistics
stats = df.groupby('experiment_type')['logical_error_rate'].describe()
print(stats)
Example 2: Cost vs Fidelity Analysis
import pandas as pd
import matplotlib.pyplot as plt
# Load data
df = pd.read_parquet("aqora://aqora/magic-state-cultivation-with-lattice-surgery/v0.0.0")
# Calculate fidelity (1 - error_rate)
df['fidelity'] = 1 - df['logical_error_rate']
# Scatter plot
plt.figure(figsize=(10, 6))
for exp_type in df['experiment_type'].unique():
subset = df[df['experiment_type'] == exp_type]
plt.scatter(subset['qubitrounds_cost'], subset['fidelity'],
label=exp_type, alpha=0.7, s=100)
plt.xlabel('Qubit-Rounds Cost')
plt.ylabel('Logical Fidelity (1 - error rate)')
plt.title('Fidelity vs Resource Cost Tradeoff')
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Example 3: Code Distance Impact
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load data
df = pd.read_parquet("aqora://aqora/magic-state-cultivation-with-lattice-surgery/v0.0.0")
# Group by code distance
distance_analysis = df.groupby('code_distance').agg({
'logical_error_rate': ['mean', 'std'],
'num_data_qubits': 'first',
'acceptance_rate': 'mean'
}).round(6)
print("Impact of Code Distance:")
print(distance_analysis)
# Visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Error rate vs distance
df.groupby('code_distance')['logical_error_rate'].mean().plot(
kind='bar', ax=ax1, color='coral')
ax1.set_title('Average Logical Error Rate by Code Distance')
ax1.set_ylabel('Logical Error Rate')
ax1.set_xlabel('Code Distance')
# Qubit count vs distance
df.groupby('code_distance')['num_data_qubits'].first().plot(
kind='bar', ax=ax2, color='skyblue')
ax2.set_title('Data Qubits Required by Code Distance')
ax2.set_ylabel('Number of Data Qubits')
ax2.set_xlabel('Code Distance')
plt.tight_layout()
plt.show()
Example 4: Protocol Stage Analysis
import pandas as pd
import matplotlib.pyplot as plt
# Load data
df = pd.read_parquet("aqora://aqora/magic-state-cultivation-with-lattice-surgery/v0.0.0")
# Extract stage success rates
stages = ['injection_success_rate', 'stabilize1_success_rate',
'surgery_success_rate', 'stabilize2_success_rate',
'epilogue_success_rate']
stage_names = ['Injection', 'Stabilize 1', 'Surgery', 'Stabilize 2', 'Epilogue']
# Calculate means
stage_means = [df[stage].mean() for stage in stages]
# Create bar plot
plt.figure(figsize=(10, 6))
bars = plt.bar(stage_names, stage_means, color='steelblue', alpha=0.7)
plt.axhline(y=0.95, color='red', linestyle='--', label='95% threshold')
plt.ylabel('Success Rate')
plt.title('Average Success Rate by Protocol Stage')
plt.ylim(0.85, 1.0)
plt.legend()
plt.grid(True, alpha=0.3, axis='y')
# Add value labels on bars
for bar in bars:
height = bar.get_height()
plt.text(bar.get_x() + bar.get_width()/2., height,
f'{height:.3f}',
ha='center', va='bottom')
plt.tight_layout()
plt.show()
print("\nStage-by-Stage Performance:")
for name, rate in zip(stage_names, stage_means):
print(f"{name:15s}: {rate:.4f}")
🔬 Research Context
This dataset implements simulations based on:
Yutaka Hirano. "Efficient Magic State Cultivation with Lattice Surgery." arXiv preprint arXiv:2510.24615 (2024).
Key Innovations:
- Complementary gap post-selection for improved magic state fidelity
- Lattice surgery protocols for efficient multi-qubit operations
- Integration of Steane and surface codes for optimized resource usage
Simulation Framework:
- Primary Tool: Stim (v1.15 or later)
- Decoder: PyMatching (minimum-weight perfect matching)
- Language: Python 3.8+