aqora / Quantum-Machine-9

Public

About dataset version

QM9 — Quantum Chemistry Structures and Properties

Short Description

Parquet conversion of the QM9 benchmark (“Quantum chemistry structures and properties of 134 kilo molecules,” Ramakrishnan et al., Scientific Data, 2014). It provides 133,885 molecules made of C, H, O, N, and F with up to nine heavy atoms, preserving every field from the extended XYZ sources for fast programmatic access and downstream quantum-chemistry or ML research.

Background

The original QM9 study computed equilibrium geometries and quantum-chemical properties at the B3LYP / 6-31G(2df,p) level. This Aqora release keeps the same scientific content and reorganizes it into a columnar layout so workflows no longer need to parse millions of text lines. No post-processing, filtering, or recomputation was applied.

Provenance

  • Source: QM9 extended XYZ files (GDB-17 subset totaling 133,885 molecules)
  • Elements: C, H, O, N, F
  • Size limit: ≤9 heavy atoms
  • Geometry: Cartesian coordinates with Mulliken partial charges
  • Properties: Electronic, energetic, thermodynamic, and vibrational descriptors from the original publication

Contents

  • Rows: 133,885 molecules with one row per molecule and nested atomic information
  • Level of theory: B3LYP / 6-31G(2df,p) (Density Functional Theory)

Schema

ColumnTypeDescription
molecule_idStringMolecule identifier (e.g., gdb 1).
n_atomsInt16Atom count.
elementsListStringElement symbols in source order.
geometryListStructPer-atom element, x, y, z, charge with coordinates in Angstrom and charges in e.
frequencies_cm1ListFloat32Harmonic vibrational frequencies (cm^-1).
element_pair_1StringFirst QM9 metadata element field.
element_pair_2StringSecond QM9 metadata element field.
inchiStringCanonical InChI string.
qm9StructScalar quantum-chemical properties (see below).
theoryStructmethod, basis, level; fixed to B3LYP / 6-31G(2df,p) / DFT.
formulaStringHill-system molecular formula.
sourceStringConstant QM9 / GDB-17.

QM9 Scalar Properties

PropertyUnitsDescription
muDebyeDipole moment.
alphaBohr^3Isotropic polarizability.
homoHartreeHOMO energy.
lumoHartreeLUMO energy.
gapHartreeHOMO-LUMO gap.
r2Bohr^2Electronic spatial extent.
zpveHartreeZero-point vibrational energy.
U0HartreeInternal energy at 0 K.
UHartreeInternal energy at 298.15 K.
HHartreeEnthalpy at 298.15 K.
GHartreeGibbs free energy at 298.15 K.
Cvcal mol^-1 K^-1Heat capacity at 298.15 K.

Loading Examples

Polars

import polars as pl
from aqora_cli.pyarrow import dataset

df = pl.scan_pyarrow_dataset(dataset("aqora/quantum-machine-9", "v0.0.0")).collect()
print(df.height)
print(df.select("qm9").head())

Pandas

import pandas as pd

df = pd.read_parquet("aqora://aqora/quantum-machine-9/v0.0.0")
print(df.head())

Notes

  • All original QM9 values are preserved; the Parquet conversion only changes storage format.
  • Derived datasets or feature-engineered variants may appear separately.

Citation

  1. L. Ruddigkeit, R. van Deursen, L. C. Blum, J.-L. Reymond, “Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17,” J. Chem. Inf. Model. 52, 2864–2875 (2012).
  2. R. Ramakrishnan, P. O. Dral, M. Rupp, O. A. von Lilienfeld, “Quantum chemistry structures and properties of 134 kilo molecules,” Scientific Data 1, 140022 (2014).