QM9 — Quantum Chemistry Structures and Properties
Short Description
Parquet conversion of the QM9 benchmark (“Quantum chemistry structures and properties of 134 kilo molecules,” Ramakrishnan et al., Scientific Data, 2014). It provides 133,885 molecules made of C, H, O, N, and F with up to nine heavy atoms, preserving every field from the extended XYZ sources for fast programmatic access and downstream quantum-chemistry or ML research.
Background
The original QM9 study computed equilibrium geometries and quantum-chemical properties at the B3LYP / 6-31G(2df,p) level. This Aqora release keeps the same scientific content and reorganizes it into a columnar layout so workflows no longer need to parse millions of text lines. No post-processing, filtering, or recomputation was applied.
Provenance
- Source: QM9 extended XYZ files (GDB-17 subset totaling 133,885 molecules)
- Elements: C, H, O, N, F
- Size limit: ≤9 heavy atoms
- Geometry: Cartesian coordinates with Mulliken partial charges
- Properties: Electronic, energetic, thermodynamic, and vibrational descriptors from the original publication
Contents
- Rows: 133,885 molecules with one row per molecule and nested atomic information
- Level of theory: B3LYP / 6-31G(2df,p) (Density Functional Theory)
Schema
| Column | Type | Description |
|---|
molecule_id | String | Molecule identifier (e.g., gdb 1). |
n_atoms | Int16 | Atom count. |
elements | ListString | Element symbols in source order. |
geometry | ListStruct | Per-atom element, x, y, z, charge with coordinates in Angstrom and charges in e. |
frequencies_cm1 | ListFloat32 | Harmonic vibrational frequencies (cm^-1). |
element_pair_1 | String | First QM9 metadata element field. |
element_pair_2 | String | Second QM9 metadata element field. |
inchi | String | Canonical InChI string. |
qm9 | Struct | Scalar quantum-chemical properties (see below). |
theory | Struct | method, basis, level; fixed to B3LYP / 6-31G(2df,p) / DFT. |
formula | String | Hill-system molecular formula. |
source | String | Constant QM9 / GDB-17. |
QM9 Scalar Properties
| Property | Units | Description |
|---|
mu | Debye | Dipole moment. |
alpha | Bohr^3 | Isotropic polarizability. |
homo | Hartree | HOMO energy. |
lumo | Hartree | LUMO energy. |
gap | Hartree | HOMO-LUMO gap. |
r2 | Bohr^2 | Electronic spatial extent. |
zpve | Hartree | Zero-point vibrational energy. |
U0 | Hartree | Internal energy at 0 K. |
U | Hartree | Internal energy at 298.15 K. |
H | Hartree | Enthalpy at 298.15 K. |
G | Hartree | Gibbs free energy at 298.15 K. |
Cv | cal mol^-1 K^-1 | Heat capacity at 298.15 K. |
Loading Examples
Polars
import polars as pl
from aqora_cli.pyarrow import dataset
df = pl.scan_pyarrow_dataset(dataset("aqora/quantum-machine-9", "v0.0.0")).collect()
print(df.height)
print(df.select("qm9").head())
Pandas
import pandas as pd
df = pd.read_parquet("aqora://aqora/quantum-machine-9/v0.0.0")
print(df.head())
Notes
- All original QM9 values are preserved; the Parquet conversion only changes storage format.
- Derived datasets or feature-engineered variants may appear separately.
Citation
- L. Ruddigkeit, R. van Deursen, L. C. Blum, J.-L. Reymond, “Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17,” J. Chem. Inf. Model. 52, 2864–2875 (2012).
- R. Ramakrishnan, P. O. Dral, M. Rupp, O. A. von Lilienfeld, “Quantum chemistry structures and properties of 134 kilo molecules,” Scientific Data 1, 140022 (2014).