DeepMD Setup

Overview

This module provides functionalities to setting up the training of DeepMD ML models, freezing and compressing them, and evaluating their accuracy. Inside each iter_00000xx directory a 01.train directory will be created, and based on the num_models a subdirectory with name traininig_x will be created and the training will be performed. Training will be done based on the user defined input_file, see MLIP Setup section.

Tip

Users are encouraged to train several models for better accuracy.

After training each directory should have a file frozen_model.pb. These different models will be used to perform Query-by-committee to find the candidates for labelling.

Note

Query by committee (QbC): Identifies the configurations by measuring the disagreement among an ensemble of model. Allows the model to learn only what it needs to without wasting resources on redundant data. See also Deepmd-kit model deviation for more details.

If the candidates are found then inside the 02.dft directory a subdirecoty dft_candidates will be created with a separate POSCAR for each candidates.

>>> tree dft_candidates
    ├── 0001
    │   └── POSCAR
    ├── 0002
    │   └── POSCAR
    ├── 0003
    │   └── POSCAR
    ├── 0004
    │   └── POSCAR
    ├── 0005
        └── POSCAR

Features:

Setup DeepPotential calculators for ASE atoms objects
Train DeepMD models with various configurations
Freeze and compress trained models
Evaluate trained models

Usage Example

Here is an example of how to use setup_DeepPotential to assign a DeepPotential calculator to an ASE atoms object:

from sparc.src.deepmd import setup_DeepPotential
from ase import Atoms

atoms = Atoms("H2O")
dp_system, dp_calc = setup_DeepPotential(atoms, "path/to/model")
print(dp_system.get_potential_energy())

Supported Backends

SPARC automatically detects the installed DeepMD-kit version and selects the appropriate backend at runtime:

Backend	Version	Notes
DeepMD v2 (TensorFlow)	deepmd-kit 2.x	Trains standard DeePMD models; frozen model saved as `frozen_model.pb`
DeepMD v3 (TensorFlow)	deepmd-kit 3.x	Trains DeePMD models with TF backend; frozen model saved as `frozen_model.pb`
DeepMD v3 (PyTorch)	deepmd-kit 3.x	Trains DeePMD models with PyTorch backend; frozen model saved as `frozen_model.pth`; supports fine-tuning from universal models
GNN (MACE / NequIP)	deepmd-kit 3.x + deepmd-gnn	GNN potentials trained through the DeepMD-kit interface; frozen model saved as `frozen_model.pth`

No changes to input.yaml or input.json are required when switching versions — SPARC falls back automatically based on what is installed.

Note

Fine-tuning from pre-trained universal models (DPA-3) and GNN backends (MACE, NequIP) require DeepMD-kit v3 with the PyTorch backend. See Fine-Tuning Universal Models and Graph Neural Network Potentials (GNN) for details.

Module Contents

DeepMD module for SPARC package with DeePMD-kit v2/v3 support.

This module contains functions for: 1. Setting up DeepPotential calculators for ASE atoms objects 2. Training DeepMD models with TensorFlow or PyTorch backends 3. Model freezing and compression 4. Support for DeePMD-GNN (MACE, NequIP models)

Supports DeePMD-kit v2 and v3 with automatic backend detection.

sparc.src.deepmd.deepmd_training(active_learning, datadir, atom_types, training_dir, num_models, input_file='input.json', compress_models=False)[source]

Train DeepMD models for molecular potential energy surface representation.

Supports both DeePMD-kit v2 (TensorFlow) and v3 (PyTorch/TensorFlow). Backend is automatically detected from the environment.

Parameters:

active_learning (bool) – Whether this training is part of an active learning cycle
datadir (str) – Path to directory containing training and validation data
atom_types (List[str]) – List of atomic species in the system
training_dir (str) – Path to the directory where models will be trained
num_models (int) – Number of models to train (minimum: 2)
input_file (str) – Path to DeepMD input JSON file (default: ‘input.json’)
compress_models (bool) – Whether to compress trained models (default: True)

Returns:

Name of the frozen model file

Return type:

str

Raises:

ValueError – If num_models < 2
FileNotFoundError – If input file not found

sparc.src.deepmd.evaluate_model_accuracy(model_path, test_data_path, version, backend)[source]

Evaluate the accuracy of a trained DeepMD model against reference data.

Parameters:

model_path (str) – Path to the DeepMD frozen model
test_data_path (str) – Path to test data in DeepMD npy format
version (int) – DeePMD-kit major version (2 or 3)
backend (str) – Backend (‘pytorch’ or ‘tensorflow’)

sparc.src.deepmd.get_backend()[source]

Detect which backend is functional in DeePMD-kit v3.

Both deepmd.pt and deepmd.tf modules may exist in the environment, but only one will actually work. We test which one can be imported. Defaults to TensorFlow if detection fails.

Returns:: ‘pytorch’ or ‘tensorflow’
Return type:: str

sparc.src.deepmd.get_version()[source]

Detect installed DeePMD-kit version and backend.

For v3, backend is determined by testing which one is functional. Defaults to TensorFlow if detection fails. Results are cached.

Returns:: (major_version, backend) e.g., (3, ‘pytorch’) or (2, ‘tensorflow’)
Return type:: tuple

sparc.src.deepmd.setup_DeepPotential(atoms, model_path, model_name=None)[source]

Setup a DeepPotential calculator for an ASE atoms object.

Parameters:

atoms (ase.Atoms) – The atomic structure to assign the DeepPotential model to
model_path (str) – Path to the directory containing DeepPotential model
model_name (Optional[str]) – Name of the DeepPotential model file. If None, auto-detects based on version

Returns:

(dp_system, dp_calc) - ASE atoms object with calculator and the calculator object

Return type:

tuple

Raises:

FileNotFoundError – If model file is not found
Exception – If model setup or testing fails

sparc.src.deepmd.update_json(data, datadir, atom_types)[source]

Update the DeepMD input JSON configuration with random seeds and proper paths.

Parameters:

data (dict) – The loaded JSON configuration data
datadir (str) – Path to the directory containing training data
atom_types (List[str]) – List of atomic species in the system

Returns:

Updated JSON configuration

Return type:

dict

Graph Neural Network Potentials (GNN)

DeepMD-kit v3 (PyTorch backend) supports training graph neural network potentials via the deepmd-gnn plugin. MACE and NequIP models are trained through the same DeepMD-kit interface as standard DeePMD models — the only difference is the model.type field in input.json.

Because SPARC drives training by passing input.json directly to DeepMD-kit, MACE and NequIP are fully supported within the SPARC active learning workflow. Set mlip_setup.input_file to a MACE or NequIP input.json and the rest of the pipeline (training, freezing, ML-MD, and Query-by-Committee model deviation) works without any other changes.

Note

GNN backends (MACE, NequIP) require DeepMD-kit v3 with the PyTorch backend and the deepmd-gnn plugin. The recommended installation is via conda-forge:

conda install deepmd-gnn -c conda-forge

Alternatively, install from source (requires Python 3.9+, a C++ compiler supporting C++14/C++17, DeepMD-kit v3.0.0b2+, and PyTorch):

pip install deepmd-gnn

The TensorFlow backend (DeepMD-kit v2.x) does not support GNN models.

MACE

MACE is an equivariant GNN potential. Set model.type to "mace" in input.json:

{
  "model": {
    "type": "mace",
    "type_map": ["O", "H"],
    "r_max": 6.0,
    "sel": "auto",
    "hidden_irreps": "64x0e"
  },
  "learning_rate": {
    "type": "exp",
    "decay_steps": 5000,
    "start_lr": 0.001,
    "stop_lr": 3.51e-8
  },
  "loss": {
    "type": "ener",
    "start_pref_e": 0.02,
    "limit_pref_e": 1,
    "start_pref_f": 1000,
    "limit_pref_f": 1,
    "start_pref_v": 0,
    "limit_pref_v": 0
  },
  "training": {
    "training_data": {
      "systems": ["data/data_0/", "data/data_1/", "data/data_2/"],
      "batch_size": "auto"
    },
    "validation_data": {
      "systems": ["data/data_3"],
      "batch_size": 1,
      "numb_btch": 3
    },
    "numb_steps": 1000000,
    "seed": 10,
    "disp_file": "lcurve.out",
    "disp_freq": 100,
    "save_freq": 1000
  }
}

NequIP

NequIP is another equivariant GNN framework supported by deepmd-gnn. Set model.type to "nequip":

{
  "model": {
    "type": "nequip",
    "type_map": ["O", "H"],
    "r_max": 6.0,
    "sel": "auto",
    "l_max": 1
  },
  "learning_rate": {
    "type": "exp",
    "decay_steps": 5000,
    "start_lr": 0.001,
    "stop_lr": 3.51e-8
  },
  "loss": {
    "type": "ener",
    "start_pref_e": 0.02,
    "limit_pref_e": 1,
    "start_pref_f": 1000,
    "limit_pref_f": 1,
    "start_pref_v": 0,
    "limit_pref_v": 0
  },
  "training": {
    "training_data": {
      "systems": ["data/data_0/", "data/data_1/", "data/data_2/"],
      "batch_size": "auto"
    },
    "validation_data": {
      "systems": ["data/data_3"],
      "batch_size": 1,
      "numb_btch": 3
    },
    "numb_steps": 1000000,
    "seed": 10,
    "disp_file": "lcurve.out",
    "disp_freq": 100,
    "save_freq": 1000
  }
}

See the deepmd-gnn examples for full working input files.

References

For more details on DeepMD-Kit, visit: https://github.com/deepmodeling/deepmd-kit