DeepMD Setup
Overview
This module provides functionalities to setting up the training of DeepMD ML models,
freezing and compressing them, and evaluating their accuracy.
Inside each iter_00000xx directory a 01.train directory will be created, and based on the num_models
a subdirectory with name traininig_x will be created and the training will be performed.
Training will be done based on the user defined input_file, see MLIP Setup section.
Tip
Users are encouraged to train several models for better accuracy.
After training each directory should have a file frozen_model.pb.
These different models will be used to perform Query-by-committee to find the candidates for labelling.
Note
Query by committee (QbC): Identifies the configurations by measuring the disagreement among an ensemble of model. Allows the model to learn only what it needs to without wasting resources on redundant data. See also Deepmd-kit model deviation for more details.
If the candidates are found then inside the 02.dft directory a subdirecoty dft_candidates will be created
with a separate POSCAR for each candidates.
>>> tree dft_candidates
├── 0001
│ └── POSCAR
├── 0002
│ └── POSCAR
├── 0003
│ └── POSCAR
├── 0004
│ └── POSCAR
├── 0005
└── POSCAR
- Features:
Setup DeepPotential calculators for ASE atoms objects
Train DeepMD models with various configurations
Freeze and compress trained models
Evaluate trained models
Usage Example
Here is an example of how to use setup_DeepPotential to assign a DeepPotential calculator to an ASE atoms object:
from sparc.src.deepmd import setup_DeepPotential
from ase import Atoms
atoms = Atoms("H2O")
dp_system, dp_calc = setup_DeepPotential(atoms, "path/to/model")
print(dp_system.get_potential_energy())
Supported Backends
SPARC automatically detects the installed DeepMD-kit version and selects the appropriate backend at runtime:
Backend |
Version |
Notes |
|---|---|---|
DeepMD v2 (TensorFlow) |
deepmd-kit 2.x |
Trains standard DeePMD models; frozen model saved as |
DeepMD v3 (TensorFlow) |
deepmd-kit 3.x |
Trains DeePMD models with TF backend; frozen model saved as |
DeepMD v3 (PyTorch) |
deepmd-kit 3.x |
Trains DeePMD models with PyTorch backend; frozen model saved as |
GNN (MACE / NequIP) |
deepmd-kit 3.x + deepmd-gnn |
GNN potentials trained through the DeepMD-kit interface; frozen model saved as |
No changes to input.yaml or input.json are required when switching
versions — SPARC falls back automatically based on what is installed.
Note
Fine-tuning from pre-trained universal models (DPA-3) and GNN backends (MACE, NequIP) require DeepMD-kit v3 with the PyTorch backend. See Fine-Tuning Universal Models and Graph Neural Network Potentials (GNN) for details.
Module Contents
DeepMD module for SPARC package with DeePMD-kit v2/v3 support.
This module contains functions for: 1. Setting up DeepPotential calculators for ASE atoms objects 2. Training DeepMD models with TensorFlow or PyTorch backends 3. Model freezing and compression 4. Support for DeePMD-GNN (MACE, NequIP models)
Supports DeePMD-kit v2 and v3 with automatic backend detection.
- sparc.src.deepmd.deepmd_training(active_learning, datadir, atom_types, training_dir, num_models, input_file='input.json', compress_models=False)[source]
Train DeepMD models for molecular potential energy surface representation.
Supports both DeePMD-kit v2 (TensorFlow) and v3 (PyTorch/TensorFlow). Backend is automatically detected from the environment.
- Parameters:
active_learning (
bool) – Whether this training is part of an active learning cycledatadir (
str) – Path to directory containing training and validation dataatom_types (
List[str]) – List of atomic species in the systemtraining_dir (
str) – Path to the directory where models will be trainednum_models (
int) – Number of models to train (minimum: 2)input_file (
str) – Path to DeepMD input JSON file (default: ‘input.json’)compress_models (
bool) – Whether to compress trained models (default: True)
- Returns:
Name of the frozen model file
- Return type:
- Raises:
ValueError – If num_models < 2
FileNotFoundError – If input file not found
- sparc.src.deepmd.evaluate_model_accuracy(model_path, test_data_path, version, backend)[source]
Evaluate the accuracy of a trained DeepMD model against reference data.
- sparc.src.deepmd.get_backend()[source]
Detect which backend is functional in DeePMD-kit v3.
Both deepmd.pt and deepmd.tf modules may exist in the environment, but only one will actually work. We test which one can be imported. Defaults to TensorFlow if detection fails.
- Returns:
‘pytorch’ or ‘tensorflow’
- Return type:
- sparc.src.deepmd.get_version()[source]
Detect installed DeePMD-kit version and backend.
For v3, backend is determined by testing which one is functional. Defaults to TensorFlow if detection fails. Results are cached.
- Returns:
(major_version, backend) e.g., (3, ‘pytorch’) or (2, ‘tensorflow’)
- Return type:
- sparc.src.deepmd.setup_DeepPotential(atoms, model_path, model_name=None)[source]
Setup a DeepPotential calculator for an ASE atoms object.
- Parameters:
- Returns:
(dp_system, dp_calc) - ASE atoms object with calculator and the calculator object
- Return type:
- Raises:
FileNotFoundError – If model file is not found
Exception – If model setup or testing fails
Graph Neural Network Potentials (GNN)
DeepMD-kit v3 (PyTorch backend) supports training graph neural network
potentials via the
deepmd-gnn plugin.
MACE and NequIP models are trained through the same DeepMD-kit interface as
standard DeePMD models — the only difference is the model.type field in
input.json.
Because SPARC drives training by passing input.json directly to
DeepMD-kit, MACE and NequIP are fully supported within the SPARC active
learning workflow. Set mlip_setup.input_file to a MACE or NequIP
input.json and the rest of the pipeline (training, freezing, ML-MD,
and Query-by-Committee model deviation) works without any other changes.
Note
GNN backends (MACE, NequIP) require DeepMD-kit v3 with the PyTorch backend and the deepmd-gnn plugin. The recommended installation is via conda-forge:
conda install deepmd-gnn -c conda-forge
Alternatively, install from source (requires Python 3.9+, a C++ compiler supporting C++14/C++17, DeepMD-kit v3.0.0b2+, and PyTorch):
pip install deepmd-gnn
The TensorFlow backend (DeepMD-kit v2.x) does not support GNN models.
MACE
MACE is an equivariant GNN potential.
Set model.type to "mace" in input.json:
{
"model": {
"type": "mace",
"type_map": ["O", "H"],
"r_max": 6.0,
"sel": "auto",
"hidden_irreps": "64x0e"
},
"learning_rate": {
"type": "exp",
"decay_steps": 5000,
"start_lr": 0.001,
"stop_lr": 3.51e-8
},
"loss": {
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0
},
"training": {
"training_data": {
"systems": ["data/data_0/", "data/data_1/", "data/data_2/"],
"batch_size": "auto"
},
"validation_data": {
"systems": ["data/data_3"],
"batch_size": 1,
"numb_btch": 3
},
"numb_steps": 1000000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 1000
}
}
NequIP
NequIP is another equivariant GNN
framework supported by deepmd-gnn. Set model.type to "nequip":
{
"model": {
"type": "nequip",
"type_map": ["O", "H"],
"r_max": 6.0,
"sel": "auto",
"l_max": 1
},
"learning_rate": {
"type": "exp",
"decay_steps": 5000,
"start_lr": 0.001,
"stop_lr": 3.51e-8
},
"loss": {
"type": "ener",
"start_pref_e": 0.02,
"limit_pref_e": 1,
"start_pref_f": 1000,
"limit_pref_f": 1,
"start_pref_v": 0,
"limit_pref_v": 0
},
"training": {
"training_data": {
"systems": ["data/data_0/", "data/data_1/", "data/data_2/"],
"batch_size": "auto"
},
"validation_data": {
"systems": ["data/data_3"],
"batch_size": 1,
"numb_btch": 3
},
"numb_steps": 1000000,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 1000
}
}
See the deepmd-gnn examples for full working input files.
References
For more details on DeepMD-Kit, visit: https://github.com/deepmodeling/deepmd-kit