Quick Start Guide
Welcome to the Quick Start Guide for SPARC. This guide walks you through the basic setup, configuration, and execution steps to run your first simulation.
Set Environment Variables
export VASP_PP_PATH=/path/to/vasp/potcar_files # POTCAR files path (VASP only)
If PLUMED was installed from source (skip if you used conda-forge):
export PLUMED_KERNEL="$CONDA_PREFIX/lib/libplumedKernel.so"
export PYTHONPATH="$CONDA_PREFIX/lib/plumed/python:$PYTHONPATH"
Basic Usage
SPARC requires a YAML input file that defines the structure, DFT
calculator, MD settings, and active learning parameters.
Example Input File
The example below shows a complete active learning workflow using VASP:
general:
structure_file: "POSCAR"
dft_calculator:
engine: "VASP"
template_file: "INCAR"
exe_command: "mpirun -np 4 vasp_std"
aimd_setup:
ensemble: "NVT"
temperature: 300.0
timestep_fs: 1.0
steps: 100
thermostat:
type: "Nose"
tdamp: 2.0
mlip_setup:
training: true
data_dir: "Dataset"
input_file: "input.json"
num_models: 4
MdSimulation: true
ensemble: "NVT"
temperature: 300.0
timestep_fs: 1.0
md_steps: 2000
train_ratio: 0.8
active_learning: true
iteration: 10
model_dev:
f_min_dev: 0.05
f_max_dev: 0.30
See Input File for a full description of all available options.
Running a Simulation
sparc -i input.yaml
Directory Structure
After the first iteration the following layout is created:
Project Root/
├── POSCAR
├── INCAR
├── input.json
├── input.yaml
├── Dataset/
│ ├── training_data/
│ └── validation_data/
├── iter_000000/
│ ├── 00.dft/ DFT / AIMD labelling
│ ├── 01.train/ MLIP training
│ │ ├── training_1/
│ │ ├── training_2/
│ │ └── ...
│ └── 02.dpmd/ ML-MD + model deviation
├── iter_000001/
│ └── ...
00.dft/— DFT calculations used to label selected structures01.train/— ML model training; onetraining_N/folder per model02.dpmd/— ML-MD simulation and Query-by-Committee model deviation
Sample Output (Sparc.log)
================================================================================
BEGIN CALCULATION - 2025-04-08 22:30:32
================================================================================
###### ######## ### ######## ######
## ## ## ## ## ## ## ## ## ##
## ## ## ## ## ## ## ##
###### ######## ## ## ######## ##
## ## ######### ## ## ##
## ## ## ## ## ## ## ## ##
###### ## ## ## ## ## ######
--v0.2.0
================================================================================
Creating Directories for Iteration: 000000
================================================================================
├── iter_000000
│ ├── 00.dft
│ ├── 01.train
│ └── 02.dpmd
================================================================================
Starting AIMD Simulation [Nose-Hoover]
================================================================================
Step Epot (eV) Ekin (eV) Temp (K)
--------------------------------------------------------------------------------
0 -36.0932 0.3102 300.00
1 -36.1182 0.4385 424.04
2 -36.1058 0.4062 392.84
================================================================================
MLIP Training — 4 models
================================================================================
RUNNING TRAINING IN FOLDER (iter_000000/01.train/training_1)
...
frozen_model_1.pth saved
================================================================================
Starting ML-MD Simulation
================================================================================
Step Epot (eV) Ekin (eV) Temp (K)
--------------------------------------------------------------------------------
0 -29.8049 0.1939 300.00
5 -29.7611 0.1458 225.61
10 -29.7915 0.1711 264.75
Core Components
1. MD Simulation
NVE, NVT (Nose-Hoover / Langevin), and NPT (Berendsen) ensembles
Supports both ab initio (VASP, CP2K, ORCA, QE, xTB, Gaussian) and ML-MD
Checkpoint/restart capabilities
PLUMED integration for enhanced sampling (Metadynamics, Umbrella Sampling)
2. MLIP Training
Automated DeepMD-kit training pipeline
Ensemble model generation for uncertainty quantification
Fine-tuning of universal potentials (DPA-3, MACE-MP) from a pre-trained checkpoint
3. Active Learning
Query-by-Committee (QbC) for candidate selection based on force deviation
RMSD-based duplicate filtering for diverse training data
Automated DFT labelling and model retraining
fparamsupport for universal models (e.g., DPA-3)