Active Learning
SPARC uses a Query-by-Committee (QbC) strategy to select which structures
to send for DFT labelling. An ensemble of num_models independently trained
models is used; configurations where the models disagree are the most
informative for improving the potential.
Force deviation thresholds
The key tuning parameters are in the model_dev block:
model_dev:
f_min_dev: 0.1 # eV/Å — lower bound
f_max_dev: 0.8 # eV/Å — upper bound
The outcome for each frame depends on its maximum atomic force deviation across the model committee:
Deviation range |
Outcome |
|---|---|
Below |
Model is confident — frame discarded (already well described) |
Between bounds |
Selected as candidate for DFT labelling |
Above |
Model has no knowledge — frame discarded (likely unphysical) |
Typical starting values: f_min_dev: 0.05–0.1 eV/Å,
f_max_dev: 0.3–0.8 eV/Å. Tighten the range in later iterations as the
model matures.
Number of models
mlip_setup.num_models controls committee size. A minimum of 2 is required;
4 is recommended for reliable uncertainty estimates. More models increase
training cost but improve candidate selection quality.
Restarting an interrupted run
If the workflow is killed mid-run, set:
active_learning: true
learning_restart: true
latest_model: "iter_000005/01.train/training_1/frozen_model_1.pb"
SPARC will skip already-completed iterations and resume from the checkpoint.
Restart exploration
To start the next ML-MD from a specific saved frame (e.g., a candidate structure) rather than the initial structure, use:
mlip_setup:
restart_exploration: true
restart_frame: "candidates" # use last saved candidate frame
This is useful for directed exploration of transition-state regions.