Fine-Tuning vs. Training From Scratch

SPARC supports two MLIP training strategies. Choose based on how much DFT data you have and whether a suitable pre-trained foundation model exists.

Training from scratch (default)

SPARC trains num_models DeePMD models from random initialisation using the accumulated dataset. Weights are updated entirely from your DFT data.

Use this when:

  • No relevant pre-trained model exists for your chemical system

  • You have a large DFT dataset (typically > 500 frames)

  • Maximum control over the model architecture is required

Configuration — simply leave finetune.enabled: false (the default) and configure mlip_setup normally.

Fine-tuning a DeePMD universal model

Set finetune.enabled: true to initialise from a pre-trained DeePMD foundation model (DPA-1, DPA-2, or DPA-3). This requires DeePMD-kit v3 with the PyTorch backend.

Fine-tuning typically converges with far fewer DFT calculations than training from scratch — often 50–200 frames instead of 500+.

When to use fine-tuning

Situation

Recommendation

Small dataset (< 300 frames)

Fine-tuning converges faster

Inorganic materials

DPA-3 Omat24 branch is a good starting point

Organic / reactive systems

DPA-3 Organic_Reactions branch

Novel system, no related pre-trained model

Train from scratch

Using TF/TF2 DeePMD (v2.x)

Train from scratch (fine-tuning requires v3+)

For full configuration options, see Fine-Tuning Universal Models.