.. _workflow_overview: Workflow Overview ================= SPARC implements an active learning loop for machine learning interatomic potential (MLIP) development. Each iteration follows three stages that map directly to sub-directories created inside ``iter_xxxxxx/``: .. code-block:: text iter_000000/ ├── 00.dft/ ← Stage 1: DFT / AIMD labelling ├── 01.train/ ← Stage 2: MLIP training └── 02.dpmd/ ← Stage 3: ML-MD + Query-by-Committee .. image:: ../../_static/images/sparc_flowchart.png :alt: SPARC workflow diagram :width: 700px :align: center Stage 1 — DFT / AIMD (``00.dft``) ----------------------------------- Ab initio molecular dynamics (AIMD) or single-point DFT calculations are run to label new candidate structures. This stage is controlled by the ``aimd_setup`` and ``dft_calculator`` sections of ``input.yaml``. Set ``aimd_setup.steps: 0`` to skip AIMD entirely and jump straight to training. In this case, place pre-existing trajectory data as ``AseMD.traj`` (ASE trajectory format) inside the ``00.dft/`` directory of the current iteration before running. Stage 2 — MLIP Training (``01.train``) ---------------------------------------- ``num_models`` independent MLIP models are trained on the accumulated dataset. Each model is placed in its own ``training_x/`` sub-directory. Controlled by ``mlip_setup.training`` and ``mlip_setup.input_file``. For fine-tuning from a pre-trained foundation model instead of training from scratch, see :doc:`fine_tuning`. Stage 3 — ML-MD + Query-by-Committee (``02.dpmd``) ---------------------------------------------------- ML-driven molecular dynamics explores configuration space using the trained committee of models. The force deviation across models (``model_dev_*.out``) identifies uncertain structures as candidates for DFT relabelling in the next iteration. Controlled by the ``mlip_setup.MdSimulation`` block. How sections in ``input.yaml`` map to stages --------------------------------------------- .. list-table:: :header-rows: 1 :widths: 35 65 * - ``input.yaml`` section - Controls * - ``general`` - Input structure file(s) * - ``dft_calculator`` - DFT engine and template for Stage 1 * - ``aimd_setup`` - AIMD run in Stage 1 * - ``mlip_setup`` - Training (Stage 2) and ML-MD (Stage 3) * - ``finetune`` - Optional fine-tuning instead of from-scratch training (Stage 2) * - ``active_learning`` - Loop control: iterations, deviation thresholds * - ``distance_metrics`` - Optional geometry sanity checks during ML-MD * - ``output`` - Custom output filenames Loop termination ---------------- The loop runs for ``active_learning.iteration`` cycles. It also stops early if no candidate structures are found in a given cycle (the model has converged for the sampled region of phase space). To resume an interrupted run, set ``learning_restart: true`` and supply ``latest_model`` pointing to the last frozen model checkpoint.