.. _active_learning_guide: Active Learning =============== SPARC uses a **Query-by-Committee (QbC)** strategy to select which structures to send for DFT labelling. An ensemble of ``num_models`` independently trained models is used; configurations where the models *disagree* are the most informative for improving the potential. Force deviation thresholds -------------------------- The key tuning parameters are in the ``model_dev`` block: .. code-block:: yaml model_dev: f_min_dev: 0.1 # eV/Å — lower bound f_max_dev: 0.8 # eV/Å — upper bound The outcome for each frame depends on its maximum atomic force deviation across the model committee: .. list-table:: :header-rows: 1 :widths: 35 65 * - Deviation range - Outcome * - Below ``f_min_dev`` - Model is confident — frame discarded (already well described) * - Between bounds - **Selected as candidate** for DFT labelling * - Above ``f_max_dev`` - Model has no knowledge — frame discarded (likely unphysical) Typical starting values: ``f_min_dev: 0.05–0.1`` eV/Å, ``f_max_dev: 0.3–0.8`` eV/Å. Tighten the range in later iterations as the model matures. Number of models ---------------- ``mlip_setup.num_models`` controls committee size. A minimum of 2 is required; 4 is recommended for reliable uncertainty estimates. More models increase training cost but improve candidate selection quality. Restarting an interrupted run ------------------------------ If the workflow is killed mid-run, set: .. code-block:: yaml active_learning: true learning_restart: true latest_model: "iter_000005/01.train/training_1/frozen_model_1.pb" SPARC will skip already-completed iterations and resume from the checkpoint. Restart exploration ------------------- To start the next ML-MD from a specific saved frame (e.g., a candidate structure) rather than the initial structure, use: .. code-block:: yaml mlip_setup: restart_exploration: true restart_frame: "candidates" # use last saved candidate frame This is useful for directed exploration of transition-state regions.