Temperature dependent force constants calculation using pypolymlp and symfc#

This is an experimental feature, and its usage may change occasionally.

Note

This feature is supported through the phonopy-load command but not the phonopy command.

With the --pypolymlp option, phonopy can interface with the polynomial machine learning potential (MLP) code, pypolymlp, to perform training and evaluation tasks of MLPs. This feature aims to reduce the computational cost of anharmonic force constant calculations by using MLPs as an intermediary layer, efficiently representing atomic interactions. The example is found at example/KCl-SSCHA.

The training process involves using a dataset consisting of supercell displacements, forces, and energies. The trained MLPs are then employed to compute forces for supercells with specific displacements.

For further details on combining phonopy calculations with pypolymlp, refer to A. Togo and A. Seko, J. Chem. Phys. 160, 211001 (2024) [doi] [arxiv].

Using the polynomial MLPs, stochastic self-consistent harmonic approximation (SSCHA) calculation is performed in the following sections. By this, temperature dependent force constants are calculated within SSCHA. About SSCHA, please refer the paper by L. Monacelli et al., J. Phys.: Condens. Matter 33 363001 (2021) and A. van Roekeghem A, et al., Comput. Phys. Commun. 263 107945 (2021). Technically, the computational procedure introduced here is equivalent to the approach of the latter paper.

Citation of pypolymlp#

“Tutorial: Systematic development of polynomial machine learning potentials for elemental and alloy systems”, A. Seko, J. Appl. Phys. 133, 011101 (2023) [doi].

@article{pypolymlp,
  author = {Seko, Atsuto},
  title = "{"Tutorial: Systematic development of polynomial machine learning potentials for elemental and alloy systems"}",
  journal = {J. Appl. Phys.},
  volume = {133},
  number = {1},
  pages = {011101},
  year = {2023},
  month = {01},
}

Citation of symfc#

“Projector-based efficient estimation of force constants”, A. Seko and A. Togo, Phys. Rev. B, 110, 214302 (2024) [doi] [arxiv].

@article{PhysRevB.110.214302,
  title = {Projector-based efficient estimation of force constants},
  author = {Seko, Atsuto and Togo, Atsushi},
  journal = {Phys. Rev. B},
  volume = {110},
  issue = {21},
  pages = {214302},
  numpages = {18},
  year = {2024},
  month = {Dec},
}

Requirements#

  • pypolymlp >= 0.4.6

    For linux (x86-64), a compiled package of pypolymlp can be installed via conda-forge (recommended). Otherwise, pypolymlp can be installed from source-code.

  • symfc >= 1.1.7

How to calculate#

Workflow#

  1. Generate random displacements in supercells. Use –rd option.

  2. Calculate corresponding forces and energies in supercells. Use of VASP interface is recommended for –sp option is supported.

  3. Prepare dataset composed of displacements, forces, and energies in supercells. The dataset must be stored in a phonopy-yaml-like file, e.g., phonopy_params.yaml. Use -f and –sp option simultaneously.

  4. Develop MLPs. By default, 90 and 10 percents of the dataset are used for the training and test, respectively. At this step phonopy.pmlp is saved.

  5. Generate displacements in supercells either systematic or random displacements.

  6. Evaluate MLPs for forces of the supercells generated in step 5.

  7. Calculate force constants from displacement-force dataset from steps 5 and 6.

  8. Temperature dependent force constants calculation

The steps 4-7 are executed in running phonopy with --pypolymlp option.

Steps 1-3: Dataset preparation#

For the training, the following supercell data are required in the phonopy setting to use pypolymlp:

  • Displacements

  • Forces

  • Total energies

These data must be stored in phonopy.yaml-like file.

The supercells with displacements are generated by

% phonopy --pa auto --rd 1000 -c POSCAR-unitcell --dim 2 2 2 --amin 0.03 --amax 1.5
        _
  _ __ | |__   ___  _ __   ___   _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ | '_ \| | | |
 | |_) | | | | (_) | | | | (_) || |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___(_) .__/ \__, |
 |_|                            |_|    |___/
                                      2.31.1

Compiled with OpenMP support (max 10 threads).
Python version 3.12.6
Spglib version 2.5.0

Crystal structure was read from "POSCAR-unitcell".
Unit of length: angstrom
Displacements creation mode
  Number of supercells with random displacements: 1000
  Min displacement distance: 0.03
  Max displacement distance: 1.5
Settings:
  Supercell: [2 2 2]
  Primitive matrix (Auto):
    [0.  0.5 0.5]
    [0.5 0.  0.5]
    [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Number of symmetry operations in supercell: 1536
Use -v option to watch primitive cell, unit cell, and supercell structures.

"phonopy_disp.yaml" and supercells have been created.

Summary of calculation was written in "phonopy_disp.yaml".
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|

For the generated supercells, forces and energies are calculated. Here it is assumed to use the VASP code. Once the calculations are complete, the data (forces and energies) can be extracted using the following command:

% phonopy --sp -f vasprun_xmls/vasprun-{001..120}.xml

This command extracts the necessary data and stores it in the phonopy_params.yaml file. For more details, refer to the description of the –sp option. Currently, supercell energy extraction from calculator outputs is only supported when using the VASP interface.

Steps 4-7: Force constants calculation (systematic displacements in step 5)#

After developing MLPs, displacements are generated systematically considering crystal symmetry.

Having phonopy_params.yaml, phonopy is executed with --pypolymlp option,

% phonopy-load phonopy_mlpsscha_params_KCl-120.yaml.xz --pypolymlp --mlp-params="ntrain=100, ntest=20"
        _
  _ __ | |__   ___  _ __   ___   _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ | '_ \| | | |
 | |_) | | | | (_) | | | | (_) || |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___(_) .__/ \__, |
 |_|                            |_|    |___/
                                      2.31.2

Compiled with OpenMP support (max 10 threads).
Running in phonopy.load mode.
Python version 3.12.3
Spglib version 2.5.0

Crystal structure was read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
Unit of length: angstrom
Settings:
  Supercell: [2 2 2]
  Primitive matrix:
    [0.  0.5 0.5]
    [0.5 0.  0.5]
    [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Number of symmetry operations in supercell: 1536
Use -v option to watch primitive cell, unit cell, and supercell structures.

NAC parameters were read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
Displacement-force dataset was read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
----------------------------- pypolymlp start ------------------------------
Pypolymlp is a generator of polynomial machine learning potentials.
Please cite the paper: A. Seko, J. Appl. Phys. 133, 011101 (2023).
Pypolymlp is developed at https://github.com/sekocha/pypolymlp.
Parameters:
  cutoff: 8.0
  model_type: 3
  max_p: 2
  gtinv_order: 3
  gtinv_maxl: (8, 8)
  gaussian_params1: (1.0, 1.0, 1)
  gaussian_params2: (0.0, 7.0, 10)
  ntrain: 100
  ntest: 20
Developing MLPs by pypolymlp...
Regression: cholesky decomposition ...
- alpha: 0.001
- alpha: 0.01
- alpha: 0.1
- alpha: 1.0
- alpha: 10.0
Clear training X.T @ X
Calculate X.T @ X for test data
Clear test X.T @ X
Regression: model selection ...
- alpha = 1.000e-03 : rmse (train, test) = 0.02432 0.23669
- alpha = 1.000e-02 : rmse (train, test) = 0.03613 0.16766
- alpha = 1.000e-01 : rmse (train, test) = 0.07193 0.22140
- alpha = 1.000e+00 : rmse (train, test) = 0.11563 0.26042
- alpha = 1.000e+01 : rmse (train, test) = 0.19375 0.31767
MLPs were written into "phonopy.pmlp"
------------------------------ pypolymlp end -------------------------------
Generate displacements
  Displacement distance: 0.001
Evaluate forces in 4 supercells by pypolymlp
-------------------------------- Symfc start -------------------------------
Symfc is a force constants calculator. See the following paper:
A. Seko and A. Togo, arXiv:2403.03588.
Symfc is developed at https://github.com/symfc/symfc.
Computing [2] order force constants.
Increase log-level to watch detailed symfc log.
--------------------------------- Symfc end --------------------------------
Max drift of force constants: -0.000000 (yy) -0.000000 (yy)

----------------------------------------------------------------------------
 One of the following run modes may be specified for phonon calculations.
 - Mesh sampling (MESH, --mesh)
 - Q-points (QPOINTS, --qpoints)
 - Band structure (BAND, --band)
 - Animation (ANIME, --anime)
 - Modulation (MODULATION, --modulation)
 - Characters of Irreps (IRREPS, --irreps)
 - Create displacements (CREATE_DISPLACEMENTS, -d)
----------------------------------------------------------------------------
Dataset generated using MMLPs was written in "phonopy_mlp_eval_dataset.yaml".

Summary of calculation was written in "phonopy.yaml".
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|

Information about the development of MLPs using pypolymlp is provided between the pypolymlp start and pypolymlp end sections. The polynomial MLPs are saved in the phonopy.pmlp file. This file is automatically searched in subsequent phonopy executions with the --pypolymlp option and reused.

After the MLPs are developed, systematic displacements, such as those involving the displacement of one or two atoms in supercells, are generated with a displacement distance of 0.001 Angstrom. The forces for these supercells are then evaluated using pypolymlp. Both the generated displacements and the corresponding forces are stored in the phonopy_mlp_eval_dataset file. The calculated force constants may be refered as the harmonic force constants.

Step 8: Temperature dependent force constants calculation#

After the last step, the phonopy.pmlp file exists in the current directory. This file is read automatically in the next calculation with the --pypolymlp option. If the developed MLPs can predict well forces at relatively large displacements, temperature dependent force constants are calculated with the --sscha NUMBER_OF_ITERATIONS option.

% phonopy-load phonopy_mlpsscha_params_KCl-120.yaml.xz --pypolymlp --sscha 10 --rd-temperature 300 --rd 1000
        _
  _ __ | |__   ___  _ __   ___   _ __  _   _
 | '_ \| '_ \ / _ \| '_ \ / _ \ | '_ \| | | |
 | |_) | | | | (_) | | | | (_) || |_) | |_| |
 | .__/|_| |_|\___/|_| |_|\___(_) .__/ \__, |
 |_|                            |_|    |___/
                                      2.31.2

Compiled with OpenMP support (max 10 threads).
Running in phonopy.load mode.
Python version 3.12.3
Spglib version 2.5.0

Crystal structure was read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
Unit of length: angstrom
Displacements creation mode
  Number of supercells with random displacements: 1000
  Temperatuere to generate random displacements: 300.0
Settings:
  Supercell: [2 2 2]
  Primitive matrix:
    [0.  0.5 0.5]
    [0.5 0.  0.5]
    [0.5 0.5 0. ]
Spacegroup: Fm-3m (225)
Number of symmetry operations in supercell: 1536
Use -v option to watch primitive cell, unit cell, and supercell structures.

NAC parameters were read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
Displacement-force dataset was read from "phonopy_mlpsscha_params_KCl-120.yaml.xz".
----------------------------- pypolymlp start ------------------------------
Pypolymlp is a generator of polynomial machine learning potentials.
Please cite the paper: A. Seko, J. Appl. Phys. 133, 011101 (2023).
Pypolymlp is developed at https://github.com/sekocha/pypolymlp.
Load MLPs from "phonopy.pmlp".
------------------------------ pypolymlp end -------------------------------
------------------------------- SSCHA start --------------------------------
[ SSCHA initialization (rd=0.001, n_supercells=1000) ]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -102.786 meV
Initial force constants are written into "phonopy_sscha_fc_0.yaml.xz".

[ SSCHA iteration 1 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.010, 0.084] ****
  [0.084, 0.159] ******************
  [0.159, 0.234] ***************************
  [0.234, 0.309] *************************
  [0.309, 0.383] ****************
  [0.383, 0.458] *******
  [0.458, 0.533] **
  [0.533, 0.608] *
  [0.608, 0.683]
  [0.683, 0.757]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.205 meV
SSCHA force constants are written into "phonopy_sscha_fc_1.yaml.xz".

[ SSCHA iteration 2 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.007, 0.094] *****
  [0.094, 0.181] ***********************
  [0.181, 0.267] ********************************
  [0.267, 0.354] ************************
  [0.354, 0.440] ***********
  [0.440, 0.527] ***
  [0.527, 0.613] *
  [0.613, 0.700]
  [0.700, 0.787]
  [0.787, 0.873]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.221 meV
SSCHA force constants are written into "phonopy_sscha_fc_2.yaml.xz".

...

[ SSCHA iteration 10 / 10 ]
Generate 1000 supercells with displacements at 300.0 K
  [0.006, 0.089] *****
  [0.089, 0.171] *********************
  [0.171, 0.253] *******************************
  [0.253, 0.336] *************************
  [0.336, 0.418] *************
  [0.418, 0.500] ****
  [0.500, 0.583] *
  [0.583, 0.665]
  [0.665, 0.747]
  [0.747, 0.830]
Evaluate MLP to obtain forces using pypolymlp
Calculate force constants using symfc
SSCHA free energy: -98.122 meV
SSCHA force constants are written into "phonopy_sscha_fc_10.yaml.xz".

-------------------------------- SSCHA end ---------------------------------

----------------------------------------------------------------------------
 One of the following run modes may be specified for phonon calculations.
 - Mesh sampling (MESH, --mesh)
 - Q-points (QPOINTS, --qpoints)
 - Band structure (BAND, --band)
 - Animation (ANIME, --anime)
 - Modulation (MODULATION, --modulation)
 - Characters of Irreps (IRREPS, --irreps)
 - Create displacements (CREATE_DISPLACEMENTS, -d)
----------------------------------------------------------------------------

Summary of calculation was written in "phonopy.yaml".
                 _
   ___ _ __   __| |
  / _ \ '_ \ / _` |
 |  __/ | | | (_| |
  \___|_| |_|\__,_|

The final force constants are stored in files named phonopy_sscha_fc_NUM.yaml.xz, where NUM represents the integer corresponding to the iteration step. By performing a sufficient number of SSCHA iterations and utilizing a sufficiently large set of supercells with random displacements at a given temperature, the SSCHA force constants can be reliably determined. The convergence of these force constants can be monitored through the SSCHA free energy. Additionally, convergence can be assessed by plotting the phonon band structures corresponding to the SSCHA force constants at the iteration steps. For example:

% for i in {0..10}; do phonopy-load phonopy_sscha_fc_$i.yaml.xz --band auto --band-points 101; mv band.yaml band-$i.yaml; done
% phonopy-bandplot band-{0..10}.yaml --legend

Parameters for developing MLPs#

A few parameters can be specified using the --mlp-params option for the development of MLPs. The parameters are provided as a string, e.g.,

% phonopy-load phonopy_params.yaml --pypolymlp --mlp-params="ntrain=80, ntest=20"

Parameters are separated by commas for configuration. A brief explanation of the available parameters can be found in the docstring of PypolymlpParams that is found by

In [1]: from phonopy.interface.pypolymlp import PypolymlpParams

In [2]: help(PypolymlpParams)

ntrain and ntest are implemented in phonopy, while the remaining parameters are directly passed to pypolymlp. Optimizing pypolymlp parameters can be difficult, both in terms of achieving accuracy and managing the computational resources required. The current default parameters are likely suitable for systems up to ternary compounds. For binary systems, the calculations can generally be run on standard laptop computers, but for ternary systems, around 40 GB of memory or more may be necessary.

For parameter adjustments, it is recommended to consult the pypolymlp documentation and review the relevant research papers.

ntrain and ntest#

This method provides a straightforward dataset split: the first ntrain supercells from the list are used for training, while the last ntest supercells are reserved for testing.

Convergence with respect to dataset size#

In general, increasing the amount of data improves the accuracy of representing force constants. Therefore, it is recommended to check the convergence of the target property with respect to the number of supercells in the training dataset. Lattice thermal conductivity may be a convenient property to monitor when assessing convergence.

For example, by preparing an initial set with 100 supercell data, calculations can then be performed by varying the size of the training dataset while keeping the test dataset unchanged as follows:

% phonopy-load --pypolymlp --mlp-params="ntrain=20, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-20
% phonopy-load --pypolymlp --mlp-params="ntrain=40, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-40
% phonopy-load --pypolymlp --mlp-params="ntrain=60, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-60
% phonopy-load --pypolymlp --mlp-params="ntrain=80, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-80
% phonopy-load --pypolymlp --mlp-params="ntrain=100, ntest=20" --br --mesh 40 phonopy_params.yaml | tee log-100

The computed phonon band structures are plotted against the size of the training dataset to observe the frequency convergence. If it has not converged, an additional set of supercell data (e.g., forces and energies in the next 100 supercells) will be computed and included. With this procedure in mind, it may be convenient to generate a sufficiently large number of supercells with random displacements in advance, such as 1000 supercells, before starting the temperature dependent force constants calculation with pypolymlp.