3. Creating an RPMDrate input file

The format of RPMDrate input files is based on Python syntax. In fact, the RPMDrate input file is a valid Python source code, and this is used to facilitate reading of the file.

Each section is made up of one or more function calls, where parameters are specified as text strings, numbers, or objects. Note that text strings must be wrapped in either single or double quotes.

3.1. Define the potential energy surface

The RPMDrate code must be provided with an external function that computes the Born-Oppenheimer potential energy with gradients for a given geometry defined in the atomic Cartesian coordinates. In principle, this can be any Python callable object of the form

V, dVdq = get_potential(q)

where the lone parameter q is the \(3 \times N_\mathrm{atoms} \times N_\mathrm{beads}\) array of Cartesian coordinates of \(N_\mathrm{beads}\) beads of \(N_\mathrm{atoms}\) atoms, and the outputs are the corresponding potentials and forces. The input file requires the name to be get_potential. Input and output parameters should be in atomic units.

In practice, evaluating the potential energy and gradients is computationally expensive, and so the global potential energy function is likely to be prepared as an analytic function evaluated by subroutines written in another language (such as Fortran) and must therefore be exposed to Python. The process of doing this is explained in the next chapter Exposing an external potential to Python.

If your implementation of the potential function requires any initialization, you can also provide a Python function initialize_potential for this purpose. This function is assumed not to accept any parameters, and any return value is ignored. If the initialize_potential identifier is not found, the code will assume that no initialization of the potential is required.

A recommended practice is to define the potential function (and its initialization subroutine, if applicable) in a separate module PES placed in the same directory as the input file. If this is done correctly, then the input file need only import them:

from PES import get_potential, initialize_potential

If no initialization subroutine is required, then the syntax is

from PES import get_potential

3.2. Define the bimolecular reactants

The next step is to provide information about the reactants. This is done by calling the reactants() function with the following arguments, all of which are required:

  • atoms - A list of the atom symbols for all atoms in the reactant pair. RPMDrate recognizes a variety of atom symbols; the full list is available in the rpmd.element module.
  • reactant1Atoms - A list of the indices of each atom in the first reactant molecule. The indices correspond to those of the atoms list, except that the first index is one, not zero.
  • reactant2Atoms - A list of the indices of each atom in the second reactant molecule. The indices correspond to those of the atoms list, except that the first index is one, not zero.
  • Rinf - The center-of-mass distance which should be chosen to be sufficiently large as to make interaction between the reactant molecules negligible. This is specified using a 2-tuple of the form (value,units), where value is a number and units is a string with the corresponding units (of length). RPMDrate can handle a variety of input units for length, including m, angstrom, and bohr (atomic units).

The following example illustrates the structure of the reactants() function for the bimolecular reaction \(\mathrm{CH_4 + H} \rightarrow \mathrm{CH_3 + H_2}\):

reactants(
    atoms = ['H', 'C', 'H', 'H', 'H', 'H'],
    reactant1Atoms = [1,2,3,4,5],
    reactant2Atoms = [6],
    Rinf = (15,"angstrom"),
)

3.3. Define the transition state dividing surface

Next step is to define parameters of the transition state dividing surface. The easiest way to do this is to specify a transition state geometry and lists of the bonds that are forming and breaking. RPMDrate will use this information to extract the relevant transition state bond lengths to use in the transition state dividing surface. Note that the final RPMD rate coefficient is independent of the choice of the initial transition state configuration. However, it is computationally advantageous to provide a reasonable configuration so as to minimize the recrossings of the transition state dividing surface.

  • The geometry is specified as a 2-tuple of the form (value,units), where value is a \(N_\mathrm{atoms} \times 3\) matrix (list of lists) of coordinates and units is a string with the corresponding units (of length). The coordinates must be given in the same order as the list of atom symbols atoms was given in the reactants() block.
  • The formingBonds and breakingBonds are lists of 2-tuples containing the indices of the atoms involved in each forming or breaking bond. As before, the indices correspond to the list of atom symbols atoms was given in the reactants() block, and the indices start from one instead of zero.

For the \(\mathrm{CH_4 + H} \rightarrow \mathrm{CH_3 + H_2}\) reaction, the transition state can be defined as follows:

transitionState(
    geometry = (
        [[-4.68413503,   -0.43825460,   -0.07250839],
         [-5.04748906,    0.58950601,   -0.07250840],
         [-4.68411607,    1.10337961,    0.81755453],
         [-4.58404767,    1.24489401,   -1.20768359],
         [-6.13758906,    0.58951941,   -0.07250839],
         [-4.29480419,    1.61876150,   -1.94140095]],
        "angstrom",
    ),
    formingBonds = [(4,6)],
    breakingBonds = [(2,4)],
)

In our example, the H radical can abstract any of the H atoms from methane. These equivalent transition states can be added with one or more equivalentTransitionState() blocks. In these blocks you need only specify the forming and breaking bonds in an order corresponding to that of the main transitionState() block. RPMDrate will automatically copy the relevant transition state distances from the main transition state to use in each of the equivalent transition states.

In our example we must add three equivalent transition states to obtain the correct reaction-path degeneracy of four:

equivalentTransitionState(
    formingBonds=[(1,6)],
    breakingBonds=[(2,1)],
)
equivalentTransitionState(
    formingBonds=[(3,6)],
    breakingBonds=[(2,3)],
)
equivalentTransitionState(
    formingBonds=[(5,6)],
    breakingBonds=[(2,5)],
)

Some systems will have several combinations of breaking and forming bonds. In this case, you need to specify all these combinations in the formingBonds and breakingBonds lists. For instance, in the case of 3 bonds it has the following form:

formingBonds=[(A1,A2),(B1,B2),(C1,C2)],
breakingBonds=[(A3,A1),(B3,B1),(C3,C1)]

3.4. Define the thermostat

RPMDrate provides two options for temperature control of time-independent part of molecular dynamics simulations: Andersen thermostatting scheme and colored noise, generalized Langevin equation (GLE) thermostats.

3.4.1. Andersen thermostat

The Andersen thermostat is the simplest stochastic thermostat which does correctly sample the NVT ensemble. In this method, atoms at each integration step are subject to a small probability to experience collision with the heat bath, which is simulated by resampling the momenta from a Boltzmann distribution at the desired temperature. The following sets up an Andersen thermostat with an automatically-determined sampling time:

thermostat('Andersen')

The user also has an option to fix the time between collisions using the samplingTime parameter. This parameter is given as a 2-tuple of the form (value,units), where value is a number and units is a string with the corresponding units (of time). RPMDrate can handle a variety of input units for time, including s, ms, us, ns, ps, and fs. An example of fixing the sampling time is given below:

thermostat('Andersen', samplingTime=(100,'fs'))

3.4.2. GLE thermostats

The more elaborate colored-noise, generalized Langevin equation (GLE) thermostats can be used to enhance canonical sampling and to reduce the number of ring polymer beads required to achieve convergence in path integral dynamics. These thermostats require more effort to set up but also can provide much faster convergence for complex polyatomic systems. To use these thermostats, user must provide the drift matrix \(A\) in a separate text file. For example, the following sets up a GLE thermostat with the drift matrix read from the file gle_A.txt in the same directory as the input file:

thermostat('GLE', A=('gle_A.txt','s^-1')).

If non-FDT (fluctuation-dissipation theorem) dynamics is to be performed, a similar diffusion matrix \(C\) is required, containing the covariance matrix. In this case the input would look similar to

thermostat('GLE', A=('gle_A.txt','s^-1'), C=('gle_C_1000.txt','K')).

Here A and C are 2-tuples of the form (file,units), where file is the name of the file containing the corresponding matrix and units are the corresponding units of inverse time (matrix \(A\)) and energy (matrix \(C\)). RPMDrate can handle a variety of input units for inverse time, including s^-1, ps^-1, and fs^-1. The diffusion matrix \(C\) should be given in units of K (Kelvin).

The files containing the \(A\) and \(C\) matrices should be structured such that each line contains one row of the matrix, with each element in that row separated by whitespace. The hash character # can be used to indicate comment lines.

An easy way to generate the \(A\) and \(C\) matrices is to use the online form at the GLE4MD web page. On this page, select Input and then fill out the rest of the form with the values for your system. Be sure to check that the output units for the two matrices are the same as those indicated in the RPMDrate input file.

3.5. Generate initial umbrella configurations

At this point the reactive system is successfully set up. In the remainder of the RPMDrate input file we specify the parameters of the various steps involved in computing the ring polymer rate coefficient.

The first step is to generate the initial (classical, i.e. with number of beads set to 1) configurations required for umbrella sampling in each biasing window. This can accomplished by a brief biased sampling in each window using only one bead, using the result from the previous window as the initial configuration for the next window (starting from the initial transition state configuration). The resulting configurations can be reused in subsequent calculations (e.g., for different number of beads).

The generation of the umbrella configurations is invoked using a generateUmbrellaConfigurations() block, which accepts the following parameters:

  • dt - The evolution time step, as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).
  • evolutionTime - The total length of each trajectory, as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).
  • xi_list - A list, tuple, or array containing the centers of windows centers placed along the reaction coordinate.
  • kforce - The value of the umbrella force constant which defines the strength of the bias in the window, in atomic units.

An example of a generateUmbrellaConfigurations() block for windows evenly spaced in the interval between -0.05 and 1.05 along the reaction coordinate with a step of 0.01 is given below:

generateUmbrellaConfigurations(
    dt = (0.0001,"ps"),
    evolutionTime = (5,"ps"),
    xi_list = numpy.arange(-0.05, 1.05, 0.01),
    kforce = 0.1 * T,
)

3.6. Define the umbrella integration parameters

The next step is to define the umbrella integration parameters. As the RPMDrate input file is a Python script, this is very easily done with a small bit of Python code. The idea is to create a list of the windows, each one a Window() object with the following parameters:

  • xi - The position of the center of the window.
  • kforce - The value of the umbrella force constant which defines the strength of the bias in the window, in atomic units.
  • trajectories - The number of biased sampling trajectories to run for the window. These trajectores can be run in parallel (see Running RPMDrate).
  • equilibrationTime - The equilibration period for each trajectory before sampling is initiated. This value is specified as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).
  • evolutionTime - The total length of each trajectory, as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).

An example of code that generates a set of umbrella integration windows is given below:

windows = []
for xi in numpy.arange(-0.05, 1.05, 0.01):
    window = Window(xi=xi, kforce=0.1*T, trajectories=100, equilibrationTime=(20,"ps"),
    evolutionTime=(100,"ps"))
    windows.append(window)

Note that you do not necessarily have to space out the windows evenly, or to use the same force constant or sampling times in each window. For example, the following code uses a larger reaction coordinate step size and weaker bias at lower values of xi:

windows = []
for xi in numpy.arange(-0.05, 0.25, 0.05):
    window = Window(xi=xi, kforce=0.03*T, trajectories=100, equilibrationTime=(20,"ps"),
    evolutionTime=(100,"ps"))
    windows.append(window)
for xi in numpy.arange(-0.25, 0.55, 0.03):
    window = Window(xi=xi, kforce=0.05*T, trajectories=100, equilibrationTime=(20,"ps"),
    evolutionTime=(100,"ps"))
    windows.append(window)
for xi in numpy.arange( 0.55, 1.15, 0.01):
    window = Window(xi=xi, kforce=0.1*T, trajectories=100, equilibrationTime=(20,"ps"),
    evolutionTime=(100,"ps"))
    windows.append(window)

3.7. Biased sampling

Once we have defined the biasing windows, we can conduct the biased sampling via a conductUmbrellaSampling() block, which accepts the following parameters:

  • dt - The evolution time step, as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).
  • windows - The list of umbrella integration windows, as generated in the previous section.

An example of a conductUmbrellaSampling() block is given below:

conductUmbrellaSampling(
    dt = (0.0001,"ps"),
    windows = windows,
)

For each window, the mean \(\overline{\xi }\) and variance \(\sigma_\xi^2\) of the reaction coordinate are accumulated and stored after running each trajectory.

3.8. Calculate the potential of mean force

When biased sampling is complete in all windows, the potential of mean force (free energy) along the reaction coordinate is calculated using umbrella integration. This is done using a computePotentialOfMeanForce() block with the following parameters all of which are required:

  • windows - The list of biasing windows used to conduct the biased sampling.
  • xi_min - The value of the reaction coordinate to use as a lower bound for the umbrella integration range.
  • xi_max - The value of the reaction coordinate to use as an upper bound for the umbrella integration range.
  • bins - The number of bins for the umbrella integration algorithm to use. Umbrella integration is performed using the trapezoidal rule between the bins.

An example of a computePotentialOfMeanForce() block is given below:

computePotentialOfMeanForce(windows=windows, xi_min=-0.02, xi_max=1.02, bins=5000)

The potential of mean force can also be calculated for the current state. The code will automatically find all previous files with the results of the biased sampling for given temperature and number of beads. In this case, conductUmbrellaSampling block and windows=windows should be removed:

computePotentialOfMeanForce(xi_min=-0.02, xi_max=1.02, bins=5000)

3.9. Calculate the transmission coefficient

The final step before calculating the ring polymer rate coefficient is the transmission coefficient (recrossing factor). Having obtained the optimal value of the reaction coordinate from the potential of mean force profile, it is evaluated by performing a constrained RPMD simulation in the presence of a thermostat (parent trajectory) to generate a series of configurations at the transition state dividing surface. For each of these constrained configurations, a series of recrossing trajectories are evolved forward in time without a thermostat or a dividing surface constraint (child trajectories), and the fraction of these trajectories resulting in products is determined.

The computation of the recrossing factor is invoked using a computeRecrossingFactor() block, which accepts the following parameters:

  • dt - The evolution time step, as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).
  • equilibrationTime - The equilibration period in the parent (constrained trajectory). This value is specified as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).
  • childTrajectories - The total number of child trajectories to run.
  • childSamplingTime - The length of the parent trajectory between each set of child trajectories, as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).
  • childrenPerSampling - The number of child trajectores to sample after each evolution of the parent trajectory.
  • childEvolutionTime - The length of each child trajectory, as a 2-tuple of the form (value,units), where value is a number and units are the corresponding units (of time).

An example of a computeRecrossingFactor() block is given below:

computeRecrossingFactor(
    dt = (0.0001,"ps"),
    equilibrationTime = (20,"ps"),
    childTrajectories = 100000,
    childSamplingTime = (2,"ps"),
    childrenPerSampling = 100,
    childEvolutionTime = (0.05,"ps"),
)

By default, the computeRecrossingFactor() block calculates the recrossing factor at the position of the reaction coordinate which corresponds to the maximum along the potential of mean force profile. This choice of the reaction coordinate can be overridden by specifying an additional parameter xi_current with the value of the reaction coordinate you wish to use for the recrossing factor calculation. An example of this is given below:

computeRecrossingFactor(
    dt = (0.0001,"ps"),
    equilibrationTime = (20,"ps"),
    childTrajectories = 100000,
    childSamplingTime = (2,"ps"),
    childrenPerSampling = 100,
    childEvolutionTime = (0.05,"ps"),
    xi_current = 1.016,
)

3.10. Compute the ring polymer rate coefficient

The final portion of the calculation is the determination of the bimolecular rate coefficient using the Bennett-Chandler method. This is done using a computeRateCoefficient() block, as shown below:

computeRateCoefficient()

The computeRateCoefficient() block does not require any parameters. It uses the position of the reaction coordinate from the most recently computed recrossing factor to compute the value of the centroid density quantum transition state theory (QTST) rate coefficient. If you have only done the potential of mean force calculation, you can still use this block to generate an estimate of the rate coefficient; in this case, a value of unity will be assumed for the recrossing factor and the centroid density QTST rate coefficient will be calculated at the top of the barrier.