`library(powerHaDeX)`

The package `powerHaDeX`

is a tool providing a solution for the need for simulation and analysis of the data coming from HDX-MS experiments. This article is dedicated to the detailed description of the simulation of HDX-MS data.

The simulation was created in such a way that it imitates the real experiment as much as it is possible. To obtain the deuterium uptake at a single time point, one has to compute a mass spectrum first. Thus, the simulation consists of two main parts. The first one concerns the biological section of calculations, that is simulation of theoretical spectra, and the second concerns calculating deuteration curves and multiplying the data for the need for power simulation.

- Simulate theoretical spectrum
- Add noise to the theoretical spectrum
- Making replicated theoretical spectra
- Calculating deuterium uptake curves from replicated spectra
- Making replicated deuterium uptake curves (power estimation)

The diagram of the simulation algorithm can be seen in the figure below:

The parameters sequence, charge, state, and time point define, simplistically, a unique mass spectrum. Thus, as shown in the above figure, having two different biological states and \(T\) different exposure times we obtain \((2 \times T)\) unique spectra. From now on such a set will identify one **replication**. This part of the simulation is described in detail in the section **Simulate theoretical spectrum**. The next step of the simulation is reproducing the number of replications. First, we add noise to the obtained spectra and transform each of them into deuterium uptake. The deuterium uptake at different time points forms the deuterium uptake curve. The set of replicated deuterium uptake curves for different states is the target of the simulation. The procedure is described in **Add noise to the theoretical spectrum**.

The algorithm that is used to simulate theoretical spectra is based on Zhong-Yuan Kan’s implementation in MATLAB. The original version of codes is located in the repository HX-MS-Simulations (as at 29.06.2020). In the `powerHaDeX`

package can be found the Kan’s algorithm re-implemented in R (using Rcpp) and the accelerated implementation (that uses Markov chains’ properties). Moreover, the package `powerHaDeX`

allows the user to simulate spectra for more than one exposure time for both (Rcpp and Markov) approaches.

The simulation code is in `simulate_theoretical_spectra.R`

file. It uses the following internal functions:

- get_approx_isotopic_distribution (
`isotopic_distribution.R`

), - get_exchange_rates (
`exchange_rates.R`

), - get_exchange_probabilities (
`simulate_HD_exchange.R`

), - get_HD_matrices_using_markov / get_HD_matrices (
`simulate_HD_exchange.R`

),

The simulation is run for a given peptide. The parameters are:

`sequence`

- sequence of the peptide, single string. Necessary to run the simulation.`charge`

- a vector of charges of the peptide ion. If NULL, one value is sampled from vector 2:6. Default value: NULL.`times`

- vector of time points of the measurement. Default value: vector of 60 and 600 [seconds]`protection_factor`

- protection factor - vector of values or one value for all amides. Default value: 1 (indicates that the exchange rate is equal to the intrinsic exchange rate)`pH`

- pH. Default value: 7.5`temperature`

- temperature. Default value: 15 [Celcius]`n_molecules`

- number of peptide molecules. Default value: 100`time_step_const`

- time step constant. Default value: 1. Value that indicates the length of the time step of the simulation. The bigger the time step, the fewer time points are simulated (the fewer iterations in case of Zhong-Yuan Kan’s approach).`if_corr`

- logical, pH correction indicator. If FALSE, then the value of pH is equal to pD. If there is correction, the pD = pH + 0.4.(Connelly et al. 1993)`min_probability`

- minimal accepted probability. Default value: \(10^{-4}\), threshold for accepting the values of the isotopic probabilities (intensity).`use_markov`

- usage of Markov chains indicator. Default value: TRUE, as it fastens the calculation

The graphical illustration of the algorithm can be seen in the figure below. As said, the procedure that we describe below is based on Kan’s implementation. However, the section of the simulation called **Calculate matrices of simulated empirical distribution** appears in this thesis in two versions. The second version is the standard approach of Zhong-Yuan Kan, additionally implemented in Rcpp (called when use_markov = FALSE), and the first version is the algorithm accelerated for the needs for this thesis.

Initially parameter `sequence`

is split into a character vector. From now, `sequence`

will indicate the vector of amino acids and n will denote its length. If a single value of `protection_factor`

is provided, the vector of that value of the length of the `sequence`

is created. If no value of `charge`

is provided, it is assigned randomly from the range from 2 to 6.

The function `get_approx_isotopic_distribution`

is responsible for calculation of approximate isotopic distribution. The function takes two arguments: `sequence`

and `min_probability`

.

Additional file `sysdata.RDA`

contains the maximal possible occurrence of the isotopes \(C_{13}\), \(N_{15}\), \(O_{18}\), \(S_{34}\) (carbon, nitrogen, oxygen, and sulfur, respectively) in the respective amino acids, and their masses. Based on that, the maximal possible number of molecules of the isotopes in the sequence is calculated. Peptide mass is the sum of the masses of amino acids and \(H_2O\) mass - as it includes the N terminal group (H) and C terminal group (OH).

Next, the distributions of mentioned isotopes are calculated under the assumption that the occurrence of \(i^{th}\) considered isotope has a binomial distribution \(B(n_i, p_i)\) with parameters \(n_i\) (maximal possible occurrence in the sequence) and \(p_i\) (natural richness - possibility of occurrence in the universe). For the oxygen molecules, we have to take into account that oxygen occurs in a diatomic molecule. Calculation of the sulfur distribution takes into account its rare occurrence.

The final isotopic distribution is computed as a convolution of obtained distributions with probabilities greater than `min_probability`

. It is a vector of probabilities of possible monoisotopic masses. The number of exchangeable amides is computed as the length of the sequence, reduced by the number of prolines located on the third of further position.

The function `get_approx_isotopic_distribution`

returns the mass of the peptide (`peptide_mass`

), final distribution (`isotopic_distribution`

) of the isotopes, number of significant probabilities minus one (`max_ND`

) and number of exchangeable amino acids (`n_exchangeable`

).

Calculation of the exchange rates is one of the crucial parts of the simulation because the exchange rates (kinetic constants) affect the exchange probabilities. The calculations are done via `get_exchange_rates`

function. The adjunctive calculations take place in the internal functions from `exchange_rates`

file. Namely:

`get_F_const`

uses two parameters:`temp_kelvin`

(initial`temperature`

parameter in celcius converted into temperature in kelvin) and`gas_constant`

which is \(1.9858775\). The value \(Q_7\) is a factor adjusted to the measured temperature and is calculated based on the equation below. \[Q_7 = \dfrac{1}{gas\_constant}\left( \dfrac{1}{temp\_kelvin} - \dfrac{1}{293}\right)\] The output of`get_F_const`

is a list of \[Ft_X = \exp\lbrace-Q_7 \cdot Ea_X\rbrace\] where \(X \in \lbrace A, B, W\rbrace\) (A - Acid, B - Base, W - Water) identifies the condition and \(Ea_A, Ea_B, Ea_W\) are tabular values for energies of activation.`get_poly_const`

calculates constants depending on provided`mol_type`

and type of exchange -`exchange`

(\(HD\) or \(DH\), default to \(HD\)). If the`mol_type`

is \(poly\) then the constant for condition \(X\) is calculated as follows \[K_{Xpoly} = \dfrac{10^{K_{X_{exp}}}}{60}\] where \(K_{X_{exp}}\) are tabular constants depending on the type of exchange. If`mol_type`

is \(oligo\), the constants are scaled accordingly: \[K_{Xoligo} = K_{Xpoly} \cdot c_X\] where \(c_X\) are constants \(2.34\), \(1.35\) and \(1.585\) for acid, base and water, respectively. The function returns a list of \(Ka\), \(Kb\) and \(Kw\) corresponding to the chosen`mol_type`

.`get_pkc`

calculates supplementary constants for aspartic acid (Asp), glutamic acid (Glu) and histidine (His). Values for mentioned amino acids are pH and temperature dependent, in contrary to the rest amino acids with fixed values. This function needs`temp_kelvin`

value,`gas_constant`

and the type of`exchange`

(default: \(HD\)).Depending on provided

`exchange`

direction tabular values of exponents \(E_{const}\) are assigned. For Asp, Glu and His the \(pKc\) constants are calculated based on the following formula: \[pKc = -\log_{10}\left(10^{E_{const}} \cdot \exp\Bigg\lbrace \dfrac{-E_a}{gas\_constant}\left( \dfrac{1}{temp\_kelvin} - \dfrac{1}{278}\right)\Bigg\rbrace\right)\] where \(E_a\) are energies off activation for given amino acid and the chosen`exchange`

direction. The function returns a list of \(asp\), \(glu\) and \(his\) (\(pKc\) values corresponding to amino acids).`get_exchange_constants`

uses the parameters`pH`

,`pkc_consts`

calculated by the function`get_pkc`

and`k_consts`

. The output of`get_exchange_constants`

is a matrix named`constants`

of tabular and calculated constants (specifically for \(Asp\), \(Glu\), \(His\), \(C-Term\) and \(NHMe\)) based on the equation: \[const = \log_{10}\left( \dfrac{10^{c_1 - pH} + 10^{c_2 - pKc}}{10^{-pKc} + 10^{-pH}}\right)\] where \(c_1\) and \(c_2\) are constants for protonated or deprotonated amide and \(pKc\) are constants obtained by the function`get_pkc`

for respective acids.

The function `get_exchange_rates`

requires the parameters `sequence`

, `exchange`

(default: \(HD\)), `pH`

(default: \(9\)), `temperature`

(celcius, default: \(15\)), `mol_type`

(default: \(poly)\) and correction factor `if_corr`

(\(1\) or 0, default: 0). The correction of \(pH\) is taken into account for calculation of \(pD\): \[pD = pH + 0.4\cdot if\_corr.\]

Next, the provided temperature is converted into K and the internal functions `get_F_const`

, `get_poly_const`

and `get_pkc`

are evaluated.

Using the obtained matrix of constants and provided `sequence`

\(F_a\) and \(F_b\) are calculated for each amino acid in the sequence, concerning the previous and next amino acid. For the amino acids in the middle of the sequence, the following formula is used: \[F_x = 10^{ previous_x + current_x}\] where \(x\) is either \(a\) or \(b\), and \(previous_x\) is the acid/base factor for a previous amino acid in the sequence, and \(current_x\) for the amino acid it is calculated for. If the amino acid is next to the C- or N-term, the term-effect is taken into account.

Finally, the exchange rate \(k_c\) for the amino acid is the sum of catalysis constants for acid, base and water (Conelly et al, 1993). Namely: \[k_c = k_{acid} + k_{base} + k_{water}\] where \[k_{acid} = F_a \cdot K_a + D \cdot F_{ta},\] \[k_{base} = F_b \cdot K_b + OD \cdot F_{tb},\] \[k_{water} = F_b \cdot K_w \cdot F_{tw}. \] where \(D\) and \(OD\) indicates deuterium and deuterium oxide concentration, \(F_a\) and \(F_b\) are values calculated specifically for given amino acid, as described before, \(K_a\) and \(K_b\) are values computed by `get_poly_constants`

function, based on the mole type, \(F_{ta}\), \(F_{tb}\) and \(F_{tw}\) are values computed by `get_F_const`

function.

The obtained exchange rates are stored in vector `kcHD`

or `kcDH`

according to the exchange direction. They are used to calculate the exchange probabilities thus both `kcHD`

and `kcDH`

are necessary as we take the possibility of back-exchange into account.

To obtain time points (not to be confused with time points of the measurement) of possible exchanges, two parameters are necessary: `time_step_constant`

and exchange rates computed by `get_exchange_rates`

function (described before).

To calculate the size of a single time step, the maximal possible exchange rate \(k_{max}\) is needed:

\[k_{max} = \max\lbrace\max\lbrace k_{cDH} \rbrace, \max\lbrace k_{cHD}\rbrace\rbrace\] where \(k_{cHD}\) and \(k_{cDH}\) are the vectors of exchange rates for each amino acid from the sequence in the appropriate direction. The size of a time step is a quotient of `time_step_constant`

and maximal exchange rate \(k_{max}\)

\[\Delta t = \dfrac{time\_step\_constant}{k_{max}}.\]

The time points of possible exchanges are arithmetic sequences from 0 to chosen time points of measurement `times`

by \(\Delta t\). Let us note here for the first time that in the case when the obtained \(\Delta t\) is very small, we encounter a problem with allocating a large vector of time sequence in R. The length of this sequence is the number of time points of possible exchanges. The vector of the number of steps between provided times of measurement is constructed for the Markov chain approach.

Exchange probabilities are calculated via `get_exchange_probabilities`

function. The essential parameters are `protection_factor`

, exchange rates \(k_{cHD}\) and \(k_{cDH}\) and the size of a single time step \(\Delta t\). The process is defined as a series of steps from the time sequence, and each step depends on the state in the previous one. Therefore, the probabilities of changing the state are conditional probabilities - probabilities of particular state in \((k+1)^{th}\) step given particular state in \(k^{th}\) step. For simplification the following notation is used:

\[P(X_{k+1} = H ~|~ X_{k} = H) = P\left( H \rightarrow H \right),\] \[P(X_{k+1} = D ~|~ X_{k} = H) = P\left( H \rightarrow D \right),\] \[P(X_{k+1} = H ~|~ X_{k} = D) = P\left( D \rightarrow H \right),\] \[P(X_{k+1} = D ~|~ X_{k} = D) = P\left( D \rightarrow D \right),\]

where \(X_k\) is the random variable describing the state of an isotope of a hydrogen (\(H\) or \(D\)) at the moment \(k\) for any \(k\) from the considered time sequence. The probabilities for the \(i^{th}\) amino acid, \(i = 1, \ldots, n\), are calculated by equations below.

\[P_i\left( H \rightarrow D \right) = 1 - \text{exp}\left(\dfrac{-kcHD_i \cdot \Delta t}{Pf}\right)\],

\[P_i\left( D \rightarrow H \right) = 1 - \text{exp}\left(\dfrac{-kcDH_i \cdot \Delta t}{Pf}\right)\] where \(Pf\) denotes the protection factor. Moreover, let us note that under the assumptions mentioned before, the following equations are satisfied: \[P_i\left( H \rightarrow H \right) = 1 - P_i\left( H \rightarrow D \right),\] \[P_i\left( D \rightarrow D \right) = 1 - P_i\left( D \rightarrow H \right).\]

The last two equations describe the probabilities of staying in the same state.

The output of `get_exchange_probabilities`

function is a list of four vectors: vector `HD`

for probabilities \(P_i\left( H \rightarrow D \right)\), vector `DH`

for probabilities \(P_i\left( D \rightarrow H \right)\), vector `HH`

for probabilities \(P_i\left( H \rightarrow H \right)\) and vector `DD`

for probabilities \(P_i\left( D \rightarrow D \right)\).

Matrices of simulated exchange can be obtained in two ways. The parameter `markov`

indicates whether Markov chain or standard approach is used. Due to the evaluation time, the recommended method is the Markov chain approach. The following items \(1\) and \(2\) are the descriptions of the improved and the standard version, respectively. Each of them is an interchangeable part of the simulation.

The improved version of the simulation uses

`get_HD_matrices_using_markov`

function. It requires`sequence`

, transition probabilities (described in the previous section), the vector of numbers of steps between given time points, and the number of peptide molecules`n_molecules`

. The improvement is based on the observation that the considered process is a Markov chain with transition probabilities \(P(H \rightarrow D)\) and \(P(D \rightarrow H)\), and states \(H\) and \(D\) (in the code denoted by \(0's\) for hydrogens and \(1's\) for deuters). Based on the notation provided before, the transition matrix for the \(i^{th}\) amino acid, \(i = 1, \ldots, n\), is created as follows: \[\mathbb{P}_i = \begin{pmatrix} P_i\left( H \rightarrow H \right) & P_i\left( H \rightarrow D \right) \\ P_i\left( D \rightarrow H \right) & P_i\left( D \rightarrow D \right) \end{pmatrix} = \begin{pmatrix} 1 - P_i\left( H \rightarrow D \right) & P_i\left( H \rightarrow D \right) \\ 1 - P_i\left( D \rightarrow H \right) & P_i\left( D \rightarrow H \right) \end{pmatrix}.\] The initial state of the process is \(\mu_i^T(0) = \begin{pmatrix} 1& 0\end{pmatrix}\) for all \(i\), as it starts with hydrogens. Since \(\mathbb{P}_i^k\) is equal to the \(k\)-step transition probability matrix, the probability distribution of the Markov chain at a time \(k\) can be found as described in equation below. \[\mu_i^T(k) = \mu_i^T(0) \mathbb{P}_i^k = \begin{pmatrix} P_i(X_k = H)\\ P_i(X_k = D)\end{pmatrix} ^ T.\] Using obtained distribution, states \(H\) or \(D\) are sampled for \(m\) peptide molecules (`n_molecules`

) for each of \(i = 1, \ldots, n\) amino acids and stored in a \(m \times n\) dimensional matrix for each of the time points of the measurement given by`times`

.The standard version of simulation of the matrices uses

`get_HD_matrices`

function. The parameters are`sequence`

, transition probabilities, time steps sequence (from 0 to the largest value of`times`

by \(\Delta t\)) and the time points of the measurement`times`

. Since this version uses the vector of time steps, there occurs an error in case when \(\Delta t\) is small (i.e. for long sequences or small`time_step_constant`

).The provided time sequence is split into intervals between time points of the measurement (

`times`

) to make the simulation faster in case of more than one time point of the measurement. In such a situation, an exchange to the next time point is obtained as an exchange to the previous time point and its continuation. The simulation starts with the creation of a matrix of dimension \(m \times n\), where \(m\) denotes the number of peptide molecules (`n_molecules`

) and \(n\) denotes the number of amino acids. Each entry of this matrix corresponds to a single exchange site. Within the matrix, 0 denotes hydrogen and \(1\) denotes deuterium. The matrix is initialized with 0s or \(1\)s, depending on the direction of the exchange.The matrix is initialized with zeroes, as a considered process it starts with hydrogens. At each time point in the time sequence:

- change \(1\) to 0 with probability \(P(H \rightarrow D)\) in each entry of the matrix from the previous iteration,
- change \(1\) to 0 with probability \(P(D \rightarrow H)\) in each entry of the matrix from the previous iteration.

Such a matrix is obtained for each of the time points of the measurement given by

`times`

.

Next, no matter which version of the simulation was chosen, the columns of the matrices respective to prolines or the first two amides are set to zeros (implying hydrogens). The exchange of the first two amino acids is not measurable due to the impact of back-exchange (Connelly et al. 1993) and proline does not have exchangeable hydrogen. Matrices are stored in a list of matrices (`HD_matrices`

) - each matrix for the respective time point of the measurement `times`

.

Isotopic probabilities are calculated via `get_iso_probs_deut`

. The function uses the parameters described before: `HD_matrices`

, `n_exchangeable`

, `isotopic_distribution`

, `max_ND`

, `peptide_mass`

and the essential parameters: `charge`

, `pH`

and `times`

.

The following calculations are performed for each time point of the measurement from `times`

.

Firstly, an observed distribution of ions is computed using the internal function

`get_observed_iso_dist`

. It takes parameters: HD matrix (element of the list of`HD_matrices`

),`isotopic_distribution`

and`n_exchangeable`

.The exchangeable-hydrogen distribution describing the increase of the mass is obtained from the exchange matrix from

`HD_matrices`

and the number of exchangeable hydrogens`n_exchangeable`

. First, the numbers of hydrogens exchanged in each molecule are calculated as sums of rows of the exchange matrix. Next, a vector of the counts is built and stored in a vector of length`n_exchangeable`

plus one (for the lack of exchange). To obtain fractions counts are averaged.The isotopic probabilities for the deuterated peptide are computed as the convolution of obtained distribution and the isotopic distribution for the undeuterated peptide (

`isotopic_distribution`

) as it is a sum of those variables (Claesen and Burzykowski 2017, Deconvolution-Based Approach). Namely \[M_{\Delta} = M_{mol} - M_{mon}\] where \(M_{mol}\) is the random variable describing molecular mass, \(M_{mon}\) is the random variable describing monoisotopic mass and \(M_{\Delta}\) is the random variable describing the increase in mass.

The function `get_observed_iso_dist`

returns a vector of observed isotopic distribution (`observed_dist`

) and the observed peaks for mass spectrum (observed isotopic probabilities).

The \(m/z\) values for the deuterated peptide are calculated using the `peptide_mass`

, `charge`

and constants - deuteron mass (\(1.00628\)) and proton mass (\(1.007276\)). Starting from the \(m/z\) value for the monoisotopic peak, the difference between the mass of deuteron and proton divided by the charge of the peptide ion is added.

The output of the function `get_iso_probs_deut`

is a data frame with the variables: Exposure (time point of measurement consistent with given HD matrix), Mz - \(m/z\) values, Intensity - isotopic probabilities and PH - pH.

To the calculated results is added a minimal exchange control - for time point 0. The \(m/z\) values are obtained as a ratio of the `peptide_mass`

magnified by proton mass and the peptide charge. The distribution of undeuterated peptide from the previous section is the intensities vector.

The output of the function `simulate_theoretical_spectra`

is a data table of:

`Exposure`

- time point of a measurement,`Mz`

- mass-to-charge ratio,`Intensity`

- isotopic probabilities larger than`min_probability`

(the smaller ones are zeroes)

and the variables provided by user

`Sequence`

,`PF`

,`Charge`

,`PH`

.

As in the table below

```
<- simulate_theoretical_spectra("LVRKDLQN", protection_factor = 10, charge = 2, times = 60)
spectrum1 head(spectrum1)
#> Exposure PH Intensity Mz Charge Sequence PF
#> 1: 0 7.5 0.5801405135 493.2931 2 LVRKDLQN 10
#> 2: 0 7.5 0.3031692943 493.2931 2 LVRKDLQN 10
#> 3: 0 7.5 0.0919265899 493.2931 2 LVRKDLQN 10
#> 4: 0 7.5 0.0204531418 493.2931 2 LVRKDLQN 10
#> 5: 0 7.5 0.0036689790 493.2931 2 LVRKDLQN 10
#> 6: 0 7.5 0.0005576524 493.2931 2 LVRKDLQN 10
```

The data obtained by the procedure described above is called the theoretical spectrum. Let us note that a single spectrum is observed for a particular set of parameters such as `Exposure`

, `Sequence`

, `PF`

, `Charge`

, `PH`

. Thus, using the option of providing a vector of time points and/or charges, one obtains several spectra.

Such a spectrum is the first step of the simulation shown in the first diagram when simulated for two biological states and different time points.

To imitate the data from the HDX experiments one may use the function named `get_noisy_deuteration_curves`

from the file `noisy_spectra.R`

that creates spectra with technical replicates and noise. The function takes the following parameters:

`theoretical_spectra`

- a data table of theoretical spectra created by the function`simulate_theoretical_spectra`

and grouped by`sequence`

,`pH`

and`PF`

. See**Simulate theoretical spectrum**section.`compare_pairs`

- logical. If FALSE, all groups (defined by the protection factor) will be considered jointly. If TRUE (default), each protection factor will be considered together with the protection factor given by the`reference`

parameter.`reference`

- protection factor that will be used for comparison to other protection factors. The function accepts either NA (for comparing all protection factors jointly), a number (for comparing with reference value of protection factor) or`"all"`

(for pairwise comparisons of all the possible combinations). Default NA.`n_replicates`

- number of technical replicates to create.`n_experiments`

- number of replicates of curves for power calculation.`mass_deviations`

- mass deviation in parts per million. Either a single number (then the error at each time point will be the same) or a vector of the same length as the number of unique time points in the experiment. The error will be sampled from normal distribution with standard deviation equal to \[\dfrac{mass\_deviations\cdot undeuterated\_mass}{10^6}\]`intensity_deviations`

- optional, standard deviations of random noise that will be added to intensities. Either a single number (then the error at each time point will be the same) or a vector of the same length as the number of unique time points in the experiment. The error will be simulated with these standard deviations. Default \(NULL\).`per_run_deviations`

- optional, standard deviations of random noise that will be added to deuteration curves. Either a single number (then the error at each time point will be the same) or a vector of the same length as the number of unique time points in the experiment. The error will be sampled from a normal distribution with these standard deviations. Default \(NULL\).`relative`

- logical, if TRUE (default), each deuteration curve will start at 0 (relative mass will be returned). Default TRUE.

The function `get_noisy_deuteration_curves`

uses the following internal functions: - `get_undeuterated_mass`

- `get_spectra_list`

- `add_noise_to_spectra`

- `get_deuteration_curves_from_spectra`

- `add_noise_to_curves`

- `fix_columns_names_types`

The procedure of adding noise is the continuation of the simulation algorithm which can be seen in the first diagram. The detailed description is shown below.

Firstly, the undeuterated mass is obtained via `get_undeuterated_mass`

function. The necessary parameter is the table of theoretical spectra created by the function `simulate_theoretical_spectra`

. The following formula is used for the calculation: \[undeuterated\_mass = charge \cdot (Mz - p_m)\] where `Mz`

is mass-to-charge ratio for the peaks from the provided theoretical spectrum and \(p_m\) is the mass of proton. The output of `get_undeuterated_mass`

is the calculated mass value for the first peak (the smallest one) as it is usually the peak corresponding to the monoisotopic mass.

The parameters `theoretical_spectra`

(produced by simulate theoretical spectra function), `compare_pairs`

(as described before) and `reference`

(protection factor, as described before) are passed to the `get_spectra_list`

function. If the parameter `compare_pairs`

is FALSE then all the provided protection factors will be considered jointly. If `compare_pairs`

is TRUE, then the parameter `reference`

is necessary (a single number or `"all"`

). Then the data is split via the supplementary function `get_paired_spectra`

into data tables of spectra with paired biological states (the reference protection factor and the protection factor of interest if provided, or all the possible pairs if `reference`

equals `"all"`

). The function `get_spectra_list`

returns a list of data tables containing spectra - for paired states or all states.

Noisy spectra are obtained from the function `add_noise_to_spectra`

. It takes the parameters `spectra`

(the output of `get_spectra_list`

), `n_replicates`

, `n_experiments`

, `undeuterated_mass`

(calculated by `get_undeuterated_mass`

), `mass_deviations`

and `intensity_deviations`

(those parameters are described before).

Firstly, an internal function `make_experimental_design`

is used to prepare technical replicates from the provided spectra. Next, for each replicate the noise is added by the function `make_noisy_spectra`

. It uses the supplementary function that adds the noise to a single replicate - `add_noise_to_one_spectrum`

. Standard deviations are calculated by the formula:

\[sd = \dfrac{mass\_deviations \cdot undeuterated\_mass}{10^6}\]

There are two functions that add calculated deviations to a spectrum:

`add_noise_to_one_timepoint`

- the noise is sampled from a normal distribution with mean 0 and standard deviation equal to \(sd\) and added to`Mz`

values for the time points (based on supplied parameters for time points).`add_noise_to_intensities`

- if the`intensity_deviations`

were provided, then noise is sampled from a normal distribution with mean 0 and standard deviation equal to those deviations and added to`Intensity`

.

Having noisy spectra, the noisy deuterium uptake curves for power estimation can be obtained. Firstly, deuteration curves are calculated from spectra. It is done by the function `get_deuteration_curves_from_spectra`

. The function uses the `spectra`

returned by the function `add_noise_to_spectra`

. From each replicated spectrum one deuteration curve is obtained via `get_deuteration_curve_single_spectrum`

function. The centroid mass value from spectrum is calculated as a weighted mean from peaks based on the formula

\[m = \dfrac{1}{N}\sum_{k= 1}^N Intensity_k \cdot Charge \cdot (Mz_k -p_m).\]

Next, the function `add_noise_to_curves`

uses the supplementary function named `add_noise_to_single_curve`

in order to make noisy deuteration curves (noise added to noisy uptake curves for power calculation purposes). It takes the parameters `curves`

obtained from `get_deuteration_curves_from_spectra`

and the parameters provided at the beginning of the simulation: `per_run_deviations`

and `relative`

. The noise is added by `add_noise`

function. There, noise is sampled from normal distribution with mean 0 and standard deviation equal to `per_run_deviations`

and added to the `Mass`

values unless they are not zeroes (there is no noise at the time 0).

When `relative`

equals TRUE, the relative mass is returned in deuteration curves (as a form of uptake). It is calculated via the function `get_relative_mass`

.

The function `get_noisy_deuteration_curves`

returns a list (for paired states when `compare_pairs`

is TRUE) of lists (repetitions of experiment for power calculations) of data tables of the variables:

`sequence`

- provided amino acid sequence`Rep`

- technical replication`State`

- provided protection factor (the theoretical - in practice unknown - state of the protein)`Exposure`

`Mass`

- mass or deuterium uptake when`relative`

is TRUE.`charge`

`Experimental_state`

- the biological state (from the viewpoint of the experimenter) when`compare_pairs`

is TRUE

as shown below

```
<- simulate_theoretical_spectra("LVRKDLQN", protection_factor = 1000, charge = 2, times = 60)
spectrum2 <- rbind(spectrum1, spectrum2)
paired_spectra
get_noisy_deuteration_curves(paired_spectra,
n_replicates = 4,
n_experiments = 1,
reference = "all")[[1]][[1]]
#> Sequence Rep State Exposure Mass Charge Experimental_state
#> 1: LVRKDLQN 1 10 0 0.0000000 2 A
#> 2: LVRKDLQN 1 10 60 1.3993987 2 A
#> 3: LVRKDLQN 1 1000 0 0.0000000 2 B
#> 4: LVRKDLQN 1 1000 60 0.8536504 2 B
#> 5: LVRKDLQN 2 10 0 0.0000000 2 A
#> 6: LVRKDLQN 2 10 60 1.3448108 2 A
#> 7: LVRKDLQN 2 1000 0 0.0000000 2 B
#> 8: LVRKDLQN 2 1000 60 0.9516104 2 B
#> 9: LVRKDLQN 3 10 0 0.0000000 2 A
#> 10: LVRKDLQN 3 10 60 1.4010245 2 A
#> 11: LVRKDLQN 3 1000 0 0.0000000 2 B
#> 12: LVRKDLQN 3 1000 60 0.9390501 2 B
#> 13: LVRKDLQN 4 10 0 0.0000000 2 A
#> 14: LVRKDLQN 4 10 60 1.4790697 2 A
#> 15: LVRKDLQN 4 1000 0 0.0000000 2 B
#> 16: LVRKDLQN 4 1000 60 1.0767921 2 B
```