Skip to content

2.3 Error propagation

We developed a method for estimating error propagation through PCR amplification. Specifically, we track all types of nucleotide-level errors introduced during library preparation, along with their error types. This information is recorded in a table that provides a per-sequence breakdown of error status. We had a specific analysis here using the method.

The table contains four columns: the first three correspond to substitution, deletion, and insertion errors for mut, del, and ins, respectively. The fourth column indicates whether the sequence acquired an error during PCR amplification. At the time of initial library preparation, this fourth column is set to False for all entries. With each round of PCR amplification, the table is updated accordingly. Each row represents a single sequence.

Before sampling, the table looks like:

      mut    del    ins  pcr_err_mark
0   False  False  False         False
1   False  False  False         False
2   False  False  False         False
3   False  False  False         False
4   False  False  False         False
5   False  False  False         False
6   False  False  False         False
7   False   True  False         False
8   False  False  False         False
9   False  False  False         False
10  False  False  False         False
11  False  False  False         False
12  False  False  False         False
13  False  False  False         False
14  False  False  False         False
15  False   True  False         False
16  False  False  False         False
17  False  False  False         False
18  False  False  False         False
19  False  False  False         False
20  False   True  False         False
21  False  False  False         False
22  False  False  False         False
23  False  False  False         False
24  False   True  False         False
25  False  False  False         False
26  False  False  False         False
27  False  False  False         False
28  False  False  False         False
29  False  False  False         False
30  False  False  False         False
31  False  False  False         False
32  False  False  False         False
33  False  False  False         False
34  False  False  False         False
35  False  False   True         False
36  False  False  False         False
37  False  False   True         False
38  False  False  False         False
39  False   True  False         False
40  False  False  False         False
41  False  False  False         False
42  False  False  False         False
43  False  False   True         False
44  False  False  False         False
45  False  False  False         False
46  False  False   True         False
47  False  False  False         False
48  False  False  False         False
49  False  False  False         False

Before sampling, the table looks like:

         mut    del    ins  pcr_err_mark
0       True  False  False         False
1      False  False  False         False
2      False  False  False         False
3      False  False  False         False
4      False   True  False         False
...      ...    ...    ...           ...
23809  False  False  False          True
23810  False  False  False         False
23811  False   True  False         False
23812  False  False   True         False
23813  False  False  False         False

[23814 rows x 4 columns]

After sampling (500 reads as sequencing depth), the table looks like:

       mut    del    ins  pcr_err_mark
0    False  False  False         False
1    False  False  False         False
2    False  False  False         False
3    False   True  False         False
4    False  False  False         False
..     ...    ...    ...           ...
495  False   True  False         False
496  False  False  False         False
497  False  False  False         False
498  False   True  False         False
499  False  False  False         False

[500 rows x 4 columns]

Error Rate Estimation: Bead Synthesis and PCR Amplification

To quantify error propagation in sequencing, we define the following two key metrics based on per-read error flags. We first define a few symbols.


  • Let \( N \) be the total number of reads.
  • For each read \( i \) (where \( i = 1, 2, \dots, N \))

  • \( d_i = 1 \) if a deletion error occurred during bead synthesis, otherwise \( d_i = 0 \).

  • \( p_i = 1 \) if an error occurred during PCR amplification, otherwise \( p_i = 0 \).

We calculate the proportion of reads with errors from either bead synthesis or PCR amplification.

\[ u_{\text{bead ∪ PCR}} = \frac{1}{N} \sum_{i=1}^{N} \mathbb{f}(d_i = 1 \ \text{or} \ p_i = 1) \]

where \( \mathbb{f}(\cdot) \) is a function that is equal to 1 if the condition is true, and 0 otherwise.


We calculate the proportion of reads that have a deletion error during bead synthesis, regardless of PCR status.

\[ u_{\text{bead}} = \frac{1}{N} \sum_{i=1}^{N} \mathbb{I}(d_i = 1) \]

These equations were directly computed.

err_union = df['del'] | df['pcr_err_mark']

p_bead_or_pcr = err_union[err_union].shape[0] / err_union.shape[0]

p_bead = df[df['del']].shape[0] / df.shape[0]