4. Adjacency
Within connected component, subcomponents can be further compartmentalised. The number of subcomponents with its central node (the highest count) one edge away from the rest of nodes is treated as the deduplicated molecule count, leading to the Adjacency
method in UMI-tools1.

The Adjacency
method draws on the information about the count of UMIs. In the example graph, the counts of the 7 unique UMIs are given in node_val_sorted
.
1 2 3 4 5 6 7 8 9 |
|
It also requires the connected components as input because those are what the Adjacency
method classifies further into connected subcomponents.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
console
30/07/2024 15:44:54 logger: ======>connected_components: ['A', 'B', 'C', 'D', 'E', 'F', 'G']
30/07/2024 15:44:54 logger: =========>subcomponents: {'A': ['B']}
30/07/2024 15:44:54 logger: =========>subcomponents: {'A': ['B', 'C']}
30/07/2024 15:44:54 logger: =========>subcomponents: {'A': ['B', 'C', 'D']}
30/07/2024 15:44:54 logger: ======>The ccurent ele popping out: A {'C', 'B', 'A', 'D'}
30/07/2024 15:44:54 logger: =========>subcomponents: {'A': ['B', 'C', 'D'], 'D': ['E']}
30/07/2024 15:44:54 logger: =========>subcomponents: {'A': ['B', 'C', 'D'], 'D': ['E', 'F']}
30/07/2024 15:44:54 logger: ======>The ccurent ele popping out: D {'F', 'E', 'B', 'C', 'A', 'D'}
30/07/2024 15:44:54 logger: =========>subcomponents: {'A': ['B', 'C', 'D'], 'D': ['A', 'E', 'F'], 'E': ['G']}
30/07/2024 15:44:54 logger: ======>The ccurent ele popping out: E {'G', 'F', 'E', 'B', 'C', 'A', 'D'}
deduplicated count:
3
deduplicated clusters:
{'cc_0': {'node_A': ['A', 'B', 'C'], 'node_D': ['D', 'F'], 'node_E': ['E', 'G']}}
deduplicated clusters decomposed:
{0: ['A', 'B', 'C'], 1: ['D', 'F'], 2: ['E', 'G']}
Deduplicated UMI count
There are 3 connected subcomponents (node_A
, node_D
, and node_E
) in connected component cc_0
and therefore, the deduplicated count of UMIs from 7 unique UMIs is 3 at the single locus.
We implemented the method based on the example 6-node UMI graph from the UMI-tools paper.
Abstract
The method aims to accurately resolve complex networks by analyzing node counts. It begins by removing the most abundant node and all nodes connected to it. If not all nodes in the network are accounted for, the next most abundant node and its neighbors are removed. This process is repeated until all nodes in the network are addressed. The total number of steps required to resolve the networks at a given locus corresponds to the estimated number of unique molecules.
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
-
Smith T, Heger A, Sudbery I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 2017 Mar;27(3):491-499. doi: 10.1101/gr.209601.116. Epub 2017 Jan 18. PMID: 28100584; PMCID: PMC5340976. ↩