
Unique Molecular Identifiers (UMIs), also known as Molecular Barcodes or Random Barcodes, are short random nucleotide sequences used to label each DNA or RNA molecule in a sample for high-throughput sequencing. These unique identifiers serve as molecular tags, allowing the distinction of true variants that are present in the original sample from errors introduced during library preparation or sequencing.
How do UMIs work?
UMIs are typically introduced at the first steps of library preparation, before any PCR amplification steps. After UMI tagging, the molecules are subjected to PCR amplification, making multiple copies of each molecule. During this process, UMIs remain attached to their respective molecules. Subsequently, the amplified molecules are sequenced using high-throughput sequencing technologies. In the final step, bioinformatic tools are employed to process the sequencing data. UMIs are used to cluster sequencing reads based on their unique molecular identifiers. This allows distinguishing between PCR duplicates (copies of the same original molecule) and unique molecules.

The usage of UMIs is especially recommended for analysis aimed at detecting low-frequency variants in DNA samples or deep sequencing of RNA-seq libraries. Although NGS sequencing technologies such as Illumina offer low error rates, they still affect the ability to confidently identify low-frequency variants. UMIs mitigate errors and quantitative biases introduced during various stages of the NGS sequencing process, including PCR amplification, library preparation, and sequencing itself. By uniquely labeling each molecule of interest, UMIs enable the identification and correction of errors, ensuring increased accuracy and reliability of downstream analysis.
What are the main advantages of using UMIs?
- Error detection: UMIs facilitate detection of errors introduced during library preparation and sequencing.
- Quantification: UMIs can be used for accurate quantification of gene expression levels (RNA-seq).
- Low-frequency (1% and lower) somatic variant detection: UMIs enable distinguishing unique molecules from duplicates generated by PCR amplification, allowing confident detection and analysis of low-frequency genetic alterations.
UMIs in Intelliseq’s workflows
We have recently developed a customized workflow to detect cancer-associated variants from a blood sample. Within the workflow, cell-free DNA released by malignant cells is screened for the presence of SNVs, INDELs, and CNVs. Since the variants of interest are of somatic origin, they are present at low frequencies in the blood sample. The introduction of UMI tags has allowed us to develop a workflow that reliably identifies clinically relevant variants as rare as 0,5-1%. Identification of such mutations is critical for selecting the most appropriate anticancer therapy in accordance with NCCN and FDA guidelines.