A reference genome is a representation of a specific organism's genetic material. It serves as a standard to compare and analyse genomic data obtained from different individuals of the same species. The Genome Reference Consortium (GRC) [1] has been responsible for the development of several assemblies of the human reference genome, with the latest version being the GRCh38 assembly [2].
The GRCh38 was first released in December 2013, and it is regularly updated to incorporate new data and improve accuracy of assemblies [2]. Compared to the previous version (GRCh37), the GRCh38 has several significant improvements. The most important one is the presence of alternative loci. These are highly variable regions in the human genome, which cannot be well represented by a single path. Therefore such regions are represented by few alternative sequences in the GRCh38 reference genome. The GRCh38 assembly is more complete than previous one, it includes more contiguous sequences and fewer gaps, it was also corrected for the artefacts. For example, the assembly of challenging regions, such as repetitive regions, centromeres, and telomeres was improved.
Why is it important to use the GRCh38 reference?
The GRCh38 is the most up-to-date human reference genome, therefore it provides the most accurate and comprehensive reference for identifying clinically relevant variants.
The first step in Next-Generation Sequencing (NGS) data analysis involves aligning the sample sequence to the reference genome, which provides the sequence's coordinates, such as the chromosome and position. Thus, the more complete the reference, the more reliable is the alignment. The improved accuracy and completeness of the GRCh38 reduces the number of false positive and false negative results generated by wrong alignment.
Once variants have been identified, their interpretation requires searching for relevant information in databases. To do this, the variants' coordinates are required, which are obtained from the reference genome. However, the chromosomal coordinates may differ between different versions of the reference genome, such as GRCh37 and GRCh38. Although there are programs that can translate coordinates from one version to another, they often generate further errors. Thus, it is best to use the standard reference, such as the GRCh38 assembly. In fact, the GRCh38 is the most widely used human reference genome, which ensures consistency with databases, other researchers and clinical laboratories.
In summary, using the GRCh38 reference is critical for obtaining accurate and reliable results in NGS data analysis. It provides the most comprehensive and up-to-date information, reducing errors and ensuring consistency across different research groups and clinical settings.
GRCh38 at iFlow
The workflows available on the iFlow platform use the GRCh38 reference genome by default. We also offer a liftover procedure so that data generated with older reference genomes can also be used. Don't hesitate to contact us if you are interested in using iFlow in your project. Sign up for a free demo with one of our experts to learn more about our solutions.
Ref.
Get in touch with us.