
A reference genome is a representation of a specific organism’s genetic material. It serves as a standard to compare and analyse genomic data obtained from different individuals of the same species. The Genome Reference Consortium (GRC) [1] has been responsible for the development of several assemblies of the human reference genome, with the latest version being the GRCh38 assembly [2].
The GRCh38 was first released in December 2013, and it is regularly updated to incorporate new data and improve accuracy of assemblies [2]. Compared to the previous version (GRCh37), the GRCh38 has several significant improvements. The most important one is the presence of alternative loci. These are highly variable regions in the human genome, which cannot be well represented by a single path. Therefore such regions are represented by few alternative sequences in the GRCh38 reference genome. The GRCh38 assembly is more complete than previous one, it includes more contiguous sequences and fewer gaps, it was also corrected for the artefacts. For example, the assembly of challenging regions, such as repetitive regions, centromeres, and telomeres was improved.
Why is it important to use the GRCh38 reference?
The GRCh38 is the most up-to-date human reference genome, therefore it provides the most accurate and comprehensive reference for identifying clinically relevant variants.
The first step in Next-Generation Sequencing (NGS) data analysis involves aligning the sample sequence to the reference genome, which provides the sequence’s coordinates, such as the chromosome and position. Thus, the more complete the reference, the more reliable is the alignment. The improved accuracy and completeness of the GRCh38 reduces the number of false positive and false negative results generated by wrong alignment.
Once variants have been identified, their interpretation requires searching for relevant information in databases. To do this, the variants’ coordinates are required, which are obtained from the reference genome. However, the chromosomal coordinates may differ between different versions of the reference genome, such as GRCh37 and GRCh38. Although there are programs that can translate coordinates from one version to another, they often generate further errors. Thus, it is best to use the standard reference, such as the GRCh38 assembly. In fact, the GRCh38 is the most widely used human reference genome, which ensures consistency with databases, other researchers and clinical laboratories.
In summary, using the GRCh38 reference is critical for obtaining accurate and reliable results in NGS data analysis. It provides the most comprehensive and up-to-date information, reducing errors and ensuring consistency across different research groups and clinical settings.
GRCh38 at IntelliseqFlow
All workflows available at the IntelliseqFlow platform use GRCh38 reference genome. You can test the IntelliseqFlow platform once you register for a 30-day trial that allows you to run a trial analyses for free [click here to register and test our platform for free]. To get started, we recommend watching the introductory video that explains how to perform an analysis [click here to watch the introductory video]. If you require more information or have specific questions about the platform, you can register for a demo with one of our highly knowledgeable bioinformatics experts [click here to request a demo]. Don’t hesitate to reach out if you need any assistance getting started with the IntelliseqFlow platform.
Ref.