In the course of an infection and over an epidemic, pathogens naturally accumulate random mutations to their genomes. This is an inevitable consequence of error-prone genome replication. By reconstructing a phylogeny, we can learn about important epidemiological phenomena such as spatial spread, introduction timings and epidemic growth rate.
More information on our ongoing SARS-CoV-2 sequencing efforts in partnership with CoVSeQ can be found here:
CoVSeQ is a partnership between the Institut National de Santé Publique du Québec (INSPQ) and the McGill Genome Center to sequence the viral genome of Quebec patients with COVID-19 disease. The viral samples are taken from a Quebec viral biobank, termed the CoVBanQ, which is hosted in the Laboratoire de Santé; Publique du Québec (LSPQ).
Murall et al. Genome Med. 2021 Oct 28;13(1):169
Identification of inversion breakpoints is easily possible with long read nanopore sequencing. Shown in the image below is the characterization of white pupae mutations in insect pest species. Mass releases of sterilized male insects have helped suppress insect pest populations since the 1950s. A key phenotype white pupae has been used for decades to selectively remove fertile insect females before releases of sterilized males, yet the gene responsible. Using nanopore sequencing wild type EgII fly assembly was plotted vs the mutant D53 fly assembly.
The D53 assembly breaks its homology with the EgII assembly at the 17.4 Mbp position of scaffold 5 and continues further at position 53.84 Mbp, providing identification of candidate breakpoints. Nanopore reads aligned against the EgII Ccap 3.2 genome reference scaffold 5. Nanopore reads align perfectly to the EgII reference without breaks.
Ward et al. Nat Commun. 2021; 12: 491.
Structural variants (SVs) are known to play an important role in many cancers including childhood brain tumours such as juvenile pilocytic astrocytomas (JPAs). A 2.4Mb tandem duplication of 7q34 occurs in 60-70% of JPA and results in a fusion between KIAA1549 and BRAF. Other JPAs have been found to have BRAF fusions with a host of other fusion partners. These SVs are difficult to call using standard Illumina reads since the read length is much smaller than the variants of interest. One potential solution is the use of long-read technologies such as Nanopore. The KIAA1549-BRAF fusion is supported by 16 reads.
Repetitive regions that are commonly found in bacterial genomes and plasmids make de novo assembly with short reads nearly impossible. Long read nanopore technology aids in overcoming these repetitive regions and builds a scaffold. This greatly helps with chromosome assembly and detection of plasmids. However, hybrid assembly is currently the superior method to obtain both long read scaffolding in addition to high read accuracy. With this method, we could assemble the Salmonella chromosome and we identified 4 plasmids with up to 100% accuracy to known plasmids.
DIRECT RNA SEQUENCING
RNA sequencing using next-generation sequencing technologies (NGS) is currently the standard approach for gene expression profiling, particularly for large-scale high-throughput studies. Single molecule, long-read RNA-Seq technologies have enabled new approaches to study the transcriptome and its function. The shift toward long-read sequencing technologies for transcriptome characterization is based on current increases in throughput and decreases in cost, making these attractive for de novo transcriptome assembly, isoform expression quantification, and in-depth RNA species analysis. These types of analyses were challenging with standard short sequencing approaches, due to the complex nature of the transcriptome, which consists of variable lengths of transcripts and multiple alternatively spliced isoforms for most genes, as well as the high sequence similarity of highly abundant species of RNA, such as rRNAs. These new sequencing tools have opened new ways in understanding gene functions at the tissue, network, and pathway levels, as well as their detailed functional characterization. Analysis of the epi-transcriptome, including RNA methylation and modification and the effects of such modifications on biological systems is now enabled through direct RNA sequencing instead of classical indirect approaches. However, many difficulties and challenges remain, such as methodologies to generate full-length RNA or cDNA libraries from all different species of RNAs, not only poly-A containing transcripts, and the identification of allele-specific transcripts due to current error rates of single molecule technologies, while the bioinformatics analysis on long-read data for accurate identification of 5′ and 3′ UTRs is still in development.