This is the digitized version of the talk I gave at this year's SEA-Phages Conference.
You can check out our poster at (yeesus.com/tangoPoster), and our github repo at github.com/bsiranosian/tango.
Here's the full title:
Tetranucleotide Usage In Mycobacteriophage Genomes - Alignment-Free Methods to Cluster Mycobacteriophage and Infer Evolutionary Relationships
And our abstract:
Many tools used for the analysis of mycobacteriophage genomes depend on sequence alignment or knowledge of gene content. Although effective, these methods are computationally expensive and may require significant manual input (i.e. gene annotation). We evaluated tetranucleotide usage in mycobacteriophages as an alternative to alignment-based methods for genome analysis. First, we computed tetranucleotide usage deviation (TUD), the ratio of observed counts of 4-mers in a genome to the expected count under a null model. TUD showed strong intra-subcluster similarity and was distinct between subclusters. Hierarchical clustering dendrograms and neighbor joining phylogenetic trees were constructed on pairwise Euclidean distances between all 663 genomes in the mycobacteriophage database. In almost every case, phage were placed in a monophyletic clade with members of the same subcluster. We found that TUD is efficient at capturing relationships between subclusters of the same cluster, in contrast to previous findings that suggest tetranucleotide usage does not carry a strong phylogenetic signal. We also evaluated the possibility of assigning clusters to unknown phage based on TUD. Under a simple k-means classifier, cluster assignments were recovered at a frequency greater than 98%. In addition, we looked for evidence of horizontal gene transfer by using tetranucleotide difference index (TDI), a measure of the deviation in tetranucleotide usage from the genomic mean in a sliding window across the genome. Our TDI plots showed a strong spike at the end of cluster L mycobacteriophages, which could indicate horizontal gene transfer in the region. Genome analysis using TUD and TDI shows promise for evaluating host-parasite coevolution and gene exchange within the mycobacteriophage population. These methods are computationally inexpensive and independent of gene annotation, making them optimal for further research aimed at clustering phage and determining evolutionary relationships.
Here are our citations:
Betley, J. N., Frith, M. C., Graber, J. H., Choo, S. & Deshler, J. O. A ubiquitous and conserved signal for RNA localization in chordates. Curr. Biol. 12, 1756–1761 (2002). Hall, M. et al. The WEKA Data Mining Software: An Update. SIGKDD Explor. f. 11, 10–18 (2009).
Hatfull, G. F. et al. Comparative Genomic Analysis of 60 Mycobacteriophage Genomes: Genome Clustering, Gene Acquisition, and Gene Size. Journal of Molecular Biology 397, 119–143 (2010).
Sandberg, R. et al. Capturing Whole-Genome Characteristics in Short Sequences Using a Na?ve Bayesian Classiﬁer. Genome Res 11, 1404–1409 (2001).
Pride, D. T., Wassenaar, T. M., Ghose, C. & Blaser, M. J. Evidence of host-virus co-evolution in tetranucleotide usage patterns of bacteriophages and eukaryotic viruses. BMC Genomics 7, 8 (2006).
Thanks for watching!