1. Data Science, Big Data and Statistics – can we all live together?

    58:37

    from Chalmers Internal / Added

    6,410 Plays / / 0 Comments

    From Chalmers Initiative Seminar on Big Data, April 2014 Terry Speed, Walter & Eliza Hall Institute of Medical Research in Melbourne, and emeritus professor in Statistics at University of California at Berkeley. Terry Speed reports on some reflections on Big Data issues, offer some suggestions for statisticians, and summarize some theory some theory which, in his opinion, has relevance to the analysis of data, whoever does it.

    + More details
    • A New Frontier - Understanding epigenetics through mathematics

      01:00:57

      from Royal Society of New Zealand / Added

      301 Plays / / 0 Comments

      Professor Terry Speed explains what epigenetics is and how more and more mathematicians will be needed in this area

      + More details
      • Terry Speed - Dealing with the GC-content bias in second-generation DNA sequence data - CAMDA 2011 Vienna

        51:17

        from BokuBI / Added

        126 Plays / / 0 Comments

        GC-content bias describes the dependence between fragment count (read coverage) and GC content found in high-throughput sequencing assays, particularly the Illumina Genome Analyzer technology. For analyses that focus on measuring fragment abundance within a genome, this bias can dominate the signal of interest. There is no consensus as to the source or shape of the bias; current methods to remove it do not assume a knowledge of the curve shape or scale. In this work we analyze regularities in the GC-bias patterns, and find a compact description for this curve family. It is the GC content of the full DNA fragment, not only the sequenced read, that influences fragment counts. This GC effect is unimodal: both GC rich fragments and AT rich fragments are under-represented in the sequencing results. Moreover, the size of the fragment may interact with the shape and peak of the GC curve. Based on these findings, we propose a new method to calculate expected coverage. This single-bp GC correction and accommodates library, strand, and fragment lengths information, as well as non-uniform bin sizes. We show that it outperforms current approaches in copy-number estimation tasks. These GC-modeling considerations can inform other high-throughput sequencing analyses, such as ChIP-seq and RNA-seq, and illuminate possible causes for the GC-content bias.

        + More details

        What are Tags?

        Tags

        Tags are keywords that describe videos. For example, a video of your Hawaiian vacation might be tagged with "Hawaii," "beach," "surfing," and "sunburn."