Python has long been used as a language for crawling the web -- perhaps the most successful example being the early web crawlers built for the Google search engine. In recent times, open source libraries have improved dramatically for doing large-scale web crawling tasks. Further, the web has also matured in that many HTML pages now offer various metadata that can be extracted by well-equipped spiders, beyond the basics such as the text content or document title. This talk will cover Parse.ly's use of the open source Scrapy project and its own work on standardizing metadata extraction techniques on news stories.
Within the past decade, the amount of DNA sequencing data generated from next-generation sequencing platforms has exploded. As a result, biology has been propelled as a field in need of better scaling algorithms and data structures to efficiently analyze data. Jason Pell will present features of khmer, a software package developed in the GED Lab at Michigan State University, to efficiently filter and analyze data generated by next-generation sequencing platforms. More specifically, he will present the use of the Bloom filter and Counting Bloom filter data structures for assembly graph traversal and k-mer counting, respectively. The khmer software package is written primarily in C++ and wrapped in Python. It is released under the BSD license and is available at github.com/ged-lab/khmer.
Why use GPUs from Python? This workshop will provide a brief introduction to GPU programming with Python, including run-time code generation and use of high-level tools like PyCUDA and PyOpenCL, and Loo.py.