Wikipedia’s corpus makes it ideal for doing some natural language procesing tasks (NLP). This talk will cover how to extract data out of Wikipedia for your own use using Python, MongoDB and Solr; it will also cover how to use this data to do familiar NLP tasks such as named entity recognition and suggesting related articles.
Blaze is a next-generation NumPy sponsored by Continuum Analytics. It is designed as a foundational set of abstractions on which to build out-of-core and distributed algorithms. Blaze generalizes many of the ideas found in popular PyData projects such as Numpy, Pandas, and Theano into one generalized data-structure. Together with a powerful array-oriented virtual machine and run-time, Blaze will be capable of performing efficient linear algebra and indexing operations on top of a wide variety of data backends.
Working with data at large scales requires parallel computing to access large amounts of RAM and CPU cycles. Users need a quick and easy way to leverage these resources without becoming an expert in parallel computing. IPython has parallel computing support that addresses this need by providing a high level parallel API that covers a wide range of usage cases with excellent performance. This API enables Python functions, along with their arguments to be scheduled and called on parallel computing resources using a number of different scheduling algorithms. Programs written using IPython Parallel scale across multicore CPUs, cluster and supercomputers with no modification and can be run, shared and monitored in a web browser using the IPython Notebook. In this talk I will cover the basics of this API and give examples of how it can be used to parallelize your own code.
Shapely is a Python library for performing geometric calculations. It is most commonly used to process and analyze geographic data, like geo-tagged media or shapefiles. In this talk, we'll take publicly available geo-tagged data, visualize it, and perform spatial analysis to find trends.