Python's Natural Language Toolkit is one of the most widely used and actively developed natural language processing libraries in the open source community. This workshop will introduce the audience to NLTK -- what problems its aims to solve, how it differs from other natural language libraries in approach, and how it can be used for large-scale text analysis tasks. Concrete examples will be taken from Parse.ly's work on news article analysis, covering areas such as entity extraction, keyword collocations, and corpus-wide analysis.
This talk was presented at PyData NYC 2012: nyc2012.pydata.org/. If you are interested in this topic, be sure to check out PyData Silicon Valley in March of 2013: sv2013.pydata.org/