There is so much text in our lives, we are practically drowning in it. Fortunately, there are innovative tools and techniques for managing unstructured information that can throw the smart developer a much-needed lifeline. In this talk, based on the outline of the book of the same name, I'll provide an introduction to a variety of Java-based open source tools that aide in the development of search and NLP applications.

Book Abstract: Taming Text is a practical, example-driven guide to working with text in real applications. This book introduces you to useful techniques like full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. You'll explore real use cases as you systematically absorb the foundations upon which they are built. Written in a clear and concise style, this book avoids jargon, explaining the subject in terms you can understand without a background in statistics or natural language processing. Examples are in Java, but the concepts can be applied in any language.

Drew Farris is a software developer and technology consultant at Booz Allen Hamilton where he focuses on large scale analytics, distributed computing and machine learning. Previously, he worked at TextWise where he implemented a wide variety of text exploration, management and retrieval applications combining natural language processing, classification and visualization techniques. He has contributed to a number of open source projects including Apache Mahout, Lucene and Solr, and holds a master's degree in Information Resource Management from Syracuse University's iSchool and a B.F.A in Computer Graphics.

