This talk is a Cook's tour of computational linguistics projects I've worked on over the last year demonstrating Information Extraction, Lexical Analysis (the linguistics rather than compiler kind), Statistical Parsing with an Ensemble, and Computer Program Corpus Construction (which includes parsing some English from Javadocs). In each case I'll give a very brief description of the motivating problem and then dive into code that deals with some part of it. We'll see how Groovy makes the most out of Java for text processing, simple web browser UIs, and cluster computing.
While the particular top-level problems may be unfamiliar to folks accustomed to working on business applications, much of the effort and the code needed to make working programs will be familiar indeed. This talk should be of interest both to those curious about what goes on in NLP as well as those who would simply like to get some of their work done faster by using more powerful tools.
Technologies we'll see at work (all of which are Open Source Software):
GATE (General Architecture for Text Engineering)
MALLET (MAchine Learning for LanguagE Toolkit)
ERG (English Resource Grammar)
Speaker: Jim White
Jim White is a computational linguist with over 30 years of experience building computer systems (resume). Prior to focusing on Natural Language Processing (NLP) he worked at the software, firmware, hardware, and system architecture level in development tools, embedded and portable devices, networking, and graphics. He is an Open Source Software advocate, Groovy committer, and has created the innovative OSS Groovy for OpenOffice and IFCX Wings. He is currently working on a thesis for the Master of Science in Computational Linguistics (CLMS) at the University of Washington and will be instructor for the program's Computational Linguistics Fundamentals course this year.