Presented by Uwe Schindler | SD DataSolutions GmbH
Since the first day, Apache Lucene exposed the two fundamental concepts of reading and writing an index directly through IndexReader & IndexWriter. However, the API did not reflect reality; from the IndexWriter perspective this was desirable but when reading the index this caused several problems in the past. In reality a Lucene index is not a single index while logically treated as a such. This talk will introduce the new API classes AtomicReader and CompositeReader added in Lucene 4.0 as very general interfaces, and DirectoryReader, which most people know as the segment-based “Lucene index on disk”. The talk will also cover more changes and improvements to the search API like reader contexts that allow to convert local document ids to global ones from IndexSearcher. Lucene changed all IndexReaders to be read-only, so it’s no longer possible to modify indexes using those classes. Finally, Uwe Schindler will show migration paths from custom norm values to the various new ranking models that were added to Lucene; this includes using Similarity with Lucene 4.0’s DocValues as replacement for norms.
During the last decade Apache Lucene became the de-facto standard in open source search technology. Thousands of applications from Twitter Scale Webservices to Computers playing Jeopardy rely on Lucene, a rock-solid, scaleable and fast information-retrieval library entirely written in Java. Maintaining and improving such a popular software library reveals tough challenges in testing, API design, data-structures, concurrency and optimizations. This talk presents the most demanding technical challenges the Lucene Development Team has solved in the past. It covers a number of areas of software development including concurrency & parallelism, testing infrastructure, data-structures, algorithms, API designs with respect to Garbage Collection, and Memory efficiency and efficient resource utilization. This talk doesn’t require any Apache Lucene or information-retrieval background in general. Knowledge about the Java programming language will certainly be helpful while the problems and techniques presented in this talk aren’t Java specific.
In this talk, Lucene/Solr committer Mark Miller will discuss some of the new features and advancements that users can look forward to in Solr 4. The list of topics will include: performance optimizations, further support for near-realtime search, SolrCloud, DirectSolrSpellChecker, and more.