Presented by Paddy Mullen,Independent Contractor
This talk walks through using the wikipedia_Solr and wikipedia_elasticsearch repositories to quickly get up to speed with search at scale. When choosing a search solution, a common question is "Can this architecture handle my volume of data", figuring out how to answer that problem without integrating with your existing document store saves a lot of time. If your document corpus is similar to Wikipedia's document corpus, you can save a lot of time using wikipedia_Solr/wikipedia_elasticsearch as comparison points.
Wikipedia is a great source for a tutorial such as mine because of it's familiarity and free availability. The uncompressed Wikipedia data dump I used was 33GB, it had 12M documents. The documents can be further split into paragraphs and links to test search over a large number of small items. To add extra scale, prior revisions can be used bringing the corpus size into terabytes.