CPython Embedded in Solr - Search Solution for Python Lovers With the Speed of Native Java
Presented by Roman Chyla, CERN
SPIRES is the biggest bibliographic database for High Energy Physics, ArXiv is the biggest fulltext repository for the fulltext papers in High Energy Physics, and INSPIRE is the biggest digital library that merges the two. We must work with result sets bigger than 1 million for citation related queries and our partners from Astrophysics with 6 million sets, however INSPIRE is written in Python. So how do we move several million result sets between the two systems fast? How do we take advantage of our special NLP processing pipeline written in Python? How do we join them? We do not use Jython. We do not use pipes. We do not embed Solr inside INSPIRE. We embed INSPIRE into Solr! The talk shows benefits and challenges of this surprisingly elegant solution.