An emerging class of distributed database management systems (DBMS), known as NewSQL, provide the same scalable performance of NoSQL systems whilemaintaining the consistency guarantees of a traditional, single-node DBMS. These NewSQL systems achieve high throughput rates for data-intensive applications by storing their databases in a cluster of main memory partitions. This partitioning enables them to eschew much of the legacy, disk-oriented architecture that slows down traditional systems, such as heavy-weight concurrency control algorithms, thereby allowing for the efficient execution of single-node transactions. But many applications cannot be partitioned such that all of their transactions execute in this manner; these multi-node transactions require expensive coordination that inhibits performance. Thus, without intelligent methods to overcome these impediments, a NewSQL DBMS will scale no better than a traditional DBMS.
In this talk, Andy present's research on integrating machine learning techniques to improve the performance of fast database systems that is inspired by his adventures at greyhound racing tracks. In particular, I he discusses his work on the H-Store parallel, main memory transaction processing system. He first describes the Houdini framework that uses Markov models to predict transactions’ behaviors to allow a DBMS to selectively enable runtime optimizations. He then presents Hermes, a method for the deterministicexecution of speculative transactions whenever a DBMS stalls because ofdistributed transactions. Together, these projects enable H-Store to support transactional workloads that are beyond what single-node systems can handle.