Dataflow is a technique for parallel computing that emerged from research in the 1970s. It's based on graph-based execution models where data flows along the arcs on a graph and is processed at the nodes. It was decades ahead of its time in an era when hardware was expensive and real-world requirements for massively parallel, low latency computing architectures were not required in the mainstream. However, dataflow as an architecture has found its place and time, with the emergence of Big Data volume, real-time low latency requirements, commodity hardware and low cost storage. Dataflow is driving the architectures for today's real-time big data solutions.
SQLstream adopted the principles of dataflow as the basis of our architecture for SQLstream s-Server. Our adapters turn any data source into a live stream of data tuples which are combined, aggregated and analyzed by the SQLstream s-Server platform. SQLstream has added one essential feature to data flow -- the use of SQL as a dataflow management language. SQL has been used for some time as the language of choice for relational database management systems, and in this context is getting a bad press in light of new structures for Big Data storage and NoSQL queries. However, SQL is powerful, declarative (therefore applications can be built easily, quickly and cheaply) and is a natural, powerful paradigm for processing streaming dataflows. The benefit is extremely low latency with the ability to process massive volumes of live data over an unlimited number of servers -- exactly the requirements of real-time Big Data. In fact, this is the only architecture capable of processing real-time Big Data streams.