A Technical Article Series for Data Architects This multi-part article series is intended for data architects and anyone else interested in learning how to design modern real-time data analytics solutions.
It explores key principles and implications of event streaming and streaming analytics, and concludes that the biggest opportunity to derive meaningful value from data – and gain continuous intelligence about the state of things – lies in the ability to analyze, learn and predict from real-time events in concert with contextual, static and dynamic data.
This article series places continuous intelligence in an architectural context, with reference to established technologies and use cases in place today.
Part 2: Event Streaming Applications Brokers don’t execute applications – they simply act as a buffer between the real world and applications.
The broker keeps an index per application and topic so if an application restarts, it can resume processing from the next event in each queue – if it is stateful.
If not, it must start from the head of each queue again.
But re-playing events for a long period of (real-world) time is wasteful and leaves the application even further behind the real-world, which never stops! Every application that delivers useful insights needs a model of the system generating events.
The stream may be used to deliver data to a machine learning pipeline, or modify a relational or graph database, but ultimately the reasoning about the meaning of each event and the state changes it communicates both to an entity and to related entities in the context of the system overall requires a stateful model which is typically saved in a database.
The unfortunate consequence is that now applications need to deal with two sources of delay – topic queues at the broker and a database at the application layer.
Neither is predictable in its delay characteristics, making it impossible to bound response times.
If the application state is organized in a database, event streaming fits nicely into a microservices architecture.
Applications can scale by adding stateless microservice instances to consume events.
Each instance independently asks the broker for the next event – avoiding the need for a load balancer, and if an instance fails it can simply restart and carry on.
But applications are always stateful, and pushing the hard problems of distribution, consistency, availability, load balancing and resilience onto the database layer impacts performance and ultimately cannot deliver continuous intelligence: A database is after all simply a repository that reflects an application’s current model of the system – it doesn’t compute based on that.
Context is problematic too: Processing a single event can cause scores of database accesses as its knock-on effects change the states of many additional entities.
Each access adds to the end-to-end processing latency for each event.
Brokers Buffer Data Historically databases attempted to represent the current state of the system, on disk.
But the demand for data-driven computation of insights or responses means that the idea of a database as an organized repository for state is no longer sufficient.
The application logic – analysis, learning and prediction – need to be carried out in real-time, continuously, as data flows into the application.
Time is of the essence: If the app is too slow it will lose track and deliver useless insights, and “store-then-analyze” architectures fall into this trap.
Unfortunately, topic queues managed by the broker introduce a serious problem: Applications can only process events in a topic in arrival order.
But they cannot control the number of events that are admitted to the queue for a given topic in a time window, so if they process events too slowly, they will fall behind or must drop events, losing fidelity.
Yet there is no way for the broker to communicate the arrival rate or depth of the queue to the application to enable it to appropriately scale resources.
How many topics does your app need? If you get it wrong, it’s difficult to fix later.
For any choice that results in more than one data source per topic, the app may end up having to read a lot of unimportant or out of date events from the queue to find information that’s important.
Because of this, there’s no way to bound application response times.
For an answer that is effectively “one source per topic” the challenge is the broker limit on topic queues.
Even a million topics is too few for large applications.
An application might need to consume and discard many events that are not of interest before finding one that is critical.
Since there is no way to know what might be in the queue, there’s no way to reason about the response time of the application.
Will it always find the critical event in time?Brokers are un-opinionated about the meaning of events – with no understanding of urgency, importance or irrelevance (they are just events in the queue).
So, an app may find itself processing useless or out of date events just to get to a critical insight – wasting time and making it impossible to deliver a real-time response.
Finally, brokers don’t reason across events, so for example a series of events that, if correlated, could trigger an application layer response, is not interpreted.
Application layer insights that rely on logical or mathematical relationships between data sources and their states, are not the domain of the broker.
It just buffers data.
As mentioned above, applications need to be built around a stateful model, which inevitably requires a database of some sort.
This makes application response times dependent on the database round-trip time (RTT), and poses another more serious challenge: If the data-driven part of the app is tasked with merely updating the database, what triggers and runs analysis on the resulting state changes? This lack of an end-to-end architecture for analysis, learning and prediction is a real roadblock to delivery of continuous intelligence.
About the Author Simon Crosby is CTO at Swim.
Swim offers the first open core, enterprise-grade platform for continuous intelligence at scale, providing businesses with complete situational awareness and operational decision support at every moment.
Simon co-founded Bromium (now HP SureClick) in 2010 and currently serves as a strategic advisor.
Previously, he was the CTO of the Data Center and Cloud Division at Citrix Systems; founder, CTO, and vice president of strategy and corporate development at XenSource; and a principal engineer at Intel, as well as a faculty member at Cambridge University, where he led the research on network performance and control and multimedia operating systems.
Simon is an equity partner at DCVC, serves on the board of Cambridge in America, and is an investor in and advisor to numerous startups.
He is the author of 35 research papers and patents on a number of data center and networking topics, including security, network and server virtualization, and resource optimization and performance.
He holds a PhD in computer science from the University of Cambridge, an MSc from the University of Stellenbosch, South Africa, and a BSc (with honors) in computer science and mathematics from the University of Cape Town, South Africa.