The ravages of concept drift in stream learning applications and how to deal with it

By Jesús López, Data Scientist and Researcher in Machine Learning and Artificial Intelligence   The Big Data paradigm has gained momentum last decade, because of its promise to deliver valuable insights to many real-world applications.

With the advent of this emerging paradigm comes not only an increase in the volume of available data, but also the notion of its arrival velocity, that is, these real-world applications generate data in real-time at rates faster than those that can be handled by traditional systems.

This situation leads us to assume that we have to deal with a potentially infinite and ever-growing datasets that may arrive continuously (stream learning) in batches of instances or instance by instance, in contrast to traditional systems where there is free access to all historical data.

These traditional processing systems assume that data are at rest and simultaneously accessed.

The models based on this traditional processing do not continuously integrate new information into already constructed models but, instead, regularly reconstruct new models from the scratch.

However, the incremental learning [1] that is carried out by stream learning methods presents advantages for this particular processing by continuously incorporating information into its models, and traditionally aim at minimal processing time and space.

Stream learning also presents many new challenges [2] and poses stringent conditions: only a single instance (or a small batch of instances) is provided to the learning algorithm at every time instant, a very limited processing time, a finite amount of memory, and the necessity of having trained models at every scan of the streams of data.

In addition, these streams of data may evolve over time and may be occasionally affected by a change (which can be abrupt, gradual, etc.

as in Figure 1) in their data distribution (concept drift), forcing models to learn/adapt under non-stationary conditions.

We can find many examples of real-world stream learning applications [3], such as mobile phones, industrial process controls, intelligent user interfaces, intrusion detection, spam detection, fraud detection, loan recommendation, monitoring and traffic management, among others.

In this context, the Internet of Things (IoT) has become one of the main applications of stream learning, since it is producing huge quantity of data continuously in real-time.

The IoT is defined as sensors and actuators connected by networks to computing systems, which monitors and manages the health and actions of connected objects or machines in real-time.

Therefore, stream data analysis is becoming a standard to extract useful knowledge from what is happening at each moment, allowing people or organizations to react quickly when inconveniences emerge or when new trends appear, helping them increase their performance.

    These predictive models need to be adapted to these changes (drifts) as fast as possible while maintaining good performance scores (i.

e.

accuracy), obtaining the maximum performance score and using minimum time and low memory at the same time.

Otherwise, these predictive models trained over these data will become obsolete and will not not adapt suitably to the new distribution.

 In order to understand the impact of concept drift on these evolving data streams, let’s have a look at two examples.

Figure 2 clearly shows such a case.

The stream learning scenario starts at t=0, and during the first 100 instances, both models are pre-trained.

From then on, they start to learn incrementally by using one instance at a time.

During the stable phase (first concept, in the absence of drift), both models perform equally.

But after drift occurs at t=1000 (abrupt drift) a new concept emerges, and the model without detection and adaptation mechanisms (solid line) starts to worsen its predictive performance (i.

e.

prequential accuracy).

However, the model with detection and adaptation mechanisms (dotted line) has forgotten the old concept and has learnt the new one quickly (adaptation), by providing a competitive predictive performance.

Finally, Error: Reference source not found shows a binary classification problem in which one of the models (upper) does not adapt to the new distribution after drift occurs, and thus is not able to correctly classify the incoming instances.

Nevertheless, the other one (lower) adapts to the new situation and then correctly classify the incoming instances.

As it has been previously mentioned, sometimes the adaptation mechanism is carried out in a passive manner, but models frequently need a drift detector to know the best moment to trigger their adaptation mechanism (active manner).

 Therefore, the drift detection and adaptation mechanisms are key ingredients in stream learning environments under evolving conditions.

   As previously mentioned, in stream learning we cannot explicitly store all past data to detect or quantify the change, so concept drift detection and adaptation become big challenges for real-time algorithms [4].

And two change management strategies are usually grouped to deal with concept drift [5]: passive (which updates the model continuously every time new data instances are received) and active (only the model gets updated when a drift is detected).

Both can be successful in practice, however, the reason for choosing one strategy over the other is typically specific to the application.

In general, a passive strategy has shown to be quite effective in prediction settings with gradual drifts and recurring concepts, while an active strategy works quite well in settings where the drift is abrupt.

Besides, a passive strategy is generally better suited for batch learning whereas an active strategy has been shown to work well in online settings as well.

Drift detection mechanisms quantitatively characterize concept drift events by identifying change points or small-sized periods of time (windows) during which these changes may occur.

Drift detectors may return not only signals about drift occurrence, but also warning signals, which are usually conceived as the moment when a change is suspected and a new training set representing the new concept should start being collected.

Drift detection is not atrivial task because, on the one hand, sufficiently fast drift detection should be ensured to quickly replace the outdated model and to reduce the restoration time (transient to the stable period).

On the other hand, it is not convenient to have too many false alarms (there is no real drift in the stream), because the successive application of drift handling techniques could be counterproductive.

In the Figure 4 we see an example of a drift detection mechanism.

    The references provided here contain software implementations for streaming algorithms that can work on stationary and non-stationary scenarios.

We do not claim this list to be exhaustive, but provides several opportunities for novices to get started, and established researchers to expand their contributions.

  FOOTNOTES  REFERENCES[1] Losing, V.

, Hammer, B.

, & Wersing, H.

(2018).

Incremental on-line learning: A review and comparison of state of the art algorithms.

Neurocomputing, 275, 1261–1274.

[2] Lu, J.

, Liu, A.

, Dong, F.

, Gu, F.

, Gama, J.

, & Zhang, G.

(2018).

Learning under concept drift: A review.

IEEE Transactions on Knowledge and Data Engineering, .

[3] Žliobaitė, I.

, Pechenizkiy, M.

, & Gama, J.

(2016).

An overview of concept drift applications.

In Big Data Analysis: New Algorithms for a New Society (pp.

91–114).

Springer.

[4] Lu, J.

, Liu, A.

, Dong, F.

, Gu, F.

, Gama, J.

, & Zhang, G.

(2018).

Learning under concept drift: A review.

IEEE Transactions on Knowledge and Data Engineering, .

[5] Ditzler, G.

, Roveri, M.

, Alippi, C.

, & Polikar, R.

(2015).

Learning in nonstationary environments: A survey.

IEEE Computational Intelligence Magazine, 10, 12–25.

  Bio: Dr.

Jesús López (@txuslopez) is currently based in Bilbao (Spain) working at TECNALIA as Data Scientist and Researcher in Machine Learning and Artificial Intelligence.

His research topics are real-time data mining (stream learning), concept drift, continual learning, anomaly detection, spiking neural networks, and cellular automata for data mining.

Not forgetting the leisure side, he also loves to go outdoors to surf all over the globe.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.

disqus.

com/embed.

js; (document.

getElementsByTagName(head)[0] || document.

getElementsByTagName(body)[0]).

appendChild(dsq); })();.

. More details

Leave a Reply