The Hidden Risk of AI and Big Data

Luckily, the model is not significant.

The variables that are not significant are eliminated step by step and the model re-estimated.

This procedure is repeated until a significant model is found.

After a few steps a significant model is found with an Adjusted R squared of 0.

4 and 7 variables at a significance level of at least 99%.

Again, we are regressing random noise, there is absolutely no relationship in it, but still we find a significant model with 7 significant parameters.

This is what would happen if we just feed data to statistical algorithms to go find patterns.

” Recent research has provided proof that as data sets grow larger they have to contain arbitrary correlations.

These correlations appear simply due to the size of the data, which indicates that many of the correlations will be spurious.

Unfortunately, too much information tends to behave like very little information.

This is a major concern in applications where you work with high-dimensional data.

As an example, let’s say you gather sensor data from thousands of sensors on an industrial plant, and then mine these data for patterns to optimize performance.

In such cases, you could easily be fooled into acting upon phantom correlations rather than real indicators of operational performance.

This could potentially be very bad news, both financially and in terms of safe operation of the plant.

 As data scientists, we might often claim that the best solution to improving our AI model is to “add more data.

” However, the idea that just “adding more data” will magically improve the performance of your model might not be the case.

What we should focus on is rather to “add more information.

” The distinction between “adding data” and “adding information” is crucial: Adding more data does not equal adding more information (at least useful and correct information).

On the contrary, by blindly adding more and more data, we encounter the risk of adding data that contains misinformation that can accordingly downgrade the performance of our models.

With the abundant access of data, as well as the computing power to process it, this becomes increasingly important to consider.

 So, should the above challenges stop you from adopting data-driven decision making?.No, far from it.

Data-driven decision making is here to stay.

It will become increasingly valuable as we gain more knowledge on how to best harness all available data and information to drive performance, that being clicks on your website or optimal operation of an industrial plant.

However, it is important to be aware that it requires more than just hardware and lots of data to succeed.

Big data and computing power are important ingredients, but it is not the full solution.

Instead, you should understand the underlying mechanisms that connect the data.

Data will not speak for itself, and we give numbers their meaning.

The Volume, Variety or Velocity of data cannot change that.

 Calude, C.


& Longo, G.

Found Sci (2017) 22: 595: The Deluge of Spurious Correlationsblogs.


com: Does big data equal big problems?Guiseppe Longo: Mathematical Use and Abuse of Big DataNY Times: Eight (No, Nine!) problems with big dataThe Wire: THE END OF THEORY: THE DATA DELUGE MAKES THE SCIENTIFIC METHOD OBSOLETEJohn Poppelars: Do numbers really speak for themselves Original.

Reposted with permission.

Bio: Vegard Flovik is Lead Data Scientist at Axbit where he solves real-world problems for various industry sectors using machine learning and advanced analytics approaches.

With a Ph.


in Physics from the Norwegian University of Science and Technology, Vegard has researched and published in the areas of complex systems, bio-inspired computing, neuroscience, and machine learning.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.



js; (document.

getElementsByTagName(head)[0] || document.


appendChild(dsq); })();.

. More details

Leave a Reply