Security Vulnerabilities of Neural Networks

Or one of each?In terms of a classification algorithm which has decision boundaries, here is an illustration of how the network is corrupted by the introduction of the strategic noise.

Illustration of an adversarial attack in the feature space.

Source: Panda et al.

 2019There are two main types of attacks that are possible: white box attacks and black box attacks.

Grey box attacks arise in the cybersecurity field but are not present in neural attacks.

A white box attack occurs when someone has access to the underlying network.

As a result, they will know the architecture of the network.

This is analogous to a white box penetration test of a company’s IT network— these are routinely done in the corporate world to test the defensive capabilities of companies IT infrastructure.

Once a hacker understands how your IT network is structured, it makes it much easier to sabotage.

This is a situation where the phrase knowledge is power is especially true, as knowing the structure of the network can help you select the most damaging attacks to perform, and also helps to unveil weaknesses relevant to the network structure.

White box attack.

Architecture is known and individual neurons can be manipulated.

A black box attack occurs when the attacker knows nothing about the underlying network.

In the sense of neural networks, the architecture can be considered as a black box.

Whilst it is more difficult to perform attacks on a black box, it is still not impervious.

The underlying procedure for black box attacks was first described by Papernot et al.


Presuming that we are able to test as many samples as we like on the network, we can develop an inferred network by passing a bunch of training samples into the network and obtaining the output.

We can then use these labeled training samples as our training data and train a new model to obtain the same output as the original model.

Once we have our new network, we can develop adversarial examples for our inferred network and then use these to perform adversarial attacks on the original model.

This model does not depend on knowing the architecture of the network, although this would make it easier to perform the attack.

Physical AttacksAs you might have already realized, these attacks are all software attacks, but it is actually possible to physically attack the network.

I do not know of any implementations of this that have actually occurred, but several research studies have looked at using ‘adversarial stickers’ to try and fool the network.

Below is an example.

Physical attacks on neural networks using adversarial stickers.

Clearly, this presents a potential problem for the mass adoption of self-driving cars.

Nobody would want their car to ignore a stop sign and continue driving into another car, or a building, or a person.

Do not be too alarmed though, there are ways to protect networks against all of these types of attacks, which I will get into later.

Evasion and Poison AttacksAll of the attacks we have discussed up to now have been evasion attacks, i.


they have involved ‘fooling’ a system.

A good example would be fooling a spam detector that guards email accounts so that you are able to get your spam emails into someone’s inbox.

Spam detectors often use some form of machine learning model (like a naive Bayes classifier) that can be used for word filtering.

If an email contains too many ‘buzzwords’ that are typically associated with spam email (given your email history as the training data), then it will be classified as spam.

However, if I know these words I can deliberately change them to make it less likely that the detector will consider my email as spam, and I will be able to fool the system.

Another good example is in computer security, where machine learning algorithms are often implemented in intrusion detection systems (IDSs) or intrusion prevention systems (IPSs).

When a network packet reaches my computer that has the characteristic signature of a piece of malware, my algorithm kicks in and stops the packet before it can do anything malicious.

However, a hacker can use obfuscated code to ‘confuse’ the network so that it does not flag up a problem.

As a final example, some researchers at MIT developed a 3D printed turtle whose texture was able to fool Google’s object detection algorithm and make it classify the turtle as a rifle.

This last one is a bit concerning given that Google’s algorithm is currently used in many industries for commercial purposes.

Poisoning attacks involve compromising the learning process of an algorithm.

This is a slightly more subtle and insidious procedure than evasion attacks, but only works on models that participate in online learning, i.


they learn on the job and retrain themselves as new experiences (data) become available to them.

This may not sound like too big of a problem until we consider some examples of poisoning attacks.

To go back to our IDSs example, these are constantly updated using online learning since new viruses are always being developed.

If one wishes to prevent a zero-day attack, it is necessary to give these systems the capability of online learning.

An attacker could poison the training data by injection carefully designed samples to eventually compromise the whole learning process.

Once this happens, your IDS becomes essentially useless, and you are at much greater risk from potential viruses and likely will not even realize.

Poisoning may thus be regarded as adversarial contamination of the training data.

The same could be said of our spam detector example.

This section has given us a broad overview of the kinds of problems we might expect.

In the next section, we will look more closely at how to deal with white and black box attacks, the most common types of adversarial attacks, and the defenses one can use in their neural networks to ameliorate these security concerns.

Specific Attack TypesIan Goodfellow (the creator of the generative adversarial network, and the one who coined the term) published one of the first papers looking at potential security vulnerabilities in neural networks.

He decided to call this ‘adversarial machine learning’, which is relatively easy to confuse with the generative adversarial network.

Just to make it clear, they are not the same.

Ian described the first kind of attack, the Fast Gradient Step Method.

This manipulates the sharp decision boundaries used by the classifier by the introduction of strategic noise, as we have been discussing up to now.

When your neural network suddenly thinks everything is an ostrich… Source: Szegedy et al.

 (2013)Post-Goodfellow 2015 attacksThere are a number of new attack vectors that have been developed over the past couple of years, the main ones being:JSMA (Jacobian-based Saliency Map) [Papernot et.

al, 2016]C&W [Carlini and Wagner, 2016]Step-LL [Kurakin et.

al, 2017]I-FGSM [Tramer et.

al, 2018]To prevent this article for stretching on for too long, I will not go into the specific details of each algorithm.

However, feel free to comment if you would like me to cover these in more detail in a future article.

Network DefenceThere are a number of methods that have been developed to defend neural networks from the various types of attack vectors that we have discussed.

Adversarial TrainingThe best way of defending against adversarial attacks is through adversarial training.

That is, you actively generate adversarial examples, adjust their labels, and add them to the training set.

You can then train the new network on this updated training set and it will help to make your network more robust to adversarial examples.

Adversarial training in action.

Smooth decision boundariesRegularization is always the answer.

In this sense, it is the derivatives with respect to the data that we are regularizing.

This acts to smoothen the decision boundaries between classes and makes it less easy to manipulate network classification using strategic noise injection.

(Left) Hard decision boundary, (right) smoothened decision boundary.

MixupMixup is a simple procedure that seems a little bit odd at first, it involves mixing two training examples by some factor λ, which is between zero and one, and assigning non-integer classification values to these training samples.

This acts to augment the training set and reduces the optimistic classification tendencies for networks.

It essentially diffuses and smoothens the boundaries between classes and reduces the reliance of classification on a small number of neuron activation potentials.

mixup: Beyond Empirical Risk MinimizationLarge deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to…arxiv.

orgCleverHansFor those of you unfamiliar with the tale of Clever Hans, I recommend you give it a Google.

The essential story is of a horse who supposedly was able to do basic arithmetic by stamping his feet a given number of times.

However, it was later discovered that the horse was actually cheating and responding to the verbal and visual clues of the surrounding crowd.

tensorflow/cleverhansAn adversarial example library for constructing attacks, building defenses, and benchmarking both …github.

comWelcome to the cleverhans blogJekyll blog associated with cleverhanswww.


ioIn the same spirit, CleverHans is a Python library that has been developed to benchmark machine learning systems’ vulnerability to adversarial examples.

If you are developing a neural network and want to see how robust it is, test it out with CleverHans and you will find get an idea of its level of vulnerability.

This is analogous to using the Burp Suite to test for code injection vulnerabilities.

Penetration TestingAs with any form of cybersecurity, you can always pay someone to hack you and see how much damage they do.

However, you can make them sign a document which specifies limits on what the attacker is allowed to do.

This gives you an idea of your level of vulnerability to an actual cyber attack.

I have no knowledge of whether penetration testing firms offer these kinds of services at the current time, but it would not surprise me.

Final CommentsI hope you enjoyed this article and now have a better understanding of this interesting new subfield of cybersecurity that involves the compromising of machine learning models through various attack vectors.

This field is still very new and there are a plethora of research papers that can be found, most of which are available on arxiv, meaning that they are free to view by the public.

I recommend you give a few of them a read — the articles in the references are a good place to start.

References[1] Panda, P.

, Chakraborty, I.

, and Roy, K.

, Discretization based Solutions for Secure Machine Learning against Adversarial Attacks.


[2] Zhang, H.

, Cisse, M.

, Dauphin, Y.

, and Lopez-Paz, D.

mixup: Beyond Empirical Risk Minimization.


[3] Goodfellow, I.

, Shlens, J.

, and Szegedy, C.

Explaining and Harnessing Adversarial Examples.


[4] Papernot, N.

, McDaniel, P.

, Goodfellow, I.

, Jha, S.

, and Celik, Z.

, Swami, A.

Practical Black-Box Attacks against Machine Learning.


[5] Papernot, N.

, McDaniel, P.

, Jha, S.

, Fredrikson, M.

, Celik, Z.


, and Swami, A.

The limitations of deep learning in adversarial settings.

In Proceedings of the 1st IEEE European Symposium on Security and Privacy, pp.

372– 387, 2016.

[6] Kurakin, A.

, Goodfellow, I.

, and Bengio, S.

Adversarial Machine Learning at Scale.


[7] Tramer`, F.

, Kurakin, A.

, Papernot, N.

, Goodfellow, I.

, Boneh, D.

, and McDaniel, P.

Ensemble Adversarial Training: Attacks and Defenses.



. More details

Leave a Reply