Adversarial Robustness

Adversarial RobustnessHow IBM Wants to Defend Neural Networks Against Attacks by Other Neural NetworksJesus RodriguezBlockedUnblockFollowFollowingFeb 4Generative adversarial neural networks(GANs) are one of the most active areas of research in the deep learning ecosystem.

Conceptually, GANs are a form of unsupervised learning in which two neural networks build knowledge by competing against each other in a zero-sum game.

While GANs are a great mechanism for knowledge acquisition, they can also be used to generate attacks against deep neural networks.

In a very well-known example, a GAN attacker can cause imperceptible changes in training images to trick a classification model.

The topic of evaluating the robustness of models against adversarial attacks have been a top priority of AI powerhouses such as OpenAI or Google.

A bit under the radar, IBM has been doing a lot of work trying to advance the research and implementation of adversarial attacks in deep neural network.

Just last week, IBM AI researchers published two different research papers in the area of GAN protection.

Today, I would like to explore some of IBM’s recent work about protecting neural networks against adversarial attacks and discuss its relevance in modern deep learning implementation.

White-Box vs.

Black-Box AttacksAdversarial attacks against deep neural networks can be classified in two main groups: White-Box and Black-Box based on their knowledge of the model’s training policy.

The white-box adversarial attacks describe scenarios in which the attacker has access to the underlying training policy network of the target model.

The research found that even introducing small perturbations in the training policy can drastically affect the performance of the model.

Black-box adversarial attacks describe scenarios in which the attacker does not have complete access to the policy network.

In AI research literary, black-box attacks are classified into two main groups:1) The adversary has access to the training environment and knowledge of the training algorithm and hyperparameters.

It knows the neural network architecture of the target policy network, but not its random initialization.

They refer to this model as transferability across policies.

2) The adversary additionally has no knowledge of the training algorithm or hyperparameters.

They refer to this model as transferability across algorithms.

A simpler way to think about white-box and black-box adversarial attacks is whether the attacker is targeting a model during training time or after is deployed.

Despite that simple distinction, the techniques used to defend against white-box or black-box attack are fundamentally different.

Recently, IBM has been dabbling into both attack models both from the research and implementation standpoint.

Let’s take a look at some of IBM’s recent efforts in adversarial attacks.

Adversarial Robustness ToolboxThe Adversarial Robustness Toolbox(ART) is one of the most complete resources for evaluating the robustness of deep neural networks against adversarial attacks.

Open sourced by IBM a few months ago, ART supports incorporates techniques to prevents adversarial attacks for deep neural networks written in TensorFlow, Keras, PyTorch and MxNet although support for more deep learning frameworks should be added shortly.

ART operates by examining and clustering the neural activations produced by a training dataset and trying to discriminate legit examples from those likely manipulated by an adversarial attack.

The current version of ART focuses on two types of adversarial attacks: evasion and poisoning.

For each type of adversarial attack, ART includes defense methods that can be incorporated into deep learning models.

Developers can start using ART via its Python SDK which doesn’t require any major modifications in the architecture of the deep neural network.

AutoZOOMA large percentage of the adversarial attacks against deep neural networks are produced in a white-box setting in which the attacker has access to the network’s training policy.

Black-Box attacks are both more challenging to implement as well as more difficult to defend against them.

For instance, a black-box attack against an image classification model typically needs to execute a large number of queries against the model in order to find the correct adversarial images.

Many times, those queries cause a performance degradation in the model that have nothing to do with the attack itself but with the network’s poor design for high volume queries.

For instance, in the following image, the black-box adversarial attack takes more than 1 million queries to find the adversarial image.

That level of computational resources are rarely available to attackers.

IBM’s Autoencoder-based Zeroth Order Optimization Method(AutoZOOM) is a technique for creating more efficient black-box attacks.

Initially published in a research paper, AutoZOOM also includes an open source implementation that can be used by developers across several deep learning frameworks.

The goal of AutoZOOM is to accelerate the efficiency of queries targeting adversarial examples and it accomplishes that using two main building blocks:i.

An adaptive random gradient estimation strategy to balance query counts and distortion.

ii.

An autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for acceleration.

To achieve i, AutoZOOM features an optimized and query-efficient gradient estimator, which has an adaptive scheme that uses few queries to find the first successful adversarial perturbation and then uses more queries to fine-tune the distortion and make the adversarial example more realistic.

To achieve ii, AutoZOOM implements a technique called “dimension reduction” to reduce the complexity of finding adversarial examples.

The dimension reduction can be realized by an offline trained autoencoder to capture data characteristics or a simple bilinear image resizer which does not require any training.

The initial tests of AutoZOOM showed that the method is able to generate black-box adversarial examples with a far fewer queries than traditional methods.

CNN-CertAs you can probably tell from the examples, convolutional neural networks(CNNs) used for image classification are among the top targets of adversarial attacks.

However, many of the current defense techniques are not optimized for CNN architectures.

Created by AI researchers of the Massachusetts Institute of Technology(MIT) and IBM, CNN-Cert is a framework for certifying the robustness of CNNs against adversarial attacks.

The key innovation in CNN-Cert is deriving explicit network output bound by considering the input/output relations of each building block.

The activation layer can be general activations other than ReLU.

The approach demonstrated to be about 11 to 17 times more efficient than traditional adversarial robustness certification methods.

CNN-Cert is able to handle various architectures including convolutional layers, max-pooling layers, batch normalization layer, residual blocks, as well as general activation functions such as ReLU, tanh, sigmoid and arctan.

As you can see, IBM seems to be really committed to advance the conversation about adversarial attacks in deep neural networks.

Efforts like ART, AutoZOOM and CNN-Cert are among the most creative recent efforts in adversarial techniques.

Hopefully, we will see some of these implementations included in mainstream deep learning frameworks soon.

.. More details

Leave a Reply