At CHES 2019 conference, eShard will be on stage to present new Side-Channel analysis methods based on Deep Learning. The talk will focus on the paper “Non-Profiled Deep Learning-based Side-Channel attacks with Sensitivity Analysis” published in the second issue of the TCHES 2019 journal.
In this post we present the digest of the paper ahead of the talk.
Courtesy of Besir Kurtulmus - Algorithmia
Side-Channel attacks are usually divided into two categories:
Non-Profiled and Profiled attack categories
Profiled attacks are based on data classification and are usually performed using Machine Learning techniques such as Support Vector Machines or Templates. In recent years, the trend in Machine Learning has been shifting towards Deep Learning, which today demonstrates great performances for many Machine Learning applications, for instance for data classification.
Quite naturally, following the trend in Machine Learning, the Side-Channel community has started to study the potential applications of Deep Learning for Side-Channel analysis. The most natural application is to use it to perform Profiled Side-Channel attacks. We have seen publications recently showing the interest of using Deep Learning to perform such attacks. These publications show that Deep Learning usually outperforms previous attack methods. One advantage of Deep Learning for Side-Channel is that one can adapt the neural network architecture to the attack context. One example studied in the literature is that Convolutional Neural Networks are particularly efficient to target de-synchronized traces.
However, so far the research efforts have been mainly focused on the applications of Deep Learning for Profiled attacks where a profiling device is available. The starting point of this research was to study how Deep Learning and Neural Network could be used in non-profiled scenarios where no profiling device is available.
Most of non-profiled attacks follow a similar strategy. The attacker starts collecting side-channel traces corresponding to known plaintext or ciphertext values. The attacker then guesses parts of the secret key in order to predict sensitive intermediate values of the targeted cryptographic algorithm manipulated by the device. The attacker finally makes use of a statistical distinguisher to compare predictions for each key guess with the side-channel traces in order to discriminate the correct key hypothesis from the other candidates.
The first idea presented in the paper is to use Deep Learning as a non-profiled distinguisher. To do so, one first chooses a neural network architecture. Then, for each key guess, a Deep Learning training is performed with the traces as data to classify and the predictions as the corresponding labels. Finally, the efficiency of the trainings are compared to discriminate the correct key value. The attack, called Differential Deep Learning Analysis (DDLA), is summarized in the diagram below.
For the correct key guess, the intermediate value predictions match with the set of traces while for the other key hypotheses the predictions are incorrect. Therefore, one can expect that the training performed with the correct key value will lead to a better training than for the other candidates. To compare the efficiency of the trainings, the first method proposed in the paper is to observe classic Deep Learning metrics such as the loss and accuracy of the trainings. In successful cases, the loss and accuracy of the training performed with the correct key will be better than the metrics for the other candidates, revealing the correct key value.
Example of DDLA with 3 key guesses. The good guess clearly leads to better accuracy and loss.
With this method, it is therefore possible to use Deep Learning to perform non-profiled attacks when no profiling device is available. The results presented in the paper show that the known advantages of Deep Learning for Side-Channel are still applicable with this approach. For instance, one can successfully use Deep Learning and neural networks to perform high-order attacks and use CNNs to attack de-synchronized traces even in a non-profiled context.
Locating the leakages of sensitive values in the trace is important in side-channel analysis. The information about the leakages locations can help understanding the implementation and be exploited to perform high-order attacks for instance.
The second idea presented in the paper, is to use techniques based on Sensitivity Analysis to reveal information about the leakage location in the traces. The sensitivity analysis of a neural network is the study of the sensitivity of the network output with regards to some of the network parameters. One application in image classification is for instance to compute so called saliency maps, which indicate which pixels of the image contribute the most to the image classification. There are different techniques to perform sensitivity analysis. The most common approach is to study the derivatives of the network output with regards to the selected parameters.
Example of sensitivity analysis in image classification.
In the paper, we propose to study the sensitivity of the neural network with regards to the traces samples during the training in order to locate the leakage areas in the trace. The paper presents results with several methods. The first approach is to observe the gradient of the first layer weights to reveal the leakage location. The second approach, is to directly observe the partial derivatives with regards to the input samples. The first approach is applicable with Multi Layer Perceptron networks while the second approach is a generic method applicable with any type of neural network.
Example of Sensitivity Analysis with 3 guesses. For the good guess it reveals the location of the leakage (t=30).
The weights directly connected to the leakage samples have bigger derivatives in average as they have a bigger impact on the loss minimization during the training. Therefore, by observing the derivatives with regards the first layer weights or with regards to the input samples it is possible to locate the leakage areas in the trace.
It is important to note that even though the technique is used in a non-profiled context in the paper, the exact same method can also be applied during classic Profiled Deep Learning attacks to locate leakages.
The paper presents results of successful attacks performed against masked AES implementations using Differential Deep Learning Analysis. The main interest of using DDLA to target masked implementation is that the attacker does not need to adapt the attack process to the implementation.
When using methods like CPA to target masked implementations, the attacker must adapt the attack strategy to the implementation. For example, to perform a high-order CPA, the attacker must consider how many masks are involved and apply specific pre-processing methods to combine the leakages of the different shares together.
With DDLA, the attacker does not need to adapt the attack strategy to the implementation nor to apply any pre-processing. The exact same attack process can be applied to both non-protected and masked implementation. It is the neural network which automatically adapts itself to the context during the training.
Another interest is that sensitivity analysis also works with masked implementations and can highlight the locations of the masks leakages in the traces.
Against masked implementations, Sensitivity Analysis can also reveal the location of the mask leakages (here t=10).
As one does not need to adapt the attack strategy to the implementation and that sensitivity analysis can reveal masks leakages, it makes DDLA an interesting attack alternative specially in black box scenarios where the details of the implementation are not known. The results of the attack can reveal information about the secret key as well as about the implementation such has whether the implementation is masked and the number of masks together with their locations in the traces.
The paper introduces two new side-channel analysis methods:
Both techniques were successfully applied on multiple datasets, including the ASCAD dataset and datasets collected on ChipWhisperer. The experiments show that the methods work in practice and that it provides an interesting alternative to perform non-profiled attacks, especially for high-order attacks and against de-synchronized traces.
The main drawback of DDLA is the complexity of the attack. In general, Deep Learning attacks are costly and in this case a Deep Learning training must be performed for each key guess, which further increases the complexity of the attack. The paper presents a series of interesting results but further work will be needed to study the practicability of the methods in more challenging scenarios.
This year again, eShard is glad to sponsor the CHES conference. We will be in Atlanta from 25th to 28th of August to present our latest content on Deep Learning and Side-Channel as well as the latest features of the esDynamic platform. If you attend CHES 2019, come meet our team!