Chip Security Testing

Binary Security Analysis

Resources

Chip Security

Optimized White-Box Tracing for Efficient Side-Channel Attacks

15 min read

Edit by Guillaume Vinet • Aug 29, 2024

Analyzing the strength of a White-Boxed Cryptography (WBC) Algorithm turned to a point when Bos et al. showed that the secret key can be fully recovered by tracing the corresponding binary at CHES 2016.

Examining the data traces resulting from the binary analysis turns out to be very powerful as it may lead to the secret with a minimum software reverse engineering effort and mathematical skills. For this reason, tracing is now part of the minimum toolkit that WBC analysts should exploit as part of the state-of-the-art.

Tracing has also shown some practical difficulties. The most significant issue concerns the data size as a single trace may reach dozens of gigabytes. As a result, the processing is tedious and takes time, storage and computing resources that an analyst could spend in getting in-depth analysis.

For this reason, eShard has upgraded its analysis tool, esReverse, to optimize the data tracing and therefore overcome the data size issue. This post shows how the data traces can be tremendously shrunk, a deeper insight is available in our WhibOx 2019 presentation: “Optimize Binary Tracing: Example with an ECDSA implementation”. The aim is to give more ability to the analyst for further testing any WBC implementation and covering larger space of attack techniques.

esReverse SVA Simulated Vulnerability Analysis module overview.png

Developers should take into account these new tool capabilities when designing their protections in order to get the right confidence level about their resistance to attacks. Similarly, security analysts should integrate these new techniques when evaluating a WBC solution.

What is White-Box Cryptography?

White-Box Cryptography enables to perform cryptographic operations without relying on a hardware component, but only on its software implementation. All algorithms can be implemented in White-Box Cryptography:

either symmetric such as AES, (T)DES, SM4
or asymmetric such as RSA, ECDSA or [SM2](SM2 and SM4 are chinese algorithms, cnnic.com website is sometimes not accessible).

Even with a full access to the White-Box, the key extraction must be difficult.

The goal is to keep the secret key safe when being executed in an untrusted environment, where it is reasonable to assume that an attacker has a total access to the software binary. Doing so, the attacker:

is able to execute the cryptography operation by choosing the plaintext to get the ciphertext,
is able to instrument the binary with a dynamic binary instrumentation (DBI) tool to monitor internal states or even tamper with the execution to get faulty ciphertexts.

A DBI tool enabling to trace a White-Box will be named a tracer, while a tool injecting faults will be named a faulter.

How to get access to a White-Box binary?

The WBC is used on mobile applications or devices to safeguard media content with Digital Right Management (DRM) or to protect payment transactions.

Embedded in an application, the White-Box binary must be extracted first from the whole mobile application binary. In the case of a mobile application on an Android platform, the binary is the Android Package (APK) file that anyone downloads from the store.

White-Box Extraction.

How to attack a White-Box?

When the WBC binary has been extracted, different techniques can be exploited to retrieve the secret key. Further reverse engineering can be applied to deconstruct step by step the algorithms and being able to recode it in another language.

This technique is very powerful, and can lead to recovering the full implementation and therefore expose the intellectual property. However, this extensive reverse engineering has several drawbacks:

Depending on the quality of the obfuscation and the runtime protections, several weeks to several months could be required to perform the attack.
Both in-depth reverse engineering and cryptographic skills are required.

A second way, called Differential Computational Analysis (DCA), was proposed at CHES 2016 by Bos et al. The binary is executed on a specific framework to trace internal states, for instance memory accesses or register values. Once traces are available, DCA makes use of Side Channel Attacks exploiting the information available from a set of data traces after many executions of the cryptographic operation using the same key.

With this kind of attack, the elapsed time can be highly reduced, from several weeks of reverse to several hours of DCA. It can be very efficient to retrieve the secret key. However, the White-Box algorithm remains mostly undiscovered.

As mentioned above, the trace sizes can quickly represent the biggest issue. The reason is coming either from the obfuscation layer used to protect the binary, or from the difficulty to focus the analysis on the potential leakage area. As a result, time is spent to generate the traces. Even if the data size can be decreased with post-processing operations, the whole process remains time-consuming. Tracing a few hundred of traces (combined with their post-processing) can quickly reach days of processing.

Requirements for an efficient tracing

From our track records in White-Box analyzes, we believe that analyzes can be dramatically improved providing that you have the right tool for selective tracing. Gaining time and computing resources allows to broaden the range of attacks and to go deeper in the analysis.

Selective tracing involves monitoring only the valuable information. This can be achieved by optimizing the way tracing is made and by giving the opportunity to the analyst to make educated choices. In the case of a full tracing, the analyst makes no choice as no information is lost. While in the case of selective tracing, this is up to him to decide which information must be kept.

As a result, it is important for the analyst to ask himself the following questions when considering an efficient tracing:

1. What to trace?

Depending on the frameworks, different data can be harvested during an execution. The Side Channel Marvels Tracers (TracerGrind and TracerPIN) collect the data involved in memory accesses in read or write. This is relevant when a WBC implementation makes use of look-up tables. If not protected, the address or the value of a data read from a look-up table may contain leakage information.

A broader way to proceed is to collect register values for each executed instruction and their address. This allows a comprehensive coverage of all values manipulated during a processing. Getting this data, it becomes possible to run dedicated analyzes against a register or another. For instance, the program counter register (corresponding to the memory address of the executed instruction) may provide a good understanding of the processing when a data register will provide exploitable leakages.

The more data collected, the bigger the traces are. This is up to the analyst to define whether being selective is valuable. Assuming the WBC implementation is unknown, it is preferable to collect the maximum of data and avoid missing valuable information.

To tackle this issue, eShard developed an advanced way to trace register by selectively recording the registers according to the instruction. Doing so, register values not involved in the instruction are ignored. As a result, data traces can be dramatically reduced.

In the image above, the Wyseur DES White-Box Challenge was traced with two different modes: first, all the registers are acquired, and then only the register accesses (more especially the written registers) are traced. In this second mode, the trace is nearly 10 times smaller. Despite this huge difference, the visual analysis of these two modes shows that the same characteristic patterns can be seen. Combined with the fact that the secret key was recovered by exploiting these traces, it demonstrates that the register access mode is suitable to reduce the trace size while keeping valuable information.

2. Where to trace?

When looking at the program counter over the processing, it may be valuable to reduce the analysis to a specific area of interest, such as a function or a sequence of operations. This requires collecting data only for a given range of program counter.

In the image above, we trace a complete execution of the Wyseur DES White-Box Challenge. There are nearly 3.500.000 points, and the graphical view of the trace does not give directly valuable information: we must zoom in to find valuable patterns.

In the image above, we simply restrict the tracing on the PC ranges of the loadable segments corresponding to the executed code. This information can be recovered from the binary header. And this time, when we display the trace, we obtain directly the DES round patterns.

In the same vein, it may be necessary to filter out a given value, address or even kind of operation (read or write), from the trace when collecting the memory access. This requires having a good idea of the information of interest.

The best optimization is achieved when targeting specific instructions and monitoring the corresponding registers. Obviously, this has to be made as an educated choice depending on the algorithms or assumptions about the implementation.

3. When do you want to trace?

Choosing the instants of the trace execution is also necessary. This is useful when targeting a round of a symmetric algorithm or the scalar multiplication of an ECDSA signature. A simple way is to select an area of interest by setting a range of instructions. This range can be defined by observing the program counter during the execution.

To reach the next step and to be able to perform fine tracing, eShard developed a trigger feature based on pattern detector to start or to stop the tracing. This is particularly useful when the number of instructions of the algorithm is not constant from one execution to the other.

The pattern detector can defeat desynchronization. In the image above, several desynchronized traces were acquired. If the attack used to recover the key demands aligned traces to be successful, it is likely to fail. This issue can be overcome in several ways that are both costly either in terms of time or resources:

more traces must be acquired,
a post-processing operation must be done to correct the misalignment.

However, in some cases, a fixed pattern could be detected. If a trigger is used to start the trace acquisition when it appears, the misalignment is automatically removed.

Another aspect of the pattern detector consists in combining it with PC range filtering to acquire very accurate area. In the image below, we use the PC range to indicate where we want to make the acquisition (red area). However, this is not optimal if an attack targeting only the first and the last rounds is executed. Indeed, the middle rounds are still acquired: we fill disk space with useless information. The trigger feature was specifically designed to overcome this kind of issue, and enables acquiring only the first and the last round in only execution.

Nowadays, the different open source frameworks available do not implement all these features. As a result, the analyst may feel limited due to the lack of advanced tracing features. This is particularly critical when the analyst needs to check the resistance of the code against a large range of attacks.

eShard developed esReverse software platform. It provides a user-friendly and efficient framework to apply all these techniques for advanced tracing and the complementary analysis, exploiting the latest developments in the side-channel area.

The framework includes the capability to fault a binary as well and explore potential fault attack vulnerabilities.

This module has been designed to validate a binary code against Side-Channel or Fault Injection attacks before loading it on a hardware device.

Use case: using esReverse to optimally trace the binary execution of an ECDSA implementation

To demonstrate esReverse capabilities, it was applied to analyze an ECDSA implementation based on an x86–64 architecture. The nature of this algorithm leads to have long execution involving a high number of instructions being executed. As a result, the data traces are expected to be large and long to trace.

With the right tool, it is possible to bring the data traces to a reasonable size. This is particularly necessary for such algorithm, as a black-box analysis would imply to try different attacks and therefore involve tracing the algorithm different times.

We attack the big-integer multiplication targeting the secret scalar. We remind the reader the ECDSA algorithm performs the following operations:

Our attack objective is to recover the secret key value d manipulated at step 7 of the algorithm. We target the mathematical operation .

Knowing several values , , which are part of the returned signatures, we can for each guess done on least significant byte(s) of estimate some intermediate values for these multiplications .

On the target binary the same operations might be done if no proper countermeasure is implemented. In that case, it is certain the same intermediate values are computed and present in some registers.

Then, performing correlation analysis between the traces obtained with the tracer and the estimated values for each guess , we expect to obtain the best correlation score when the guess is equal to the correct part of . It gives the attacker the knowledge of the target part of . Then, using divide and conquer method we iterate the attack from least significant to most significant part of to recover the secret.

It is worth to notice that contrary to physical Side-Channel attacks, no leakage model is required here. Indeed, we are correlating computed data in registers with estimated values. It might happen when we are right the correlation score is exactly 1.

The ECDSA execution is composed of 33 millions of instructions.

Different tracing methods were used to generate 10 traces, with the aim to compare the time needed to generate them and their sizes.

Full register, we acquired all registers: rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15, and pc.
Focus on register access. More especially, we traced only the registers updated by the instructions. Also, we used only one pattern detector to focus on the area of interest and get aligned traces.
Focus on memory access. We traced the value and the address of the manipulated data. We used the same pattern detector described previously.
Focus on mult instruction and the changing registers. Attacking multiplication is sensible for an ECDSA algorithm as classical implementations make use of this instruction. The same pattern detector described previously is applied.

Hereafter, you have the trace size and generation time for each mode. The cost of collecting is very high when targeting all registers, which can result in long computation time. Targeting the register access optimizes highly the speed of execution and the trace size without losing information. On the other hand, partial tracing such as memory access may not provide the information while the algorithm leaks.

Mult instruction mode is 20 times faster than the full register mode.

Filtering to the mult instruction leads to get trace size to 510.869 smaller than the full register mode.

Focusing on mult instruction gives very small traces and very good generation timing. Despite their size, they still contain leakage points since we were able to perform successfully a DCA and recover the key.

Example of scores obtained for a key word.

Focusing on mult instruction still enables to have good leakage points.

The key is composed of 24 bytes. We guess it by packet of four: first we guess the first byte, then the two first bytes, and so on. For each guess based on the four bytes, we retrieve the correct key value.

As said previously, for the DCA, the leakage model is not based on the Hamming Weight, but on the value. It enables to obtain correlation scores equals to one.

Conclusion

We showed that tracing the binary can be very efficient when analyzing the security of a White-Boxed cryptography binary execution. Demonstrated on an ECDSA implementation, the same can be applied to multiple algorithms. A deeper insight is available in our WhibOx 2019 presentation: “Optimize Binary Tracing: Example with an ECDSA implementation”. Without specific knowledge of the implementation, the secret can be recovered by applying more or less sophisticated attacks, directly inspired from the Side-Channel expertise.

Tracing the binary execution should be now part of any security analysis of White-Box Cryptography. To be properly done, it requires to instrument the binary with different options and optimizations. Indeed, an analysis must be able to capture all kinds of information produced by the algorithm under execution, and to focus on some specific part:

monitor memory or register access,
select information based on the PC value or the instruction,
trigger the area of interest with a specific pattern.

Selecting the information is critical to avoid too large data traces, that result in excessive computation time, memory space and complexity. With a proper tool, it can save days of effort when performing an analysis.

Acknowledgements

Thanks to Benoit Feix and Hugues Thiebeauld for proofreading this article.