Pincering SKINNY by Exploiting Slow Diffusion: Enhancing Differential Power Analysis with Cluster Graph Inference
This repository hosts the codebase that was used for the paper Pincering SKINNY by Exploiting Slow Diffusion: Enhancing Differential Power Analysis with Cluster Graph Inference
scheduled to appear in TCHES 2023,
issue 4. It contains implementations of SKINNY integrated with the ChipWhisperer code (with an example on how to use
them to collect traces) and the implementation of the CGI-DPA attack on real traces and Hamming Weight model. Furthermore,
the dataset of traces we used for the paper will be available in the artifact of the paper (currently under submission).
WARNING: This repository was developed in an academic context and no part of this code should be used in any production system. In particular the implementations of cryptosystems in this tool are not safe for any real world usage.
Acknowledgement Our implementations of SKINNY leaned heavily on the FELICS project (our LUT implementation is theirs integrated with the Chipwhisperer code) and on the Skinny-C implementation by Rhys Weatherley (for the S-Box circuit).
This repositery is licensed under multiple licenses.
The code derived from the ChipWhisperer project is licensed under GPLv3, as denoted in their files. The rest of the code is licensed under the MIT license.
- GPL-3.0 (https://www.gnu.org/licenses/gpl-3.0.html)
- MIT license (LICENSE or http://opensource.org/licenses/MIT)
Under the folder ChipWhisperer are two implementations of SKINNY-128-384 (with a 384-bit tweakey state and 56 rounds) integrated with the ChipWhisperer code. One implementation (LUT) uses lookup tables for the S-Box computation while the other uses a circuit. For more detail on each implementation we refer to Section 5 of our paper. Those implementations are aimed at the ChipWhisperer Lite or the ChipWhisperer Kit using the STM32F3 target board.
This folder is not self-sufficient, and each implementation needs to be integrated in the ChipWhisperer project. Specifically, the folders simpleserial-SKINNY and simpleserial-SKINNY-LUT should be placed in the folder chipwhisperer/hardware/victims/firmware/ and used similarly to the other examples (such as simpleserial-AES for which the ChipWhisperer project provides example jupyter notebooks).
Compared to this example, our SKINNY implementation has several extra flags that must be used to record traces correctly. Hence we give a self-contained Python script that showcases how to flash the ChipWhisperer with the firmware (that can be compiled using the provided bash script), set the tweakey state and record traces. Technical details:
- The implementation precomputes the round-tweakeys (RTK) upon receiving TK3. Our traces assumed fixed public TK1 and TK2, and TK3 as the key.
- We need to record the first and last few rounds for our CGI-DPA attacks. However, the ChipWhisperer only records 5000 timestamps which is insufficient to cover the full encryption. We, therefore, made a flag
e
that we can set to record the last rounds. When this flag is set, the implementation executes the first 47 rounds of encryption before setting the trigger to record the trace. - We discovered that if you were to record profiling traces with a rotating key (where we recompute the RTK between each trace) and attack traces with a fixed key (with no recomputation of the RTK), the resulting datasets are misaligned. Our solution was to force a recomputation of the RTK for the attack dataset. We are still unsure why the traces are misaligned despite the computation happening outside of the triggers and ack signals cleanly segregating the RTK computation and the encryption (most likely a pipeline issue).
Under the folder hamming_weight are three Python scripts that we used to perform the Hamming Weight simulation experiments (for more information see Section 5 of the paper). All scripts rely on the numpy and scipy packages.
The 16sbox script exploits exactly one S-Box per key byte; we used a simplification and only computed scores for one S-Box of the first round and one S-Box of the last round (the ones for K1 and K9). We then multiply the resulting success rates to obtain the success rate for the full key. It outputs a final success rate in log2 scale (a success rate of x translates to
The 32sbox script exploits all S-Boxes that depend on a single key byte. Each key byte is still handled separately, with the success rate of each key byte being multiplied at the end, but this time we compute the success rate for all of the 32 S-Boxes. It also outputs a log2 success rate.
The 44sbox script uses all 44 S-Boxes and CGI-DPA, as explained in Section 4 of the paper. The cluster graph is hardcoded in the form of edges and nodes with a predefined order of the message transmitted. It outputs a .csv
file with rows shaped (key_id;rank;ranks;...;rank;
) where the rank is the position of the true key at a given trace.
All those scripts take as a mandatory command line argument (in this order) SIGMA
(float
, N_TRACES
(int
, number of traces per experiment), N_EXPERIMENT
(int
, number of experiments to perform before averaging). Additionally, 44sbox takes two extra optional arguments, OUTPUT_PATH
(string
, the output path for the .csv
file, a default one is given otherwise), SEED
(int
, the seed to use for the PRNG, if none provided 4 bytes are taken from /dev/urandom
). Example command line is python3 44sbox_skinny.py 4.0 100 10
. As a disclaimer, the code is not exactly "well written". There is a lot of code duplication that could be improved, but we decided to minimize the changes from the version we used for the submission.
Under the folder real_traces are Python scripts that we used to perform the experiment on real traces recorded using the ChipWhisperer implementations mentionned above. All scripts rely on the numpy and scipy packages, and additionally use the packages threadpoolctl and Skinny. Finally, those scripts use the datasets of traces we collected for the paper. Those traces are available for download in the TCHES artifacts (link pending submission). More information about the trace collections and the technical details of the attacks are available in Section 5 of the paper.
The cpa script uses a correlation based distinguisher. It outputs 3 .csv
file that contains experimental results (key_id;ranks, like the 44sbox script for the Hamming weight experiments) for 16 S-Boxes, 32 S-Boxes and 44 S-Boxes. Two additional command line arguments are available. --lut
signals to use the LUT dataset (by default the circuit one is used). --offset
followed by an int
The 16_32sbox_profiled script uses profiling. It builds templates using the profiling dataset first before running the attack after. Additional arguments are --lut
(same as cpa) and --single
which signals to only use a single S-Box per key byte (16 S-Boxes total), default being 32 S-Boxes. It also outputs .csv using the same format.
Finally, the 44sbox_profiled script also uses profiling and all 44 S-Boxes. Same optional --lut
argument and --offset
followed by an int
and same .csv output format. By default the script performs 50 experiments of 50 traces each for the LUT and 25 experiment of a 100 traces for the circuit dataset. Those values are hardcoded and can be changed at line 825-830 of the script.