SHA-3 Algorithms¶
Overview¶
SHA-3 (Secure Hash Algorithm 3) is a set of cryptographic hash functions defined in FIPS 202: SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions.
The SHA-3 family consists of six hash functions with digests (hash values) that are 128, 224, 256, 384 or 512 bits: SHA3-224, SHA3-256, SHA3-384, SHA3-512, SHAKE128, SHAKE256.
Currently, this library supports all of the algorithms mentioned above.
- SHA3-224
- SHA3-256
- SHA3-384
- SHA3-512
- SHAKE-128
- SHAKE-256
Implementation on FPGA¶
The internal structure of SHA-3 algorithms can be shown as the figures below:
As we can see from the figures, hash calculation in both SHA-3 and SHAKE is much different from SHA-1 and SHA-2. Since the internal state array is updated iteratively (by the input message) and used in the next permutation, it cannot be partitioned into block generation part and digest part.
Both the digest parts of SHA-3 and SHAKE pad or split the input message into fixed sized blocks (1600-bit for each), and XOR it to the state array of the last iteration.
The message word size is 64-bit for both SHA-3 and SHAKE, and each block has a different number of message words according to the specific suffix of the algorithm which is selected. The number can be defined as:
Loop-carried dependency is enforced by the algorithm, and thus the digest part cannot reach II=1.
Performance¶
SHA3-224¶
A single instance of SHA3-224 function processes input message at the rate of
144 byte / 1105 cycles
at 303.58MHz.
The hardware resource utilizations of SHA3-224 is listed in tab1SHA3224
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 36974 | 45821 | 7819 | 606 | 3.294 |
SHA3-256¶
A single instance of SHA3-256 function processes input message at the rate of
136 byte / 1104 cycles
at 306.65MHz.
The hardware resource utilizations of SHA3-256 is listed in tab1SHA3256
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 36787 | 44975 | 7203 | 606 | 3.261 |
SHA3-384¶
A single instance of SHA3-384 function processes input message at the rate of
104 byte / 1100 cycles
at 310.75MHz.
The hardware resource utilizations of SHA3-384 is listed in tab1SHA3384
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 33782 | 40511 | 7264 | 611 | 3.218 |
SHA3-512¶
A single instance of SHA3-512 function processes input message at the rate of
72 byte / 1096 cycles
at 316.25MHz.
The hardware resource utilizations of SHA3-512 is listed in tab1SHA3512
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 32994 | 39794 | 7264 | 611 | 3.162 |
SHAKE-128¶
A single instance of SHAKE-128 function processes input message at the rate of
168 byte / 1108 cycles
at 306.37MHz.
The hardware resource utilizations of SHAKE-128 is listed in tab1SHAKE128
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 37577 | 47898 | 8229 | 610 | 3.264 |
SHAKE-256¶
A single instance of SHAKE-256 function processes input message at the rate of
136 byte / 1104 cycles
at 302.02MHz.
The hardware resource utilizations of SHAKE-256 is listed in tab1SHAKE256
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 36789 | 44909 | 7889 | 606 | 3.311 |
Clustering¶
To boost the throughput of SHA-3 primitives, multiple instance can be organized into a cluster, and offer message level parallelism.