SHA-2 Algorithms¶
Overview¶
SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions defined in RFC 6234: US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF).
The SHA-2 family consists of six hash functions with digests (hash values) that are 224, 256, 384 or 512 bits: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256.
This library supports all of the algorithms mentioned above.
Implementation on FPGA¶
The internal structure of SHA-2 algorithms can be shown as the figure below:
As we can see from the figure, the SHA-2 hash calculation can be partitioned into two main parts.
- The pre-processing part pads or splits the input message into fixed sized blocks, and informs the down-stream parts that how many blocks do we have in this message. The message word size is 32-bit for SHA-224/SHA-256, 64-bit for the rest 4 algorithms, and each block has a size of 16 message words.
- The digest part iteratively computes the hash values. Loop-carried dependency is enforced by the algorithm, and thus this part cannot reach II=1.
As these two parts can work independently, they are designed into parallel dataflow process, connected by streams (FIFOs).
The dup_strm module is used to duplicate the number of block stream, and generateMsgSchedule module is responsible for generating the message word stream in sequence.
Performance¶
SHA-224 and SHA-256¶
As SHA-224 is simply truncated SHA-256 with different initialization values, and they share the same internal structure, as illustrated in the figure above.
A single instance of SHA-256/SHA-224 function processes input message at the rate of
512 bit / 68 cycles
at 330.25MHz/314.36MHz respectively.
The hardware resource utilizations of SHA-224 is listed in tab1SHA224
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 7806 | 4976 | 1121 | 0 | 3.028 |
The hardware resource utilizations of SHA-256 is listed in tab1SHA256
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 7806 | 4973 | 1176 | 0 | 3.181 |
SHA-384, SHA-512, SHA-512/224, and SHA-512/256¶
As SHA-384 and SHA-512/t is simply truncated SHA-512 with different initialization values, they share the same internal structure, as illustrated in the figure above.
A single instance of one of SHA-384/SHA-512/SHA512-224/SHA512-256 processes input message at the rate of
1024 bit / 84 cycles
at 313.28MHz/323.31MHz/310.26MHz/313.57MHz.
The hardware resource utilizations of SHA-384 is listed in tab1SHA384
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 15494 | 8317 | 2045 | 0 | 3.192 |
The hardware resource utilizations of SHA-512 is listed in tab1SHA512
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 15497 | 8318 | 2015 | 0 | 3.093 |
The hardware resource utilizations of SHA-512/224 is listed in tab1SHA512224
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 15498 | 8318 | 2101 | 0 | 3.223 |
The hardware resource utilizations of SHA-512/256 is listed in tab1SHA512256
below:
BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |
0 | 0 | 15497 | 8322 | 2029 | 0 | 3.189 |
Clustering¶
To boost the throughput of SHA-2 primitives, multiple instance can be organized into a cluster, and offer message level parallelism.