# SHA-2 Algorithms¶

## Overview¶

SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions defined in RFC 6234: US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF).

The SHA-2 family consists of six hash functions with digests (hash values) that are 224, 256, 384 or 512 bits: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256.

This library supports all of the algorithms mentioned above.

## Implementation on FPGA¶

The internal structure of SHA-2 algorithms can be shown as the figure below:

As we can see from the figure, the SHA-2 hash calculation can be partitioned into two main parts.

- The pre-processing part pads or splits the input message into fixed sized blocks, and informs the down-stream parts that how many blocks do we have in this message. The message word size is 32-bit for SHA-224/SHA-256, 64-bit for the rest 4 algorithms, and each block has a size of 16 message words.
- The digest part iteratively computes the hash values. Loop-carried dependency is enforced by the algorithm, and thus this part cannot reach II=1.

As these two parts can work independently, they are designed into parallel dataflow process, connected by streams (FIFOs).

The dup_strm module is used to duplicate the number of block stream, and generateMsgSchedule module is responsible for generating the message word stream in sequence.

## Performance¶

### SHA-224 and SHA-256¶

As SHA-224 is simply truncated SHA-256 with different initialization values, and they share the same internal structure, as illustrated in the figure above.

A single instance of SHA-256/SHA-224 function processes input message at the rate of
`512 bit / 68 cycles`

at 330.25MHz/314.36MHz respectively.

The hardware resource utilizations of SHA-224 is listed in `tab1SHA224`

below:

BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |

0 | 0 | 7806 | 4976 | 1121 | 0 | 3.028 |

The hardware resource utilizations of SHA-256 is listed in `tab1SHA256`

below:

BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |

0 | 0 | 7806 | 4973 | 1176 | 0 | 3.181 |

### SHA-384, SHA-512, SHA-512/224, and SHA-512/256¶

As SHA-384 and SHA-512/t is simply truncated SHA-512 with different initialization values, they share the same internal structure, as illustrated in the figure above.

A single instance of one of SHA-384/SHA-512/SHA512-224/SHA512-256 processes input message at the rate of
`1024 bit / 84 cycles`

at 313.28MHz/323.31MHz/310.26MHz/313.57MHz.

The hardware resource utilizations of SHA-384 is listed in `tab1SHA384`

below:

BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |

0 | 0 | 15494 | 8317 | 2045 | 0 | 3.192 |

The hardware resource utilizations of SHA-512 is listed in `tab1SHA512`

below:

BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |

0 | 0 | 15497 | 8318 | 2015 | 0 | 3.093 |

The hardware resource utilizations of SHA-512/224 is listed in `tab1SHA512224`

below:

BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |

0 | 0 | 15498 | 8318 | 2101 | 0 | 3.223 |

The hardware resource utilizations of SHA-512/256 is listed in `tab1SHA512256`

below:

BRAM | DSP | FF | LUT | CLB | SRL | clock period(ns) |

0 | 0 | 15497 | 8322 | 2029 | 0 | 3.189 |

### Clustering¶

To boost the throughput of SHA-2 primitives, multiple instance can be organized into a cluster, and offer message level parallelism.