Dataflow compiler for QNN inference on FPGAs
This project is maintained by Xilinx
02 Oct 2019 - Yaman Umuroglu
We’re happy to announce some exciting developments in the FINN project: we’re rebuilding our solution stack from the ground up to be more modular, more usable and more open-source!
Over the past few years, the team at Xilinx Research Labs Ireland has done quite a bit of research of Quantized Neural Networks (QNNs). Starting with Binarized Neural Networks (BNNs) on FPGAs back in 2016, we’ve since looked at many aspects of quantized deep learning, ranging from at better quantization methods and mixing quantization and pruning, to accuracy-throughput tradeoffs and recurrent topologies.
Although some demonstrators of our work has been open source for some time, we want to take things a step further. We love QNNs and the high-performance, high-efficiency dataflow accelerators we can build for them on Xilinx FPGAs, and we want you and the FPGA/ML community to be able to do the same. The (co-)design process for making this happen is actually quite involved, starting from customizing a neural network in a machine learning framework, going through multiple design steps that involve many optimizations, HLS code generation and Vivado synthesis, and ending up with an FPGA bitstream that you can deploy as part of some application. Many of those steps require some manual effort, but having a modular, flexible solution stack to support you through this process is greatly helpful. This is why we are rebulding our FINN solution stack from the ground-up to make it more modular, and we hope to build a community around it that shares our excitement around QNNs for FPGAs.
The first step towards making this happen is to define what layers exist in the solution stack. In many ways, this solution stack is inspired by the tested-and-tried frontend/backend software architecture found in compiler frameworks like LLVM. This stack breaks down the complex co-design problem into parts, and each layer focuses on a different sub-problem, consuming the artifacts produced by the previous one. The diagram on the left illustrates this briefly, and over the next few months we hope to make a first few QNNs go through all the layers of this stack to produce cool FPGA dataflow accelerators. In fact, some of these components are already available today for you to explore!
Let’s have a look at the main parts:
torch.nn
building
blocks to explore different forms of weight, activation and accumulator quantization schemes. You can also learn the bitwidths for
different layers with backpropagation! See the Brevitas page for more information.### Getting started
More will be available in the coming weeks and months, but if you want to get your hands dirty there’s already plenty to start with! If you haven’t done so already, we recommend starting with BNN-PYNQ to see what dataflow QNN accelerators look and feel like. You can also start experimenting with Brevitas to train some QNNs, or put together a streaming pipeline with the FINN HLS library. We have also created a Gitter channel to make it easier to get in touch with the community, and hope to see many of you there! :)