Design of Efficient DNN Accelerator Architectures
MetadataShow full item record
Deep Neural Networks (DNNs) are the fundamental processing unit behind modern Artificial Intelligence (AI). Accordingly, expecting a future with smart devices that are able to monitor, decide, and take action seems reasonable. However, DNNs are computation and power-hungry, which makes deployment of them into edge devices challenging. The focus of this dissertation is on designing architectures to perform the inference of DNNs efficiently. The contents of this dissertation can be divided into four specific areas: (1) early detection of the ineffectual computations inside the computation engine; (2) enhancing the utilization of Processing Elements (PEs) inside the computation engine; (3) skipping identical effectual computations through binary Multiply and Accumulation (MAC) operations; (4) the design of approximate DNN accelerators. In most DNNs, an activation function follows a convolutional or a fully connected layer. Several popular activation functions involve setting all negative inputs to zero. In this dissertation, firstly, the characteristics of activation layers that are considered for adding non-linearity to DNNs are studied. Then, a novel architecture in which the activation function is merged with the prior computational layer is proposed. To add more detail, the proposed architecture coordinates early sign detection of output features. When compared to the original design, our method achieves a speedup of ×2.19 and reduces energy consumption by ×1.94. The average reduction in the number of multiply-accumulate~(MAC) operations is 10.64% and the average reduction in the number of load operations is 3.86%. These improvements are achieved while maintaining classification accuracy in two popular benchmark networks. One of the main challenges that DNN accelerator developers face is keeping all the PEs busy with performing effectual computations while running DNNs. In this dissertation, a Twin-PE for spatial DNN accelerators is introduced that increases the utilization of the PEs and the performance of the whole computation engine. In more detail, the proposed architecture which comes with a negligible area overhead is implemented based on sharing the scratchpads between the PEs to use the available slack time caused by applying computation-pruning techniques. When compared to the reference design, our proposed method achieves a speedup of ×1.24 and an energy efficiency of ×1.18 per inference. Decomposing the MAC operations down to bit-level provides the chance of skipping bit-wise and word-wise sparsity. However, there is still room for pruning the effectual computations without reducing the accuracy of DNNs. In this dissertation, a novel real-time architecture by decomposing multiplications down to bit level and pruning identical computations while running benchmark networks. Our proposed design achieves an average per layer speedup of ×1.4 and energy efficiency of ×1.21 per inference while maintaining the accuracy of benchmark networks. Applying approximate computing techniques reduces the cost of the underlying circuits so that DNN inference would be performed more efficiently. However, applying approximation to DNNs is somehow different from other applications. In this dissertation, a step-wise approach for implementing a re-configurable Booth multiplier suitable for inference of DNNs is proposed. In addition, the tolerance of different layers of DNNs to approximation is evaluated and the effect of applying various degrees of approximation on inference accuracy is explored. The proposed design achieves an area efficiency of ×1.19 and energy efficiency of ×1.28 compared to the exact design while running benchmark DNNs.
DegreeDoctor of Philosophy (Ph.D.)
DepartmentElectrical and Computer Engineering
CommitteeBerscheid, Brian; Eager, Derek; Wahid, Khan; Chen, Li
Deep Neural Network, Accelerator, Skipping Sparsity, Pruning effectual Computations, Identical Bit Values, Dataflow, Convolutional Layers, Activation Function, Processing Element, Hardware Implementation, Soft Multipliers, Partial Re-configuration, Inference, Accuracy Loss, High-level Concept, Critical Information, Look-Up Tables, Booth Multipliers, Approximate Multipliers