A Fault-Tolerant Design on Convolution Neural Networks by Applying Reconfigurable Processing Element Arrays

Jin, Chen

A Fault-Tolerant Design on Convolution Neural Networks by Applying Reconfigurable Processing Element Arrays

dc.contributor.advisor	Chen, Li
dc.contributor.committeeMember	Ko, Seok-Bum
dc.contributor.committeeMember	Zhang, Chris
dc.creator	Jin, Chen
dc.date.accessioned	2024-02-09T21:58:37Z
dc.date.available	2024-02-09T21:58:37Z
dc.date.copyright	2023
dc.date.created	2023-04
dc.date.issued	2024-02-09
dc.date.submitted	April 2023
dc.date.updated	2024-02-09T21:58:37Z
dc.description.abstract	Convolutional neural networks (CNNs) implemented on field programmable gate arrays (FPGAs) have garnered significant interest due to their superior performance and flexibility, particularly during the inference phase following CNN model training on other platforms. The ability to customize the programmable logic (PL) section of the FPGA is the key factor driving the aforementioned performance and flexibility advantages. Moreover, recent trends in research have indicated that the parallel design of multiple processing element (PE) groups is becoming increasingly popular for implementing complex CNN designs. This approach offers a significant advantage over single PE or flat implementations, as it results in higher performance levels. However, increasing the number of PEs in a design can result in an elevated Single Event Upset (SEU) rate for designs operating in radiation environments. This is due to the vulnerability of the configuration memories in SRAM-based FPGAs. While memory refreshing can eliminate errors, the CNN may still produce incorrect results before SEUs are rectified. To address this issue, Triple Modular Redundancy (TMR) techniques are commonly employed to ensure correct operations. Nevertheless, this approach incurs at least 200% overhead in terms of resources, which can render it unsuitable for many complex neural networks that have high resource requirements. To address the resource limitations of TMR techniques, FPGA vendors offer Dynamic Partial Reconfiguration (DPR) methods that enable the repair of SEUs in specific regions of the configuration memories through partial refreshing without the need for additional hardware resources in the FPGAs. DPR allows for the reconfiguration of a portion of the FPGA while the rest of the device continues to operate normally. This technique can also be applied to TMR-protected CNN designs to reduce refreshing time. However, it does not alleviate the area overhead associated with TMR methods. In this thesis, a CNN was designed and implemented in a FPGA with multiple parallel PE array groups serving as computing engines, with each group working independently. Prior to the start of computation, self-testing was performed on each PE array to verify its functionality. If any faults were detected, DPR was conducted to correct the errors in the configuration memory of the affected PE array.The experiments in this thesis evaluated the performance of a single PE group without any reinforcement design as a control group using both error injection and laser experiments. Subsequently, more PE groups were added to determine whether the system could handle more SEUs or laser pulses before an error occurred. In the result, for non-critical errors where the CNN incorrectly estimates the percentage of a given output number, adding DPR can result in a 13.8 times improvement in cross-section. In cases where the CNN makes critical errors and predicts the input number incorrectly, adding DPR can improve the cross-section by 25 times. Additionally, the overall accuracy of the CNN remains consistently above 99% even after a large number of laser pulse or fault injections, indicating the robustness and reliability of the model. The key novelty of this study is the use of DPR to improve the overall fault tolerance of the entire CNN by taking advantage of the parallel processing capability of the PE arrays to perform data processing without faulty PE arrays. This approach significantly reduces area overhead compared to TMR methods. Experimental results demonstrated the effectiveness of the proposed method.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/10388/15495
dc.language.iso	en
dc.subject	Fault-tolerant, Convolution Neural Network, FPGA, Dynamic Partial Reconfiguration
dc.title	A Fault-Tolerant Design on Convolution Neural Networks by Applying Reconfigurable Processing Element Arrays
dc.type	Thesis
dc.type.material	text
thesis.degree.department	Electrical and Computer Engineering
thesis.degree.discipline	Electrical Engineering
thesis.degree.grantor	University of Saskatchewan
thesis.degree.level	Masters
thesis.degree.name	Master of Science (M.Sc.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: JIN-THESIS-2023.pdf
Size:: 10.14 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: LICENSE.txt
Size:: 2.26 KB
Format:: Plain Text
Description:

Download

Collections

Graduate Theses and Dissertations