Source-code Summarization of Java Methods Using Control-Flow Graphs

Beyene, Michael

Source-code Summarization of Java Methods Using Control-Flow Graphs

dc.contributor.advisor	Dutchyn, Christopher
dc.contributor.advisor	Schneider, Kevin
dc.contributor.committeeMember	McCalla, Gord
dc.contributor.committeeMember	Mondal, Debajyoti
dc.contributor.committeeMember	Chen, Li
dc.creator	Beyene, Michael
dc.date.accessioned	2021-10-20T21:36:45Z
dc.date.available	2021-10-20T21:36:45Z
dc.date.created	2021-09
dc.date.issued	2021-10-20
dc.date.submitted	September 2021
dc.date.updated	2021-10-20T21:36:45Z
dc.description.abstract	Source-code summarization aims to generate natural-language summaries for software artifacts (e.g., method and class). % Researchers have been exploring source-code summarization as one research area in software engineering. Various research works showed the use of text-retrieval-based techniques, heuristic-based techniques, and data-driven techniques for source-code summarization. In data-driven techniques, researchers used a sequence of source-code tokens and other representations of source code (e.g., application programming interface (API) sequences and abstract syntax tree (AST)) as an input to source-code summarization models. According to the current published literature in source-code summarization, researchers have not explored the use of a sequence extracted from control-flow graph that shows a contextual relationship between program instructions based on control-flow relationships for source-code summarization models. In this work, we employ control-flow graph representations to increase the prediction accuracy of a bi-directional long-short term memory (LSTM) source-code summarization model in terms of describing the functionality of Java methods. We use an attention-based bi-directional LSTM sequence-to-sequence model to show the use of linearized control-flow graph sequences alongside a sequence of source-code tokens. We compared our model with the current state-of-the-art and with or without a linearized control-flow graph. We created a source-code summarization dataset to train and evaluate our approach and conducted expert and automatic evaluations. In the expert evaluation, the participants gave rating for summaries generated by each model in terms of correctly describing the functionality of a Java method. Our models outperformed the state-of-the-art in terms of the mean average-rating. Also, the expert evaluation showed us the model benefit from the structural information. In the automatic evaluation, we found that the use of control-flow graphs does not increase the prediction accuracy of a bi-directional LSTM model in terms of BLEU score compared to a bi-directional LSTM model that does not use control-flow graphs. However, we found our source-code summarization approach that uses a control-flow graph as an additional representation better than encoding AST in graph neural networks. Overall, we improved the state-of-the-art for method summarization with our models that take sequence of method tokens with and without a control-flow graph.
dc.format.mimetype	application/pdf
dc.identifier.uri	https://hdl.handle.net/10388/13656
dc.subject	Source-code Summarization
dc.title	Source-code Summarization of Java Methods Using Control-Flow Graphs
dc.type	Thesis
dc.type.material	text
thesis.degree.department	Computer Science
thesis.degree.discipline	Computer Science
thesis.degree.grantor	University of Saskatchewan
thesis.degree.level	Masters
thesis.degree.name	Master of Science (M.Sc.)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: BEYENE-THESIS-2021.pdf
Size:: 1.38 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: LICENSE.txt
Size:: 2.27 KB
Format:: Plain Text
Description:

Download

Collections

Graduate Theses and Dissertations