Repository logo
 

Learning APIs through Mining Code Snippet Examples

dc.contributor.advisorRoy, Chanchal K.
dc.contributor.committeeMemberKeil, Mark
dc.contributor.committeeMemberKhan, Shahedul
dc.contributor.committeeMemberCodabux, Zadia
dc.contributor.committeeMemberLee, Roy
dc.creatorSaifullah, C M Khaled 1993-
dc.creator.orcid0000-0002-8822-2091
dc.date.accessioned2020-03-03T21:15:00Z
dc.date.available2020-03-03T21:15:00Z
dc.date.created2020-01
dc.date.issued2020-02-04
dc.date.submittedJanuary 2020
dc.date.updated2020-03-03T21:15:01Z
dc.description.abstractDevelopers extensively use and reuse the Application Programming Interfaces (APIs) to faster the development time and effort. In order to do this, developers need to learn and remember APIs for effectively using them in their codebase. However, APIs are difficult to learn as they are large in numbers and are not properly documented and the documentation contains a lot of text to remember. To support developers learning and using those APIs, this thesis focuses three different studies that (1) enhances the code completion features of the modern integrated development environments (IDEs), (2) make the online forum code snippets compilable and (3) annotates the code elements of the dynamically typed programming language (e.g, JavaScript) by their types. Towards this direction, we first explore the method name, argument and code completion techniques in the literature and find that none of them is suitable for completing a full method call sequence which consists of a name and a list of arguments. Thus we propose a Bi-LSTM based encoder-decoder model with attention mechanism and beam search, DAMCA that takes all three lexical, syntactic and semantic contexts of a method call and returns a list of method call sequences as the completion suggestions. Evaluation results show that the proposed technique outperforms the state-of-the-art method name, argument, code completion and program synthesis techniques for method call sequence completion. Next, we explore the techniques that are proposed for resolving the Fully Qualified Name (FQN) of the API element of the online forums code snippets. We find that the techniques restrict themselves by the locally specific code tokens only. We incorporate globally related tokens with the local tokens and use likelihood, context similarity, and name similarity to resolve the API element. Experimental results show that the proposed technique outperforms the state-of-the-art techniques with faster training. Finally, in our third study, we explore the techniques developed for statically typed programming languages (i.e, Java) for dynamically typed programming languages (i.e, JavaScript). The evaluation results show that the techniques performed very poorly for JavaScript. Next, we investigate the causes and built a technique that leverages Word2Vec, context similarity as the global models and previous outputs on the same project as a local model. The combination of models outperforms the technique developed for Java. We then compare the proposed technique with state-of-the-art deep learning based techniques developed for JavaScript. The experimental results suggest that the proposed technique has faster training time than the deep learning based technique without sacrificing accuracy. We believe that findings from this research and proposed techniques have the potential to help developers learning different aspects of APIs, thus ease software development and improve the productivity of developers.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10388/12688
dc.subjectAPI
dc.subjectContext-Sensitive
dc.subjectCode Compeltion
dc.subjectNeural Encoder Decoder
dc.subjectFQN
dc.subjectWord2Vec, Type System, Deep Learning
dc.titleLearning APIs through Mining Code Snippet Examples
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.Sc.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SAIFULLAH-THESIS-2020.pdf
Size:
3.5 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.27 KB
Format:
Plain Text
Description: