Learning APIs through Mining Code Snippet Examples

View/ Open
Date
2020-02-04Author
Saifullah, C M Khaled 1993-
ORCID
0000-0002-8822-2091Type
ThesisDegree Level
MastersMetadata
Show full item recordAbstract
Developers extensively use and reuse the Application Programming Interfaces (APIs) to faster the development time and effort. In order to do this, developers need to learn and remember APIs for effectively using them in their codebase. However, APIs are difficult to learn as they are large in numbers and are not
properly documented and the documentation contains a lot of text to remember. To support developers learning and using those APIs, this thesis focuses three different studies that (1) enhances the code completion features of the modern integrated development environments (IDEs), (2) make the online forum code snippets compilable and (3) annotates the code elements of the dynamically typed programming language (e.g, JavaScript) by their types.
Towards this direction, we first explore the method name, argument and code completion techniques in the literature and find that none of them is suitable for completing a full method call sequence which consists of a name and a list of arguments. Thus we propose a Bi-LSTM based encoder-decoder model with attention mechanism and beam search, DAMCA that takes all three lexical, syntactic and semantic contexts of a method call and returns a list of method call sequences as the completion suggestions. Evaluation results show that the proposed technique outperforms the state-of-the-art method name, argument, code completion and program synthesis techniques for method call sequence completion. Next, we explore the techniques that are proposed for resolving the Fully Qualified Name (FQN) of the API element of the online forums code snippets. We find that the techniques restrict themselves by the locally specific code tokens only. We incorporate globally related tokens with the local tokens and use likelihood, context similarity, and name similarity to resolve the API element. Experimental results show that the proposed technique outperforms the state-of-the-art techniques with faster training. Finally, in our third study, we explore the techniques developed for statically typed programming languages (i.e, Java) for dynamically typed programming languages (i.e, JavaScript). The evaluation results show that the techniques performed very poorly for JavaScript. Next, we investigate the causes and built a technique that leverages Word2Vec, context similarity as the global models and previous outputs on the same project as a local model. The combination of models outperforms the technique developed for Java. We then compare the proposed technique with state-of-the-art deep learning based techniques developed for JavaScript. The experimental results suggest that the proposed technique has faster training time than the deep learning based technique without sacrificing accuracy. We believe that findings from this research and proposed techniques have the potential to help developers learning different aspects of APIs, thus ease software development and improve the productivity of developers.
Degree
Master of Science (M.Sc.)Department
Computer ScienceProgram
Computer ScienceSupervisor
Roy, Chanchal K.Committee
Keil, Mark; Khan, Shahedul; Codabux, Zadia; Lee, RoyCopyright Date
January 2020Subject
API
Context-Sensitive
Code Compeltion
Neural Encoder Decoder
FQN
Word2Vec, Type System, Deep Learning