Repository logo
 

Towards Semantic Clone Detection, Benchmarking, and Evaluation

dc.contributor.advisorRoy, Chanchal K
dc.contributor.committeeMemberMcQuillan, Ian
dc.contributor.committeeMemberKeil, Mark
dc.contributor.committeeMemberMcCalla, Gord
dc.contributor.committeeMemberKhan, Shahedul A
dc.creatorAl-omari, Farouq Ahmad
dc.date.accessioned2021-06-07T18:51:03Z
dc.date.available2021-06-07T18:51:03Z
dc.date.created2021-04
dc.date.issued2021-06-07
dc.date.submittedApril 2021
dc.date.updated2021-06-07T18:51:03Z
dc.description.abstractDevelopers copy and paste their code to speed up the development process. Sometimes, they copy code from other systems or look up code online to solve a complex problem. Developers reuse copied code with or without modifications. The resulting similar or identical code fragments are called code clones. Sometimes clones are unintentionally written when a developer implements the same or similar functionality. Even when the resulting code fragments are not textually similar but implement the same functionality they are still considered to be clones and are classified as semantic clones. Semantic clones are defined as code fragments that perform the exact same computation and are implemented using different syntax. Software cloning research indicates that code clones exist in all software systems; on average, 5% to 20% of software code is cloned. Due to the potential impact of clones, whether positive or negative, it is essential to locate, track, and manage clones in the source code. Considerable research has been conducted on all types of code clones, including clone detection, analysis, management, and evaluation. Despite the great interest in code clones, there has been considerably less work conducted on semantic clones. As described in this thesis, I advance the state-of-the-art in semantic clone research in several ways. First, I conducted an empirical study to investigate the status of code cloning in and across open-source game systems and the effectiveness of different normalization, filtering, and transformation techniques for detecting semantic clones. Second, I developed an approach to detect clones across .NET programming languages using an intermediate language. Third, I developed a technique using an intermediate language and an ontology to detect semantic clones. Fourth, I mined Stack Overflow answers to build a semantic code clone benchmark that represents real semantic code clones in four programming languages, C, C#, Java, and Python. Fifth, I defined a comprehensive taxonomy that identifies semantic clone types. Finally, I implemented an injection framework that uses the benchmark to compare and evaluate semantic code clone detectors by automatically measuring recall.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10388/13413
dc.subjectSemantic clones
dc.subjectClone detection
dc.subjectClone detection benchmark
dc.subjectStack Overflow
dc.subjectClone detection evaluation
dc.titleTowards Semantic Clone Detection, Benchmarking, and Evaluation
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Saskatchewan
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy (Ph.D.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AL-OMARI-DISSERTATION-2021.pdf
Size:
4.37 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.27 KB
Format:
Plain Text
Description: