On the Stability of Software Clones: A Genealogy-Based Empirical Study
Clones are a matter of great concern to the software engineering community because of their dual but contradictory impact on software maintenance. While there is strong empirical evidence of the harmful impact of clones on maintenance, a number of studies have also identified positive sides of code cloning during maintenance. Recently, to help determine if clones are beneficial or not during software maintenance, software researchers have been conducting studies that measure source code stability (the likelihood that code will be modified) of cloned code compared to non-cloned code. If the presence of clones in program artifacts (files, classes, methods, variables) causes the artifacts to be more frequently changed (i.e., cloned code is more unstable than non-cloned code), clones are considered harmful. Unfortunately, existing stability studies have resulted in contradictory results and even now there is no concrete answer to the research question "Is cloned or non-cloned code more stable during software maintenance?" The possible reasons behind the contradictory results of the existing studies are that they were conducted on different sets of subject systems with different experimental setups involving different clone detection tools investigating different stability metrics. Also, there are four major types of clones (Type 1: exact; Type 2: syntactically similar; Type 3: with some added, deleted or modified lines; and, Type 4: semantically similar) and none of these studies compared the instability of different types of clones. Focusing on these issues we perform an empirical study implementing seven methodologies that calculate eight stability-related metrics on the same experimental setup to compare the instability of cloned and non-cloned code in the maintenance phase. We investigated the instability of three major types of clones (Type 1, Type 2, and Type 3) from different dimensions. We excluded Type 4 clones from our investigation, because the existing clone detection tools cannot detect Type 4 clones well. According to our in-depth investigation on hundreds of revisions of 16 subject systems covering four different programming languages (Java, C, C#, and Python) using two clone detection tools (NiCad and CCFinder) we found that clones generally exhibit higher instability in the maintenance phase compared to non-cloned code. Specifically, Type 1 and Type 3 clones are more unstable as well as more harmful compared to Type 2 clones. However, although clones are generally more unstable sometimes they exhibit higher stability than non-cloned code. We further investigated the effect of clones on another important aspect of stability: method co-changeability (the degree methods change together). Intuitively, higher method co-changeability is an indication of higher instability of software systems. We found that clones do not have any negative effect on method co-changeability; rather, cloning can be a possible way of minimizing method co-changeability when clones are likely to evolve independently. Thus, clones have both positive and negative effects on software stability. Our empirical studies demonstrate how we can effectively use the positive sides of clones by minimizing their negative impacts.
Clones, Stability, Method Genealogy, Change Dispersion
Master of Science (M.Sc.)