![]() To maximize the probability of finding highly relevant good quality articles, we used three specific digital resources: ACM Digital Library 9 9 9, IEEE Xplore 10 10 10, and SpringerLink 11 11 11. ![]() We specifically targeted papers which were published in high ranking journals and conference proceedings of the software engineering area. The selection of appropriate literature is essential to guarantee high-quality papers and to grasp the state-of-the-art issues in the software engineering field (Kavitha, 2009) 3 Systematic Mapping: How Previous Studies Used Git Diff? It is also obvious that the code between lines 12 and 16 were replaced by one line of code in line 17, while the closing curly brace in line 20 was omitted from the files, and three new lines of code (line 23, 24 and 25) were added at the end of the code in Figure 3. This block of code is clearly understood as the new code inserted before the statement of the assignment code (code in line 10 which is used as one of some unique lines to match). This influences the sequences of the other changed code.Īn additional block of if condition is written between lines 4 and 9 where it should be placed. a line of an unchanged code identified as a changed code, so that in the diff list, this code is written in duplicate as both a deleted and inserted code).įor example, if we extract the differences between the two versions of the same file in Figure 1 using the Histogram in the git diff command, we obtain the output as depicted in Figure 3.Ī unique line of code in line 10 of Figure 3 is not detected as a changed code due to its role as the benchmark to match the line, where this line is identified as a changed code in case of Myers. This reduces the occurrence of conflicts (i.e. This diff algorithm uses a unique line of code as a benchmark to match the sequences of the changed lines between the two files. In contrast with the Myers, the Histogram algorithm provides diff results that are easier for software archives miners to understand, as the Histogram more clearly separates the changed code lines. the Myers and the Patience), the Histogram nevertheless, has been declared much quicker 8 8 8. In comparison with the other two diff algorithms, (i.e. This means that the Histogram performs similarly to the Patience if a unique common element exists in both files otherwise, it selects the element that has the least occurrences. section 1 represents the area before the LCS, while section 2 represents the region after the LCS), are then executed repetitively using the same process as the beginning of the algorithm. Two sections resulting from the partition (i.e. Once the screening is finished for the second sequence, the lowest occurrence of LCS is marked as the separator. If the elements exist and their presences are less than in the first sequence, they are expected to be a potential LCS. The Histogram strategy works similarly to with the Patience by developing a histogram of the appearances for every line in the first version of a file.Įvery element in the second version is subsequently shown to match with the first sequence in an orderly way to find the existences of the elements and to count the occurrences. However, both diff algorithms evenly have a good quality in generating the list of non-code changes. Regarding the result of the patches analysis, we found that, in-code changes, ![]() The percentage of files that have different deleted lines of code range from 2.4% to 6.6%. The divergent diff outputs also affected the different number of identified files in bug introduction identification. The differences of these added and deleted lines that are distinguished by their different number and position range from 0.8% to 6.2% and from 1.4% to 7.6%, respectively. This influences the different number of files that have dissimilar added and deleted lines of code in each CI-Java project. Our findings show that using various diff algorithms in the git diff command produced unequal diff lists. We analyze the quality of patches derived from Myers and Histogram by manually comparing their two diff from 377 changes, a statistically representative sample of the 21,590 changes identified in the above two comparisons. We investigate the disagreement between two diff algorithms: Myers and Histogram, and take a manual measurement of their quality in generating the diff lists.īased on previous related studies, we investigate the code changes from the files in 14 OSS projects that employ Continuous Integration for metrics collection and 10 Apache projects for the bug introduction identification to quantify the differences of the diff outputs that resulted from both diff algorithms. In our empirical analyses, we conduct three comparisons based on the most popular usages of git diff found in our mapping study: collecting metrics, identifying bug introduction, and getting patches.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |