Research




Topics of Interest: Research Statement:

  • Introduction
  • The research program focuses on developing and constructing methods for lightweight static program analysis. The objective is to develop new analysis methods that are highly scalable for applications on very large software systems (>MLOC). Clearly, there will be a trade-off of reduced accuracy. Much of my work deals with understanding the difference in accuracy and how this impacts the conclusions that can be drawn from the results.

    My dissertation work dealt with the construction of a lightweight forward static slicing approach. I developed a tool, srcSlice [1], [2], [3], and compared its results to a commercial slicing tool (CodeSurfer). srcSlice produced a very reasonable result in comparison. When compared to a full PDG (Program Dependency Graph) approaches, srcSlic had up to four orders of magnitude increase in speed for large systems. The tool is built on top of the srcML3 [4], [5] infrastructure that is developed at Kent State University by Professor Jonathan I. Maletic. Both tools are available to download from srcML website . Instructions for installing srcML and srcSlice along with documentation are also available at this site.

    A very fast and scalable, yet slightly less accurate, slicing approach is extremely useful for a number of reasons. Developers will have a very low cost and practical means to estimate the impact of a change within minutes versus hours/days. This is very important for planning the implementation of new features and understanding how a change is related to other parts of the system. It will also provide an inexpensive test to determine if a full, more expensive, analysis of the system is warranted. A fast slicing approach will open up new avenues of research in metrics and mining of histories based on slicing. That is, slicing can now be conducted on very large systems and on entire version histories in very practical time frames. A number of experiments and empirical investigations previously too costly to undertake may now be performed.

  • A lightweight Forward Static Slicing
  • Program slicing is a widely used method for understanding and detecting the impact of changes to software. The idea is fairly simple; given a variable and the location of that variable in a program, find what other parts of the program are affected by this variable. Unfortunately, there are not any open-source slicing tools available that are scalable to large systems. While some free versions of commercial tools are available for academic use they are limited on the size of code base that can be analyzed. For example, the free version of CodeSurfer will not slice programs over 200KLOC.

    The concept of program slicing was originally identified by Weiser [6], [7] as a debugging aid. He defined the slice as an executable program that preserved the behavior of the original program. Weiser’s algorithm traces the data and control dependences by solving data-flow equations for determining the direct and indirect relevant variables and statements. Since that time, a number of different slicing techniques and tools have been proposed and implemented. These techniques are broadly distinguished according to the type of slices: static versus dynamic, closure versus executable, inter-procedural versus intra-procedural, forward versus backward, conditioned, amorphous, union, and quasi-static slicing.

    Program slicing is typically based on the notion of a Program Dependence Graph (PDG) or one of its variants, e.g., a System Dependence Graph (SDG). Unfortunately, building the PDG/SDG is quite costly in terms of computational time and space. As such, slicing approaches generally do not scale well and while there are some (costly) workarounds, generating slices for a very large system can often take days of computing time. Additionally, many tools are strictly limited to an upper bound on the size of the program they can slice.

    The tool developed by the author, srcSlice, addresses this limitation by eliminating the time and effort needed to build the entire PDG. In short it combines a text-based approach, similar to Cordy’s [8], with a lightweight static analysis infrastructure that only computes dependence information as needed (aka on-the-fly) while computing the slice for each variable in the program. The slicing process is performed using the srcML format for source code. The srcML format provides direct access to abstract syntactic information to support static analysis.

    This work is detailed in my paper at the IEEE International Working Conference on Reverse Engineering (WCRE 2012) [1]. This paper was selected and invited to a special issue of the best papers at WCRE’12 of the Journal of Software: Evolution and Process (JSPE) [2]. One of the reviewers stated that this work was an important new direction in the field of program slicing and the paper deserves very serious attention and is likely to have a far-reaching and hugely important impact on the program slicing research agenda and its applications.

    A new implementation of the srcSlice tool is released as a SAX parser. The particular implementation of SAX parser is a C++ wrapper around libxml2’s SAX interface called srcSAX; it was made specifically to support building tools that use srcML. Since SAX parsers store no data about previously seen tags, they are very memory and operation efficient. To take full advantage of this feature of SAX, we have worked to store as little data as possible and minimize repetition. As a result, srcSlice is very fast. On the largest system we present here (see Table I), the Linux Kernel, srcSlice clocks in at about 274K identifiers per minute. On Blender, srcSlice ran at 240K identifiers per minute. However, the Linux Kernel had about 7 times the number of identifiers as Blender, so there is not quite a 1:1 ratio between the number of variables and the size increase of code. Despite this, the ratio between the speed of execution for srcSlice on Linux and Blender is negligible. Even accounting for a higher number of identifiers per line on Blender, srcSlice scales extremely well and we continue to look for ways to increase srcSlice’s speed.

    This work has been published in the International Conference on Software Engineering (ICSE 2016) [3].

    TABLE I. RESULTS OF SRCSLICE APPLIED TO MULTIPLE SYSTEMS WITH THE SIZE OF THE SYSTEM, THE NUMBER OF VARIABLES FOUND, AND THE EXECUTION TIME FOR THE SRCSLICE TOOL.
    System Size (LOC) Variables Execution Time
    Linux Kernel-4.06 ~13,000,000 ~1,918,000 7 min
    Blender-2.68 ~1,300,000 ~265,000 70 sec
    Inkscape-0-.91 ~410,000 ~74,000 18 sec

  • Slicing Profile Changes Over History
  • This area of research focuses on leveraging the slicing approach to examine how the slicing profile of a system changes over its history. This could not be previously done due to the time constraints necessary to slice large systems (i.e., days of computing time). We start by examining how the Linux kernel has changed in the context of slicing and we generated the slice for every variable across every version of the Linux kernel (~1000 version with a total lines of code equal to ~ 4.5 billion LOC). This will allow for understanding the impact of changes from one version to the next. Additionally, a slice-based software metrics over versions history that related to the maintenance effort is to be introduced. (e.g., sliceSize, hashSize, sCoverage, etc.). These metrics are extracted directly from the source code without any other metadata needed. This work has the potential for a number of interesting and publishable results.

  • A Slice-Based Estimation Approach for Maintenance Effort
  • This work addresses the following program maintenance concerns: build an estimation approach for maintenance effort using a slice based metrics, identify the type of maintenance being performed during the life time of the system, and finally characterize the system’s evolution using Lehman’s laws of software evolution. More specifically, slice-based software measures and a corresponding process that can represent maintenance effort in open-source systems are being developed. This work analyzed almost all the versions of Linux kernel, and constructed, validated, and compared three indirect maintenance-effort models.

    This work is detailed in my paper at the IEEE International Conference on Software Maintenance and Evolution (ICSME 2014) [9]. Additionally, this work is extended and investigated towards open-source software projects in my journal article at the International Journal of Advanced Computational Engineering and Networking (IJACEN 2015) [10].

  • A Slicing-Based Approach to Measure Semantic Change between Software Versions
  • The srcSlice is used as a basis for an approach to automatically capture the semantic differences between different versions of systems. Whereas previous approaches to semantic differencing have tended to have too much of a syntactic character, this work uses dependence analysis to find (measure) not merely changed elements, but also those elements of the code that are affected by the changes. As such, the proposed approach could be thought of as a combination of impact analysis and change analysis; combining the two to understand the impact of changes as a quantity of semantic difference. Each version is represented as a collection of program slices on different levels of granularity: variable, method and file. The changes between versions are represented as the set of changed slices.

    This work is submitted as a full paper to the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015) [11]. The reviewers strongly accepted the novelty of the proposed approach as a semantic differencing, however the paper is rejected due to the lack of strong evaluation and not considering threats to validity. This paper is ready for submission now; I’ll submit this paper again to ASE conference.

  • Visualizing Program Slices
  • This area of research seeks to understand the results of slicing very large-scale software systems. That is, understanding how a slice-based change is related to other parts of the system. The ultimate goal is to provide a suitable interface for program slicing that allows slicing at the variable, function, or file level, and provide fast visual feedback on slice structure. The investigation includes a method that support visual mining of system trends and artifact’s hotspots in a software repository. The described method detects the system hotspots as system artifacts with high activity over flexible time intervals and indicated by different measures to augment the visualization method. A new visualization approach is developed at the Department of Computer Science & Software Engineering, Miami University. This project is applied over the Linux, and currently other systems are under investigation.

    This work has been published in the IEEE Working Conference on Software Visualization (VISSOFT’16) [15].

    References

    Return Hakam's home page.
    Published By: H. Alomari
    Last update: Sun Feb 21 01:15:43 2016 EST