NSF: 0811518: Blended Static/Dynamic Analyses for Performance Understanding and Improved Security of Framework-intensive Systems

Web applications are an important software paradigm in wide usage both by the commercial and research communities. These applications are built on top of numerous integrated layers of libraries and frameworks. Performance problems in these framework-intensive systems are often difficult to understand, exhibiting characteristics intrinsically different from previous systems. For example, a typical performance problem is not a single frequently executed method, but rather involves problematic activity across many methods spanning disparate frameworks (e.g., Apache's Tomcat, Microsoft's .NET CLR platform, Java EE platforms such as Apache's Geronimo, JBoss, or IBM's Websphere. To the developer, the application resembles an iceberg, the familiar code being only a small portion of the entire implementation, yet the entire system must be analyzed to achieve understanding of performance and security problems.

Framework-intensive Web applications are a challenge to existing analysis techniques. Purely static analyses, accomplished through examination of code without execution, suffer problems of insufficient scalability and/or insufficient precision for answering behavioral questions for these systems. Purely dynamic analyses, accomplished through judiciously placed instrumentation in source code, bytecode or by probing the JVM runtime system, introduce too much execution overhead, especially for production systems, or are too limited in the information gathered. Further, existing dynamic performance analyses focus on control flow, but the main purpose of these applications is to manipulate data; understanding object usage is crucial. The main idea in this proposal is to address these weaknesses by blending static and dynamic analyses in new ways, that in combination avoid these problems and support tools for framework-intensive applications.

The specific goals of this research proposal are:

Framework-intensive applications largely have been ignored by software engineering researchers because of their complexity and scale. This has resulted in a gap between the tools and techniques needed to deal with these applications, and those being developed by the research community. Designing analyses and developing tools to address performance and security issues for these applications will begin to bridge this gap. The PI has the advantage of her unique depth in program analysis, plus an already established research relationship with IBM researchers. These colleagues can provide access to real-world data for testing these ideas and appreciation of the difficulties of software development with inadequate tools.

Intellectual Merit

This research offers a unique opportunity to advance the state-of-the-art in program analysis to handle an important, but currently unexplored, complex software paradigm (i.e., framework-based applications) and to strongly influence current software practice. Successful application of blended analysis to framework-intensive systems will demonstrate that analysis is it scalable to software orders of magnitude larger than is currently possible, such as Web applications. The intellectual challenge is to develop analyses of practical cost and of sufficient precision to scale up to industrial-strength framework-intensive software. Collection of framework-intensive benchmarks for a shared repository, a task best suited for an academic-industrial research collaboration, will encourage other researchers to address problems in Web applications.

Broader Impacts

Blended analyses for performance understanding and security enhancement will benefit Web application developers and the Open Source community through the prototypes built. Making the research infrastructure available to others will lower the barriers to further investigations into framework-intensive applications. The PI co-ordinates an experimental pedagogy program, RESCS (http://rescs.rutgers.edu), aimed at students entering CS from underrepresented groups. Research opportunities will be made available to the best RESCS students.

IBM Open Collaboration Award: Software Quality

Currently two research projects are funded under the IBM OCR project: Blended Program Analysis and Change Impact Analysis. For access to the cited publications, visit the PROLANGS publications page.

Blended Program Analysis (in collaboration with Gary Sevitsky)

A new analysis paradigm, blended program analysis, enables practical, effective analysis of large framework-intensive Java applications for performance diagnosis. Blended analysis combines a dynamic representation of program calling structure with a static analysis applied to a region of that calling structure with observed performance problems.

The initial instantiation of the paradigm addresses the issue of performance bottlenecks stemming from overuse of temporary objects, common in these applications. A blended escape analysis, which approximates object effective lifetimes, has been designed and implemented. Experiments demonstrating its utility in explaining the usage of newly created objects in a program region have yielded promising results (B. Dufour, B.G. Ryder, and G. Sevitsky, "Blended Analysis for Performance Understanding of Framework-based Applications", in the Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), July 2007). A case study on the Trade benchmark shows how blended escape analysis helped to locate the single call path responsible for a performance problem involving objects created at 9 distinct sites and as far away as 6 levels of call, in a region which calls 223 distinct methods with a maximum call depth of 20.

Future work includes experiments on a wider set of benchmarks and blended analysis algorithm refinement using more dynamic information. Also to be investigated are other instantiations of the blended analysis paradigm including blended value-flow analysis, and possible applications to problems in security and debugging.

Change Impact Analysis (in collaboration with Dr. Frank Tip)

Software systems evolve over time in order to adapt to changes in the environment and to add desired functionality. Graceful software evolution requires that only expected changes in functionality occur when working code is changed; this is desirable, but difficult to achieve. Software tools are needed to automate the evolution of large, complex software systems made up of heterogeneous components, by reporting 'change impact' information to programmers. Change impact analysis allows programmers to examine the effects of edits they make to code; tool support for change impact analysis has a clear potential to boost programmer productivity and enable safe code enhancement.

An initial framework for change impact analysis in an object-oriented system (see Barbara G. Ryder and Frank Tip, ``Change Impact Analysis for Object-oriented Programs'', Proc. of 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp 46-53, June 2001.) decomposes program edits into sets of interdependent, method-level changes (e.g., add an empty method, change the body of a method). These atomic changes may affect a subset of the regression tests for the system. Analysis can determine which tests are affected and which changes affect each of these tests. Since regression tests often test independent system functionalities, knowing the tests affected is tantamount to knowing which functionalities may have been altered.

Prior to OCR support, change impact analyses had been developed for Java 1.4 programs, including an initial prototype change impact tool, Chianti, which was used to explore a year's worth of changes in the Daikon system built by Dr. Michael Ernst of MIT (X. Ren, F. Shah, F. Tip, B.G. Ryder, and O. Chesley, "Chianti: A Tool for Practical Change Impact Analysis of Java Programs", in Proceedings of the ACM SIGPLAN Conference on Object Oriented Programming, Systems and Applications (OOPSLA), pp 432-448, October 2004). Another tool, Crisp, allowed exploration of intermediate versions of an edited program, that lie between the original and edited versions. A programmer can select changes affecting some test and ask Crisp to build the intermediate program version in order to run that test on it, and see if its behavior is the same as on the full edited program. Systematic exploration of this kind allows a programmer to find failure-inducing changes (X. Ren, O. Chesley, and B.G. Ryder, "CRISP, A Debugging Tool for Java Programs", IEEE Transactions on Software Engineering, Volume 32, Number 9, September 2006, pp 1-16; O. Chesley, Ophelia, X. Ren, and B.G. Ryder, "Crisp: A Debugging Tool for Java Programs", in the Proceedings of the 21st International Conference on Software Maintenance (ICSM), Budapest, Hungary, September 2005).

With OCR support, this prior work was extended to enable Crisp to be run in 'automatic' mode to find the failure-inducing changes among the affecting changes associated with a failing test (O. Chesley, X. Ren, and B.G. Ryder, "Crisp - A Fault Localization Tool for Java Programs", a selected demo with published abstract in the Proceedings of the 29th International Confeence on Software Engineering, May 2007). This automatic mode has been found to work often, except in cases where import statements may have to be changed.

In addition, a new ranking heuristic was developed to order the exploration of the affecting changes of a failing test; promising results with this heuristic showed that in 67% of the representative tests, the heuristic ranked the single failure-inducing change as #1 or #2 (X. Ren, and B.G. Ryder, "Heuristic Ranking of Java Program Edits for Fault Localization", in the Proceedings of the International Symposium on Software Testing and Analysis (ISSTA), July 2007).

Chianti was released as an executable from the PROLANGS website in July 2007. This release contains both the change impact analysis and the Crisp tool, but is referred to as Chianti. In the near future, the new fault-localization heuristic will be added to the released version of Chianti.

A new research project is investigating how to use change impact analysis to facilitate early release of partial edits in a collaborative software environment. Another possible future exploration is how to use change impact information to aid in test case generation for new changes not covered by any existing tests.