Ph.D. Defense: Pavel Senin, “Software Trajectory Analysis: An empirically based method for automated software process discovery”

Software Trajectory Analysis: An empirically based method for automated software process discovery
Pavel Senin
Friday, January 16, 2014, 12:00pm
POST 302

Abstract: Recurrent behaviors are considered to be the basic building blocks of any human-driven goal-oriented process, reflecting the development of efficient ways for dealing with common tasks based on the past performance. Thus, the ability to discover recurrent behaviors is utterly important for a bottom-up systematic study, modeling, and improvement of human-driven processes. In the context of software development, whose ultimate goal is the delivery of software, the ability to recognize recurrent behaviors enables the understanding, formal description, and effective guidance of evolving software processes. While a number of approaches for recurrent behaviors discovery and software process modeling and improvement has been previously proposed, they typically built upon on-line intrusive techniques, such as observations and interviewing, therefore expensive, suffering from biases, and unwelcome by software developers.

In this exploratory study, I have developed and tested the idea of software process discovery via off-line analysis of software process artifacts. For this, I have prototyped and evaluated the Software Trajectory Analysis framework, which is built upon the definition of “software trajectory” data type, that is a temporally ordered sequence of software artifact measurements, and a novel technique for temporal data classification, that enables the software trajectory characteristic patterns discovery and ranking. By analogy with the Physics’ trajectory that describes a projectile path in metric space, a software trajectory describes the software process and product progression in a space of chosen software metrics, whereas its recurrent structural patterns are related to the recurrent behaviors.

The claim of this dissertation is that (1) it is possible to discover recurrent behaviors off- line via systematic study of software artifacts, (2) the Software Trajectory Analysis framework provides an effective off-line approach to recurrent software process-characteristic behaviors discovery. In addition to the extensive experimental evaluation of a proposed algorithm for time series characteristic pattern discovery, three empirical case studies were carried out to evaluate the claim: two using software artifacts from public software repositories and one using public dump of a Q&A web site. The results suggested that Software Trajectory Analysis is capable to discover software process-characteristic recurrent behaviors off-line, though their sensible interpretation is sometimes difficult.

Bio: Pavel Senin is a PhD candidate at the ICS department of UH Manoa, working at the Collaborative Software Development Laboratory (CSDL) under supervision of Prof. Philip M. Johnson.  He is originally from Krasnyi Luch, a city in Eastern Ukraine.  Before studying at UH, he received an MS in Applied Mathematics from SFedU, Russia.  During his time at UH Manoa he worked at ASGPB where he assembled the Transgenic Papaya Genome and annotated a representative of Verrucomicrobia phylum; he also received a practical training from JGI at LANL, where he participated in the pioneering single-cell genome sequencing research project. At CSDL Pavel was involved in the Hackystat project, where he worked on the problem of software process discovery.  He developed a novel technique for time series classification and proposed a framework for software process characteristic recurrent behaviors discovery and ranking. When not mining software repositories, Pavel tinkers with Arduino sensors — a project which led to the development of a novel approach for the discovery of spatio-temporal anomalies.