Skip to main content

Nyyti Saarimäki: New research method reveals cause-effect relationships in empirical software engineering

Tampere University
LocationKorkeakoulunkatu 3, Tampere
Hervanta campus, Sähkötalo, auditorium S3 and remote connection (link to be added)
Date14.11.2023 11.00–15.00
LanguageEnglish
Entrance feeFree of charge
In her doctoral dissertation , MSc (Tech) Nyyti Saarimäki adapts the cohort study methodoly used in epidemiology to empirical software engineering research. The research method enables obtaining a higher level of evidence from studies utilising retrospective observational data. For example, it can be used to find out how the use of a particular tool affects the quality of the software.

The level of evidence provided by empirical research consists of many factors, and one of the most significant is the research method used. It can be thought of as the data collection approach. Controlled experiments are the most valued way to get information about cause-and-effect relationships, because in them researchers can expose some of the subjects of the study and compare them with unexposed subjects. However, the research method cannot be used if the investigated exposure is harmful or the data are retrospective.

Empirical software engineering research, and especially Mining Software Repositories (MSR) research, often uses data naturally accumulated during software development, such as the source code in the version control system and data stored in various tools. However, this kind of data are not suitable for conducting controlled experiments, which is a problem especially in MSR studies. However, the field lacks a research method that could investigate cause-and-effect relationships from such data.

“MSR research often uses a large amount of retrospective automatically collected observational data. The data are reliable and comprehensive, and for example the status of the software's source code is known at every moment of time since its creation. However, with current research methods, the full potential of the data remains unutilized,” Nyyti Saarimäki says.

Guidelines for implementing cohort studies in MSR research

Saarimäki’s research identified a method suitable for the special features of the data used the MSR field, able to obtain high level evidence. In her dissertation, she proposes as a solution the cohort research methodology commonly used in epidemiology. In cohort studies the researchers observe over time groups consisting of subjects which are either naturally exposed or non-exposed and compares the frequency of the outcome between the groups.

“In epidemiology, cohort studies have been used to study the connection between tobacco and lung cancer, as well as the health effects of the corona virus. In empirical software engineering research, they could be used, for example, to find out whether adopting a tool affects the quality of the software,” says Saarimäki.

In her dissertation, Saarimäki presents a preliminary version of the guidelines for implementing cohort studies in MSR research. The guidelines created are relevant to the research community in the field, as researchers can use the guidelines in their own research. However, the study is also useful for MSR research in general, as it also extensively deals with the use of observational data in research, regardless of the research method itself.

Public defence on Tuesday 14 November

The doctoral dissertation of Master of Science (Technology) Nyyti Saarimäki in the field of software engineering titled Applying Retrospective Cohort Study Methodology in Mining Software Repositories Studies will be publicly examined at the Faculty of Information Technology and Communication Sciences at Tampere University at 13 on Tuesday 14.11.2023 at Hervanta campus in auditorium S3 of Sähkötalo building (Korkeakoulunkatu 3, Tampere). The Opponent will be Professor Marcela Genero from the University of Castilla-La Mancha, Spain. The Custos will be Associate Professor Davide Taibi from the Faculty of Information Technology and Communication Sciences.

The doctoral dissertation is available online.

The public defence can be followed via remote connection.

Photo: Ansse Saarimäki