The UCI Department of Informatics welcomed Darko Marinov, a professor of computer science at University of Illinois at Urbana-Champaign, for a presentation on flaky tests in coding on April 15. During this presentation, Marinov discussed the issues software developers face when flaky tests interrupt the process of testing code for bugs and how his research attempts to resolve these delays.
UCI’s Department of Informatics currently hosts an Informatics (INF) Seminar Series that invites guest speakers to share their research pertaining to information and computer science.
According to Marniov, flaky tests are tests that “give false alarms when detecting software bugs.” Bugs are errors that can occur in various steps of software development. Briefly speaking, this developmental process includes coding, fetching changes with or for the code, building and testing to ensure nothing “breaks.” This last step of checking for potential errors, bugs and breaks is done through regression testing, or the running of functional and non-functional tests, to ensure that tested software still performs after a change. These tests result in a deterministic pass or fail. If a single test fails, it is the job of the software engineer to “debug” the code.
Marinov’s presentation demonstrated an example of a failed test (red box).
Large companies like Google or Microsoft use various tests as a precautionary measure for software bugs. According to Marinov’s presentation, more than “75 million tests are run at Google every single day.” Flaky tests cause major issues for companies because they non-deterministically pass or fail the same code — which is to say, tests will analyze code to be right when it is wrong and vice versa.
Amongst the complications that occur as a result of flaky tests, Marinov noted three key issues which include “misleading developers about changes, wasting developers’ time and reducing the credibility of tests.”
Flaky tests have caused recurring problems for software at all levels. “Big companies such as Apple, Huawei, Twitter and Mozallia are investing in research,” Marinov said to combat flaky tests and their false alarms.
To demonstrate the regularity in which flaky tests occur, Marniov referenced a statistic from Google, stating that “[We have] around 4.2 million tests that run on our continuous integration system. Of these, around 63,000 have a flaky run over the course of a week. While this represents less than 2% of our tests, it still causes significant drag on our engineers.”
To tackle these issues of flaky tests, Marniov explains how his research involves fixing order-dependent (OD) testing. Simply put, “order-dependent” testing is another method of checking for software bugs.
“First, we must determine what kind of OD test will be used. Then, we need to find the polluter. Finally, once we identify the issue, we need to find the cleaner,” Marniov said.
Flaky tests are a key challenge in regression testing. iFixFlakies is a group of researchers, including Marniov, who are attempting to address this challenge by finding ways to automatically fix flaky tests and proactively detect polluters before they lead to test failures. Creating more reliable tests for software functioning does not eliminate the possibility of software bugs in flaky testing, but it will lessen the frequency in which they occur.
Marniov’s research group iFixFlakies is currently researching how to detect flaky tests faster, perform analytical analysis of OD tests and more.
Natalie Ringdahl is a STEM Intern for the spring 2022 quarter. She can be reached at email@example.com.