View on GitHub

DBGBench

From Practitioners for Researchers

Automated Repair Techniques

Go back to main page

Qualitative evaluation w/out user study

Using DBGBench, we can evaluate the correctness of the auto-generated patches. We provide the patches generated by the participants, elicit the general fix strategies, and classify them as correct and incorrect. For each incorrect patch, we provide a rationale as to why we classify it as incorrect. For some incorrect patches, we even provide test cases that show their incorrectness.

Qualitative evaluation with user study

Using DBGBench, you can significantly reduce the time and effort required for user studies (e.g., the manual review of auto-generated patches). Users can leverage bug diagnosis, simplified and extended regression test cases, the bug report, the bug diagnosis, fault locations, and developer-provided patches to make the call.

Empirical evaluation

DBGBench is a first milestone towards the realistic evaluation of tools in software engineering that is grounded in practice. DBGBench can thus be used as necessary reality check for in-depth studies. For now, we would strongly suggest to also utilize other benchmarks, such as CoREBench or Defects4J, for the empirical evaluation. Going forward, we hope that more researchers will produce similar realistic benchmarks which take the practitioner into account. To this end, we also publish our battle-tested, formal experiment procedure, effective strategies to mitigate common pitfalls, and all our material.