How to use the ESEC/FSE 2017 artifact

The DBGBENCH artifact provides materials drawn from a large-scale debugging study with human participants. A comprehensive description of the shared dataset and infrastructure is provided in the project website In general, the provided artifact can be used for several purposes that might well go beyond the scope of our ESEC/FSE 2017 paper. In the following, we will explicitly outline two usage scenarios to help understand the scope of our work. Scroll down for step-by-step instructions. Find out how to generate results and figures in our paper: here.

Extending the DBGBENCH dataset

In order to carry out a similar study and extend the DBGBENCH dataset, we provide all questionnaire and the necessary computational infrastructure. Specifically, the following steps should be carried out in order to replicate the study environment:

Install the docker virtual infrastructure. This docker includes all the necessary source code as well as tests to manifest the investigated bugs. It can be shared with the study participant.
Download tutorial materials that include slides and videos to provide details about the study to participants.
Download example questionnaire for the study participants. For each bug under investigation, this questionnaire needs to filled up by a participant after she finishes debugging. The questionnaire can be set up in an online form (e.g. Google form) to make it easily accessible by the participants.

Using the DBGBENCH dataset

The DBGBENCH dataset collected data from 12 professional developers debugging 27 real-world bugs. This dataset is provided in a comprehensive format in the project website.

One of the primary usage of DBGBENCH dataset is to compare the effectiveness of automated debugging tools, in particular, automated fault localization and automated repair tools. In the following, we outline a few examples:

We provide scripts to check the plausibility of patches provided by participants. Please follow these instructions to apply and check plausibility of participant-provided patches.
We include fault locations as provided by the participants. These fault locations are manually cross checked to validate their correctness. See the plaintext version of such a fault location for the bug find.24e2271e.These fault locations can be used for validating an automated fault localization tool. As an example, for statistical fault localization tools, participant-provided fault locations can be compared with the most suspicious statements reported by the fault localization tool.
We provide participant-provided patches for each bug. For example, see all the patches and their categorizations (plausible and/or correct) provided for the bug find.24e2271e. These patches can be used to compare the readability, structure and correctness of patches generated by automated repair tools.
We include bug diagnosis results (in natural language) provided by the study participants. See the plaintext version of such diagnosis for the bug find.24e2271e. Such diagnoses can be leveraged upon to generate natural-language explanations of common bug types. Such explanations can further be used to design and evaluate sophisticated debugging tools that highlight suspicious locations along with a possible explanation of the bug.
We provide data on the average time and the average number of correct fixes for each bug in the project website. This data can be used to evaluate automated debugging tools in the future. In particular, we hypothesize that a necessary (but not sufficient) condition to validate automated fault localization and repair tools is to outperform the study participants.

Step-by-Step instructions

If time is of the essence, we suggest to explore this artifact following the error in our motivating example (find.66c536bb, Fig. 1).

Benchmark Summary
Collected Data for find.66c536bb
- Error-introducing commit
- Simplified Bug Report to quickly reproduce the error and understand its symptoms
- Original Bug Report as reported by a user, incl. discussion with the developer.
- Bug Diagnosis as participant description of how the error comes about.
- Fault Locations as pointed out by about 50% of participants or more.
- Original Patch as provided by the developer. Notice that this patch occured about a month after the error was introduced.
- Patches as provided by our participants, including
  - fix strategies,
  - our classification as correct and incorrect, and
  - our rationale for our classification;
  - For many incorrect patches, we produced tests that fail on these patches, e.g., for find.091557f6
Install and Run Docker Virtual Infrastructure
- Use infrastructure to check patch plausibility
- Apply patch of participant MzBiYjQ4ZG for find.66c536bb and execute test case. Once you installed and ran the infrastructure, you can copy-paste this into your terminal.

# Checkout patches
cd ~/Desktop
git config --global http.sslVerify false
git clone https://github.com/dbgbench/dbgbench.github.io.git dbgbench
  
# Order of errors is scrambled for each participant.
# Identify version containing find.66c536bb
ls find*/find.66c536bb

# Suppose find1 contains find.66c536bb
cd ~/Desktop/find1/find

# Apply patch from participant MzBiYjQ4ZG
patch -l -p1 -f < ~/Desktop/dbgbench/patches/find.66c536bb/MzBiYjQ4ZG.patch

# Build fixed version
make

# Execute test case (should *not* print FAIL)
../test/test.sh $PWD || echo FAIL

# Revert patch, build, and execute test (should print FAIL)
patch -R -l -p1 -f < ~/Desktop/dbgbench/patches/find.66c536bb/MzBiYjQ4ZG.patch
make
../test/test.sh $PWD || echo FAIL

Final note

The entire raw data of DBGBENCH can be downloaded as a self-containted CSV file.
The entire cleaned data of DBGBENCH is available as ZIP file and as Github Repository. Pull requests welcome!
A summary of the data can be downloaded as PDF file.
You can re-generate all figures in the paper here.
Apart from the aforementioned use cases, researchers and professionals may take advantage of this dataset in ways that best match their research interests.
Find out how DBGBench can be used for the qualitative evaluation of automated repair techniques.