The AERI Stacktraces dataset


The Automated Error Reporting (AERI) system retrieves information about exceptions. It is installed by default in the Eclipse IDE and has helped hundreds of projects better support their users and resolve bugs.

This dataset is a dump of all records over a couple of years, with useful information about the exceptions and environment. It is composed of:

This dataset is published under the Creative Commons BY-Attribution-Share Alike 4.0 (International) licence.



More information about the AERI system can be found on the Code Trails website.

Privacy concerns

The result contains no email address, user id or machine id. Rather than removing the information (we are not sure that we remove all required information) we decided to simply pick relevant information from the file and push it into the output.

End users have an option to keep their own class names private. We have presently no simple means to know what stacktraces in the database extraction should be kept private, so we decided to play it safe and hide class names whose packages don't start with known prefixes [1]. All private classnames have been replaced by the HIDDEN keyword.

[1] "ch.qos.*", "com.cforcoding.*", "*", "com.gradleware.tooling.*", "com.mountainminds.eclemma.*", "com.naef.*", "com.sun.*", "java.*", "javafx.*", "javax.*", "org.apache.*", "org.eclipse.*", "org.fordiac.*", "org.gradle.*", "org.jacoco.*", "org.osgi.*", "org.slf4j.*", "sun.*"

Format: problems

  "summary": "",
  "osgiArch": "",
  "osgiOs": "",
  "osgiOsVersion": "",
  "osgiWs": "",
  "eclipseBuildId": "",
  "eclipseProduct": "",
  "javaRuntimeVersion": "",
  "numberOfIncidents": 0,
  "numberOfReporters": 74,
  "stacktraces": [
    [ "stacktrace for incident" ],
    [ "stacktrace for cause" ],
    [ "stacktrace for exception" ]

Format: incidents

    [ "stacktrace" ]
  "summary": "Failed to retrieve default libraries for jre1.8.0_111"

Format: Stacktraces

The structure used in the mongodb for stacktraces has been kept as is: it is composed of fields with all information relevant to each line of the stacktrace. Each stacktrace is an array of objects as shown below:

    "cN": "",
    "mN": "parseHTTPHeader",
    "fN": "",
    "lN": 786,