Software Artifact & Technical Guides
- ACM Artifact Review and Badging - The current badging guidelines for judging scientific software artifacts, by the Association of Computing Machinery (ACM).
- Artifact Evaluation: Tips for Authors - Ten experience-based tips, with justification and examples, for creating software artifacts; by Rohan Padhye.
- BenchExec - A framework for reliable benchmarking of non-interactive tools, with built-in resource control and a table generator for visualizing results.
- Benchmarking Crimes - A synopsis of the many ways an experiment design or analysis can go wrong, by Gernot Heiser.
- Can you trust your experimental results? - A general framework for validating experimental designs; a technical report developed based on the Evaluate 2011 workshop.
- EAPLS Artifact Badges - The European scheme for software artifact evaluation.
- Empirical Evaluation Guidelines - A checklist to evaluate soundness of scientific experiment setup, developed by the ACM Special Interest Group on Programming Languages (SIGPLAN).
- Empirical Standards for Software Engineering research - The official evidence standards for conducting and reporting studies in software engineering; developed by the ACM Special Interest Group on Software Engineering (SIGSOFT).
- Guide for Accelerating Computational Reproducibility in the Social Sciences - A structured guidebook toward assessing and improving computational reproducibility.
- Guidelines for Proof Artifacts - Proof artifacts are a special category of scientific software and thus have their own presentation standards; the guidelines are maintained by Marianna Rapoport.
- Handbook for Reproduction and Replication Studies - A practical how-to guide for how to carry out a reproduction or a replication study.
- Reliable benchmarking: requirements and solutions - Motivations for reliable benchmarking and presentation of BenchExec.
- STABILIZER: Statistically Sound Performance Evaluation - Addressing the bias that commonly arises in measurements of effect size, i.e., the magnitude of change in performance; by Charlie Curtsinger and Emery Berger.
- Scientific Benchmarking of Parallel Computing Systems - 12 rules of best practices for reporting empirical evaluation results, by Torsten Hoefler and Roberto Belli.