Repeatable and Reproducible Evaluation
Fraida FundNYU Polytechnic School of Engineering
“In industry, we ignore the evaluation in academic papers. It is often wrong and always
irrelevant.”
- Head of a major industrial lab, 2011
Source of quote: Vitek, Jan, and Tomas Kalibera. "R3: Repeatability, reproducibility and rigor." ACM SIGPLAN Notices 47, no. 4a (2012): 30-36. http://janvitek.github.io/pubs/r3.pdf)
Common problems in evaluation
● Unclear goals● Meaningless measurements● No baseline (or wrong baseline)● Not representative● Implicit assumptions● Weak statistics● Ineffective or misleading graphics● Proprietary code and data● Results are not reproducible
Repetition
The ability to re-run the exact same experiment with the same method on the same or similar system and obtain the same or very similar result.
Reproducibility
Independent confirmation of qualitative results by a third party, using the description of experiment design in the report/paper.
Six degrees of reproducibility
5: The results can be easily reproduced by an independent researcher with at most 15 min of user effort, requiring only standard, freely available tools (C compiler, etc.).
Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf
Six degrees of reproducibility
4: The results can be easily reproduced by an independent researcher with at most 15 minutes of user effort, requiring some proprietary source packages (MATLAB, etc.).
Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf
Six degrees of reproducibility
3: The results can be reproduced by an independent researcher, requiring considerable effort.
Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf
Six degrees of reproducibility
2: The results could be reproduced by an independent researcher, requiring extreme effort.
Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf
Six degrees of reproducibility
1: The results cannot seem to be reproduced by an independent researcher.
Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf
Six degrees of reproducibility
0: The results cannot be reproduced by an independent researcher.
Source: P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf
Experiment design
❏ Is there a clear mapping between your experiment goal and experiment design?
❏ Does your experiment achieve your goal with the minimum amount of work possible?
❏ Is it clear what the “result” of your evaluation is?
❏ Are there as few manual steps in your experiment as possible?
❏ Are the tools used in your experiment open and widely available?
Data analysis and visualization
❏ Did you separate raw and processed data?❏ Do you have a data analysis and
visualization script? (No manual calculations or interactive image generation!)
❏ Did you share the raw and processed data and script used to generate any images in your report?
❏ Are you using version control?❏ Do you follow good statistics and data
integrity practices?
Documentation
❏ Is it clear where to begin? (e.g., can someone picking a project up see where to start running it)
❏ Are there instructions for setting up the experiment and executing it?
❏ Do you explain non-obvious steps in the instructions?
❏ Have you noted the exact version of every external application used in the process?
❏ Are you using version control?
Final lab exercises
Routing (repeatable and reproducible): ● Dijkstra’s algorithm● OSPF
Software defined networks● Just to give you another tool to use in
potential projects
Projects
● Form groups of 3 or 4● Project will run on GENI
○ Lab exercises give you some software tools to use: iperf, netem, tinyhttpd, OSPF setup, SDN, others
○ May use these or other software● Must use good experiment design practices● Must use good practices for communicating
quantitative results● Must use good practices for creating
reproducible experiments
Projects
The labs are meant to help you, so you can use them as a jumping-off point for projects
Topics can include:● Data center networks● Congestion and flow control● Routing and resiliency● SDN● Other topics related to HSN
Projects
Start thinking about your project● Work in groups of 3-4● Must have reasonable division of labor (every student
takes responsibility for a part of the project)● Must apply lessons from the lab lectures
● Will give you specific instructions for proposal before spring break.
● Project proposals due @ midterm.
Lab coverage on midterm
Lab topics are included on midterm:● Using networking testbeds● Experiment design● Communicating results● Reproducible experiments
Will give some example problems for you to work on.
Getting help
● Office hours on lab website ● Asking for help on the Internet
○ For e.g. Git Bash, R usage, there’s lot of information online
○ GENI Users Group: https://groups.google.com/forum/#!forum/geni-users
○ If you ask a question, cite it in your report
References1. Raj Jain, The Art of Computer Systems Performance Analysis: Techniques for Experimental
Design, Measurement, Simulation, and Modeling," Wiley- Interscience, New York, NY, April 1991, ISBN:0471503361.
2. Moraila, G., Shankaran, A., Shi, Z., & Warren, A. M. “Measuring Reproducibility in Computer Systems Research.” Tech Report (2014). http://reproducibility.cs.arizona.edu/tr.pdf
3. Vitek, Jan, and Tomas Kalibera. "R3: Repeatability, reproducibility and rigor." ACM SIGPLAN Notices 47, no. 4a (2012): 30-36. http://janvitek.github.io/pubs/r3.pdf
4. P. Vandewalle, J. Kovacevic, and M. Vetterli. "Reproducible research in signal processing - what, why, and how." IEEE Signal Processing Magazine, 26(3):37–47, May 2009. http://infoscience.epfl.ch/record/136640/files/VandewalleKV09.pdf
5. Edwards, Sarah, Xuan Liu, and Niky Riga. "Creating Repeatable Computer Science and Networking Experiments on Shared, Public Testbeds." ACM SIGOPS Operating Systems Review 49, no. 1 (2015): 90-99. http://mescal.imag.fr/membres/arnaud.legrand/research/readings/acm_sigops_si_rsea/p90-edwards.pdf and http://groups.geni.net/geni/wiki/PaperOSRMethodology
6. Leek, Jeff. The elements of data analytic style. 20157. Handigol, Nikhil, Brandon Heller, Vimalkumar Jeyakumar, Bob Lantz, and Nick McKeown.
"Reproducible network experiments using container-based emulation." In Proceedings of the 8th international conference on Emerging networking experiments and technologies, pp. 253-264. ACM, 2012. http://tiny-tera.stanford.edu/~nickm/papers/p253.pdf and https://reproducingnetworkresearch.wordpress.com/