TY - JOUR
T1 - Crowdsourcing hypothesis tests: Making transparent how design choices shape research results
AU - Pozzi, Maura
PY - 2020
Y1 - 2020
N2 - To what extent are the results of research investigations influenced by subjective decisions that
scientists make as they design studies? Fifteen research teams independently designed studies to
answer five original research questions related to moral judgments, negotiations, and implicit
cognition. Participants from two separate, large samples (total N > 15,000) were then randomly
assigned to complete one version of each study. Effect sizes varied dramatically across different
sets of materials designed to test the same hypothesis: materials from different teams rendered
significant effects in opposite directions for four out of five hypotheses, with the narrowest range
in estimates being d = -0.37 to 0.26. Meta-analysis indicated a lack of overall support for two
original hypotheses, mixed support for one hypothesis, and significant support for two
hypotheses. Overall, none of the variability in effect sizes was attributable to the skill of the
research team in designing materials, while some variability was attributable to the hypothesis
being tested. In a forecasting survey, predictions of other scientists were strongly correlated with
study results, and average predictions were similar to observed outcomes. Crowdsourced testing
of research hypotheses helps reveal the true consistency of empirical support for a scientific
claim.
AB - To what extent are the results of research investigations influenced by subjective decisions that
scientists make as they design studies? Fifteen research teams independently designed studies to
answer five original research questions related to moral judgments, negotiations, and implicit
cognition. Participants from two separate, large samples (total N > 15,000) were then randomly
assigned to complete one version of each study. Effect sizes varied dramatically across different
sets of materials designed to test the same hypothesis: materials from different teams rendered
significant effects in opposite directions for four out of five hypotheses, with the narrowest range
in estimates being d = -0.37 to 0.26. Meta-analysis indicated a lack of overall support for two
original hypotheses, mixed support for one hypothesis, and significant support for two
hypotheses. Overall, none of the variability in effect sizes was attributable to the skill of the
research team in designing materials, while some variability was attributable to the hypothesis
being tested. In a forecasting survey, predictions of other scientists were strongly correlated with
study results, and average predictions were similar to observed outcomes. Crowdsourced testing
of research hypotheses helps reveal the true consistency of empirical support for a scientific
claim.
KW - research robustness
KW - scientific transparency
KW - research robustness
KW - scientific transparency
UR - http://hdl.handle.net/10807/146268
U2 - 10.1037/bul0000220
DO - 10.1037/bul0000220
M3 - Article
SN - 0033-2909
VL - 2020
SP - 451
EP - 479
JO - Psychological Bulletin
JF - Psychological Bulletin
ER -