

The Competition is sponsored by NASA with platform support provided by RAMPIT. Thank you for your interest in the NASA Earth & Space Air Prize (the “Competition”). They appear more similar, because they are now aligned with typical distributions across the total population of reviewers. If we apply this rescaling process to the same two reviewers in the example above, we can see the outcome of the final resolved and normalized scores. We rescale the standard deviation like this:īasically, we are finding the difference between both distributions for a single reviewer and those for all of the reviewers combined, then adjusting each score so that no one is treated unfairly according to which reviewers they are assigned. Then, we change the mean score and the standard deviation of each judge to match. In order to do this, we measure the mean and the standard deviation of all scores across all judges. To ensure that the judging process is fair, we rescale all the scores to match the judging population. It wouldn't be fair, if we didn’t consider this difference.įormally, we denote the standard deviation like this: As an example, imagine that two reviewers both give the same mean (average) score, but one gives many zeros and fives, while the other gives more ones and fours. The standard deviation measures the “spread” of a reviewer’s scores. The mean takes all the scores assigned by a reviewer, adds them up, and divides them by the number of scores assigned, giving an average score. To do this, we utilize a mathematical technique relying on two measures of distribution, the mean and the standard deviation.

We ensure that no matter which reviewers are assigned to your team, each application will be treated fairly. If your application was rated by the first reviewer, it would earn a much higher total score than if it was assigned to the second reviewer. The first reviewer is far more generous, as a scorer, than the second reviewer, who gives much lower scores. One reviewer scoring an application may take a more critical view, giving any assigned team a range of scores only between 1.0 and 2.0, as an example meanwhile, another reviewer may be more generous and want to score every submission between 4.0 and 5.0.įor illustrative purposes, let’s look at the scores from two hypothetical reviewers: Since the same reviewers will not score every application, the question of fairness needs to be explained carefully. The most straightforward way to ensure that everyone is treated by the same set of standards would be to have the same reviewers score every application unfortunately, due to the number of applications that we may receive, that is not possible. Examples of possible scores for a trait are… 1.4, 3.7, etc. Those scores will combine to produce a total normalized score. Each trait will be scored on a 0-5 point scale, in increments of 0.1. Those reviewers will offer both scores and comments against each of four distinct traits. To ensure fairness, once a valid application has been submitted for assessment, a minimum of five reviewers will grade each submission.
