How NIH proposal percentiles are calculated and some instructions to reviewers

Sep 12 2016 Published by under Uncategorized


I am ad-hoc'ing on a study section, again. This time there was a presentation (phone confer) on scoring to "normalize" scores. A lot of good stuff was discussed, and I'm going to post parts of it over the next while. I tried to paste in the slide but it was blurry, so I'm just including the formula (exactly) and paraphrasing the text.

WHY, oh WHY am I going on about scoring? The people who read this column and sit on study sections know everything I'm going to say here. This column is not for them (though they can read & laugh). This is for the folks submitting proposals. One of the best lessons I ever got was: the more you understand about how NIH works, the more likely you are to get funded. While one must write a good, strong, kick-ass proposal, that is content, the stuff that falls into the basket of "grantsman/grantsperson -ship" may make a difference. If funding is at 9%, and truly the difference between 9% and even 12% is trivial, then everything one can do to move one's self towards 9% is a good thing. Do not get lost in grants-ship, but understanding the process, understanding what reviewers are told and what reviewers do is important. Thus...

To start with, calculating percentiles, which I actually thought it was different than this.

Here is the formula:

Percentile = 100 (rank - 0.5)/Total # R01s in 3 rounds 


  • Rank in order final impact scores from all R01 applications in the current and prior two rounds of this study section
  • Find rank of proposal in question
  • Calculate per formula

Note: ONLY R01's. Excluded are R21, R03 and other non-R01 mechanisms in that study section.

I was mistaken in that (I think???) in the olden dayes, they used to exclude the top and bottom scores. This is/was an acceptable statistical process to eliminate outliers. I may be mistaken on this one. Now, at study section, there is a range of scores (the 2-4 reviewers). Anyone who wants to vote outside the range, needs to indicate this to the SRO (Scientific Review Officer, person running the study section), and sometimes such voters are asked to write something to justify the reason why.

Part of the reason for talking to all the reviewers before SS, and this set of slides lies with problem of "score compression". This is where people tend to use 2 or 3 (on scale of 1-9) as the average, and have lots of 1's. The instruction is (quote):

5 is a good medium-impact application and the entire scale (1-9) should be used

So, we are told to follow the following to ensure: Scoring practices that promote clear ranking of applications

Consider the full scoring range (1-9) for each application. Scores need to distinguish between applications.

This my friends, is the heart of the problem. Few reviewers (almost said "no", but of course many of my near and dear friends would say that its not them) wish to give what are perceived as really low scores (<6). Here is NIH's best advice on to get the range:

Remember that a good, medium-impact application is a 5. Start with the assumption that the overall impact score of every application is a 5 before you read it, then adjust based on impact and relative strengths and weaknesses. Don’t give equal weight to each criterion.

It's not so much that the scores can't be scale (of course they can be, and are scaled). But that bunching the scores in the upper ranges means that the difference between 1 & 2 is not going to separate out scores. They really want you to start scoring at 5.

For low impact applications, use the 7-9 range, not the 4-6 range. The 4-6 range is appropriate for good applications with medium impact, and the 1-3 range is appropriate only for those applications with truly high impact.

This is just another way of saying: use the range. Reviewers are also supposed to "balance the range", and they don't want this done:

Don’t use R21s, R03s, R15s, or other non-R01 mechanisms to balance the distribution of R01 scores. They have no impact on R01 percentiles. The full range of scores should be considered for each mechanism.

Again, use the full range for everything in the same mechanism.

What does this mean to lowly-first-time scientist studying bunny hopping? Even if your score is bad (7+) and your proposal (your proposal, not you!) was triaged (not discussed), do not despair. NIH wants some score down there. And what do you do next? Read the reviews. Read the reviews. Read the reviews.

3 responses so far

  • xykademiqz says:

    This is super interesting and clear, and I don't even ever apply to NIH!

  • Ola says:

    What this means in the real world, is cycle-to-cycle variation on a single study section can have a huge impact on percentiles. If a particular cycle has a lot of ad-hocs, and they all decide to score high or low, then the grants from that cycle will do better or worse in percentile terms than those from the preceding two cycles. Consistency from one cycle to the next is something that SROs (at least the ones I've interacted with) try very hard to rein in.

    Also FYI - my Firefox browser has been kicking up a security certificate exception for Scientopia sites the past couple of days. Both here and on DM's blog. Someone at the mother ship ought to fix it.

Leave a Reply