Rigor & Reproducibility: Scientific Rigor (3/N)

May 25 2016 Published by under Uncategorized

As I said earlier (link to my intro post):

NIH is changing has changed proposal guidelines rules again. Here is a link to what they have to say about the new parameters: Rigor  & Reproducibility.

In 2/N I wrote about "Scientific Premise", or at least what I can discern about it. It's probably worth at least glancing at those before diving deep into this. The next concept with which we must (and we must) deal with is Scientific Rigor. We all like rigor. We all strive for rigor. I've never met anyone who says (seriously) "my science isn't particularly rigorous, but I'm okay with that".

Scientific rigor is defined, by NIH, as:

the strict application of the scientific method to ensure robust and unbiased experimental design, methodology, analysis, interpretation and reporting of results. This includes full transparency in reporting experimental details so that others may reproduce and extend the findings.

Reading through the various bits of  verbiage posted by various NIH-niks, it seems that being a rigorous scientist is mostly what we'd all like to be: a good scientist by first principles. This smacks a bit of circularity and its  not much help to me (or you) in terms of actually writing the damn proposal. I do think NIH has tried, in many places to be as explicit as possible in defining these terms, with the caveat that different subdisciplines will have different standards or emphasis. In fact they say so explicitly in a couple of places. So...

One word that comes up is in the discussions of rigor is transparency. My perception is that transparency means an explicit description of what you are going to do experimentally or analytically. In the examples NIH published (you need to scroll down, or search on "examples" on this page), one major part of rigor seem to be a relatively well-written paragraph in the Research Design section. These examples hit on the points and use the words NIH thinks indicate transparency. I've deconstructed much of one example here:

Male and female mice will be randomly allocated to experimental groups at age 3 months. [my highlights]

Are samples randomly chosen and assigned to groups? If you have animals (like mice), the choice may be automatically random, and the assignment to groups important. If you are doing a clinical or epidemiologic study, then how you find & chose individuals becomes more important. For a study that involves development, or a particular disease/physiologic entity with a time course, the when of assigning to groups is important. However the exact mechanism of choosing random is not (flipping a coin, or a table or a program, it matters not). This points out one of the difficult things for interpreting these instructions: what does need to be included and what does not. The next sentence in their example gives the why of this age (what happens).

At this age the accumulation of CUG repeat RNA, sequestration of MBNL1, splicing defects, and myotonia [the criterion that is relevant to the project succinctly stated] are fully developed. [my highlights]

Then comes 1-2 sentences on what is done, including explicit dosage of the treatment being given. Short, simple but containing enough detail that someone else could do it:

The compound will be administered at 3 doses (25%, 50%, and 100% of the MTD) for 4 weeks, compared to vehicle-treated controls. IP administration will be used unless biodistribution studies indicate a clear preference for the IV route.

This is followed by another sentence that probably has been in your proposal, but is one of the cornerstones of rigor - how did you determine the sample?

A group size of n = 10 (5 males, 5 females) will provide 90% power to detect a 22% reduction of the CUG repeat RNA in quadriceps muscle by qRT-PCR (ANOVA, α set at 0.05).[my highlights]

Group size, power stats and effect size, together with statistical method, all in one compact short sentence. You do not need more than one sentence. You do not need more than what is in this sentence, even if you have spent three months learning how to do what is in this sentence. But this single sentence indicates that the person writing actually understands what is necessary for a power calculation. In my experience, effect size is over looked in the brouhaha concerning p-statistics and p-hacking and whatnot. But that's another post. This sentence also contains the criterion for determining what the effect is ("CUG repeat..."). That is an alpha (α), btw, in the phrase about how power was calculated. This is a marvelous sentence, and I do not even come close to doing this kind of science, yet as a reviewer I can appreciate what this sentence signals about the proposer. It would be very easy to take this sentence structure and put your own science into it:

A group size of n = 16 (8 males, 8 females) will provide 90% power to detect a 10% reduction in the efficiency of hindfoot flexion measured through frame-by-frame analysis of high-speed cinematography (ANOVA, α set at 0.05). [my highlights]

I do not think this is plagiarism. A time honored way to learn how to write is to take other people's excellent sentences and rewrite them with new content. Another a time honored way to avoid plagiarism is to take notes whilst writing, and then write a para from your notes. Further, there are a number of grant-writing workshops that have their own explicit sentence-level templates. When reviewing, I can almost always tell which workshop the PI attended. I do not mind the use of templates when reviewing grants. In fact, it usually makes it easier to read the grant and understand the content and meaning. Remember, you do not want to make the reviewer work. Your goal is to enlist them as an ally in promoting your proposal to the study section.

Back to the example, more on methodology including blinding of scientists (and which are blinded of a large group of workers):

The treatment assignment will be blinded to investigators who participate in drug administration and endpoint analyses.

This is another sentence which you can use as a template for your proposal. And this example ends with a statement about goodness-of-lab:

This laboratory has previous experience with randomized allocation and blinded analysis using this mouse model [refs]. Their results showed good reproducibility when replicated by investigators in the pharmaceutical industry [ref].

This is more along the lines of documenting what a lab/ PI can do. I am not sure both of these sentences are necessary, and certainly a well-published PI could put references at the end of these statements documenting this.  But, pointing out that one can do the work necessary for the project is an important part of any proposal.

There are other examples (again scroll down to example #2 &3), that are also worth reading, sentence by sentence. Read each sentence in their examples and ask: what is the purpose of this sentence? What is being conveyed about doing the research as well as the specifics of the project being proposed? From example #3:

 Random Forest [a machine learning approach described in the sentence before] uses a bootstrap method to assess test error, ideal in our situation of small sample size (n=18). For diversity and load measures, significance between groups will be assessed using non-parametric Wilcoxon rank-sum tests.   

This acknowledges the small sample size and proposes a valid methodology. It might have been improved with a citation where the program and method had been successfully published, even if not in the same situation.

In Mike Lauer's (who writes in the NIH extramural blog, Nexus) post on Rigor he makes a couple of points worth repeating about what is rigor, comparing grant applications to papers (here is an update about Rigor from Mike):

In published papers, full transparency in reporting experimental details is crucial for others to assess, reproduce, and extend the findings.

He points out that signaling [my choice of word] of rigor in a proposal would include both experimental aspects (what you are going to do) and analysis aspects (both how you decide what to do, and also what you are going to do with the data you get). These things include sample size considerations, determination of what constitutes scientific control in experiments, replications, and avoid bias. Bias and Robust are two more words that show up frequently. Luckily NIH tries to define everything:

What is meant by "robust" and "unbiased"?

Robust results are obtained with solid, well-controlled experiments capable of being reproduced under well-controlled conditions using reported experimental details. Applicants should consider methods to reduce bias, such as having multiple individuals recording assessments, defining terminology in advance, using independent, blinded assessors, etc. "Robust" and "unbiased" results are goals, not absolute standards to be met, and may vary across scientific disciplines.

Note room for variation in the application of these ideas. The reviewer instructions (download the pdf here) contain these bon mots:

Scientific Rigor: The strict application of the scientific method to ensure robust and unbiased experimental design, methodology, analysis, interpretation and reporting of results.

There was some back & forth in the blog comments about "strict application" and innovative/exploratory work. This is indeed a whole separate post, but what exactly a PI believes, nay, knows to be "exciting, exploratory" and my favorite "paradigm changing" (something NIH allegedly wants), is often perceived by reviewers as "lacking rigor" or "sloppy reasoning". Framing your exciting new stuff in the context of the scientific method, and explaining why the results are paradigm changing is a more successful route to take in proposal writing, with respect to what study sections perceive.

The reviewer instructions continue with the following points:

Whereas scientific premise pertains to supporting data, scientific rigor pertains to the proposed research (statistical procedures, data analysis, precision, subject inclusions and exclusion criteria, etc.).


Scientific premise <--> supporting data (what is known)

Scientific rigor <--> proposed research (what you are going to do)

So, first off, I am always suspicious of any writing that starts with "whereas". We are not lawyers or politicians.  But the upshot of this is: the premise (see 2/N) is in the significance, background and the justification/support. The rigor is in the design and approach. And the important caveat:

Different research fields may have different standards or best practices for scientific rigor.

I suppose this seems to go without saying, but its nice to see it like this. I remember when power stats were first being demanded in IACUC reviews of sample size justification. It was difficult to persuade people that there were a variety of legitimate sources that a PI could use in the calculation of either the variation or the effect size parameters.

Rigor will be assessed in peer review as part of the Approach criterion for research grant applications and as part of the Research Plan criterion for mentored career development award applications.

There is a lovely table in the reviewer guide, as to where this information goes, but the gist of which is:

Scientific Premise: pertains to Supporting Data and the review criterion is Significance

Scientific Rigor: pertains to Proposed Research and the review criterion is Research Plan

At some point in the future, I will talk about the two more parts (four in total) to the guidelines: (3) Consideration of Relevant Biological Variables (which includes sex as a biological variable (SABV); and (4) Plan for Resource Authentication. In some ways these are easier to understand, if harder to implement.




2 responses so far

Leave a Reply