Resource for Evaluation of Research Articles

Provided for medical editors who evaluate and edit manuscripts to provide quick access to some relevant resources and information. It is a work in progress and feedback is welcome. Please contact Margaret Winker, MD, with any suggestions.


Evaluation by Section



  • Generally nondeclarative, not a question, begins with main concept if possible, and without causal language, eg, "effect of," unless the study is an RCT
  • Study design in subtitle (ie, after colon)


Since the structure of abstracts vary by journal, a general approach is provided here.
The background should provide the context of why the study question is important. The study question should be clearly stated.
The methods should include study design, population and setting, number of participants, years of study, length of follow up, and main outcomes measures. For RCTs, who if anyone was blinded to the intervention/control should be specified,  the intervention and control conditions should be defined, the number in each group should be specified, statewhether the analysis was based on intention to treat, and provide the number lost to follow up. For systematic reviews/meta-analyses, see PRISMA for abstracts and includeyears of search, data sources, number of studies included, types of study designs included, eligibility criteria, for synthesis/appraisal methods. Surveys should include the response rate.
Main results should be quantified (with 95% CIs if possible), the important dependent variables that are adjusted for listed, the actual results and/or absolute risk(s) provided, and the results should match what is presented in the main paper. Provide major limitation(s) as appropriate.
Conclusions should interpret the study based on the results presented, emphasizing what is new and potential implications.


  • Indicate how the literature was searched to determine whether the hypothesis had been addressed previously and, if it was addressed previously, why the current study was performed. 
  • Do not provide the results or conclusion.
  • End with clear description of study question or hypothesis.

Methods and Results

  • Define study design, population and setting, number of participants, years of study, length of follow up, and outcome measures.
  • Numbers of patients, samples, etc, should be reported and accounted for. Check that numbers add up. "Lost to follow-up" should be defined. Reasons for exclusion should be defined.
  • Raw data/actual numbers for outcomes, not just percentages or ORs, should be presented. Numerators and denominators should be presented for percentages, at least in tables.  
  • Comparison groups should be specified, for, eg, OR, HR, etc.
    • For adjusted analyses, all factors adjusted for should be provided. Unadjusted or minimally adjusted analyses should be presented in addition to fully adjusted analyses.
    • 95% CIs should be provided.
    • For common events, ORs may need to be adjusted to approximate relative risk            ]
  • Ethics
  • Human studies: Specify institutional review board (IRB) approval and informed consent if required. Written vs oral informed consent should be specified (with rare exception informed consent should be written). If authors do not have access to an IRB, research needs to be conducted in accordance with the Declaration of Helsinki.
  • Animal studies: Report handling following ARRIVE guidelines.         
  • Statistical resource: SAMPL resources [add link to SAMPL when posted];  Statistics
  • For multiple hypothesis testing (if hypothesis generating, state this): should be adjusted for and method of adjustment provided
  • Specific claims should be justified by the results. 


A short, clear summary of the article's findings should be provided including what the study adds to existing research, limitations of the study, implications, and next steps.


Website URLs should be checked for access and date access verified provided; references should not include personal communications or papers listed as in process or submitted; if paper is listed as in press, citation should be updated as possible.


  • Figures should not be reproduced from other sources without permission
  • All Figures must have titles (and almost always, legends)
  • Graphs should not be 3-D unless the data are
  • Pie charts generally should not be used for research results
  • If measurements are discrete, display as discrete points rather than a continuous line
  • 95% CIs should be provided whenever appropriate (rather than SE)
  • If data are paired, they should be displayed as such
  • For graphs, axes should begin at zero; if they do not, a break should be shown in the axis
  • Odds ratios should be displayed on a logarithmic scale
  • Survival curves should include number at risk below x axis


Appropriate language when describing humans
  • Use nondehumanizing terms for patients: "person with diabetes" rather than "diabetic"; intellectual and developmental disabilities should be used in place of mental retardation or other similar terms
  • "Subject" should not be used in reference to humans; should be changed to participant, patient, individual, person.
  • Reporting of race/ethnicity: Define the categories for race/ethnicity and who defined them. Explain why race/ethnicity was considered important and what it is believed to represent (eg, are SES or genetic differences being attributed to race/ethnicity?)
  • Compliance should not be used with regard to adhering to treatment recommended or prescribed; "adherence" should be used instead.

General style

  • Temper claims of primacy of results by stating, "to the best of our knowledge" or something similar
  • Drug names: use nonproprietary (aka "generic") name (with rare exceptions) rather than brand name
  • Personal communications should be cited in text and the person named should agree in writing to be named
  • All abbreviations should be defined

Statistical editing: (see SAMPL statistical reporting)

  • If the term "significant" is used it should be clear whether statistical or clinical significance is intended (if clinical significance is intended, verify that it is)
  • The term "trend" should be used only when the test for trend has been conducted; it should not be used when the P value is close to but not significant.
  • P value should not equal zero.

Specific Study Designs

  • See checklists and explanatory documents under CONSORT; editor should ensure that all components of CONSORT are present, including how randomization, allocation concealment, and any blinding of intervention or outcomes were performed; how drop outs or loss to follow up were defined (resource on handling dropouts: Matsuyama Y. A comparison of the results of intent-to-treat, per-protocol, and g-estimation in the presence of non-random treatment changes in a time-to-event non-inferiority trial. Stat Med 2009;29:2107-2116.); and that power statement is provided
  • Trial should be registered in one of registries listed in WHO site
  • Registration should have been completed before patients were randomized (if not, ask authors for explanation)
  • Outcome measures and sample size in manuscript should match those in trial registry and protocol
    • If other outcomes are listed in trial registry or protocol, author should indicate where the other results are (already published; in press, under review, or not submitted? If not submitted may be selective reporting)
    • If the submitted study outcomes are not listed in the trial registry or protocol, the authors should indicate whether the outcomes were prespecified or post hoc (post hoc are hypothesis generating only)
  • Main analysis should be intention to treat (all individuals randomized are included in the analysis in the groups to which they were originally assigned; for any dropouts specify if imputed and if so using what method; justify, and refer to as modified ITT). Main comparisonsgenerally should be the outcome in the intervention vs the control, not before-after (ie, between group rather than within group effects). 
  • Noninferiority/equivalence trials should state equivalence bounds and require different approach to missing data; refer to; if a test of superiority is negative, statements regarding equivalence should not be made (the statement should be that one is not superior to the other).
  • Departures from protocol should be explained and discussed in the text
  • Safety data should be presented including numbers of specific events (whether or not adverse events are thought to be related to treatment)
Systematic Reviews/Meta-analyses
  • See checklists and explanatory documents under PRISMA provided; paper states if SR was registered and if so, registry number provided; paper states if there was a protocol and if so, protocol document provided as supplementary file.   
  • All dates of search should be provided and search should be timely; ideally including multiple databases, and languages, with evaluation of studies for inclusion and data abstraction performed in duplicate. The study quality/risk of bias and publication bias should be evaluated.
  • Safety data should be presented wherever relevant
  • Network meta-analysis resources: Caldwell, D.M., A.E. Ades, and J.P. Higgins, Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ, 2005.331(7521): p. 897-900.; Lu, G. and A.E. Ades, Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med, 2004. 23(20): p. 3105-24.; Salanti, G., et al., Evaluation of networks of randomized trials. Stat Methods Med Res, 2008. 17(3): p. 279-301.
Cohort, Cross-sectional, and Case-control Studies
  • See checklists and explanatory documents under STROBE.
Diagnostic tests
  • See checklists and explanatory documents under STARD.
  • Population should be representative of population of interest (ie, for a screening test, the screening population rather than individuals confirmed to have the disease and confirmed not to have the disease, and for diagnostic tests all individuals who would undergo the test, not excluding those who have an intermediate test result)
  • Derivation results should be validated in an independent population. The number of patients reclassified for whom reclassification changes management is important, if relevant  (see Statistical Evaluation of Prognostic versus Diagnostic Models: Beyond the ROC Curve)
Qualitative Research
  • See checklists and explanatory documents under EQUATOR COREQ or RATS criteria (COREQ has more background on qualitative research that is useful for overall evaluation).
  • See here for reporting guidelines for surveys.
  • The population from which the sample was drawn should be defined and generalizability addressed
  • The response rate should be defined using standard measures (eg, see AAPOR) and the authors should define efforts made to increase the response rate (repeated survey waves, financial incentives). Unless the response rate is very high, the authors should provide measures comparing the nonrespondents to the respondents to assess potential for bias if possible; if serial waves of surveying were performed, characteristics of each wave of respondents should be provided to show whether the characteristics changed over time
Cost-effectiveness analyses: see here.
Modeling studies
The following list is derived from from Geoffrey P Garnett, Simon Cousens, Timothy B Hallett, Richard Steketee, Neff Walker. Mathematical models in the evaluation of health programmes. (2011) Lancet. DOI:10.1016/S0140-6736(10)61505-X
Items that should accompany modeling analyses:
Diagrams that show model structure: Show how disease natural history is represented, process and determinants of disease acquisition, and how putative intervention could affect the system.
Complete list of model parameters: Include clear and precise descriptions of the meaning of each parameter, together with the values or ranges for each, with justification or the primary source cited, and important caveats about the use of these values noted. Where a parameter value comes from another modeling analysis, this caveat should be noted. Parameter values that are fit in this model (not independently measured) should be clearly marked.
Assessment of model predictions with data: Illustration of agreement between model (as used in the analyses) and data or observational information. Clear statement about how model was fitted to the data, including goodness-of-fit measure, the numerical algorithm used, which parameter varied, constraints imposed on parameter values, and starting conditions.
Presentation of results: Key modeling results to be presented with a scientifically based estimate of uncertainty. Presentation of uncertainty analyses should be accompanied with statement about the sources of uncertainties quantified and not quantified, and these sources can include parameter, data, and model structure. Sensitivity analyses enable readers to identify which parameter values are most important in the model. Uncertainty estimates seek to derive a range of credible results on the basis of an exploration of the range of reasonable parameter values. The choice of method should be presented and justified.
Discussion of model structure: Include the scientific rationale for this choice of model structure and identify points where this choice could influence conclusions drawn. Also to describe the strength of the scientific basis underlying the key model assumptions.
Types of modeling:
Linear (eg, chronic diseases; ignores indirect effects) vs non-linear (includes dynamic behaviors)
Deterministic (every replication is the same for a given set of parameters) vs stochastic (includes chance, eg, Monte Carlo; running the same parameters will give different results)
Population-based vs individual based
Age (cohort) vs calendar time (changing pattern of events over time)
Qualitative assessment:
Construct validity: Does the model’s structure and its explanation accord with the biological system? Have the important causal processes been included?
Can a convincing explanation of the results be provided?
Irrelevant detail should not be included (eg, gender for infectious diseases not related to reproduction).                                                                                                   
Actual=model with the intervention(s) the model is designed to test; counterfactual=model without the intervention 
Version 1.1, May 7, 2014
Margaret Winker MD 

Back to Home