Research, Peer Review, and the Clinical Trial
By Erin Brodwin / 12 minute read
Medical research is to health-care reporting what yeast is to bread. Almost every story relies on medical research, and if the research is missing, the story falls flat.
The bulk of medical research is the product of peer review — the process of subjecting a given piece of research to review by experts in the same field. Not unlike the role that health-care editors play, peer reviewers serve to fact-check and critique medical studies, to vet their claims, and to identify any key limitations or conflicts of interest. The result is a study published in a medical or scientific journal.
The peer-review process is the best protection we have against faulty, self-serving, and promotional studies. But it is not flawless. Understanding the limitations of peer review will enable you to make important choices about how a given study is framed and presented, as well as what kind of descriptive language reporters should use and what data they should include.
Key Terms
- Absolute risk: The probability of a given outcome.
- Clinical trial: Research studies performed in people to evaluate a medical or behavioral intervention.
- Conflict of interest: A situation in which a researcher’s professional judgment may be clouded by conflicting personal or financial interests.
- Disclosure: A portion of a peer-reviewed study where researchers declare any potential conflicts of interest.
- Efficacy: How well a given treatment works to produce an intended outcome.
- Effect size: The magnitude of the difference between groups given a treatment.
- Endpoints: The primary outcomes a clinical trial is designed to evaluate.
- Peer review: The process of subjecting an author’s research to the scrutiny of experts in the same field who were not directly involved in the research.
- Peer-reviewed journal: A journal that published research that has been peer-reviewed.
- Preprint: A research paper that is made available before being subject to peer review and is typically published later in a scientific or medical journal.
- Relative risk: The likelihood of an event’s occurring in a group of people compared with another group of people who may have different behaviors, physical conditions, or environments.
- Stigma: Discrimination against a person or set of behaviors based on perceived characteristics, behaviors, or assumptions about those characteristics or behaviors.
- Outside expert: A skilled researcher, clinician, or other expert who can evaluate a given research study, presentation, or other finding because that expert is not directly involved in the work.
For starters, not all medical journals are created equal; some are considered predatory, in that they prey on academics’ need to publish for career advancement. Such journals are known to publish low-quality content with minimal or no review. Here’s a short list of some predatory journals (a full list can be found at predatoryjournals.com):
- American International Journal of Contemporary Scientific Research
- American Journal of Advanced Drug Delivery
- Annals of Clinical Case Reports
- Clinics in Surgery
- European Journal of Biomedical and Pharmaceutical Sciences
By contrast, other journals have rigorous processes and are respected for their high academic standards and credibility. Such journals include:
- Annals of Internal Medicine
- The Journal of the American Medical Association
- The New England Journal of Medicine
- The Lancet
- Nature Medicine
One might draw the conclusion that every study published in one of the high-caliber journals listed above is going to be a high-caliber study. Such an assumption would be wrong. Studies exist along a continuum from weak to strong based on a panoply of factors, including the number of participants, their demographic makeup, and the study’s design. In general, the larger, more well-controlled, and lengthier the study, the stronger it is. Here are some indicators of high-quality research:
- Involves large, diverse groups of people
- Takes place over long periods of time
- Involves a control group that did not receive the treatment
- Follows people over time rather than looking back retroactively
- Randomly assigns participants to either the control group or the test group
Not all research follows this gold standard. In fact, there are different kinds of studies that adhere more or less to these best practices. They include these categories, from weakest to strongest:
Case reports: A collection of stories, often compiled by doctors, about individual patients that describes their medical histories. When stories of multiple patients with similar symptoms or histories are gathered, the case report is called a case series. Case reports cannot demonstrate or prove causality, but rather merely describe a phenomenon or observation.
Case-control studies: Case-control studies start with a given outcome, look back in time, and compare two groups to see what may have contributed to the outcome. For example, researchers who observed high rates of asthma in a given community might compare the high-asthma population with a low-asthma population. They might survey the two groups to get a sense of earlier behaviors or environmental factors that could have influenced the groups’ divergent health outcomes.
Cohort studies: Also called prospective studies, cohort studies involve following multiple groups (or cohorts) over time and comparing their outcomes. In contrast to case-control studies, which look backward in time, cohort studies look forward. For example, researchers might start with two populations, both with low rates of asthma, and then survey them at regular intervals over years or decades to see whether the groups go on to show any significant differences in rates of disease.
Randomized controlled trials (RCTs): RCTs are widely considered to be the gold standard of medical research. They involve randomly assigning one group of patients to receive a treatment and another group to receive a placebo, and then following and comparing the groups over time. The randomness is critical to ensuring that the treatment is the only variable influencing the different outcomes the two groups experience. When possible, the strongest RCTs also involve what is known as “blinding,” in which either the participants, the researchers, or both groups do not know which participants were given a treatment or a placebo. This strengthens the research by ensuring that neither the researchers nor the participants are influenced by their perceptions or assumptions about a given treatment.
Systematic reviews and meta-analyses: The strongest of all of the studies we’ll review here, systematic reviews and meta-analyses weigh the contributions of multiple studies to assess a given treatment or other outcome. For example, researchers looking to evaluate how well a digital diabetes program worked to help patients lose weight might review a handful of previous studies that sought to evaluate individual programs. By looking at the studies together, researchers can use statistical analyses to get a sense of how well the treatment worked.
Sometimes even the most-admired newsrooms can mishandle reporting on studies, such as when, in July 2020, during the Covid-19 epidemic, The New York Times took a small, weak study and misrepresented it as a large, well-controlled study.
In the article, reporters attempted to rate the evidence behind some of the most talked-about treatments for Covid-19. The first version of the story, which was almost immediately altered on the basis of expert feedback, placed 20 treatments into the following six categories, from best to worst:
- Strong evidence
- Promising evidence
- Mixed evidence
- Not promising
- Ineffective or harmful
- Pseudoscience
Because of the way it presented the findings, the story received wide criticism from clinicians and other experts, many of whom argued on social media that the Times had overstepped its bounds and assumed the role of medical expert with a story that appeared to recommend or flag unproven, experimental treatments. Especially concerning was the problematic way it initially presented one early-stage experimental treatment involving blood plasma taken from recovered Covid-19 patients.
In the first version of the story, this treatment was labeled “promising evidence.” The authors cited positive results from early “trials” as the evidence for their claims. But the word “trials” in the story did not refer to clinical trials; rather, it referred to a small case-control study — the second-weakest type of study — and involved just 39 people. That kind of study is far too preliminary to label as “promising,” especially in the context of different kinds of studies of other treatments. You can think of it as labeling both a toddler and a high-school student “promising” in terms of their scholarly potential. You simply don’t know enough, and the two are not comparable.
To the Times’s credit, an earlier article, published in May, about convalescent plasma headlined “Uncertain Results in Study of Convalescent Serum for Covid-19” was clear about the limitations of the study, carefully describing caveats and noting that the only available evidence was from a small, early-stage study.
“Analyses like these are fraught with difficulties,” the earlier story read. “The only way to know for sure if the treatment works is to randomly assign patients to receive [the treatment] or a placebo.”
But the July story did not devote adequate attention to these key limitations. In an updated version of the story, posted on July 17, the authors changed the treatment labels entirely. For convalescent plasma, the dark-green “strong evidence” label was replaced with an orange one that read “tentative or mixed evidence.”
What could an editor have done? At minimum, an editor should have suggested a more appropriate label for convalescent-plasma treatment, since one small case-control study does not constitute “promising evidence” for a treatment that remains highly experimental, not to mention possibly expensive or inaccessible.
An editor could have also chosen to strike the word “trials” in the description of the treatment, since it could lead readers to wrongly believe that convalescent plasma had been studied in a well-designed study, such as a randomized controlled trial.
An editor needs to be careful not only with the examples, facts, or figures contained within a story, but also with the overall impression or take away it provides.
The Research Process
Let’s take a look at a related research process that produces most of the approved treatments we have today.
Clinical trials are privately or publicly funded research studies that involve testing an experimental treatment in volunteers. These trials generally happen in a series of four steps, known as “phases.” Each phase has a distinct purpose and is designed to help researchers answer specific questions. As with all medical research, editors should be mindful of the limitations of clinical trials. One important limitation that influences later-stage clinical trials is that they tend to be made up predominantly of white males. This can either directly or indirectly disadvantage members of underrepresented groups, particularly women, Black people, and other people of color.
General Questions for Reporters
- What level of research or clinical trial is it?
- Has it been peer-reviewed?
- What are the limitations of the study?
- Who are the study subjects, and what is their demographic makeup?
- What do outside experts think of it?
- Do the study authors have conflicts of interest?
Take the example of clinical trials on multiple myeloma, a kind of cancer that causes a buildup of cancer cells in bone marrow: Black Americans make up 20 percent of multiple-myeloma patients and are twice as likely as white patients to be diagnosed with the disease. Yet since 2003, Black participants have accounted for less than 5 percent of patients in multiple-myeloma trials.
How does this underrepresentation affect potential treatments? Let’s take a look at another example, involving asthma. Most asthma research has focused on people of white European descent, even though asthma has a higher prevalence among Black people and other people of color. Several studies also suggest that different ethnic groups — white Europeans included — have varying genetic mutations that affect how they respond to treatments. The problem is that the research focuses overwhelmingly on only those mutations that affect white Europeans and their descendants. A 2016 study in the journal Immunogenetics, for example, concluded that of all the recognized genetic mutations tied to asthma, just 5 percent apply to Black people. Were more late-stage clinical trials of asthma done to include Black people, researchers might come to learn of a different set of genetic mutations that are more strongly tied to asthma in Black people. Eventually, those data could lead to the creation of better-designed and more-efficacious treatments.
Besides trial demographics, editors should also pay attention to effect sizes, which allow reporters to describe the magnitude of a new treatment’s impact. For example, if a reporter says a new drug halves the risk of heart attack, be skeptical: That language implies that the drug has what’s known as an “effect size” of 50 percent, which is very high and could well warrant a “breakthrough” label. However, that is rarely the case; it’s more likely the reporter is conflating two types of risk — relative and absolute.
The chapter on statistics goes into greater detail on this, but you should know that absolute risk describes the real change a treatment might make in a patient’s life, while relative risk merely describes a treatment’s unspecified potential. Let’s pretend there’s a study involving two groups of 100 people. In the first group — our control — two people out of 100 have a heart attack. In the second group, the one receiving the proposed treatment, one person out of 100 has a heart attack.
It is mathematically true, then that the number of heart attacks was halved, or decreased by 50 percent – from two to one. However, that is the relative risk. In absolute terms, the decrease was by one person in 100, so the absolute risk reduction was one percentage point. In a population of 320 million people, that could still be significant — more than three million lives — but it’s definitely not half.
As a result, it’s generally better to talk about the change in absolute risk rather than relative risk.
Here’s another way of looking at it, from Health News Review, a watchdog group that was dedicated to reviewing the claims of health news stories and was run by Gary Schwitzer, an adjunct associate professor at the University of Minnesota’s School of Public Health. (The site shut down in 2018.) Imagine you get a “50-percent off” coupon that doesn’t specify what it can be used on. If the coupon can be used in, say, a jewelry store, the money you’d save could be in the hundreds or thousands of dollars. If it can be used only on snacks at a checkout counter, however, at most you’d save a few dollars. The coupon’s true value — what it can be used on — represents its absolute risk, while the 50-percent figure would represent the relative risk.
Relative risk is unhelpful in reporting because it could involve comparing two very different groups — sedentary people and active people, for example. Absolute risk, on the other hand, describes the likelihood that something will happen under specific conditions.
The Four Phases of Research Studies
It’s common for researchers to talk about these phases as if their definitions were broadly understood. While the public is unlikely to know the differences, you should, so that you can translate research-speak into something meaningful for your audiences.
Phase I: In Phase I, researchers are testing an intervention for the very first time, typically in a very small group (20 to 80 people). The primary goal of this research is merely to test the intervention’s safety and identify side effects, not to show whether the intervention works to treat a condition. In general, editors may want to discourage reporters from covering Phase I research, as most interventions in this stage will still fail before reaching the market.
Phase II: In this phase, which may, in rare cases, merit limited coverage, researchers test an intervention in a larger group (100 to 300 people) to determine the drug’s efficacy and further study its safety. I’ll draw a parallel here to the case-report level of peer-reviewed research: in Phase II, just as with case reports, results do not involve comparing the intervention with other treatments. As a result, these studies give no indication as to whether the drug is an improvement over any alternatives. What they can do, however, is show whether an intervention may be superior to no treatment. In 2010, the proportion of interventions that made it through this phase was 18 percent, although some estimates suggest that figure is increasing and rose to 31 percent in 2015.
Phase III: In Phase III, researchers give an intervention to 300 to 3,000 people. At this stage, researchers may also compare the experimental intervention to existing treatments, meaning that these studies can take the form of the randomized controlled trials. It is generally at this point that regulators approve new treatments and make them available, making the Phase III trial stage the most pivotal phase of research. Still, one must not get caught up in any hype. Be sure that reporters covering Phase III trials thoroughly discuss any findings about harmful side effects, costs, or lackluster results.
Phase IV: After the drug has been released, this final phase is used to track the safety of the drug as it’s taken up by the general population.
Appropriately, most clinical-trial coverage focuses on Phase III. An example of appropriate coverage can be found in Biopharmadive’s reporting on the then-experimental depression treatment esketamine, a (now-approved) nasal spray developed by Johnson & Johnson.
For Biopharmadive’s story, the publishers used the headline “J&J’s ketamine-based antidepressant sees mixed results in Phase III.” Avoiding hype and fear-mongering, the headline provides the type of balance you want to see in medical stories.
That headline was an appropriate choice: The treatment appeared to help significantly curb the symptoms of depression in the Phase III study, which involved a large group of patients who did not respond to other treatments. However, another part of the trial, which involved testing the therapy in a more challenging group of older adults, fell short of showing a clear benefit for people receiving the treatment compared with people who did not get it.
Editors should be especially careful with headlines related to clinical trials. Sometimes, small tweaks can make a headline misleading, too sensational, or just plain wrong. For example, one might have been tempted to term the esketamine trial a “success” because some — but not all — of the elements were indeed successful. Since headlines carry outsize influence on reader’s perceptions, care must be taken not to overstate results.
Editors must also be mindful of potential conflicts of interest that may influence a study’s design, process, and outcomes.
Many clinical trials involving pharmaceutical treatments are designed and funded by the companies themselves, meaning journalists must treat the outcomes with extra caution. For example, Janssen Research & Development, the biotechnology company that designed and funded the esketamine study, is owned by Johnson & Johnson, maker of the drug that was tested. In the study’s conflict-of-interest disclosure section, which for this journal was located in a tab titled “article information,” two-thirds of the authors are listed as employees of Janssen. Although the study was well-designed, and some measures were taken to avoid undue bias, the potential for conflict of interest was inarguable and must be made clear to readers.
Often, reporters will turn to outside analysts, academics, or other researchers for perspectives on new drugs or treatments. This is smart and appropriate journalism, but in a field rife with collaborations, conflicts, and competition, beware. Analysts may be influenced when they stand to benefit from the approval of a particular treatment.
“Yes, some [analysts] may have doctorate degrees and medical degrees, and some may also have a solid understanding of the science supporting these projections,” said Randi Hernandez, oncology editor at Cancer Therapy Advisor. “Still, you can’t know their motivations” when they comment on something that could have implications for companies’ stock prices.
Although this may seem like obvious advice for editing, just make doubly sure reporters have fact-checked analysts’ statements, and encourage them to include relevant data, placing any quotes in the proper context.
Preprint Servers
Research studies and the publishing process are notoriously slow. Journals must evaluate submissions and then put them through the peer-review process and the publishing process. During that time, the researchers are usually forbidden to discuss their research, instead reserving publicity for when the study is published. In some cases, like during a public-health emergency such as the coronavirus pandemic, researchers may take to publishing their findings outside of the peer-review process. These findings are typically found in what are known as “preprint servers,” online repositories that house early studies and data associated with papers that have not yet been accepted by traditional academic journals. Rather than a thorough review process, preprints are typically checked only for plagiarism, although they may undergo other basic screening requirements.
Because preprints have not been thoroughly vetted, you should always take extra caution when reporters choose to cover them. Be sure to provide proper context for your audience.
Not surprisingly, preprint papers were the cause of some consternation during the pandemic, when researchers were working at breakneck speed to publish their findings.
Rather than a thorough review process, preprints are typically checked only for plagiarism.
“We’ve seen some crazy claims and predictions about things that might treat Covid-19,” Richard Sever, a co-founder of two of the most popular servers, bioRxiv and medRxiv, said to Nature News. (Nature is one of the world’s largest publishers of peer-reviewed articles).
One preprint study that attracted controversy was a paper written by Stanford researchers that suggested that the number of Covid-19 cases in Santa Clara County, California, was 50 to 85 times higher than the number of confirmed cases in the region. Published on the preprint server medRxiv, the findings were contested by public-health experts. A group of external reviewers from the Johns Hopkins Bloomberg School of Public Health — who came together specifically to address the flood of preprint coronavirus research — determined that the preprint lacked the evidence necessary to bolster such dramatic claims.