Personal tools
You are here: Home WAME Listserve Discussions ORI and Self-plagiarism
Navigation
 
Document Actions

ORI and Self-plagiarism

September 12, 2007 to September 13, 2007

The latest issue of the US Office of Research Integrity’s (ORI) newsletter has a short article titled: ORI Retains Its Working Definition of Plagiarism Under New Regulation By John Dahlberg, Director, Division of Investigative Oversight, ORI. Given that there has been considerable discussion on the topic of self-plagiarism on this list, I thought that the following section of the article would be of interest:

“ORI often receives allegations of plagiarism that involve efforts by scientists to publish the same data in more than one journal article. Assuming that the duplicated figures represent the same experiment and are labeled the same in both cases (if not, possible falsification of data makes the allegation significantly more serious), this so-called “selfplagiarism” does not meet the PHS research misconduct standard. However, once again, ORI notes that this behavior violates the rules of most journals and is considered inappropriate by most institutions. In these cases, ORI will notify the institution(s) from which the duplicate publications/grants originated, being careful to note that ORI had no direct interest in the matter”.

Personally, I am a little disappointed that in the eyes of ORI these cases are still not considered scientific misconduct. I suppose that if they were to be classified as such, ORI’s workload would probably triple or quadruple. Any way…“c’est la vie”.

The entire newsletter can be viewed at http://ori.hhs.gov/documents/newsletters/vol15_no4.pdf.

Miguel Roig
______________________________
Again, I think it is very important to distinguish between publishing "the same data" and publishing the same analysis of the same data.

ADDHealth, as only one obvious example, is a huge longitudinal dataset of US adolescents that is publically available for analysis. Hundreds of people analyze those data. The issue isn't the data, but the way the data are used. Specifically, different analyses of even the same variables can provide answers to substantially different questions that may or may not be appropriately combined into a single paper.

I would agree that presenting duplicate analyses in two papers is problematic. But it's not an issue of "data".

Nancy Darling
______________________________
I absolutely agree that not all forms of self-plagiarism are problematic and you provide great examples of such instances where the use of the same data is totally acceptable. But, as recent WAME exchanges show, the real problem cases occur when, for example, authors of duplicates fail to disclose the earlier publication, thereby misleading editors and readers about the true origin of the 'new' data. I imagine that, in some cases, the offending authors simply don't know any better. But, in cases where they should know better (I guess that the real challenge lies in establishing an author's intentions, state of mind, etc), this form of fraudulent behavior is, in my opinion, analogous to data falsification. That is why I am disappointed.

Miguel Roig
______________________________
In self-plagiarism I see three issues:

A. misleading re-presentation of old observations as if they were "independent" new observations, which can poison the well of the collective enterprise of science, a sin against humanity.

B. misleading re-presentation of old writing (double-dipping from the same personal well), resulting in an unfair advantage to the double-dippers.

C. Abuse of others: copyright violations in some cases, if a journal owns the copyright on the original text. But copyright doesn't cover data themselves. Worse: wasting the time of those who invest money or, worse, time, in reading the same thing rehashed.

This exchange suggests several rubrics of self-plagiarism.

1. Crypto self-plagiarism. Merton gives a case of a scholar who had forgotten he had published a paper and later wrote the same thing. Surely this is exceedingly rare. This is [B,C] and might be [A] if data are involved.

2. Self-plagiarism with intent to deceive. Re-publication of the same data with intent to get another or better impact, concealing the source etc. [A,B,C.] This I see as misconduct.

3. Self-plagiarism of the sloppy kind. [A] chiefly and sometimes [C], I would think. But B and C apply generically to any kind of sloppiness. Old data are re-presented alone or folded into new data without clear advertisement that this is what has happened. Essentially self-deception and/or sloppy scholarship. The crime is statistical and loss of independence of observations. It is akin, I think, to pooling a set of paired observations with a mixed set of unpaired observations, and running an unpaired t-test on the larger set. This sort of sloppy pooling of data happens, I'm afraid, all the time, and 'shades off into' the kind of disputes about multiple comparisons, the interpretation and presentation of p values, failure to report the test used,  whether SEM or STDEV is used, or even the number of observations, and eventually into the disputes one hears between the frequentist and Bayesian statisticians. I think this is the kind that Miguel is disappointed in, but I submit if ORI were to call this misconduct, it would be hard to justify not calling sloppy statistics misconduct. This is common in science, I suspect. If this were the law or medicine, it would be called lack of due diligence, malpractice, etc. Whether it should be viewed such in science, including biomedical science, seems to depend upon one's philosophy of science, and that means getting into the kind of subtle debate engaged by the frequentists and the Bayesians.

4. Self-reference, in which ideas, including data, are re-used, re-analyzed, re-considered, with explicit citation of the original publication in the case of experimental observations, or even without citation (in some cases) if involving re-use of other forms of text. Not [A, B or C].

**While I agree with Miguel on seeing a need to reduce [3], I would rather seethis category treated, along with mild cases of patchwriting plagiarism, as "errata"—professional errors that are embarassing but fall short of misconduct. I think we would make more progress in reducing sloppy scholarship of all kinds if we de-penalized sloppiness, but spent more energy on training scientists to be more skillful in handling data and texts.

John Rodgers
______________________________
Just to clarify, although my interest in plagiarism and my comments on this list have mostly focused on authors' re-use of their own and of others' text, there is no doubt in my mind that, within the sciences, it is the plagiarism of data that is the more serious ethical breach. In my view, there is not much difference between falsified data that have made it into the published record and re-used data that are presented and published as if they were new and independent, regardless of how they have been analyzed relative to their first appearance in a publication. For example, consider a scenario where a duplicate paper that reports data derived from a phase III trial is published and remains undetected indefinitely. What happens when these undetected, duplicate data become part of a meta-analysis? In terms of their potentially negative consequences, how are these duplicate data different from their falsified or fabricated counterparts? In all 3 cases, the 'bad' data can conceivably lead others to draw false conclusions by overstating (or understating) the nature of the effect under study. The fact is that previously published data that are re-published as new (no reference to its earlier publication), even when analyzed differently, have the potential of undermining the integrity of the literature and lead to health policies (eg, approval of a new device or therapy) that can place people's lives at risk.

I am aware that in some areas (eg, surgery!) various forms of self-plagiarism are more prevalent than in other areas of biomedicine. But, it may be that most cases of this malpractice are subtle or much more complex than simply reporting the exact same data ('sloppy pooling of data') as the scenario above illustrates. Perhaps their potentially negative consequences are a rarity, but aren't fabrication and falsification also rare occurrences? In sum, to characterize all forms of self-plagiarism as not rising to the level of scientific misconduct....what can I say? I am disappointed.

Miguel

 

Powered by RedHat and Plone Hosted by BMJTechnology

This site conforms to the following standards: