|Image: Public Domain frame from|
Four Horsemen of the Apocalypse (1921)
More than 3 years ago a group of researchers, publishers and representatives of the academic associations issued a set of recommendations called the San Francisco Declaration of Research Assessments (DORA). This declaration aimed to prevent further usage of journal impact factor as a proxy of quality of individual papers and their authors. It has been signed to date by 825 research organizations and more than 12 thousand individuals.
In fact, DORA presented pretty obvious arguments that were well known before it’s publication. These claims are simple, logical consequences of a journal impact factor’s definition. Impact factor was designed as neither a way of evaluating academic papers nor their authors. Yet, 3 years later, the research communication system still does not follow DORA recommendations, which shows that it is an environment where conformism can resist facts for a long time.
The impact factor of a journal is an average number of citations received per papers published in that journal during the two preceding years. Therefore, it says something about journal’s recent performance (but even in this case it has several downsides), while it says absolutely nothing about the quality of every individual paper that is published in this journal. It has also nothing to do with the quality of research conducted by individual authors publishing in this or that journal. It seems that you do not have to be a PhD to understand that. However a lot of PhDs chairing promotional committees or funding agencies completely ignore these simple truths.
The origin of quantitative measures
The increased need for quantitative performance indicators of efficiency of police officers, firefighters, health care workers and public servants originates from early 80s. It is rooted in so called New Public Management, a concept that brought the ideological principles of neoliberalism to the public sector. In short, New Public Management assumes that the public sector should be managed in the same way as private entities are, with performance oriented funding and well-defined, ‘objective’ criteria of effectiveness. It should trigger competition between different public institutions (and between their departments and eventually between their regular workers).
In the heart of New Public Management lays managerialism, and so the rise of professional group of ‘public sector managers’ who are able to lead any kind of public institutions, even when they are not very familiar with the everyday work of their regular workers. The second key feature of this concept is an external audit, very often run by private agencies. Both professional managers and private auditors cannot get into nuanced details of the work of ananalysed institution and their personnel, thus they need a simple and easy way to judge them. Quantitative indicators, such as the numbers of easy to define events (arrests, interventions, served customers etc.) work best in this role.
New Public Management spread across public sector very quickly, and it is already well widespread in the academic research sector globally. However, a good example of its relative novelty in academia might be the fact that the first international ranking of universities (and rankings are one of key issues for New Public Management) was published as recently as in 2003.
Key performance indicator for fostering human knowledge?
The idea that contribution to knowledge might be measured with the number of citations that a piece of research scored is quite old. However, for a very long time there was no data set that would make a big scale analysis of this kind possible. Every research was based on its own, small sample of references from published articles, that were manually catalogued and counted. The number of these studies were initially limited and the citation-oriented perspective on science communication process was far from being dominant. The real turn came with the birth of the Institute for Scientific Information, founded in the 1960s by Eugene Garfield. This institute started to collect information on citations on a regular basis, creating a huge and constantly growing database – Scientific Citation Index. As a consequence, citation analysis became more popular. Yet, a lot of researchers have been expressing their doubts about its accuracy as a way to indicate the real impact of research. These critiques usually raise the point that a paper might be cited for various reasons, including that it may be incorrect, controversial or simply dealing with a temporarily fancy topic. It was also noted quite early on that some kinds of papers, e.g. those describing methods or review articles that are more cited than the original research, which do not reflect their importance for scientific advancement. However, when managers get into academic research, the Scientific Citation Index was already well developed and there were not too many alternatives for citation count as a key performance indicator for the research sector.
And in fact, even the bare citation count of an individual paper turned out to be problematic in this role. Time is a reason for that. In every discipline an average citation window is different, and in some cases it might be very long. First citations may come months after publication and citation peak may occur after an even longer time. What is more, a big number of papers are not cited at all. This causes problems with treating citation count as a performance indicator. An important fact is that young researchers, who are most often subjected to various evaluation processes, usually only have fresh papers in their portfolios, and it is often impossible to use this measure to evaluate their output. But it is different with the impact factor of a journal that publishes their works. This value is available immediately after a publishing decision, usually months before formal publication, and sometimes years before it is revealed how big the actual impact of a paper was. Thus, IF became a convenient proxy of a research quality for evaluators, managers, funders, governments and ranking authors. Impact factor for a journal is calculated on the basis of an individual papers’ citation counts by the same Institute for Scientific Information. The growing importance of this measure led in 1992 to acquisition of ISI by Thomson Corporation, today the world’s largest information company (now called Thomson Reuters).
The run for impact factors
In Poland, where I live, the Ministry of Science and Higher Education is the main and virtually only funder of academic research. For several years now the Ministry has been increasing amounts of money that are available to researchers in grants and other competitive forms of funding. In the Ministry’s official documents appears the strange concept of “summarized impact factor”, which is calculated for an individual researcher by simply summing up impact factors of journals that (s)he published works in. The “summarized impact factor” is used in an evaluation process, when researchers applies for grant funding or for promotion.
Sounds reasonable? Well, impact factor is a numeric value so it might be summed up, multiplied or divided by anything you want, but the question is for what? It is pretty obvious for me that one can publish a moderately important piece of research in a high impact journal, while others may publish a ground breaking paper in a less vogue serial. It happens, right? But now in Poland and to various extents in other parts of the world, researchers have a strong incentive to care about the name of the journal that will publish their work.
Wider adoption of journal impact factor as a proxy of quality of a research paper resulted in publisher’s running to lift their average citation counts per paper. This caused some well known publishers misconduct – over-selectivity. Despite the fact that the majority of research papers are not cited at all, every journal tries to publish a minimal number of papers that will have poor citation performance. The problem is, however, that editors are not clairvoyants and they have to rely on their own visions of what is interesting for an academic audience, and they are not always right. They reject on a regular basis sound papers, that are simply seen by them as less sexy, with small chances to gain a high citation score. This used to have some additional sense in the print era, but not now in an on-line publishing system. The most notable case of senseless rejection concerns a first paper on graphene, an invention later awarded with a Nobel Prize, which was rejected over a “lack of scientific meaning”. This might be seen as not so big an issue, since the paper was eventually published elsewhere and graphene research is now flourishing, albeit higher selectivity means more work for editorial staff, and more work means more costs for publishers, so higher prices for consumers. Run for impact factors makes the whole publishing system more expensive, and slower, so less efficient. The need for publishing heavily cited papers may also influence editorial polices. For example it “pays off” to publish more reviews than original research to lift a journal’s impact factor.
Of course, the pursuit for impact factors alters the behavior of authors as well. Since they are evaluated on the basis of impact factor, they tend to publish in “high impact” journals, regardless of the fact that their audience is often very limited. The main reason for choosing toll access journals, that will keep published papers pay-walled from big audiences over open access ones is the high impact factor. Authors tend to choose journals whose audience is limited to subscribers only, which sounds crazy taking into account the existence of free to read journals, and the main reason for that is the impact factor.
The problem is also, that this whole competition is now managed by one private company. Thomson Reuters regularly remove journals from their rankings, dumping them to the very bottom of the publishing hierarchy. These decisions are always final and might not be disputed. I pointed out some time ago that their policy unfairly targets small journals and niche titles. Moreover, it can also lead to the quicker death of some research fields that could otherwise thrive by destroying the most important journals operating in these fields.
And finally, Science Citation Index belonging now to Thomson Reuters is a closed database, which makes the re-use of information and verification of findings harder.
However, every concept has its prime time and a fall, and recent months have shown that massive critique arising around extensive use of impact factor as an evaluation criteria is taking an effect.
On July 5th 2016, a group of researchers submitted a preprint to BioRxiv claiming that 65-75% of articles have fewer citations than indicated by the impact factor of a journal. This comes as no surprise for me, since citation distribution is known to be highly skewed and it is quite obvious that outstanding observations have a great impact on a mean value here. However, the practical consequence of this fact, is that when IF is used to evaluate academic authors, 3/4 of them are awarded for somebody’s else successes. I think it was already known before the preprint was available, however, that this paper became the last drop that made the cup run over.
6 days after preprint appeared on-line, all journals by the American Society of Microbiology published the editorial that they eliminated impact factor information from their websites and other marketing channels. This was later followed by Nature’s editorial that claimed that it is “time to remodel the journal impact factor”.
It seems that the number of impact factors advocates decreased to near zero. It might be a real time of change ahead of us. What will a “post-impact factor” world look like? I hope that scholars will not be forced to choose between journals that guarantee career advancement and open access ones any more. And that one of the major barriers on the way to fully open research communication will be overcome.