3 de abril de 2017

5 services to help researchers find free full text instantly & a quick assessment of effectiveness

As open access takes hold, the ability to quickly find free versions of articles becomes more and more useful and important. So what are the best ways to do so?

The scenario I address this time is this, you land on a journal article with a paywall. You don't have access via your institution and need access in the next 5 minutes, what do you do? The extensions (typically Chrome, occasionally Firefox) listed below will instantly suss out if there is a free version (for the public) sitting on the web somewhere and tell you where to download it within seconds. (There's also 5 Alternative ways to get scholarly material that don’t involve the library but these methods are a bit more manual and take time e.g. emailing the author or requesting via #icanhazpdf.)

I'm going to stick to mostly legal methods (so no sci-hub based methods) and on extension and plugins that work for everyone and not those that rely only on institutional access or subscriptions. So I have left out services like Johan Tilstra's lean library ezaccess button or Trey Gordner's Koios extension (though that's focus on books not articles). Nor is the old aging standby Libx included or extensions that append institution proxy strings, because they do not have functions designed for finding open articles.

The methods are

1. Lazy Scholar button
2. Google Scholar button
3. Open Access button
4. Extensions based on https://www.oadoi.org/ service
5. Others (e.g. http://doai.io/, http://dissem.in/ ,Reference managers)

Unfortunately due to the highly distributed nature of open access articles and the lack of consistency and standards for open access aggregators to work on, there is no method that is 100% reliable in finding the full text.

I will briefly mention each of these methods and then end with a rough assessment of how effective they are.

In brief though, it seems even with a quick cursory test, extensions like Google Scholar button, or Lazy Scholar button that rely on Google Scholar to automatically identify free full text are much more effective than other methods such as oadoi.org or the methods used by openaccess button.

Some of it is simply due to Google Scholar's prowess at crawling and indexing the thousands of institutional repositories compared to services relying on open access aggregators like CORE and BASE which may be less capable due to various technical issues.

More fundamentally though, services like Open Access button and oadoi.org generally are unable/unwilling? to pick up vast amounts of free full text sitting on normal repository pages like University hosted pages and for the small sample I tested they also do not surface the large amounts of free full text on ResearchGate and Academia.edu.

This is not purely a technical issue, as the refusal to surface free full text on such sites by such services may reflect the ethical decision not to show full text due to the lack of certainty of the legal status of these articles. For example we certainly know 40% of papers deposited on Research Gate are publisher PDFs that violate the journals' copyright agreements.

This sets up an interesting decision to make for researchers and librarians.

1. Lazy Scholar

This is by far the most complicated extension on the list, but is also probably the most capable and in terms of chronology it is one of the first, launched circa 2014.

Among other functions, it is capable of checking if your institution has access via full text, shows various citation metrics, comments from commenting systems like Pubmed commons, provides functions like creating of citations and has some recommender functions.


I've long lost track of the various APIs and web scrapping lazy scholar does but at certain points it checks http://doai.io/ (see #5), dissem.in possibly even oadoi.org (see #4) for free full text.

Still I suspect the most effective avenue it uses to find free full text is to run a Google Scholar search and scrape the link to the free pdf that appears (if available) in the Google Scholar result. (Note it also scrapes the link created by the library's link resolver if you have access to one for access to articles behind paywalls).

Interestingly it was the first extension I know to exploit Google Scholar this way to find full text, it was even available before the official Google Scholar button (see below) was launched.

Even more suprisingly behind this extension is a one man team, Colby Vorland  who is not associated with libraries or similar institutions in any capacity but merely a researcher in nutrition science who came up with this to solve his own problems.

2. Google Scholar button

This is an official Chrome extension by Google themselves launched in 2015. In typical Google fashion it's simple yet effective. It can be seen as a stripped down version of Lazy Scholar, with it's major function devoted to searching Google Scholar for free text and then displaying it to the user.

How does it work?  It exploits the fact that Google Scholar is very effective at finding full text whether free or via institutional access. Below shows a example of a result in Google Scholar.


Highlighted in yellow is the free full text, "Find it@SMU Library" - provides full text via the library link resolver

But what happens if you don't start from Google Scholar and land on a page that is asking you to pay and you are too lazy to open another tab and search for the article in Google Scholar? Use the Google Scholar button released by Google last year instead.

On any page, you can click on the Google Scholar button extension and it will attempt to figure out the article title you are looking for, run the search in Google Scholar in the background and display

a) the free full text (if any)
b) the link resolver link (if your library has a copy of the article)


3. Open Access button 

This extension started by a couple of students in late 2013, has attracted a ton of attention from the press.

If you look at the recent Innovations in Scholarly Communication survey, the open access button seems amazingly  popular.

http://dashboard101innovations.silk.co/page/Access

This popularity would be a function of the media attention given, or perhaps a proportion of the respondents  took "open access button" as the option of open access as opposed to the extension itself.

In any case, I admit to not paying much attention to it initially because the earliest functions were about encouraging researchers to report whenever they ran into a paywall to highlight the issues of paywalls.

However today it goes beyond that. As OA button has tweeted


So unlike most of the other extensions on this list, it doesn't just try to find free full text available online when you encounter a paywall but if it fails to find free full text you can also request for it and the system will try to detect who the author is send a request for it, among other functions.

I'm not going to comment much on the later functions as that's not the main purpose of this article.

But OAbutton has clarified that their method of automatically finding free full text on the web is their own special method that is different from systems like OAdoi.org that came later. I haven't had much time to dig into their system but it probably involves looking at material indexed in large OA aggregator systems like CORE, BASE or using their own systems to index full text etc.

4. Extensions based on https://oadoi.org/ service e.g. unpaywall

During the Open Access week 2016, the oadoi.org service was launched by impactstory. The conceit was simple, you would enter a doi of an article, and it would resolve to a free full text version if available. It even provides a API you can build services off.

It sounds really neat but the difficulty lies I think in two areas.

Firstly you need to have a good coverage of all the open access articles there for this to work reliably. The main subject repositories are relatively few in number and easy to catch and do workarounds if needed, but the institutional repositories are the tricky bit due to the heterogeneity and numbers. You also need to cover the Open access journals for completeness sake I suppose but if something is published in a open access journal, paywalls isn't an issue.

Secondly, if I feed you a doi of a paywall article, if the only version currently available in repositories are post prints or even pre-prints you need a way to recognize that these are the versions of what you want.

As of Jan 2017, crossref has started to allow issuing of dois for preprints, while I assume post prints are meant to have the same dois has the version of record. In both cases, there needs to be a way to indicate they are related and it needs to be done consistently. Currently, I doubt this is done to any extent.

In short, connecting a user from a paywall version to a open access version is extremely tricky.

OAdoi states they do quite a bit to find full full text.

https://oadoi.org/about
This looks impressive, but as you will see later it has many gaps in coverage.

In any case, quite a few services have started to leverage OAdoi such as ExLibris SFX link resolver (Ex Libris is working on compatibility with the alma link resolver), zotero etc.

The one that most researchers will use is probably unpaywall.


Once you install the extension, when you come to a paywall page the extension will attempt to find a free full text article version.


The color of the tab that appears at the extreme right has the following meanings


This extension which is currently as of writing in beta, seems to be taking off.

5. Others (e.g. http://doai.io/, http://dissem.in/, Reference managers?)

Besides oadoi, there are also services like http://doai.io/ by CAPSH (Committee for the Accessibility of Publications in Sciences and Humanities)  etc , which are similar in nature in terms of trying to detect free full text.

How effective are these tools at finding free full text?

I haven't had the time to do a full blown scientific test but it's pretty obvious after testing a little, in terms of effectiveness, there are two tiers.

Basically ones that rely on Google Scholar such as Lazy Scholar and Google Scholar button are a lot more effective at finding full text and fall into the first tier of effectiveness and everything else falls into the second tier.

Here's why.

Google Scholar is typically recognized as the biggest index of scholarly material and despite some issues with indexing repositories, it's by far the best in this area.

Other attempts at discovering free full text , such as oadoi.org or open access button generally are far weaker in terms of coverage. Most of them typically do one or more of the following

a) query crossref for open access articles
b) rely on aggregators like CORE, BASE etc to cover repositories particularly institutional repositories
c) crawl specially larger well known repositories like Arxiv, Pubmed Central, SSRN etc
d) May use datacite etc to find open data sets

It's a heroic effort, but it misses a lot that Google Scholar finds (except in the case of data sets of course) due to the following reasons.

Firstly, as OADOI state themselves, a major weakness of such services currently is coverage of Green OA and in particular institutional repositories where they rely heavily on open access aggregators like BASE.

As I wrote in The open access aggregators challenge — how well do they identify free full text?, unfortunately BASE and its cousins tend to have problems with coverage of institutional repositories. The main subject repositories are relatively few in number and easy to work with, but the institutional repositories are the tricky due to the heterogeneity of systems and the sheer numbers.

The problem isn't necessarily the contents aren't indexed in them (though this can be so), but because of the lack of standards, there is no easy way to identity if the records are pointing to full text.

Also as earlier mentioned, how can you easily tell if a paper sitting in the repository is a pre-print or post-print of a paywalled article? While cross-ref I believe does allow you to indicate if some papers are different versions of the final published version this is seldom done.

In contrast, Google Scholar has worked hard at it's algorithms to identify and group different "manifestations" of the same article and hence has almost no problem with this though obviously it's algorithms are not perfect.

Secondly a lot of material Google Scholar finds are not in repositories per se, but can be on all sorts of sites including public facing University webpage. These are invisible to most other methods as they harvest only from defined sites such as repositories and journal sites, while Google Scholar will index articles as long as they seem scholarly and certainly if they are on edu sites.

Lastly, my sample tests show that many services such as OA button and OAdoi.org are not picking up papers freely available on ResearchGate or Academia.edu but Google Scholar is. Given that various studies are showing that they (ResearchGate in particular) are major sources of free full text, this makes a huge difference in the effectiveness at finding full text.

Defenders of OAbutton and doaoi.org are likely to say that the problem of institutional repositories is known and they are working on it. More-over, I speculate with the possible rise of more centralized preprint repositories like bioRxiv, SocArXiv etc, coverage of institutional repositories might perhaps become less important.

More critically, defenders will say it is correct and proper for their services not to pick up free full text from non-repository sites and even ResearchGate or Academia.edu because the legality of the "free" articles is in question. (Technically it probably isn't very hard to try to address this, after all one-man author Lazy Scholar button has leveraged on Google Scholar even before the GS button team was launched!)

For example we certainly know from a recent paper that 40% of papers deposited on Research Gate are publisher PDFs that violate the journals' copyright agreements.

Conclusion

The way I see it, the amount of free full text you can find via Google Scholar is probably the upper bound on "legal" free papers and what you can find using OADOI/OAbutton etc will be the lower bound.

That said most researchers are pragmatic and just want access to the full text. For such users Google Scholar button or Lazy Scholar button (which has similar functionality) will be most effective for them. While many researcher might shy from outright piracy like using sci-hub , most would probably be okay with using Google Scholar button to find full text, since they use it daily anyway.

What do you think? Should services like oadoi.org or openaccess button aim to maximise the chance of finding free full text even if not all of it is legal? Or is that missing the point of these extensions?

What should we recommend as librarians to our researchers? Should we recommend Google Scholar button/ Lazy Scholar as it is most effective even though we know some of what is picked up is not quite kosher? Or do we promote open access button/ unpaywall based on Oadoi?

Autor: Aaron Tay
Twitter: <@aarontay>
Fuente: <https://musingsaboutlibrarianship.blogspot.com.co/>

No hay comentarios.: