This post has been written by Liz Stovold, Research Data Support Manager and Information Specialist, Cochrane Airways.
What is Figshare?
Figshare provides the infrastructure for the St George’s Research Data Repository. The repository facilitates the discovery, storage, citing and sharing of research data produced at St George’s. It is possible to store and share a range of research outputs in the repository including datasets, posters, presentations, reports, figures, and data management plans. Each item that is published via the repository receives a Digital Object Identifier (DOI) which makes it easy to cite, share and promote your work.
What is a collection?
One of the features of Figshare is the ability to create a citable collection of individual related items. You can choose to publish a collection publicly, or opt to keep it private. Collections can be added to over time and republished as they are updated with new items. There are several advantages to using collections, such as the ability to group themed research outputs together in one place, and to showcase a portfolio of work.
Here at SGUL, Cochrane Airways – a research group based in the Population Health Research Institute – decided to create a collection of the posters and presentations that they have produced over a number of years. A Figshare collection enables the Group to showcase and cite their research dissemination activities and share with funders and other stakeholders. It also provides them with one place to store these outputs instead of saving them across a variety of shared and personal drives.
What is a ‘project’?
A Figshare ‘project’ also enables researchers to group together related items, but it differs from a collection in that it allows multiple collaborators to contribute and to add notes and comments. You can choose to make your project public or keep it private. The project itself doesn’t have a DOI, but the items within a project can do. A project can contain a mix of publicly available data and private data visible only to the project collaborators.
Cochrane Airways are piloting a Figshare project to store, share and publish reports and other documents that have been produced as part of their priority setting work. A project hosted on Figshare allows them to collate the output of their ongoing work, share documents within their group, and publish documents with a DOI as and when needed.
Could a collection or project in Figshare be useful for you or your team? Contact the SGUL RDM Service at researchdata@sgul.ac.uk to discuss your needs, or see SGUL Research Data Management for more general information and guidance.
This post has been written by Liz Stovold, Research Data Support Manager and Information Specialist, Cochrane Airways.
A data management plan (DMP) is an important part of a research project and many funders require a DMP as part of a research proposal. A DMP will typically cover issues such as data collection, format, storage, security, documentation, discoverability, reuse, sharing, retention and preservation. Thinking through these issues before embarking on your research will help to improve the organisation of your data throughout the lifecycle of your research and save you time in the long run.
To help you with writing your data management plan, St George’s subscribes to DMPonline – a tool provided by the Digital Curation Centre (DCC). To access DMPonline you simply need to log in with your St George’s credentials:
From the dashboard, click on ‘create plan’ and off you go!
Detailed guidance is available in the help tab, together with links to a wealth of resources on data management planning and examples of data management plans. You can also look at publicly shared DMPs from the reference tab:
Using DMPOnline to write your DMP offers a number of benefits:
access to funder specific templates
built-in guidance for each section of the plan
invite your collaborators to join the plan
add comments for your collaborators
option to request feedback on your plan from the SGUL Research Data Management Service
export your plan in a variety of formats including MS Word and PDF
option to keep your plan private, share with SGUL DMPOnline users, or share publicly
The St George’s Research Data Repository is a digital archive for discovering, storing, sharing and preserving research data produced at St George’s. Other research outputs such as posters, presentations, protocols, reports and software/code can also be shared in the repository, allowing researchers to get credit for a wider range of research outputs. Every output shared receives a DOI, making it more findable and citable.
The repository is managed by the St George’s Research Data Management Service and is powered by figshare. Figshare recently improved some of system’s functionality. In this post we’ll overview two of these changes and what they might mean for researchers:
changes to confidential data, and
linking data with their associated publications
Changes to confidential data
The confidential data feature is now referred to as ‘permanent embargo’. This change is retrospective and all datasets that were previously published as ‘confidential’ are now ‘under permanent embargo’.
This is mostly a change in name. The function works in exactly the same way as confidential data used to. Researchers can publish a description of the data they possess. The data itself is not published. Instead, we’ll provide an email address for external users to request access to the data. This feature is useful when anonymised data cannot be made publicly available, but they can be shared under controlled access conditions.
To demonstrate how this works we can look at this dataset (shown in part below) which supports the peer-reviewed publication, “Weekend and weekday associations between the residential built environment and physical activity: findings from the ENABLE-London Study.”
Where researchers will see a change is in how they apply a permanent embargo to a dataset. When uploading a dataset for publication, you will need to go to the Embargo section of the form and select ‘Permanent’ from the dropdown menu (as shown in the image below).
Once this is selected, apply the embargo to the files only and then add a reason for the file being under embargo (as shown below).
Linking data with their associated publication
For data supporting a publication, researchers can now more prominently link the data with their associated publication. This will allow users to find the main publication related to a dataset easily, enhancing transparency and increasing the visibility of your work. This dataset shows how data and their associated publication can be linked (see image below).
This information can only be added once the article is public and has a DOI.
To do this, you will need to include the title of the published paper and the paper’s DOI in the file upload form, as shown below.
If you do not have this information when first publishing the dataset, that’s fine. Simply leave these fields blank. You can add this information later once the paper is public – even after the dataset has been published. This will not generate a new version of the dataset.
Our guidance
The repository guidance on our website has been updated to reflect these changes.
Get in touch
If you have any questions about these changes, or you’d like to request a demo of the data repository for your research group, please email the SGUL RDM Service at researchdata@sgul.ac.uk. We’d be happy to help you.
This week October 21 – 27, 2019 is Open Access week, an international event celebrating and promoting openness in research.
In keeping with this year’s theme, Open for Whom? Equity in Open Knowledge, this blogpost reflects on the public benefits of open data, the current challenges and opportunities. We’re using the Library’s twitter account (@sgullibrary) to retweet interesting articles and blogpost all this week.
Open for whom?
This week the international research community is celebrating Open Access Week by reflecting on equity in open knowledge; enabling inclusive and diverse conversations on a single question: “open for whom”? Today’s blog post focuses specifically on open research data. UK Research and Innovation (UKRI) state in their Common Principles on Data Policy that:
Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner.
But who exactly does open research data benefit? We often speak about the benefits of open data to research and innovation:
enabling transparency
promoting reproducibility
boosting opportunities for collaboration
enhancing opportunities for innovation
reducing inefficiencies in research
The public ultimately benefit from open research data but are often treated as beneficiaries and not active, engaged partners.
This year’s theme asked me to challenge an assumption that open research data are for (and used primarily by) scientific/technical specialists working “in the public interest”, rather than the public themselves. A noble endeavour, I thought. So off I set…
Designed by Freepik
Who is the public?
At the very start, I faced a conundrum – who exactly is the public? The National Co-ordinating Centre for Public Engagement (NCCPE) helped ‘define the territory’. The short answer is everyone. Anyone can be a part of the range of groups that make up the public.
Source: The National Co-ordinating Centre for Public Engagement
Non-governmental organisations, social enterprises, health and well-being agencies, local authorities, strategic bodies and community, cultural and special interest groups all comprise members of the public with an interest in accessing data to inform decisions that will benefit their group.
Releasing raw data in ways that make the data easy to find, access, understand and reuse helps maximise the potential benefits of research data across the social spectrum. It should be easy to discover what research data are available and how that data can be accessed. When released, data should be in open formats so that anyone can be able to access it, not just a select or privileged few possessing expensive, proprietary software. Data should also be shared with sufficient information about how it was created, how it should be understood and how to reuse it meaningfully and responsibly. Finally, data should always be shared under licences which tell people what they can do with it. Called FAIR data, these principles of data management and sharing enable maximum reuse of research data.
Measured voices
It’s here that a measured voice within in me started whispering… and I listened carefully.
Is this really enough? This still has the potential to get
messy. Very messy. Especially if we’re talking about health and medical data derived
from human beings, which can be sensitive and which we have taken
responsibility for protecting.
In the fallout of various data
scandals, including scandals about the data used to
train artificial intelligence, organisations everywhere are
scrambling to restore public trust in the way we handle and use data. Part of
restoring that trust is in the transparency offered by open data. Another
aspect of restoring trust is in safeguarding the data that people provide us with
and using that data responsibly, in ways individuals have consented to.
This tension between openness and our professional responsibilities is recognised in the UKRI’s data policy as well:
UKRI recognises that there are legal, ethical and commercial constraints on release of research data. To ensure that the research process is not damaged by inappropriate release of data, research organisation policies and practices should ensure that these are considered at all stages in the research process.
This is a tension we are constantly negotiating given the kinds of data that we handle at St George’s.
Data ethics
A new field of applied ethics, called data
ethics, gives us a useful framework for exploring and responding to legal
and moral issues related to data collection, processing, sharing and reusing.
The Open Data Institute has developed the Data Ethics Canvas to
help organisations identify and manage ethical issues related to data. The UK
Department of Digital, Culture, Media and Sport also provides a Data
Ethics Framework to guide the use of data in the public sector.
Being responsible in our data sharing means that a large
amount of data produced from human participants are only available on request from
other researchers. This takes me right back to where I started, though with the
caveat that it might be particularly relevant for health and medical research:
an assumption that open research data are for (and used primarily by)
scientific/technical specialists working “in the public interest”, rather than
the public themselves.
But maybe there’s a middle ground for health and medical
data derived from human participants? Maybe there are possibilities for us to
create meaningful and lasting partnerships with ‘the public’ to realise the
public benefits of data? The UK Biobank engages very closely
with their participants, but they are still participants. I wonder if there are
examples out there of projects where participants are also decision-makers
about their data. Or examples of projects that have formed collaborations with civil
society and/or public sector groups to realise the greater benefits of data. It
would be nice to see examples of initiatives like these to use as a springboard
for wider conversation.
If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.
Libraries Week takes place between 7th – 12th October 2019. This year’s campaign is focused on celebrating the role of libraries in the digital world. Over the course of the week we’ll be introducing you to different teams within the Library and explore how they use technology to support our community.
Today’s post features a contribution from our Research Support Team and will be highlighting:
How the Library supports our researchers with making their publications and data findable and accessible online so it can be used by others
How we work to preserve these important digital research assets for the future.
So how does research take place?
This diagram gives a birds-eye view of what researchers are doing at various stages of their work – how ideas are tested, what is recorded, and how results are written up and shared.
Once shared, the research can be used by others – for example, other researchers, policy makers and health professionals – to further medical knowledge and clinical practice.
How is the Library involved in the research process?
The Library is involved in supporting
SGUL researchers throughout their research process, from the early stages when they
apply to medical and other funders to make a case for grant funding for their
research projects, right through to the long-term availability and preservation
of the research that they produce.
Meet the Research Support team
Michelle Harricharan, our Research Data Support Manager, works with our research teams to help them to create, manage, share and preserve high quality digital data that is findable, accessible, interoperable and reusable (FAIR) – and in line with funder and publisher data policies.
Jennifer Smith Research, Research
Publications Librarian and Jenni Hughes, Research Publications Assistant, help
researchers understand how they can make their research papers freely available
online via our publications repository, SORA,
and advise researchers on the fast moving world of open access publishing.
We all are available for face to face meetings with researchers, we provide guidance on our webpages and blogposts, and can be contacted by phone or email (see below).
The Library also procures and manages a range of software systems to help provide our services to researchers.
How do we use technology to support our users?
Making research papers freely available
The government allocates funding to
universities based on the impact and reach of their research out in the wider
world. As part of the next assessment by the government, known as REF, any research papers SGUL wishes to use
as evidence of our research impact will need to be freely available online.
Our researchers can track and
record their publications in our Current Research Information System (CRIS), which
uses Symplectic
Elements software. The CRIS captures and records detailed information about
the research publications, such as how often their research is picked up and
referred to by other researchers, and allows researchers to upload their
publications to be made open access in our repository. Publications information
from the CRIS is also transferred into researchers’ public profiles on the SGUL
website.
The CRIS links to our institutional
database for publications, St George’s
Online Research Archive (SORA) which is hosted and supported by Cosector. This repository uses
open source software, and information about the papers in SORA is picked up by
indexing services such as Google Scholar,
CORE, and Unpaywall, and many of our researchers’ papers are also
freely available in the big medical databases PubMedCentral and Europe PubMed Central.
Both systems show Altmetric
scores, which visualise how many times the research has been referred to in
traditionally non-scholarly places such as news media, social media, public policies
and so on.
Having the research findable and accessible in so many places helps ensure there are as few barriers to reading and re-use as possible. To date we have over 3,700 papers freely available online via SORA – with downloads currently averaging 3,600 per month from all parts of the world.
Research Data Infrastructure
In 2016 the university partnered
with Jisc on the Research
Data Shared Service project. This allowed us to establish the foundations
for a state of the art digital data infrastructure at our Library.
In mid-2017 we launched our
figshare-based research data repository
which is a digital archive for discovering, storing and sharing research data
(and wider research outputs) produced at St George’s. Since its launch we have
shared some 45 outputs from a range of SGUL research and collected hundreds
more that are publicly available via PLOS. To date, our 45 public items have
been viewed more than 20,000 times and downloaded almost 4,000 times, a
testament to the contribution open research can make to enabling public access
to high value digital research.
Together with Records Management and Archives, we are also in the process of implementing a digital preservation system, Preservica, to ensure continued access to our valuable research data assets (as well as our unique institutional records). Digital content are fragile; they can quickly become inaccessible as the hardware and software to open them become obsolete. By continually migrating digital files to their latest formats, Preservica will ensure that our digital content remains accessible and usable for the long term.
Get connected, get creative and learn new skills
The following websites are a useful starting point if you would like to know more:
Understanding Health Research If you are trying to make sense of health research, this website was funded by the MRC to guide you through some steps to help you read scientific papers and think about the value of the evidence or conclusions made.
Open Access Publishing A course for those who wish to understand more about how to publish open access – some of the terminology that is often used and funder expectations are explained.
Michelle Harricharan, Research Data Support Manager Jenni Hughes, Research Publications Assistant Jennifer Smith, Research Publications Librarian
If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.
To coincide with Peer Review Week Sept 16-20, this is an overview on current developments in peer review, with some thoughts on the future, and information on how Library Services can offer support to our researchers.
What is peer review and why is it important?
Peer review is the process by which scholarly work is submitted to the scrutiny of other experts in the same field. It’s thought to date back to the seventeenth century1, but has become increasingly standardised since the mid twentieth century2. It’s now an important part of the scholarly publications process, helping to assess and improve research papers before formal publication. A report published last year by Publons3 (part of Clarivate Analytics) found that peer review was overwhelmingly valued by researchers. There are different models of peer review, such as blind review (where authors and reviewers may not be known to each other) through to more open models of reviewing (see below, fig 2 in the Publons report)3.
Why is there a “peer review crisis”?
Peer
review is far from perfect, however. Research that contains errors or fraud
isn’t always picked up, and reviewers aren’t always objective: unconscious bias
can affect peer review4, and even double blind reviewing isn’t
always completely anonymous, especially in smaller fields where reviewers are
more likely to be able to identify authors based on topic or writing style.
Peer review also often goes unrewarded: reviewers are not usually paid for
their work, and researchers may not cite this work as part of their scholarly
profile when applying for jobs or promotions.
Recent research in PLoS One has also suggested that some reviewers can lazily accept low-quality manuscripts, bringing down the overall quality of research5. That the website Retraction Watch exists highlights that peer review does not always fulfil the functions expected.
How is the open research agenda changing peer review?
Open
peer review refers to a variety of different models that broadly support the
principles of open research. The features of these models might include:
Named, identifiable reviewers.
Reviews that are published alongside the final article.
Participation by the wider community as opposed to just a small number of invited reviewers, whether on pre-review manuscripts or on the final version.
Direct discussion between authors and reviewers.
Reviews taking place on a different platform to publication6.
The
different models have in common a desire to improve the peer review process,
making it more transparent, accountable and accessible7.
Recent research has found that publishing peer review reports doesn’t compromise the review process, though only 8.1% of reviewers were willing to publish their identity alongside the report8.
Peer reviewing data
Data sharing
has exploded in recent years. It is becoming commonplace in the academic
publication process in light of the huge volumes of data being created in
research and the challenges of irreproducible research.
But while data sharing is becoming routine, peer review of data underlying
publications is not always common.
Leading the way in data peer review are data
journals. Data journals specialise in publishing descriptions of high value
scientific datasets or analyses/meta-analyses of existing datasets. Submissions
to data journals are peer-reviewed.
Other journals are quickly catching up. Peer reviewers may be asked to appraise the data underlying any publication, not just data-focused papers. Journals may have their own guidance for assessing datasets but PLOS provides some very practical criteria:
Is the data accessible?
Can you tell what you’re looking at?
Does the data you see match the data referenced in the manuscript?
What might drive developments in the future to improve peer reviewing – for researchers, and for science?
The San Francisco Declaration on Research Assessment of 2012, commonly known as DORA, and to which St George’s University of London is a signatory, sets out a statement of intent and some guiding principles around a move away from a narrow set of metrics such as journal impact factor as a measure of assessment. Acknowledging that researchers may undertake a wide range of scholarly activities, and produce outputs other than journal articles, could lead to better recognition of and reward for peer reviewing.
In
2017, the DOI provider Crossref announced that they would now support registering
peer reviews as well as other types of research outputs9. Other
services such as Publons and ORCiD10,11 also offer ways for
researchers to track and get credit for their reviews, where these reviews are
openly available12.
Given the known problems with peer review, and the growing number of manuscript submissions, it’s no surprise that as noted by Nature13, publishers are starting to employ Artificial Intelligence to try and improve those processes that can be automated – without taking away from decision making by human editors. For example, Frontiers journals have announced the use of AI to help with quality control and reviewer identification14.
While as the Publons report finds, “the scholarly community lacks a robust measure of review quality”, more
openness of the peer reviewing process, and wider use of identifiers to link
reviewers and their reviews, could enable more analysis and agreement of what
constitutes good peer review.
In conclusion, new technologies, publishing models and funder mandates present opportunities for the scientific community to improve the peer review process – a process which at its best allows researchers to engage in a constructive dialogue to improve research and the communication of research findings.
Michelle Harricharan, Research Data Support Manager Jenni Hughes, Research Publications Assistant Jennifer Smith, Research Publications Librarian
If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.
References
1. Tennant JP, Dugan JM, Graziotin D et al. (2017) A multi-disciplinary perspective on emergent and future innovations in peer review [version 3; peer review: 2 approved]. F1000Research, 6:1151 (https://doi.org/10.12688/f1000research.12037.3)
8. Bravo, G; Grimaldo, F; López-Iñesta, E; Mehmani, B; Squazzoni, F (2019), “The effect of publishing peer review reports on referee behavior in five scholarly journals”, Nature Communications 10:322 https://doi.org/10.1038/s41467-018-08250-2
12. Tennant, JP (2018), “The state of the art in peer review”, FEMS Microbiology Letters, Volume 365, Issue 19, fny204, https://doi.org/10.1093/femsle/fny204
The open research movement is about disseminating scientific outputs widely and openly as soon as possible. One of the ways that researchers can rapidly share their work with a wide audience is by posting a preprint to a preprint server. The practice of sharing and commenting on preprints has recently been described as ‘science in real time’1
The preprint is the original version of your work, before peer review and before acceptance by a journal.
Why post preprints
online?
Publishing your research as a preprint means that you can get your work out fast. From 2021, the Wellcome Trust2 will require that any research they fund that is relevant to a public health emergency be published as a preprint, in order to disseminate findings on such important areas as quickly as possible3,4.
Your work will be citeable and shareable as soon as it’s posted, allowing you to demonstrate the work you’re doing to funders, colleagues and potential collaborators.
Immediate feedback from your peers can help you improve your manuscript, as well as opening up potential avenues for follow up work or collaborations.
By publishing your findings as a preprint, you can publically establish priority by date stamping your findings and making your preprint part of the scientific record.
Preprint servers (examples below) allow for disseminating hard-to-publish but important work such as negative/null findings.
In fields where posting preprints to preprint servers is commonplace, these can become a one stop shop for getting a quick overview of the newest developments in the field – a piece in Nature5 highlights how biorXiv can be used to help researchers stay abreast of what their colleagues are working on.
Before you post your preprint, what should
you consider?
If you are
posting as a step prior to publishing in a journal, check whether your prospective
journal has any rules around preprints – do they consider posting preprints as
‘prior publication’?
What’s the best platform for what you want to achieve? If you want feedback on your paper from a specific group before going more public, you could share it on St George’s data repository via a closed group or a private link.
Are there charges for posting?
Where there are charges, these tend to be much less than open access fees in
more established journals, however you will still need to consider how these
are paid.
Where can I post preprints?
bioRxiv.org is a preprint server for the biological sciences. Many journals allow you to submit work that has been previously published as a preprint, and preprints posted to bioRxiv can also be directly transferred for submission to a variety of other peer review services (eg Plos, BMC). An analysis6 earlier this year of biorXiv preprints found that “two-thirds of preprints posted before 2017 were later published in peer-reviewed journals”.
medRxiv is a preprint
server using the same software as bioRxiv, and papers on health sciences topics
can be posted there.
BioMed Central have recently launched a new prepublication
option, In Review, for
articles under consideration in four of their journals: BMC Anesthesiology, BMC Neurology,
BMC Ophthalmology and Trials.
F1000 Research, Wellcome Open Research and the new
AMRC Open Research operate under a
slightly different model: preprints posted to these sites are then openly peer
reviewed, and the article is considered published once it has passed peer
review.
All these sites screen contributions for plagiarism and
appropriateness, and to ensure they meet ethical standards.
Where are preprints
indexed?
bioRxiv and medRxiv preprints are indexed by Google, Google
Scholar, CrossRef and other search tools. They are not indexed by Web of
Science, however they will be indexed in EPMC as follows:
“To distinguish preprints from peer reviewed articles in Europe PMC,
each preprint is given a PPR ID, and is clearly labelled as a preprint, both on
the abstract view and the search results… When preprints have subsequently
been published as peer-reviewed articles and indexed in Europe PMC they are
crosslinked to each other.”
Preprints are not indexed in PubMed until they have achieved
sufficient peer review.
How do I find out
about preprints?
Preprint platforms have options to set up alerts for subject
categories, recent additions and to track papers when they are revised.
Rxivist combines preprints from bioRxiv with data from Twitter to help find the papers being discussed in a particular field, to help researchers deal with the “avalanche” of research7 they may be faced with.
I’m a SGUL
researcher, can I record and deposit my preprints in SGUL’s CRIS (Current Research Information System), St George’s Research Data Repository or publications repository, SORA
(St George’s Online Research Archive)?
Records for preprints can come into your CRIS profile from CrossREF & EPMC. This is useful as it adds to the completeness of your publication list in CRIS.
As and when a paper from biorXiv or medrXiv goes onto to be published in a journal, then we’d expect to see a record for this in CRIS too.
For the purposes of making full text available via SORA, we have historically only made those versions of an article post peer review (either the final accepted MS or publisher version where possible) publically available.
For REF 2021, while preprints will be eligible for submission8, only outputs which have been ‘accepted for publication’ (such as a journal article or conference contribution with an ISSN) are within the scope of the REF 2021 open access policy. SGUL researchers should continue to follow the deposit on acceptance advice and upload the accepted version of their papers to CRIS for SORA.
The future of
preprints
While there has been debate on the pros and cons of
preprints in terms of whether research disseminated in this way will advance
healthcare for patients9, improvements to preprint platforms (such
as medRxiv’s cautionary advice to news media on their homepage) and backing by
funders should mean that as a tool for researchers to quickly share & find
preliminary findings, preprints will be around for the foreseeable future.
As funder mandates and preprint practices develop in the medical and health sciences, we will keep our system capabilities for capturing and promoting researchers’ preprints under active review.
Michelle Harricharan, Research Data Support Manager Jenni Hughes, Research Publications Assistant Jennifer Smith, Research Publications Librarian
Look out for a Library blog post on open peer review during Peer Review Week which is taking place September 16-20 2019.
If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.
Chiarelli, A; Johnson, R; Pinfield, S; Richens, E. Practices, drivers and impediments in the use of preprints: Phase 1 report [Internet]. 2019 [cited 2019 Aug 8]. Available from: http://doi.org/10.5281/zenodo.2654832
3. Peiperl L. Preprints in medical research: Progress and principles. PLoS Med [Internet]. 2018 [cited 2019 Aug 8];15(4):e1002563. Available from: https://doi.org/10.1371/journal.pmed.1002563
4. Johansson MA, Reich NG, Meyers LA, Lipsitch M. Preprints: An underutilized mechanism to accelerate outbreak science. PLoS Med [Internet]. 2018 [cited 2019 Aug 8];15(4):e1002549. Available from: https://doi.org/10.1371/journal.pmed.1002549
6. Abdill, RJ, Blekhman, R. Tracking the popularity and outcomes of all bioRxiv preprints. bioRxiv [Internet]. 2019 [cited 2019 Aug 7];515643. Available from: https://doi.org/10.1101/515643
7. Abdill, RJ; Blekhman R. Rxivist.org: Sorting biology preprints using social media and readership metrics. PLOS Biol [Internet]. 2019 [cited 2019 Aug 8];17(5):e3000269. Available from: https://doi.org/10.1371/journal.pbio.3000269
9. Krumholz HM, Ross JS, Otto CM. Will research preprints improve healthcare for patients? BMJ [Internet]. 2018 [cited 2019 Aug 8];362:k3628. Available from: https://doi.org/10.1136/bmj.k3628
Last November the Wellcome Trust launched the Data Re-use Prize to celebrate innovative reuse of open data either in antimicrobial resistance (AMR) or malaria. Entrants were asked to generate a new insight, tool or health application from two open data resources, the AMR ATLAS dataset or the Malaria ROAD-MAP dataset.
MRC-LID PhD student and member of the winning team for AMR, Quentin Leclerc, dropped by the SGUL RDM Service to talk about the prize and the challenging but rewarding process of reusing open data.
Quentin, congratulations on the win. Can you tell me a little bit about your team’s entry for the Data Re-Use Prize?
Sure. We developed a tool to help inform empiric therapy. Empiric therapy is basically when physicians pool multiples sources of data together to make the best informed guess about how to treat a patient. This is before they know exactly what bacteria a patient is infected with and its potential resistance to antibiotics. Say, for example, a patient has sepsis and needs to be treated right away. A physician might determine the most likely causes as E.coli and S. aureus infection and then make an informed guess about the best antibiotic to prescribe to treat both of these bacteria, bearing in mind regional estimates of each of pathogen’s resistance to different antibiotics. The physician is basically thinking, “given what we know about the common causes of this condition and antibiotic resistance, which antibiotic is likely to work best?”
Our proof of concept web app integrates data from a range of open data sources to visualise antibiotic resistance rates for common infections to help physicians prescribe faster and more accurately. If developed, the tool can potentially be used to inform national guidelines on how to treat common infections in many countries, particularly in low and middle income counties where data aren’t always available to inform empiric therapy at the local or hospital level.
Some visualisations from the team’s AR.IA app
Sounds very exciting. As a first year PhD student, what was it like to win a prize like this?
It was really unexpected. We didn’t expect to win, we just thought, ‘we’ll publish our findings anyway so let’s see how this goes’. The other entries for the prize were very specific while our entry was pretty broad so we weren’t very confident. It was a real surprise and a great effort from everyone on the team.
Team photo (l to r): Gwen Knight, Quentin Leclerc, Nichola Naylor and Alexander Aiken Missing: Francesc Coll
As a PhD student, it was an interesting experience overall. This project is very different from my PhD but working on this tool helped me to get used to the various datasets out there and to look at the big picture of antimicrobial resistance and antibiotic prescribing. It was an enlightening process.
Can you tell me a little bit more about the process of reusing existing data? What was it like?
It was surprising. The thing with data is that it’s collected for a purpose. When someone comes in trying to use that data for a different purpose, they start to see what’s missing. They start to make approximations and assumptions to use the data for something it wasn’t intended for. The ATLAS dataset is very accurate and it’s very rich but it suits its original purpose. For example, we needed to group the data in increasingly complex ways. Once we started doing this, the sample sizes started to look quite small. The dataset wasn’t suited to those kinds of groupings.
When we started comparing the ATLAS dataset to other datasets, the AMR data appeared to show slightly different information. So we started to ask, who collected this data? In what contexts would this data have been collected? Might there be a sampling bias that explains this difference we’re seeing between the datasets? There was a legitimate reason for the difference we were seeing, but that’s why it’s really important to think about why you’re using a dataset and exactly what you want to achieve because the data may not suit your purpose.
Also, we integrated data from a range of sources. When you start doing this, comparing available datasets, you realise the heterogeneity of the data that’s out there; they are all in different formats, they have different naming conventions, even the bacteria aren’t named in the same way and we had to work out exactly which bacteria different datasets were referring to. There aren’t any standards across the different sources to make integrating the datasets easy.
So there were a lot of challenges to reusing data that someone else created?
Yes, we needed to keep in mind that the data was not created to answer our research question. We also found that there was a lack of information in the available literature around the common causative pathogens of several infections to help us understand and use the data correctly.
What advice would you give to researchers wanting to reuse open datasets but are hesitant?
It is important to look at the dataset and really understand it. Ask yourself why it was collected, where it was collected, how it was collected. Don’t take anything for granted. Open datasets are incredible resources but you can’t blindly go in there.
Once you understand the dataset you’ll naturally get the confidence to use it and ask the right questions of it. You won’t be scared or overwhelmed by it. You’ll also save a lot of time once you start working on the data and better understand how to combine it with other datasets.
Quentin and his team’s winning entry, Antibiotic Resistance: Interdisciplinary Action (AR:IA), is openly available here. The team was led by Dr Gwen Knight at the London School of Hygiene and Tropical Medicine and included Nichola Naylor, Francesc Coll and Alexander Aiken.
If you have any questions about finding and reusing open data contact Michelle Harricharan, Research Data Support Manager.
UPDATE 03/05/2019: You can read the official SGUL news release on this prize here.
If you are a researcher at SGUL, we are here to help you share and preserve your data, and publish in a way that meets your funder open access mandates, as many have a commitment to making data and publications as openly available as possible.
SGUL has two repositories to enable researchers to share and preserve both data and publications: read on for more facts and figures about how adding your work to these ties in with our Strategic Plan to maximise the impact of our research.
Research Data
In late 2017 the Research Data Management Service announced our pilot Research Data Repository. In 2018 we published more than 20 outputs to the repository including the official proceedings from SGUL’s Education Day (2017), presentations from Infection and Immunity’s annual INTERTB symposium, and, to mark World AIDS Day this December, the Centre for Global Health released the first of six free training modules to share SGUL expertise on treating one of the biggest causes of HIV-related mortality in Africa. Our work has been viewed, downloaded and shared locally and internationally.
Contact the Research Data Management Service to talk about sharing your data, powerpoint presentations, posters and videos on the repository.
This year also saw the introduction of new Europe-wide data protection legislation. How could we forget that? Our team worked closely with colleagues across St George’s and external organisations to support our researchers in the run-up to 25 May. Our GDPR and Health Research blog post was part of that awareness raising campaign.
In 2018 SGUL’s Information Management (IM) Team was also formed. Made up of our Information Governance Manager, Data Protection Officer, Freedom of Information Officer, Archivist, Records Manager and Research Data Manager, the IM Team looks to streamline information flows across St George’s and raise awareness of information policies and good practice. We run regular seminars on IM.
On the publications front, the number of articles now free to read via SORA (St George’s Online Research Archive) has been steadily increasing, driven by the open access mandate for the 2021 REF (for more on this, see our webpages). We now have nearly 3000 articles publicly accessible via SORA with more being added all the time. Downloads of the articles is also rising; up to 2,300+ downloads per month on average in 2018 (from 1,800+ downloads per month on average in 2017). As with data, the articles have a global reach, being downloaded by readers in all parts of the world.
Records are included in the open access aggregation platform CORE, which contains over 11 million full-text articles. CORE is working with trusted parties such as institutional and subject repositories and journals (other sources of articles such as SciHub1 and Research Gate2 have been subject to action by publishers due to copyright infringement). CORE also allows for text mining of the corpus.
This year we also upgraded our CRIS (Current Research Information System). Among other improvements, if you confirm your ORCiD in your CRIS profile, any publications matched in our data sources with your ORCiD will be automatically claimed for you. For more on ORCiDs and the benefits of having one, see our blogpost from earlier this year.
Contact us at sora@sgul.ac.uk if you would like guidance on keeping your CRIS publication lists & metrics up to date.
Funder initiatives
Funder mandates and publisher policies around open access to research are an area of constant evolution. This year has seen the announcement of Wellcome Trust’s plans to update their open access policy for 2020, to ensure all Wellcome-funded research articles are made freely available at the time of publication, and Plan S, which aims to require all research articles funded by the coalition of research funding organisations behind the plan be published in open access journals, or on open access platforms.
Plan S has certainly caught the attention of publishers – for example it has been welcomed with caveats by the International Association of Scientific, Technical and Medical Publishers3, and Nature recently reported it has support in China4
SGUL researchers have benefited from negotiations by Jisc Collections5 with publishers around subscriptions and open access charges; for instance in being able to publish open access for free under the Springer Open Choice agreement.
Contact us via openaccess@sgul.ac.uk if you have any questions about how to meet your funder open access policies.
Lastly, special thanks to all of our researchers who have answered our calls to be involved with open research.
In particular, to the laboratory researchers who opened up their groups, projects and labs to us earlier this year and told us all about their data and records management practices. We have now produced a report on our findings and will be building on this work in the New Year.
And to all who have been making their papers open access, as we work towards the next REF.
We hope to see or hear from you in 2019
Michelle Harricharan, Research Data Support Manager
Jenni Hughes, Research Publications Assistant
Jennifer Smith, Research Publications Librarian
3. STM. STM statement on Plan S: Accelerating the transition to full and immediate Open Access to scientific publications [Internet]. The Hague: International Association of Scientific, Technical and Medical Publishers; 2018 [cited 2018 Dec 13]. Available from: https://www.stm-assoc.org/2018_09_04_STM_Statement_on_PlanS.pdf
4. Schiermeier Q. China backs bold plan to tear down journal paywalls. Nature [Internet]. 2018 Dec 13 [cited 2018 Dec 14]. Available from: http://dx.doi.org/10.1038/d41586-018-07659-5
5. Earney, L. National licence negotiations advancing the open access transition – a view from the UK. Insights [Internet]. 2018 [cited 2018 Dec 14]; 31 (11). Available from: http://doi.org/10.1629/uksg.412
If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.
As the theme of 2018 International Open Access Week “Designing Equitable Foundations for Open Knowledge” acknowledges, “setting the default to open is an essential step toward making our system for producing and distributing knowledge more inclusive”.
Following on the heels of Wellcome Trust setting up Wellcome Open Research in 2016 – which publishes scholarly articles reporting any basic scientific, translational and clinical research that has been funded (or co-funded) by Wellcome – a group of funders have come together to launch AMRC Open Research:
This is a platform “for rapid author-led publication and open peer review of research funded by AMRC member charities” – which include Parkinson’s UK, Stroke Association, Alzheimer’s Research UK and many more.
“All articles benefit from immediate publication, transparent refereeing and the inclusion of all source data”
If you are an SGUL researcher in receipt of a grant from these funders, take a moment to look at How it Works.
If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.