Challenging but rewarding – Wellcome Trust Data Re-use Prize winner, Quentin Leclerc, on reusing open data

Last November the Wellcome Trust launched the Data Re-use Prize to celebrate innovative reuse of open data either in antimicrobial resistance (AMR) or malaria. Entrants were asked to generate a new insight, tool or health application from two open data resources, the AMR ATLAS dataset or the Malaria ROAD-MAP dataset.

MRC-LID PhD student and member of the winning team for AMR, Quentin Leclerc, dropped by the SGUL RDM Service to talk about the prize and the challenging but rewarding process of reusing open data.

Quentin, congratulations on the win. Can you tell me a little bit about your team’s entry for the Data Re-Use Prize?

Sure. We developed a tool to help inform empiric therapy. Empiric therapy is basically when physicians pool multiples sources of data together to make the best informed guess about how to treat a patient. This is before they know exactly what bacteria a patient is infected with and its potential resistance to antibiotics. Say, for example, a patient has sepsis and needs to be treated right away. A physician might determine the most likely causes as E.coli and S. aureus infection and then make an informed guess about the best antibiotic to prescribe to treat both of these bacteria, bearing in mind regional estimates of each of pathogen’s resistance to different antibiotics. The physician is basically thinking, “given what we know about the common causes of this condition and antibiotic resistance, which antibiotic is likely to work best?”

Our proof of concept web app integrates data from a range of open data sources to visualise antibiotic resistance rates for common infections to help physicians prescribe faster and more accurately. If developed, the tool can potentially be used to inform national guidelines on how to treat common infections in many countries, particularly in low and middle income counties where data aren’t always available to inform empiric therapy at the local or hospital level.

app screenshot
Some visualisations from the team’s AR.IA app

Sounds very exciting. As a first year PhD student, what was it like to win a prize like this?

It was really unexpected. We didn’t expect to win, we just thought, ‘we’ll publish our findings anyway so let’s see how this goes’. The other entries for the prize were very specific while our entry was pretty broad so we weren’t very confident. It was a real surprise and a great effort from everyone on the team.

Team photo
Team photo (l to r): Gwen Knight, Quentin Leclerc, Nichola Naylor and Alexander Aiken
Missing: Francesc Coll

As a PhD student, it was an interesting experience overall. This project is very different from my PhD but working on this tool helped me to get used to the various datasets out there and to look at the big picture of antimicrobial resistance and antibiotic prescribing. It was an enlightening process.

Can you tell me a little bit more about the process of reusing existing data? What was it like?

It was surprising. The thing with data is that it’s collected for a purpose. When someone comes in trying to use that data for a different purpose, they start to see what’s missing. They start to make approximations and assumptions to use the data for something it wasn’t intended for. The ATLAS dataset is very accurate and it’s very rich but it suits its original purpose. For example, we needed to group the data in increasingly complex ways. Once we started doing this, the sample sizes started to look quite small. The dataset wasn’t suited to those kinds of groupings.

When we started comparing the ATLAS dataset to other datasets, the AMR data appeared to show slightly different information. So we started to ask, who collected this data? In what contexts would this data have been collected? Might there be a sampling bias that explains this difference we’re seeing between the datasets? There was a legitimate reason for the difference we were seeing, but that’s why it’s really important to think about why you’re using a dataset and exactly what you want to achieve because the data may not suit your purpose.

Also, we integrated data from a range of sources. When you start doing this, comparing available datasets, you realise the heterogeneity of the data that’s out there; they are all in different formats, they have different naming conventions, even the bacteria aren’t named in the same way and we had to work out exactly which bacteria different datasets were referring to. There aren’t any standards across the different sources to make integrating the datasets easy.

So there were a lot of challenges to reusing data that someone else created?

Yes, we needed to keep in mind that the data was not created to answer our research question. We also found that there was a lack of information in the available literature around the common causative pathogens of several infections to help us understand and use the data correctly.

What advice would you give to researchers wanting to reuse open datasets but are hesitant?

It is important to look at the dataset and really understand it. Ask yourself why it was collected, where it was collected, how it was collected. Don’t take anything for granted. Open datasets are incredible resources but you can’t blindly go in there.

Once you understand the dataset you’ll naturally get the confidence to use it and ask the right questions of it. You won’t be scared or overwhelmed by it. You’ll also save a lot of time once you start working on the data and better understand how to combine it with other datasets.

Quentin and his team’s winning entry, Antibiotic Resistance: Interdisciplinary Action (AR:IA), is openly available here. The team was led by Dr Gwen Knight at the London School of Hygiene and Tropical Medicine and included Nichola Naylor, Francesc Coll and Alexander Aiken.    

If you have any questions about finding and reusing open data contact Michelle Harricharan, Research Data Support Manager.

UPDATE 03/05/2019: You can read the official SGUL news release on this prize here.

Advertisements

A year’s worth of Open Research and SGUL

A year's worth of Open Research and SGULIf you are a researcher at SGUL, we are here to help you share and preserve your data, and publish in a way that meets your funder open access mandates, as many have a commitment to making data and publications as openly available as possible.

SGUL has two repositories to enable researchers to share and preserve both data and publications: read on for more facts and figures about how adding your work to these ties in with our Strategic Plan to maximise the impact of our research.

Research Data

In late 2017 the Research Data Management Service announced our pilot Research Data Repository. In 2018 we published more than 20 outputs to the repository including the official proceedings from SGUL’s Education Day (2017), presentations from Infection and Immunity’s annual INTERTB symposium, and, to mark World AIDS Day this December, the Centre for Global Health released the first of six free training modules to share SGUL expertise on treating one of the biggest causes of HIV-related mortality in Africa. Our work has been viewed, downloaded and shared locally and internationally.

Contact the Research Data Management Service to talk about sharing your data, powerpoint presentations, posters and videos on the repository.

This year also saw the introduction of new Europe-wide data protection legislation. How could we forget that? Our team worked closely with colleagues across St George’s and external organisations to support our researchers in the run-up to 25 May. Our GDPR and Health Research blog post was part of that awareness raising campaign.

In 2018 SGUL’s Information Management (IM) Team was also formed. Made up of our Information Governance Manager, Data Protection Officer, Freedom of Information Officer, Archivist, Records Manager and Research Data Manager, the IM Team looks to streamline information flows across St George’s and raise awareness of information policies and good practice. We run regular seminars on IM.

Contact our Records Manager for more information.

 

384px-Open_Access_logo_PLoS_transparent.svgOpen Access publications

On the publications front, the number of articles now free to read via SORA (St George’s Online Research Archive) has been steadily increasing, driven by the open access mandate for the 2021 REF (for more on this, see our webpages).   We now have nearly 3000 articles publicly accessible via SORA with more being added all the time. Downloads of the articles is also rising; up to 2,300+ downloads per month on average in 2018 (from 1,800+ downloads per month on average in 2017). As with data, the articles have a global reach, being downloaded by readers in all parts of the world.

Records are included in the open access aggregation platform CORE, which contains over 11 million full-text articles.  CORE is working with trusted parties such as institutional and subject repositories and journals (other sources of articles such as SciHub1 and Research Gate2 have been subject to action by publishers due to copyright infringement). CORE also allows for text mining of the corpus.

This year we also upgraded our CRIS (Current Research Information System). Among other improvements, if you confirm your ORCiD in your CRIS profile, any publications matched in our data sources with your ORCiD will be automatically claimed for you. For more on ORCiDs and the benefits of having one, see our blogpost from earlier this year.

Contact us at sora@sgul.ac.uk if you would like guidance on keeping your CRIS publication lists & metrics up to date.

 

Funder initiatives

Funder mandates and publisher policies around open access to research are an area of constant evolution. This year has seen the announcement of Wellcome Trust’s plans to update their open access policy for 2020, to ensure all Wellcome-funded research articles are made freely available at the time of publication, and Plan S, which aims to require all research articles funded by the coalition of research funding organisations behind the plan be published in open access journals, or on open access platforms.

Plan S has certainly caught the attention of publishers – for example it has been welcomed with caveats by the International Association of Scientific, Technical and Medical Publishers3, and Nature recently reported it has support in China4

SGUL researchers have benefited from negotiations by Jisc Collections5 with publishers around subscriptions and open access charges; for instance in being able to publish open access for free under the Springer Open Choice agreement.

Contact us via openaccess@sgul.ac.uk if you have any questions about how to meet your funder open access policies.

 

Lastly, special thanks to all of our researchers who have answered our calls to be involved with open research.

In particular, to the laboratory researchers who opened up their groups, projects and labs to us earlier this year and told us all about their data and records management practices. We have now produced a report on our findings and will be building on this work in the New Year.

And to all who have been making their papers open access, as we work towards the next REF.

We hope to see or hear from you in 2019

Michelle Harricharan, Research Data Support Manager
Jenni Hughes, Research Publications Assistant
Jennifer Smith, Research Publications Librarian

 

Contacts

CRIS & Deposit on acceptance: sora@sgul.ac.uk

Open Access Publications: openaccess@sgul.ac.uk

Research Data Management: researchdata@sgul.ac.uk

 

References

1. Page, B. Publishers succeed in getting Sci-Hub access blocked in Russia. The Bookseller [Internet]. 2018 Dec 11 [cited 2018 Dec 13]. Available from: https://www.thebookseller.com/news/sci-hub-blocked-russia-following-court-action-publishers-911571

2. McKenzie, L. Publishers escalate legal battle against ResearchGate. Inside Higher Ed [Internet]. 2018 Oct 4 [cited 2018 Dec 13]. Available from: https://www.insidehighered.com/news/2018/10/04/publishers-accuse-researchgate-mass-copyright-infringement

3. STM. STM statement on Plan S: Accelerating the transition to full and immediate Open Access to scientific publications [Internet]. The Hague: International Association of Scientific, Technical and Medical Publishers; 2018 [cited 2018 Dec 13]. Available from: https://www.stm-assoc.org/2018_09_04_STM_Statement_on_PlanS.pdf

4. Schiermeier Q. China backs bold plan to tear down journal paywalls. Nature [Internet]. 2018 Dec 13 [cited 2018 Dec 14]. Available from: http://dx.doi.org/10.1038/d41586-018-07659-5

5. Earney, L. National licence negotiations advancing the open access transition – a view from the UK. Insights [Internet]. 2018 [cited 2018 Dec 14]; 31 (11). Available from: http://doi.org/10.1629/uksg.412

 


If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.

Open Access Week 2018: Medical charities collaborate further to ensure results are shared.

OA Week 2018 Banner Website

As the theme of 2018 International Open Access Week  “Designing Equitable Foundations for Open Knowledge” acknowledges, “setting the default to open is an essential step toward making our system for producing and distributing knowledge more inclusive”.

Following on the heels of Wellcome Trust setting up Wellcome Open Research in 2016 – which publishes scholarly articles reporting any basic scientific, translational and clinical research that has been funded (or co-funded) by Wellcome – a group of funders have come together to launch AMRC Open Research:

AMRC screenshot

This is a platform “for rapid author-led publication and open peer review of research funded by AMRC member charities” – which include Parkinson’s UK, Stroke Association, Alzheimer’s Research UK and many more.

All articles benefit from immediate publication, transparent refereeing and the inclusion of all source data

If you are an SGUL researcher in receipt of a grant from these funders, take a moment to look at How it Works.

The AMRC platform levies relatively minimal charges  for publication by researchers funded by the participating charities – much lower than the cost of publishing in traditional journals (see Wellcome is going to review its open access policy blog post, March 2018).

Any questions about making your publications open access, please visit our Open Access FAQs or contact us on openaccess@sgul.ac.uk

For any questions about sharing or preserving data, please visit our Research Data Management pages or contact us on researchdata@sgul.ac.uk

Jennifer Smith

Research Publications Librarian


If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.

The GDPR and health research

St George’s researchers will already be aware of the EU General Data Protection Regulation (GDPR) and the new UK Data Protection Bill, which will govern how we handle personal data after 25 May 2018. While we have learnt a lot about our obligations under the new regulations, researchers may not be clear about what these obligations mean for research. The SGUL Joint Research and Enterprise Services (JRES), Governance and Legal Assurance Services and the Research Data Management Service have come together to clear up a number of misconceptions about what the new regulations may mean for health and social care research. Read on!

It is not clear how the GDPR relates to health and social care research

GDPR has a broad scope beyond clinical research but does relate to all personal data which includes web search engines, social media, and much more.  Specifically, data required in research (and the way it is managed) would be within its remit. Identifiers such as name, addresses, date of birth, and electronic medical numbers all constitute personal information. However, the GDPR expands the personal data definition to include information such as location information, genetic data and IP addresses. In sum, any data that could potentially be used to directly or indirectly identify a person is considered personal data. In addition, pseudonymised data will now be considered personal data and therefore governed by the GDPR.

We will have to change all of our research processes to meet the requirements of the GDPR

As many, including the Medical Research Council, have already acknowledged, the GDPR reiterates many of the key principles of good research practice when handling personal data. Research, particularly health research, is governed by very strict guidelines and many of the mechanisms currently in place for assuring good practice can provide the safeguards needed to comply with the GDPR, for example, our ethics procedures and data management plans already address many of the requirements for privacy impact assessments and privacy by design. What we need to ensure is that all of our research is included in these processes, not just our funded research.

The GDPR will stifle research innovation

The GDPR ensures that innovation in health research can continue, but with the appropriate safeguards for data subjects. The new Data Protection Bill (which will replace the current Data Protection Act 1998) is currently going through parliament. This will direct the way the GDPR is implemented within the UK and any specific exemptions or “derogations”. It is widely accepted, but yet to be confirmed, that clinical research will have a number of related derogations to ensure that we are able to carry on normally with the business of improving and transforming health.

The research community will not be able to re-use/re-purpose data for future research

We are aware that it is not always possible to know all the ways research data could be processed when we are collecting it. The legislation also recognises this. Article 6(4) allows for further processing of personal data beyond the purposes for which it was collected, as long as those operations are considered ‘compatible’ with the original purpose under which consent was given, for example, medical research.

Further, secondary processing of data not collected for research, can subsequently be used for research, as long as appropriate safeguards are met and the processing is in the public interest. This means we can continue to access health data to better understand and treat health conditions.

I am going to have to re-consent participants every few years if I want to continue to hold their personal data

Consent is not the lawful basis on which our researchers hold and process personal data. As a public authority, we will usually process personal data for health and social care research as a ‘task in the public interest’, as such your participants may not need to be re-consented under the GDPR. However, under GDPR you will need to ensure you have been lawful, fair and transparent about the personal data you have collected and how it is managed. It is important to understand what information has been provided to your participants already and does this meet the GDPR requirements for transparency and accountability. This may require updates to your participant information sheet, or the addition of an information leaflet. The Health Research Authority (HRA) is working on consistent templates and wording to support researchers and sponsors have confirmed, if required, this would be a non-substantial amendment, that is, one not requiring formal ethics approval.

Even though consent is not the legal basis for processing personal data for research, the common law duty of confidentiality is not changing, so consent is still needed for people outside the care team to access and use confidential patient information for research. Therefore, consent continues to be required to meet the high ethical and research governance expectations we place on our researchers.

How can I be fair and transparent?

Being fair and transparent with research participants means respecting their rights and wishes, and ensuring their personal data is used in line with their expectations.  The GDPR requires that the information provided should be concise and easy to understand. If you want to retain information you should state the reason and allow the participant to make that judgement.

Organisations should also display corporate level privacy information about their research in locations where it will be noticed, for example links on website homepages and in waiting rooms. Linking this to your information sheets is a good way of ensuring participants are aware of our institutional role in research.

The JRES is working on updating template documents such as protocol templates and information sheets, to ensure appropriate guidance is provided and considered during the development of our research.

My funder expects me to make my data openly available at the end of my project, the GDPR will prevent me from doing this

The GDPR does not preclude data sharing, it only requires that data is shared responsibly and robustly. This has always been the case with data sharing. The GDPR only covers data that personally identifies a living person. Research that does not involve personal data is not covered under the GDPR and can be shared. The legislation also does not cover data that has been appropriately anonymised according to the ICO’s Anonymisation Code. This is what the ICO calls de-identified data for publication. There are also options to share de-identified data for limited disclosure or access. The ICO Anonymisation Code covers different forms of data publication and the Research Data Management Service is available to discuss your options.

A participant has requested to withdraw from the study but my data has already been anonymised and analysed; I have to start all over

In exceptional circumstances research participants are exempted from erasure if it is “likely to render impossible or seriously impair the achievement of the objectives of that processing” (Article 17(3)(d)). So you can continue to use this data in some circumstances. For data that has already been thoroughly anonymised, the GDPR does not apply.

The responsibility for GDPR compliance falls solely on project teams

The responsibility for compliance is corporate, that is, the organisation is accountable to the ICO, so it is important that researchers do not make decisions about legal compliance alone.

For St George’s University initiated research, we will usually be the data controller. This means we are responsible for outlining what data needs to be collected, why and how it is to be used/managed. For studies we collaborate in (where we are not the lead) we may be the data processor. In this instance, we are being directed on the data requirements and management.

If you are in doubt you should check as this is particularly important if a research participant asks you about their personal data rights.

 

We hope this post has helped you to get better acquainted with how the new legislation will affect our research activities. With regards to health and social care research, the GDPR maintains existing best practice and we should use this opportunity to evaluate our systems and procedures to ensure that we are indeed engaging in good practice.

Queries about the GDPR not covered here can be emailed to dataprotection@sgul.ac.uk.


If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.

Get a Unique Researcher ID for Free and Help Identify Your Research Outputs

What is ORCID?

ORCID
Image from: https://members.orcid.org/sites/default/files/28-banners.png

ORCID stands for Open Researcher and Contributor ID

  • Creating an ID is free
  • The ORCID registry is maintained by a not for profit organization, funded through organizational membership and subscription fees

Why should I get one?

You can create a unique, persistent identifier which you can use to better identify yourself with your research outputs, such as publications and data sets.

  • It links you together with all your publications, whatever version of your name they are published under. That means if you change your name, or a different variation of it is used (eg middle name or initial), your publications will still be linked to your identity and will be collected in your ORCID record. And, what’s more, you can continue to use the same ID when you change organisations.

 

  • It’s also useful for clarifying which publications aren’t yours but have been published by someone with the same name – especially helpful if there’s someone with a similar name in the same field or the same organization as you.

 

  • It can link to many different types of research outputs, including datasets and software, as well as journal articles, meaning that you can easily get credit for all your published work.

 

  • ORCID integrates with a variety of other systems, such as funder applications and publisher manuscript systems, saving you from having to put the same information over again (see the section Who can see the information? below to find out how this works). Some actually require ORCID IDs, such as the Wellcome Trust’s grant applications system (and here’s some more on why they made that choice).

 

ref
Image from: http://www.ref.ac.uk/

ORCID and REF

The recent REF 2021: Decisions on staff and outputs says “The funding bodies consider that the benefits offered by persistent staff identifiers are significant, in terms of increased efficiency, transparency and interoperability in the research data landscape.” While not mandated for REF 2021, ORCIDs look likely to be required for future funding assessments, and HEFCE “strongly encourage” an ORCID ID to be provided for Category A submitted staff in REF 2021.

ORCID and CRIS

There will be some exciting developments with SGUL’s CRIS later this year when the CRIS is upgraded. If you have an ORCID ID, CRIS will retrieve records from data sources that have the ORCID ID in their metadata (such as Europe PubMed Central, PubMed, Web of Science). Once you have confirmed that the ORCID ID is yours, CRIS will retrieve any future records from those data sources with that ORCID ID in their metadata, and automatically add the records into your publications list.

How do I get an ORCID?

If you haven’t already got one, go to the ORCID website and click “Register now”. You can add your professional information and any other identifiers you might have to your account.

Who can see the information?

  • You control the content in your ORCID, who can see it
  • There are three visibility settings : everyone, trusted parties, or only me. Visibility to items can be set individually. For more information see Visibility settings
  • If you are happy to have the information visible to anyone, you can set visibility to ‘everyone’.
  • This means the profile will be visible via the orcid.org website, and importantly can be searched for via the API, which means the data can be reused.
  • If you want to be able to let the data update across systems that are registered /integrated to use ORCID data, then set it to ’trusted parties’
  • You can register your ORCID record with Research Fish, and this will enable you to add publications in your Research Fish portfolio to your ORCID record (so if it is in Research Fish, it will be included then in ORCID). Also you can use the publications search in Research Fish to fetch publications from ORCID and add them to your Research Fish portfolio.

Useful links:

Building your ORCID record and connecting your iD

ResearcherID & ORCID Integration – how to associate ORCID with ResearcherID

EPMC: How do I link my articles to my ORCID?

 

Jennifer Hughes, Research Publications Assistant

Jennifer Smith, Research Publications Librarian

Contact: openaccess@sgul.ac.uk


If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.

St George’s announces new research data repository

The Research Data Management Service has launched a research data repository for use by St George’s researchers, including our doctoral student researchers.

Figshare homepage screengrab for blog

Powered by figshare, the repository is the first phase of a pilot project to develop a shared research data management infrastructure for UK higher education. The pilot is headed by Jisc, and St George’s is proud to be one of just 13 higher education organisations included in the project. More information about this can be found on the project website.

The SGUL data repository is a digital archive for sharing, storing and preserving research content produced at St George’s. It was acquired to enable our researchers to better engage in Open Science and to respond to funder and publisher requirements for data sharing and preservation.

Researchers can use the repository to share research data, source code, posters, PowerPoint presentations, images, videos, electronic lab notebooks and a range of other digital research outputs. The repository can also be used to catalogue and link to items that are already in the public domain, but are difficult to discover, cite and measure for impact. Each deposit in the repository is provided with a persistent identifier, which allows items to be uniquely identified, cited and measured for impact.

All items deposited with us will be preserved for the lifetime of the repository.

Depositing to the repository is easy. All research staff and doctoral students are automatically registered for the service. Just log in to the repository using your institutional credentials and deposit your items following figshare’s normal deposit procedures. All deposits will be checked by a member of the research data management team before your research is published, giving you added peace of mind.

It is advisable to contact the Research Data Management Service if you intend to deposit your data in the repository to avoid any delay in publishing your research.


If you are interested receiving updates from the Library on all things open access, open data and scholarly research communications, you can subscribe to the Library Blog using the Follow button or click here for further posts from us.