Background
The CoDiet project focusses on evaluating the optimal technologies and tools to monitor diet in different populations with aim to use this to combat non-communicable disease through dietary interventions. The first work package of the EU HORIZON and UKRI funded project involves developing and using natural language processing (NLP) tools to speed up the literature review and is led by Joram Posma (Section of Bioinformatics, Department of Metabolism, Digestion and Reproduction), Marek Rei (Department of Computing) and Tim Beck (School of Medicine, University of Nottingham).
With 10,000s of scientific articles to read the team is using NLP to find the important entities of each article, the links between them and the context of each study. The final aim is to create a dedicated knowledge graph to generate new testable hypotheses to be used by the other work packages of CoDiet, including a clinical trial conducted in 4 countries led by Gary Frost (Section of Nutrition, Department of Metabolism, Digestion and Reproduction). Before this can be done reliably the machine-based annotations need to be checked for accuracy by human experts to find the best methods and to enable postdoc Antoine Lain (Section of Bioinformatics, Department of Metabolism, Digestion and Reproduction) to develop better and faster algorithms. This task will see 1,000 full-text scientific articles be read by a minimum of 2 human experts with arbitration in cases where humans disagree with each other or with the machine annotation. As part of an on-going collaboration between the Posma and Beck research groups, they have been using local installations of the NCBI TeamTat application with small groups of annotators. However, the challenge of the CoDiet annotation task is that it involves over 40 annotators and arbiters that need to be able to work collaboratively on this task from different institutions.
Our Contribution
We deployed a production-ready version of TeamTat that the CoDiet project can use to perform all of these annotations. We chose Azure as the platform to deploy the tool rather than an on-premise virtual machine like in other porjects to facilitate its scaling in the future, should the need arise to accomodate all the required collaborators. This deploynment also enbale researchers to update the tool to include any additional customisations they want
One of these customisations carried by the RSE Team before going to production has been to enable TeamTat to opperate with the newest MySQL version, as the recommended version of MySQL was no longer maintained.
Outcomes
TeamTat was set up to run as an Azure web app and has been running continously since early November. This deployment of TeamTat has already been stress-tested in a workshop with 25 concurrent users and has performed to the satisfaction of the researchers.
Testimonials
Joram Posma, CoDiet work package 1 leader, said:
“For this project we are going into unknown territories to use software only ever tested with single digit numbers of annotators. Our first trial of the TeamTat application the RSE team built involved over 25 people accessing the system and annotating documents at the same time. It all worked without a single hitch on the Azure-cloud platform that was created specifically for this project.”
Tim Beck, work package 1 co-lead, Nottingham PI for CoDiet:
"This will create a unique dataset that will be shared with the wider academic community according to FAIR data principles, fostering further development of new methods that can be implemented within the CoDiet and other future projects.”
Antoine Lain, CoDiet postdoc and leading on named-entity recognition for CoDiet:
“This platform will allow us to assess the accuracy of our pre-trained AI algorithms, evaluate where the errors are coming from and fine-tune the algorithms with much more reliable data than is currently available.”
Georgios Theodoridis, CoDiet PI for Aristotle University of Thessaloniki and one of the expert annotators that had previous used manual, labour intensive systems for text annotation, said during the demo session:
“Very advanced, I love it.”