Loading…

Loading grant details…

Active RESEARCH GRANT UKRI Gateway to Research

Illuminating the dark metabolome: de novo identification of small molecules from their mass spectra using transformer-based deep learning

£6.67M GBP

Funder Biotechnology and Biological Sciences Research Council
Recipient Organization University of Liverpool
Country United Kingdom
Start Date Nov 01, 2024
End Date Oct 30, 2027
Duration 1,093 days
Number of Grantees 4
Roles Co-Investigator; Principal Investigator
Data Source UKRI Gateway to Research
Grant ID BB/Y009258/1
Grant Description

The metabolic activities of living cells and organisms lead to the production of many thousands of different molecules. To analyse and quantify them, scientists use methods that separate them in special tubes (known as chromatography columns) and then determine their nature by giving them an electric charge and fragmenting them in the gas phase, then measuring the masses (strictly the mass-to-charge ratios) of the fragments.

These 'fragment fingerprints', known as mass spectra, may be compared with those of known molecules stored in databases, and thereby used to identify the molecules. The big problem here is that most of the mass spectra generated bear little or no relation to the comparatively few molecules (relative to all plausible molecules) that ARE in the databases.

What is therefore needed is a method that allows one to propose a structure from the mass spectra 'de novo', i.e. without recourse to databases of experimental mass spectra.

Although the number of experimental mass spectra is small, given a molecular structure it is possible to fragment it inside a computer to produce all (or a sensible subset) of the fragments that it COULD create. The ZINC database contains more than 10 billion molecular structure that obey chemical rules.

Modern methods of 'deep learning' or 'generative artificial intelligence (AI)' allow one to relate paired 'in silico' (computer-generated) mass spectra with the structures that 'caused' them, and in an earlier study we used just such a method, known as a 'transformer', trained with some 21 million computer-generated mass spectra, to learn the mass-spectrum-to-structure mapping. This transformer consisted of a neural network with some 400 million nodes, and could indeed generalize to predict the structures of molecules on which it had been trained.

Although this was for 2020 (when the work was performed) a very large network - three years earlier it would have been the largest ever published by anyone, including the likes of Google, Facebook and Amazon - it was nowhere near the kinds of network size that were even then being published (e.g. Google Switch > 1 trillion nodes - Hutson, M. (2021) The language machines.

Nature. 591, 22-25). Since it is well known (as 'scaling laws') that bigger networks can in effect learn more, the first requirement of this project is to increase the size of both the dataset used to train the network and the network itself, and to see how much this improves generalisation.

A variety of other strategies will also be tried to improve the ability of our new network to generalise to most of the biologically relevant chemical space. These include changing the representation of the structure of the small molecules given to the computer, removing nodes that do little or nothing, changing the architecture of the transformer, and 'fine tuning' the transformer by training it additionally not only with computer-generated mass spectra by composite mass spectra obtained experimentally using a variety of instruments that we already possess.

The result will potentially be a solution to the biggest problem besetting those who study metabolism in any organism - the fact that they cannot even identify the molecules that they can observe, and which can be seen to be intimately involved in the processes of interest.

All Grantees

University of Liverpool

Advertisement
Discover thousands of grant opportunities
Advertisement
Browse Grants on GrantFunds
Interested in applying for this grant?

Complete our application form to express your interest and we'll guide you through the process.

Apply for This Grant