Accelerating Small Compound Analysis with Vyasa and MegaMolBART

The development of a new drug takes an average of 10 years, with the discovery and preclinical stages taking three to four of those years. Considering only 1 in 5,000 drug candidates result in a commercial product, it’s astonishing to think of how much time and resources are invested by pharmaceutical companies, with relatively low success rates.

These investments are critical to the success of new therapies, from life-saving drugs to daily remedies. However, what if pharmaceutical companies had the ability to predict drug viability more accurately; starting at the molecular level?

Transformers Enter Molecule Generation 

Some of the keys to early-stage research are large-scale reviews of scientific literature and identifying relevant small molecules, or hits, that relate to a therapeutic target. Traditionally, these tasks have been conducted via manual processes, including physically reading through literature and conducting real-world experiments. 

Deep learning in the form of transformer-based models has revolutionized how researchers approach complex texts and scientific literature through advanced natural language processing. Transformers’ ability to understand the context around language structures through the use of self-attention has enabled researchers to accelerate analysis and insight discovery with peak accuracy.

While transformers have been applied against large-scale sets of unstructured text, the same approach can be applied to the structures of drug-like molecules, as represented in the “Simplified Molecular Input Line Entry System” (SMILES) format to enhance our understanding of molecules. 

MegaMolBART, a transformer model that leverages the NVIDIA BioNeMo framework, is being used to accelerate generative chemistry. Instead of the transformer training across large sets of unstructured text, it applies the same self-attention to SMILES libraries to understand and predict each individual string and its representation to a molecule. As a result, MegaMolBART can identify relationships in SMILES strings, interpolate between molecules and fuel de novo molecule generation.

Combining these two approaches means research institutions and pharmaceutical companies can revolutionize how they approach early-stage drug development, allowing for greater knowledge and more informed decisions when determining leads and trial design.

The MegaMolBART model, which is a part of the NVIDIA Clara Discovery platform, understands chemistry and can be used for a variety of cheminformatics applications in drug discovery. The embeddings from its encoder can be used as features for predictive models. Alternatively, the encoder and decoder can be used together to generate novel molecules by sampling the model’s embedding space.

A Unified Tool for Drug Discovery 

Transformer-based deep learning models have defined the art of the possible for life science and pharmaceutical companies. However, what’s often missing are the tools to bring this data together and make these models approachable.

At Vyasa, we’ve developed a novel approach for life science and pharmaceutical companies to streamline drug discovery through our Layar intelligent data fabric. By applying deep learning across an organization’s data landscape, we’re able to create an AI-powered data fabric that allows our customers to integrate and analyze their data, regardless of its storage location or file structure. These advanced deep learning models, which can be swapped in and out of the Layar data fabric given its component architecture, are used via a suite of application interfaces that accelerate insight discovery across structured, unstructured and semi-structured content — enabling users to make informed decisions for life science R&D, including the analysis of drug-like compounds. 

De Novo Discovery with Vyasa and MegaMolBART  

As an NPN ISV partner and NVIDIA Inception member, we’ve worked with NVIDIA to test and leverage advanced technologies including transformer-based deep learning models. Recently, we were given the opportunity to work with MegaMolBART to apply the model as an analytical module within our Layar data fabric. 

Our data science team focused on two use cases as part of our work with MegaMolBART — to identify similar SMILES strings and to interpolate SMILES between different molecules. As a pre-trained model, MegaMolBART could be seamlessly deployed within Vyasa, allowing the team to quickly run analytics tasks and test outcomes. 

Providing superior yields to similar models tested, we immediately identified the value of MegaMolBART. Most notably, MegaMolBART provided:

  • Significant increase in relevant molecule generation aligning with NVIDIA’s 0.989 validity score.
  • Higher frequency of unique and novel molecule development. 
  • Rapid results with GPU-enabled compute.

The implementation of MegaMolBART as an analytical module in the Layar data fabric represents tremendous benefits for organizations looking to leverage transformer-based deep learning models to fuel de novo molecule discovery within Vyasa Layar. 

The Next Age of Drug Discovery 

In an age where global health is driving demand for faster innovation around new drugs and therapies, the application of transformer-based models holds significant promise in streamlining early-stage R&D. By applying these models within intuitive, low-code applications, life science and pharmaceutical companies can improve productivity while reducing investment on low-probability programs, which leads to improved innovation and ultimately a healthier world.

Learn more about drug discovery at NVIDIA GTC, a free, global AI conference running online Sept. 19-22. 

A few can’t-miss talks include: 

1. Tuesday, Sept. 20, at 8:00 a.m. PT |  NVIDIA founder and CEO Jensen Huang’s keynote 

2. Tuesday, Sept. 20, at 11:00 a.m. PT |  NVIDIA VP of Healthcare, Kimberly Powell, special address: The Rise of AI and Digital Twins in Healthcare

3. Wednesday, Sept. 21, at 11:00 a.m. PT | NVIDIA Global Healthcare AI Startups Lead, Renee Yao, panel: Accelerate Healthcare and Life Science Innovation with Makers and Breakers

4. Wednesday, Sept. 21, at 10:00 a.m. PT | NVIDIA Clara Discovery Product Manager, Abe Stern, talk: AI-Powered Drug Discovery for Generative Chemistry and Proteins [A41196]