Jul 11, 2023

[How to use GPT-4/BingChat for Drug Discovery or Other Research Development White Paper (w/ Prompt Engineering)]


Backgrounds

The motivation to write this white paper was inspired by an article in Hankyoreh on July 5 (Generative AI designs new drug in 46 days...enters phase 2 for the first time ever[1]which I read as a non-expert in the field of drug development. To summarize the content of the article, it is as follows.

A drug candidate designed using artificial intelligence (AI) by biotech company Insilico Medicine has entered Phase II clinical trials. The drug, a treatment for idiopathic pulmonary fibrosis, will be administered to 60 patients in China and the United States. Insilico developed the drug using a combination of generative AI and reinforcement learning, which cut drug development costs by a tenth and time by a third. The company currently has more than 30 AI drug development programs underway, three of which have entered clinical trials. This entry into Phase II clinical trials is considered a major milestone in the field of drug development using AI.

Based on this article, I tried to explain how generative AI (ChatGPT, BingChat) can be well utilized in drug discovery with an example of the process of applying prompt engineering techniques to derive concrete results step by step.

 

This course will help you understand why prompt engineering is necessary and how to apply it. It will also show you how we uncovered hidden prompt directives that are very useful in drug discovery.  

Here are the prompt directives (in the form of #hashtags) that we discovered while writing this paper: (*Note: The hashtag directives below should be run in BingChat rather than GPT-4 to achieve the desired results.')

  • #patent_search
  • #trend_analysis
  • #information_query
  • #information_detail
  • #scenario_write
  • #scenario_simulate
  • #content_generate
  • #expert_interview
  • #educational_content
  • #risk_assessment
  • There are many other

These hashtags are not only useful for drug discovery, but also for other research and development. We encourage you to apply these hashtag directives to a variety of research areasFor example, among the hashtags below, "#patent_search" is a command to search for patent information.

A drug discovery researcher can ask a question in BingChat with "#patent_search: rituxim"] or give a command with "Give me the latest patents for rituxim"]. However, there are differences between these two commands.

The '#patent_search: rituxim' command uses a hashtag to search for patent information. It searches for domestic and international patents with the keyword rituxim and provides information such as patent name, application number, filing date, inventor, and summary.

"Show me the latest patents for rituxim" is a typical question to retrieve patent information: it performs a web search with the keyword rituxim, finds patent-related sites or documents in the search results, and provides the information.

Drug discovery is an important field that improves human health and quality of life. However, drug development is an extremely difficult, time-consuming, and costly process. Drug candidates need to be discovered, validated for efficacy and safety, tested in clinical trials, and approved before they can be brought to market. There is a lot of failure and waste along the way, and the probability of success in drug development is very low.

To address these challenges, artificial intelligence (AI) technologies are increasingly being used in drug discovery. Generative AI, in particular, is an AI technology that generates new data by learning from existing data, and can be used to design drug candidates, predict their efficacy and safety, and even simulate clinical trial results. A typical example of generative AI used in drug discovery is the Generative Adversarial Network (GAN).  

Generative Adversarial Networks (GANs) and GPT-4 are both deep learning models, a branch of artificial intelligence, but they differ in their purpose and how they work. Both models can be utilized in drug discovery, but the way and context in which they are used is different.

1. Generative Adversarial Networks (GANs): GANs are generative models in which two neural networks, a generator and a discriminator, learn by competing with each other. The generator tries to create fake data that resembles real data, and the discriminator tries to determine whether the data created by the generator is real or fake. Through this competition, the generator gradually creates fake data that is indistinguishable from real data, which is then used to generate new images, speech, and more.

GANs  are often used in the molecular design phase of drug development. GANs can be used to generate new molecular structures, which can help find new drug candidates. A constructor creates a new molecule that is similar to a real molecule, and a discriminator determines how similar it is to the real molecule. In this way, GANs can be used to explore and generate new molecular structures in drug discovery.

2. GPT-4: GPT-4 is a model used in natural language processing (NLP) that focuses on understanding and generating textual data. GPT-4 can learn large amounts of text data to understand context, generate appropriate text for a given input, provide answers to questions, translate text, and more. GPT-4 is based on the Transformer architecture, which is designed to process all words in an input sentence simultaneously to better understand context. Natural language processing models like GPT-4  can be used in other aspects of drug discovery. For example, these models can be used to analyze and understand large amounts of medical text data. This can help analyze research findings, interpret clinical trial results, or search and summarize medical literature.

So, the main difference between GANs and GPT-4 is that GANs are used to generate different types of data, such as images and speech, while GPT-4 is primarily used to process and generate text data. Also, GANs learn by having two neural networks compete against each other, while GPT-4 learns by training a large amount of text data to understand context and generate text.

GANs and GPT-4 can be utilized in different ways at different stages of drug development. By leveraging their respective strengths, these two models can contribute to improving and accelerating the drug discovery process.

This paper describes the process of applying prompt engineering techniques to analyze how GPT-4 can be utilized in the drug discovery process.

 

In order to utilize generative AI for drug discovery, prompt engineering techniques are required. Prompt engineering is the art of providing an AI model with the right inputs (prompts) to achieve a desired outcome. Prompt engineering can help you increase the performance and efficiency of your AI model, tailor your AI model to your desired purpose, and reduce the limitations and risks of your AI model.

I am a non-expert in the drug discovery field. Based on my experience in general prompt engineering,I have been studying the process of how generative AI (ChatGPT, BingChat) can be utilized in drug development.


Much of the content in this whitepaper was generated by utilizing GPT-4 and BingChat as appropriate.


For more information, download the PDF file here 


-------------------------------------------

Published Book: Mastering ChatGPT-4 Prompt for Writers: (Author:Charly Choi)