AI passes the US medical licensing exam

Two artificial intelligence (AI) programs — including ChatGPT — have successfully passed the United States Medical Licensing Examination (USMLE), according to two recent research papers.

The papers highlighted different ways of using large language models to take the USMLE, which consists of three exams: Step 1, Step 2 CK, and Step 3.

ChatGPT is an artificial intelligence (AI) search tool that simulates long typing based on prompts from human users. Developed by OpenAI, it became popular after several social media posts showed potential uses for the tool in clinical practice, often with mixed results.

The first paper was published on medRxiv In December, check out ChatGPT’s performance on the USMLE without any special training or reinforcement before the exams. According to Victor Tseng, MD, of Unable Health in Mountain View, California, and colleagues, the results show “new and surprising evidence” that this AI tool is up to the challenge.

Tseng and his team note that ChatGPT was able to perform with over 50% accuracy in all tests, and even achieved 60% in most of their analyses. While the threshold for passing the USMLE varies between years, the authors said that passing is about 60% in most years.

“ChatGPT was performed at or near the success threshold in all three tests without any specialized training or reinforcement,” they wrote, noting that the tool was able to demonstrate “a high level of concordance and insight in its interpretations.”

They concluded, “These findings suggest that large language models may have the potential to aid in medical education, and possibly clinical decision-making.”

The second paper, published on arXiv, also in December, evaluated the performance of another large language model, Flan-PaLM, on the USMLE. The main difference between the two models was that this one was heavily modified for exam preparation, using a set of question-answering medical databases called MultiMedQA, Vivek Natarajan, an AI researcher, and colleagues explain.

Flan-PaLM achieved 67.6% accuracy in answering USMLE questions, which was about 17 percentage points higher than the previous best performance performed using PubMed GPT.

Large language models, Natarajan and his team concluded, “present an important opportunity to rethink the development of medical AI and make it easier, safer, and more equitable to use.”

ChatGPT, along with other AI software, has appeared as a subject—and sometimes as a co-author—of new research papers focused on testing the technology’s usefulness in medicine.

Of course, healthcare professionals have also expressed concerns about these developments, especially when ChatGPT is listed as an author in the research papers. A recent article from nature Unease of potential colleagues and co-authors with the emerging technology is highlighted.

One objection to the use of AI programs in research was based on whether they were truly capable of making meaningful scientific contributions to a research paper, while another objection asserted that AI tools could not agree to be co-authors in the first place.

The editor of a newspaper that listed ChatGPT as an author said it was an error that would be corrected, according to nature Article. However, researchers have published several research papers promoting these AI programs as useful tools in medical education, research, and even clinical decision-making.

Large language models could become a useful tool in medicine, Natarajan and colleagues concluded in their paper, but their first hope was that their findings would “spark more conversations and collaborations among patients, consumers, AI researchers, clinicians, social scientists, ethicists, and policymakers.” and other interested persons in order to responsibly translate these early research findings to improve healthcare.”

  • author['full_name']

    Michael Debo Wilson is a MedPage Today reporter and investigative team. It covers psychiatry, long-running viruses, and infectious diseases, among other relevant US clinical news. Continued

primary source

medRxiv

Source reference: Kung TH, et al “Performance of ChatGPT on USMLE: Possibility of AI-Assisted Medical Education Using Large Language Models” medRxiv 2022; DOI: 10.1101 / 2022.12.19.22283643.

secondary source

arXiv

Source Reference: Singhal K et al. “Large Language Models Encoding Clinical Knowledge” arXiv 2022; doi: 10.48550/arXiv.2212.13138.

Leave a Comment