AI Health Chatbots “Risky”

A large study has warned of the risks associated with chatbots giving medical advice, due to their tendency to provide inaccurate and inconsistent information.

The study, out of Oxford University,¹ found that while large language models (LLMs) now excel at standardised tests of medical knowledge, they present risks to real users seeking help with their own medical symptoms.

the disconnect between benchmark scores and real-world performance should be a wake-up call for AI developers and regulators

In the study, participants used LLMs to identify health conditions and decide on an appropriate course of action, such as seeing a doctor, or going to the hospital, based on information provided in a series of specific medical scenarios.

The decisions made by people using LLMs were no better than people who relied on traditional models, like online searches or their own judgement.

The study revealed a two-way communication breakdown. Participants often didn’t know what information the LLMs needed to offer accurate advice, and the responses they received frequently combined good and poor recommendations, making it difficult to identify the best course of action.

Lead medical practitioner on the study, Dr Rebecca Payne said, “despite all the hype, AI (artificial intelligence) just isn’t ready to take on the role of the physician”.

“Patients need to be aware that asking a large language model about their symptoms can be dangerous, giving wrong diagnoses and failing to recognise when urgent help is needed,” she said.

“These findings highlight the difficulty of building AI systems that can genuinely support people in sensitive, high-stakes areas like health,” Dr Payne said.²

Senior author Associate Professor Adam Mahdi said the “disconnect between benchmark scores and real-world performance should be a wake-up call for AI developers and regulators”.

“We cannot rely on standardised tests alone to determine if these systems are safe for public use. Just as we require clinical trials for new medications, AI systems need rigorous testing with diverse, real users to understand their true capabilities in high-stakes settings like healthcare.”²

The study was published in Nature Medicine.¹

References

Bean AM, Payne RE, Mahdi A, et al. Reliability of LLMs as medical assistants for the general public: a randomized preregistered study. Nat Med. 2026. doi: 10.1038/s41591-025-04074-y.
University of Oxford, New study warns of risks in AI chatbots giving medical advice, News 10 Feb 2026, available at: ox.ac.uk/news/2026-02-10-new-study-warns-risks-ai-chatbots-giving-medical-advice [accessed Feb 2026].

DECLARATION

DISCLAIMER : THIS WEBSITE IS INTENDED FOR USE BY HEALTHCARE PROFESSIONALS ONLY.
By agreeing & continuing, you are declaring that you are a registered Healthcare professional with an appropriate registration. In order to view some areas of this website you will need to register and login.
If you are not a Healthcare professional do not continue.

Recent Posts

ACO Reminder: New Therapeutics Certificate Starts Soon

Glaucoma NZ Symposium to Put Spotlight on Clinical Judgement

MIVISION DIGITAL JOURNAL

AI Health Chatbots “Risky”

New Biosimilar for nAMD

Coming Soon: A Webinar to Explore AI and GA

ACO Reminder: New Therapeutics Certificate Starts Soon

Glaucoma NZ Symposium to Put Spotlight on Clinical Judgement

19th AVPRS Congress – 28–30 August, Gold Coast

Prof Stephanie Watson to Lead Save Sight Institute

About Us

Contact

DECLARATION

Recent Posts

AI Health Chatbots “Risky”

New Biosimilar for nAMD

Coming Soon: A Webinar to Explore AI and GA

Related posts

About Us

Contact

DECLARATION