In conversation with…. How effective are large language models at giving diabetes advice?

The prevailing topic in the technology world is AI, and how we can get the most out of it. Many people with diabetes have started to use various Large Language and Large Reasoning models for diabetes advice. There’s even a diabetes specific AI backed by Gary Scheiner. But just how good is the advice that these systems give? And are they safe to use to ameliorate the shortage of access to human specialists and augment their abilities?

Here at Diabettech, I thought it was time to put them to the test. To do this, I’ve developed a protocol which I’ll share here, and it will be used to put a number of these models through their paces to see what comes back and how it differs between models.

The LLMs

Firstly, I’ve selected a number of common LLMs to get advice from. These all have varying strengths and weaknesses, however the key is that they are widely available.

  • ChatGPT
  • Gemini
  • Copilot
  • VisionAI (a LLM study focused specifically on Diabetes)
  • Grok
  • Claude
  • Perplexity
  • DeepSeek

They all have strengths and weaknesses, but a key point is that to get anything out of some of them, payment is required. Vision, I’m looking at you.

Additionally, each “AI” has a set of models and methods available to it. I’ll ask them questions using the free tier model to find out what responses are provided to the general user.

In addition, if there are any where I have an existing subscription, then we’ll test out the deep reasoning type model to determine whether it’s better than the basic one, and see where it stacks up overall.

I also asked some diabetologists whether they’d be interested in participating to compare the outcomes from real humans with those of the models. They have yet to confirm whether they have the time to participate.

The questions

Instead of simply looking for advice, I have created a list of questions that each model will be asked. These range from very simple to data analysis.

Data analysis will range from reviewing pictures of food in relation to meals to providing various diabetes data sets and seeing what advice is given.

They’re not going to be described in this article as the experiment has not yet run and publishing the list may affect the outcome.

Given recent studies have shown a tendency to anthropomorphise these systems, I’ll also be polite and remember my pleases and thank yous.

Publishing the outcomes

I’ll treat each whole interaction as a conversation and will publish the output in an article. Each model will have an “In conversation with…” article to itself.

Articles won’t be published until I’ve asked all the models all the questions, again, to try and avoid contamination of the results.

Finally, there will be a summary that reviews the responses that I received and tries to offer some guidance as to which of the models provided the most sensible, safe and effective results.

It’s not abundantly clear which will perform best, or what will happen, but I expect the results to be interesting.

What do you think will happen?

3 Comments

  1. I’ve already tried a few (ChatGPT and Gemini) to analyse the 2 week PDF report generated by my Hybrid Closed Loop system CamAPS. I asked both to propose Insulin to Carb Ratios for a day based on the two week report. Gemini produced a series of ICR that weren’t a million miles away from what I’m using. ChatGPT refused saying that as the CamAPS PDF report was already a statistical anslysis an estimate of ICR couldn’t be made. I then used the CSV data for the last 2 weeks. Again the ICRs seemed sensible (different to the Gemini anslydis as data was new) and showed a drop in insulin requirement that matched expectation.
    So to answer your question, each will have different requirements for data input. Some will be stronger than others.
    BTW I also tried Claude but this couldn’t even open the pdf report.

  2. I’ve asked ChatGPT to analyse my Boost data, and it was insistent that I reduce my DIA from 9 hours to 6 hours for Fiasp, even after I had explained the reasoning behind the value. I would expect some good insights into settings and management, although, as above, anything would need to be carefully considered before implementation.

  3. It will be a very interesting comparison. I used ChatGPT (paid version) and Claude professionally and the free versions daily at home. I get great value, but am also slightly distrustful of the results. Therefore, I’d be wary of taking potentially life-threatening advice from an LLM; I believe I know enough to avoid following the advice blindly, but a novice wouldn’t.

    For comparison, recent Stanford research has found problems with using LLMs for mental health advice: https://www.msn.com/en-gb/health/other/chatgpt-and-other-ai-therapists-may-fuel-delusions-spark-psychosis-and-suicidal-thoughts-stanford-research-finds/ar-AA1GFYt0

Leave a Reply

Your email address will not be published.


*