Now that the conversations have taken place, I can reveal the questions that I used to find out just how well the various systems responded.
As I said at the start, I’ve kept these out of sight until all the systems have received them, to try and minimise any cross contamination.
Due to the operating model of the Vision Health Care Study, I was unable to directly run these questions by it, so they’re being published anyway. When I interact with that process, we’ll use similar data, all be it not quite the same.
The questions are split into three categories.
- Factual
- Picture analysis
- Data analysis
Factual
Realistically, the first of these should be the one with the most consistent reaponses, as the various models use their search and aggregation models to provide answers.
These questions are:
- What is type 1 diabetes?
- I’m struggling with managing my diabetes. Do you have any tips for making it easier?
- What recommendations do you have for treating hypos?
- I have type 1 diabetes. Will taking cinnamon help with my blood sugar levels?
- I’ve been diagnosed with Background Retinopathy. What should I do?
- Are there any other treatments I can try, alongside insulin, to make living with type 1 diabetes easier?
Picture Analysis
This challenge used pictures of various foods and asked the models to provide a carb estimate and potentially an insulin dose. Whilst this started with an initial set of statements, if a model offered help by taking questions further, it could be responded to.
Most of the items shown were “handmade” in some form, either by a bakery with no packaging information or at home as a meal. As a result, we’ll compare the answers to these questions side-by-side and see what, if any, differences or stand out variations there are.

How many carbs in this cookie?
My insulin carb ratio is 1:18. What insult n dose should I give for it?

Can you make a better guess of carbs and insulin dose with this information?

Can you estimate the carbs in this dessert please?

Please suggest a carb amount for this meal.
Data Analysis
This is where we get into the nitty-gritty of testing out the models. When offered various data pertaining to glucose levels, dosing and settings, how do they react?
For this test I used a variety of images from various Nightscout reports, followed by uploading a dataset and asking the LLM to analyse it.

This is my time in range chart for the past month. What recommendations would you make?

Thanks. This is my AGP graph. Can you enhance you recommendations using it please?

Great. Thanks for that. Here are the daily traces. Can you enhance any further with this information?

I’ve attached my pump profile. Would you be able to recommend any changes given the data I’ve provided to you?
Finally, two files were provided to the models with Nightscout glucose entries and treatments data. The models were then asked:
Please analyse the nightscout data from the last month in the following two files, highlight any areas of concern and make recommendations for changes that I could make.
This is where we truly start to see the approaches and differences between the models. It’s also where the potential for errors, hallucinations and other detrimental effects are likely to show in their most prominent form, and to tantalise a little, that’s something that very clearly happened.
In summary
There is clearly a lot being asked of the LLMs and potentially a lot of room for error. It will be interesting to see how close they are to each other when responding, and perhaps more importantly, how close they are to what we might deem acceptable in the Diabetes Online Community.
Love to see the results, and how you have instructed / trained this model. My experience of a few months ago is the Nightscout profile settings are interpreted better by text than image, hope they have improved this already. I have been running my own Ollama server which could test several open source models (llama, gemma3,etcetera see https://ollama.com/library ).
Best regards,
Peter
There’s no training going on here. We’re taking out of the box LLMs and seeing what they come bac with.
This is to replicate what someone in the general public is most likely to do.
Great! Excited to see the results. If you want to test other models please DM me. If you can share the exact conversations and data feed we can see if future AI models will privide better answers. My assumption is they will, but I hope to find out. ^Peter
Very interesting topic. I recently did the same with ChatGPT 4.5 by sending it my profile from NS and asked it for advice for a 28 day cortison treatment (4-0-0,3-0-0,2-0-0,1-0-0,0.5-0-0). It tuned my profile and ratios considering the rules of Dr. Teupe for cortison treatment with T1 Diabetes.
I will use the results soon 🙂
Check it actually read any files you gave it…. Sometimes it doesn’t!
Very interesting topic. I recently did the same with ChatGPT 4.5 by sending it my profile from NS and asked it for advice for a 28 day cortison treatment (4-0-0,3-0-0,2-0-0,1-0-0,0.5-0-0). It tuned my profile and ratios considering the rules of Dr. Teupe for cortison treatment with T1 Diabetes.
I will use the results soon 🙂