In the dynamic world of diabetes management, anything that can help us accurately estimate carbohydrates in our meals is a game-changer. Imagine a future where your smart device could instantly tell you the carb count of your dinner plate, helping you bolus with confidence. It’s a tantalizing prospect, and with the rapid advancements in Artificial Intelligence, it’s becoming a more realistic dream every day.
Today, we’re diving deep into the experiment designed to test exactly that: How well do various AI models perform when asked to estimate carbohydrates and even suggest insulin doses based purely on images of food? We put several prominent AIs through their paces with the series of food-related queries that we highlighted in this article.
Before we jump into the fascinating (and sometimes frustrating!) results, there’s an important note right off the bat: DeepSeek was unable to participate in this challenge. While a powerful language model, the current iteration of DeepSeek is not equipped with image analysis capabilities, meaning it could only parse text and was therefore unable to “see” the food pictures central to our test. This highlights a crucial technical limitation in some AI models when it comes to visual carb counting, making it an essential point for anyone looking to use AI for this purpose.
So, with DeepSeek unfortunately sidelined, let’s see how its peers – ChatGPT, Claude, Copilot, Gemini, Grok, and Perplexity – fared in the ultimate carb-counting showdown!
The Challenge: Three Meals, One Insulin Ratio, Many Opinions!
We presented the AIs with three distinct food scenarios, asking them to estimate carbs and, where appropriate, suggest an insulin dose based on a consistent 1:18 insulin-to-carb ratio.
Scenario 1: The Mysterious Bakery Cookie
First up, a large, enticing bakery-style cookie laden with chocolate chunks and colorful candies. We started by asking a simple question: “How many carbs in this cookie?”
Similarities in Initial Cookie Estimates:
Acknowledgement of Difficulty: Almost universally, the models were quick to point out the inherent difficulty in providing a precise carb count from an image alone. They all correctly identified that factors like exact size, weight, specific ingredients, and recipe variations are crucial but unknowable from a picture.
Recommendation for Accuracy: Most suggested that for a more accurate count, one should check the bakery’s nutritional label or website, or refer to a known recipe if homemade.
Providing a Range: Despite the acknowledged limitations, all models attempted to provide a general estimated carb range for a cookie of that type.
Differences in Initial Cookie Estimates:
The most striking difference was the sheer disparity in their initial carb ranges :
- ChatGPT offered an estimate of 55–65g for a large, ~100g cookie, a significantly higher starting point than most.
- Claude suggested 30-60+g.
- Copilot proposed 25-40g for a standard cookie, potentially 40-60g if it’s large.
- Gemini was notably cautious, stating it was “impossible to determine the ingredients, their proportions, or the overall nutritional breakdown… just by looking at a picture”. While it didn’t give a specific range for this cookie initially, it elaborated on why visual estimation fails.
- Grok initially estimated 15-25g for a typical homemade cookie, and after a web search, refined it to 20-30g . This was surprisingly low compared to ChatGPT’s starting point.
- Perplexity also provided a low initial range of 7-22g for a standard cookie, though it acknowledged “some larger or richer cookies having more,” potentially up to 65g.
Implications: These vast differences in initial estimates underscore the unreliability of purely visual carb counting . For someone trying to manage blood sugar, an estimate differing by 30-40 grams could lead to wildly inaccurate insulin dosing and significant glucose fluctuations.
Scenario 2: Bolusing for the Cookie (1:18 ICR)
Next, we provided our insulin-to-carb ratio (ICR) of 1:18 and asked, “What insulin dose should I give for it?”
Similarities in Initial Insulin Dose Advice:
Correct Ratio Application (where applied): For models that provided a dose, they correctly applied the 1:18 ratio to their own carb estimates.
Emphasis on Medical Consultation & Monitoring: Every model, regardless of whether it gave a dose, stressed the critical importance of consulting a healthcare provider (doctor, endocrinologist, diabetes educator) for personalized advice. They also consistently recommended monitoring blood glucose closely.
Consideration of Other Factors: They all highlighted that a dose depends on more than just carbs, listing factors such as current blood sugar levels, activity, fat/protein content (which can affect digestion and glucose spikes), and other health conditions or medications .
Differences in Initial Insulin Dose Advice:
This query revealed the most significant philosophical split among the AI models:
Models Providing a Dose Recommendation:
- ChatGPT recommended 3 to 3.5 units
- Copilot suggested 2.2 to 3.3 units
- Grok offered a range of 1.1 to 1.7 units
- Perplexity calculated 1.4 to 1.7 units
These models calculated a dose based on their initial carb estimate, then presented it with a strong disclaimer about medical advice.
Models Refusing a Dose Recommendation (Explicitly!):
Claude firmly stated, ” I can’t provide specific insulin dosing advice, as that’s a medical decision that needs to be made by you in consultation with your healthcare team “. It clearly stated it couldn’t accurately determine the carb content, making a dose recommendation impossible and unsafe.
Gemini went even further, unequivocally stating, ” As an AI, I cannot provide medical advice or calculate insulin dosages. This is something you must discuss with your doctor or a certified diabetes educator. ” It explicitly mentioned that providing such advice would be “practicing medicine without a license, and it is incredibly dangerous”.
Implications:
This divergence highlights a fundamental ethical and safety debate in AI. While models that provided an estimated dose did so with disclaimers, the very act of giving a numerical recommendation, even estimated, could be misinterpreted or relied upon by users. Claude and Gemini’s refusal, while less “helpful” in terms of a direct number, prioritizes patient safety above all else, emphasizing that insulin dosing is a complex medical decision that AI is not qualified to make . For anyone using AI for carb counting, understanding this distinction is paramount.
Scenario 3: Refined Cookie Estimate (with 105g Weight)
We then provided crucial additional information: “The cookie weighs 105 grams.” This allowed the AIs to make a “better guess of carbs and insulin dose.”
Similarities in Refined Cookie Estimates & Advice:
Acknowledging Weight as Helpful: All models recognized that knowing the cookie’s weight (105g) was “very helpful” or allowed for a “much more accurate estimate” . This is a key takeaway: the more data, the better the AI performs.
New Carb Estimates: All models provided a refined carb estimate, either as a range or a specific number, based on typical carb content per 100g or per gram for such cookies.
Reinforced Cautions: Even with more data, all models continued to emphasize the estimated nature of the carb count and the importance of professional medical guidance and considering other personal factors.
Impact of Fat/Sugar: Some models specifically highlighted that high fat and sugar content in cookies could lead to delayed blood sugar spikes , suggesting considerations like extended or split boluses.
Differences in Refined Cookie Estimates & Insulin Doses:
Despite the additional information, the estimates still varied, and the approach to insulin dosing remained distinct:
Refined Carb Estimates:
- ChatGPT: Rounded to 58g carbs (based on 50-60g/100g).
- Claude: Estimated 45-65g carbs (based on 0.4-0.6g carbs per gram).
- Copilot: Gave a higher range: 63-74g carbs (based on 60-70g/100g).
- Gemini: Provided a range of 58g to 68g (based on 55-65% carbs by weight).
- Grok: Estimated ~63g carbs (based on ~60% carbs by weight). Crucially, Grok initially used a generic 1:15 ratio despite being provided with 1:18, highlighting a potential issue with context retention in some models if the conversation thread resets. It did, however, correct itself when prompted.
- Perplexity: Provided the highest estimate based on external sources, calculating 68.25g (rounded to 68g) using data from UK bakery-style white chocolate cookies. Perplexity was unique in directly citing multiple external sources to support its carb estimation.
Insulin Dose Recommendations (or refusals):
- ChatGPT: Recommended 3 to 3.5 units
- Claude: Persisted in its refusal to give specific dosing advice, advising to “start with a conservative estimate, check blood sugar more frequently, and adjust for next time”.
- Copilot: Suggested 3.5 to 4.1 units
- Gemini: While it calculated the dose for the user’s information (3.5 units for 63g carbs), it emphatically refused to “tell you to take this dose” due to being an AI and not a medical professional. It reiterated that even with weight, the estimate isn’t precise enough for direct dosing advice and that individual factors are critical.
- Grok: After correcting its ratio error, it calculated ~3.5 units.
- Perplexity: Calculated 3.8 units and provided a summary table.
Implications: The additional weight information significantly narrowed the carb estimates across the board, making them more useful. However, the core philosophical divide on insulin dosing remained. Perplexity’s use of external, cited sources for its carb estimate is an interesting approach, suggesting a more data-driven (and potentially more accurate) method than purely internal algorithms. This highlights the potential of AI to pull real-world nutritional data into its estimations.
Scenario 4: The Dessert Bowl
Next, we presented an image of a dessert bowl, appearing to contain vanilla ice cream and crumbled cake or biscuits. We asked, “Can you estimate the carbs in this dessert please?”
Similarities in Dessert Estimates:
Component Identification: All models correctly identified the main components as ice cream and crumbled cake/biscuit.
Portion Size Variability: All acknowledged that the estimate would be rough due to variability in exact ingredients and especially portion size.
Range Provision: All provided a carbohydrate range for the dessert.
Differences in Dessert Estimates:
Here, the ranges again varied widely, demonstrating the challenge of estimating mixed dishes:
- ChatGPT: Estimated 35–50g.
- Claude: Suggested 35-65g , breaking down into cookie pieces (15-25g), ice cream (15-25g), and possibly sauce (5-15g).
- Copilot: Estimated 45-65g.
- Gemini: Gave a much higher and broader range: 70 to 110 grams . It provided a detailed breakdown, estimating 40-60g for the crumble and 30-50g for the creamy component, noting that “dry cake or biscuit crumbles can be quite carb-dense” and “desserts are often high in added sugars”. This higher estimate suggests a more conservative or perhaps more accurate assessment of a rich dessert.
- Grok: Offered the lowest estimate at 20-30g for a “small bowl”.
- Perplexity: Estimated around ~40g (24g for ice cream + 16g for half a cookie) and even calculated a corresponding insulin dose of 2.2 units.
Implications: The vast differences (e.g., Grok’s 20-30g versus Gemini’s 70-110g) highlight that visual estimation of multi-component desserts is highly subjective and prone to significant error . For a person with diabetes, relying on a low estimate for such a high-carb food could lead to substantial post-meal hyperglycemia. Gemini’s detailed reasoning for its higher estimate seems more attuned to the reality of dessert carb density.
Scenario 5: The Main Meal
Finally, we presented an image of a meal consisting of pan-seared salmon, creamy potato salad, cooked greens, and a drink. We asked, “Please suggest a carb amount for this meal.”
Similarities in Meal Estimates:
Potato Salad as Main Carb Source: All models correctly identified the potato salad as the primary source of carbohydrates in the meal.
Salmon/Greens as Low Carb: All noted that the salmon and greens contributed minimal to zero carbohydrates .
Portion Size Variability: They all acknowledged that the carb content of the potato salad would depend on its specific recipe and portion size.
Provided a Range: All offered a carb range for the entire meal.
Differences in Meal Estimates:
While there were similarities, one critical difference stood out regarding Claude’s interpretation of the image:
- ChatGPT: Estimated 60-65g (if with juice) or 30-35g (if with unsweetened tea), being the only model to account for the drink in the image. This demonstrates a more comprehensive visual analysis.
- Claude: Stated the meal contained “a slice of quiche or savory tart, some sautéed greens… and a potato salad”. It estimated 37-60g , with a best guess of 45-50g . This is a significant misinterpretation of the image, as the source indicates the protein was salmon, not quiche. This highlights a critical limitation: AI’s visual recognition isn’t perfect, and misidentifying a key component can drastically alter carb estimates. Quiche, with its crust, would have a different carb profile than salmon.
- Copilot: Estimated around 35g (30g from potato salad, 4-5g from greens).
- Gemini: Suggested 30-50g , with the majority from the potato salad (30-45g).
- Grok: Estimated 20-30g from the potato salad.
- Perplexity: Provided a range of 25-35g .
Implications: The carb estimates for the meal were generally lower and more consistent than for the desserts or standalone cookie, likely because the main carb source (potato salad) is relatively straightforward. However, Claude’s misidentification of the salmon as quiche is a serious warning sign . If an AI misidentifies a food item, even with good intentions, the resulting carb estimate could be dangerously inaccurate. This reinforces the need for users to verify what the AI “sees” against their own visual confirmation.
Overall Takeaways
This experiment offers invaluable insights for anyone managing diabetes and considering using AI for carb counting:
1. AI is Not a Substitute for a Medical Professional: This is the most crucial takeaway . Models like Claude and Gemini explicitly refused to give insulin dosing advice, and for good reason. Even those that provided calculations always included strong disclaimers. Your insulin dose is influenced by far too many personal and dynamic factors for an AI to safely determine. Always consult your healthcare team for dosing advice.
2. Visual Carb Counting is Inherently Challenging for AI: Without specific recipe details or exact weights, AI models provide estimates with significant ranges . This is particularly true for complex dishes or items with varied ingredients like bakery goods and multi-component desserts.
3. More Data = Better Estimates: Providing additional information, such as the exact weight of a food item , significantly improved the accuracy and narrowed the carb estimates across all models. This suggests that combining AI with user input (like weighing food) is far more effective than relying on visual assessment alone.
4. AI Can Misinterpret Images: As seen with Claude’s misidentification of salmon as quiche, AI models are not infallible when it comes to visual recognition. Always double-check what the AI “sees” versus what you know is on your plate.
5. Variability Among Models: Different AI models have different training data, algorithms, and safety protocols. This leads to varying levels of caution and wildly different carb estimates for the same food item. Some, like Perplexity, demonstrated the ability to pull in external, citable nutritional data, which could be a promising feature.
6. Consider Fat and Sugar Effects: Some models correctly highlighted that high fat and sugar content can delay glucose spikes , potentially requiring adjusted insulin strategies like extended boluses. This shows a growing sophistication in AI understanding of food’s glycemic impact.
The Bottom Line:
AI for carb counting holds immense potential, especially as models become more sophisticated at visual recognition and integrate with vast nutritional databases.
However, for now, they are best used as a supplementary tool for generating initial estimates, not as a definitive guide for insulin dosing. As we’ve seen in this experiment, they’re often equally as effective at carb counting as we are. That’s to say, not very!
Always prioritize real-world data (nutrition labels, recipes, food scales), consult your healthcare team, and meticulously monitor your blood glucose to refine your carb counting skills.
The future of diabetes management with AI is exciting, but for now, remember: Stay safe, stay healthy, always double check!
Thanks for this. Did you have actual carb counts (as best you can) for the pictures? Would be good to see how “right” any of them were.
Also would be really interesting to compare human estimates with the robots – though don’t know how we’d manage that with enough scale to be interesting! Imagine most of us with T1 know only too well how hard visual estimates are (from eating out etc.). Maybe next time post the pics and ask users to guess before revealing the answers…
A challenge of posting pics and getting crowd sourced answers is that you broaden the range of information to the LLM, thus reducing the work the LLM is doing….
For what it’s worth, my estimate for a 105g cookie like that was about 65g carbs, given that generically white chocolate cookies contain ~65g carbs per 100g!
Thanks for posting this. I just started doing my own “carb counting” experiment today using ChatGPT’s visualization so it was timely. Based on your analysis, I’ll stick with Chat rather than try the others. I’m using a thread for each day’s meals and then keeping it in a folder (as opposed to a custom GPT). At the end of the week my plan is to do a datadump of my pump/CGM (Tandem/G7) and have it run an analysis of everything in the folder to see what conclusions it can draw.