The nine sensor samba: Results revealed

Following 15 days and 150 fingerpricks, they’re here. The results of the “9 sensor samba“. And what a set of a results…

Well maybe that’s overplaying it a little.

Let’s just say that the outcome of this n=1 experiment wasn’t quite what I expected. One of the established players came out much worse than expected, while a newcomer did a lot better.

Let’s dig in, and take a look at the variation.

Statistics

This section is broken down into multiple sections looking at the various metrics that we often see in marketing.

MARD versus Fingerpricks (MARDf)

As I outlined in the first article, a Contour Next meter was used to test capillary glucose, and if the results were questionable or may have been an outlier, a secondary test was done to confirm the reading. This happened about fifteen times over the two weeks. None of the initial fingerpricks proved to be erroneous.

The CGM readings used were the streamed reading at the time of the fingerprick, as as someone making a decision on your next action, this is what you’d be using (along with a trend arrow).

It’s worth noting that on the Dexcom One/G6, a restart of the sensor was undertaken on day 10, after it expired.

The headline MARDf numbers are shown in the table below:

MARDf values for all sensors over the 15 day test period

It’s fairly obvious that on this occasion, the G6/One sensor was not performing all that well, indeed, even if we take the day six and post restart values out, we still have a MARDf of 16%. Previous experience with this sensor (documented here, here and here) is that it generally doesn’t perform all that well on me without calibration. Two of the previous three studies mentioned show MARDf in the low teens, so the 16% without clibraiton over the lifetime shouldn’t come as a huge surprise. The outcomes from the Libre2+ and Libre3+ are slightly worse than my previous experience of the previous versions of both those sensors (documented here and here).

The biggest surprise from the test was the performance of two of the newcomers. The Syai Tag produced a set of data that was on a par with the market leaders, while the Caresens Air also produced a set of figures that were in line with the Dexcom G7/One+ and Libre2+.

Given that all of these results are without a single calibration taking place, they highlight the benefits of having a sensor that can be calibrated. All three of the other “newcomers” would have benefited from calibration, alongside the G6/One.

As we’d expect, the variation over time is visible in the dataset, although in some of the sensors, there’s less delineation between the “better” mid-life days and the normally less effective end zones.

The variance from fingerpricks that we show here also highlight why “soft calibration” isn’t a bad idea. Early in a sensor’s life, seeing how close it is to blood is perhaps the safest way to proceed.

Bias versus fingerpricks

Whilst MARD or MARDf gives us an indication of the scale of the variation from blood of the sensors in question, it doesn’t tell us anything about directionality.

This is where our Bias measurements come in.

Bias values for all sensors over the 15 day test period

In this dataset, we can see the general trend of each sensor’s measurements.

Overall, the pattern across sensors here isn’t too far from historic norms. As I’ve noted in previous versions of this test, Libres tend to be low biased against the Contour Next, while Dexcoms have tended to be high biased.

The only real standout data point in this set is the Glucomen iCan which was horribly and consistently positively biased.

I’ve had some experience with the Sinocare iCan and Microtech Linx previously. The last time I tried the Linx, it read constantly low, which doesn’t seem to be the case on this occasion, highlighting some sensor to sensor variation. The iCan, on the other hand, didn’t last 14 days, and during testing was similarly out of kilter as we saw in this test.

Again, this dataset highlights the need to check against bloods, and calibrate if necessary. Given the range of biases shown, it also highlights that when used with an automated insulin delivery system, calibration of some sensors may be critical.

Performance when low

In the iCGM standard, there’s a requirement to look at sensor performance below 4.4mmol/l (80mg/dl) as the reduced glucose levels at these values make generating. Valid signal harder.

I’ll talk about the 15/15 rule later on, but first, here we have the MARDf and Bias for events where glucose levels were measured below the benchmark.

MARDf for blood values below 80mg/dl
Bias for blood values below 80mg/dl

The first thing that both these tables show is with the exception of the Dexcom G7/One+ and the two Libre sensors, at blood glucose levels below 4.4mmol/l, the sensors struggled to be as close to the fingerpricks as they were when above this level.

The second point to note is that in general, all except the Libres tended to overstate the glucose values when glucose levels were tending towards hypoglycemia.

If we look at the 15/15 analysis below 80mg/dl (4.4mmol/l), it tells a slightly different story. The number of results within 15mg/dl when blood was below 80mg/dl was greater than 80% for the Caresens Air and both Libres. Unsurprisingly, the iCan scored very poorly on this, while both Dexcom sensors in this test performed much more middle of the pack than I’d have anticipated.

Number of comparison results per sensor with 15mg/dl of blood when below 80mg/dl

If we look at the number of non-hypoglycaemic CGM results when bloods were below 70mg/dl (3.9mmol/l), we see a much more useful story.

Number of non-hypoglycaemic values per sensor when blood was below 70mg/dl

In this case, the iCan stands out for failing to pick up over 90% of readings, while the Libres both have the highest volume of correctly detected hypos. As we mentioned, the G6/One sensor in this test seemed to be anomalous compared to others, and this specific analysis is no different.

The biggest surprise is that the Caresens Air produced similar results to the Dexcom G7/One+.

Given the method used for comparing fingerpricks with CGM in this test, where the fingerprick was compared to the immediately available CGM reading, some of these mismatches can be put down to sensor lag. But I don’t think that’s what we can attribute the dramatic failure to for the iCan.

Time in Range comparisons

Whilst MARDf data is great, a more indicative illustration of performance is the Time In Range charts that each sensor produces. This is where we can really see the real-world differences between sensors. I’m starting with a properly calibrated G6/One as the reference to try and illustrate what my real world TIR was:

TIR From Nightscout for Calibrated Dexcom G6/One

Subsequently, we have the TIR charts or data from each of the sensors in use:

Libre2+
Libre3+
Dexcom G6/One
Dexcom G7/One+
Yuwell Anytime
Syai Tag
Microtech Linx
Glcuomen/Sinocare iCan
Caresens Air

That’s a lot of TIR data, but it highlights how the bias demonstrated in the testing translates into the real world. If you were a Microtech Linx user, for example, your healthcare team might be questioning why you were having so many lows, which, looking at the reference dataset, wouldn’t necessarily be a fair question.

Likewise, both Libres showed significant low periods, but as I know from the bias testing, and from historic use of the Libre series, it tends to read low for me. Whilst that’s potentially safer from an AID use perspective, it’s not necessarily so true for those awkward healthcare conversations.

Similarly, the iCan giving falsely high data suggests there maybe safety concerns on using it with hypo-unaware users. There appears to be a high risk of missed hypos.

The other thing that the sensor statistics highlight is the “coverage” issue I had with both the Yuwell and the Libre2+. With the Yuwell’s own software, it was rarely connected when I opened the phone, and I had to rely on it reconnecting and backfilling the data. Libre2+ used the Juggluco software and always appeared to have a connection when I checked, however, subsequently, the statistics suggest that it was often not fully connected to the phone. The Dexcom One+/G7 was on the same phone, and appeared t capture all the data, however, the illusion of constant connection may be just that. Unfortunately, I don’t have connection statistics to be able to tell.

While all the statistics are useful, they don’t give a reflection of what it might be like to live with each of these sensors. Hopefully, when taken alongside the TIR data, it becomes much more obvious what the variation in the statistics means for everyday use.

Lots of data, but what have we learned from this?

With the usual caveats about n=1 data, the key lesson from this test was that in real world use, few of the systems get close to the values in their clinical trials, so once again we are left with the distinct impression that MARD is a marketing tool, but not much else.

As per the table below, these are the values provided for these sensors at ATTD, and I’ve added the numbers realised in this experiment to give a comparison.

Manufacturer stated MARD compared to 9 Sensor Samba

Some of those numbers are very different, and raise questions about studies, inter-sensor variation and other factors.

Finally, with the addition of the TIR data alongside the statistical data, we can see where the real problems might arise in relation to these sensors.

Having said that, two of the newcomers impressed me with their proximity to fingerprick values and bias and one took that a step further with its low detection. Step forward the Caresens Air which achieved the best results of the newcomers, with the Syai Tag taking the silver position.

On the face of it, given that they can be calibrated and are intended for non-adjunctive use (or in other words, you can bolus directly off their data), they appear from this small test to work reasonably well.

It’s good to see new manufacturers entering the fray with systems that appear to be starting to challenge the status quo. There’s also hope that some of the other CGM manufacturers can get their acts together with their next generation sensors and really start to open up the market a bit more.

Automated Insulin Delivery use

What some people may not be aware is that Android APS supports a number of these sensors directly and others via various app integrations. The question is, would I feel comfortable using these sensors for the purpose of Automated Insulin Delivery?

To investigate this further, I’ll take a look at the Consensus Error Grids (CEGs) for each sensor in the second part of the results, where we will review the CEGs and look at the distribution of data points to try and determine whether the sensors tested here, without calibration, would be safe to use.

Keep watching for a further deep dive…

12 Comments

  1. Thanks this is really interesting. I have been offered to move to the NHS HCL set up with the omnipod and told I could choose the G6 or Libre2+ as the sensor, presumably without calibration in the setup. Looks like I should go for the Libre2+ on your results?

    • I’d never suggest making a decision based on my results alone. Given that they both cost the same to the NHS, I’d ask if you can try a couple of each and see how you get on.

      There’s an individual variation factor that also comes into play, and while this G6 wasn’t great, that hasn’t always been the case.

  2. I think we have become obsessed with MARD and just how “accurate” these sensors are – if sensors were 100% accurate it would not change things much from a patient perspective -diet insulin sensitivity time to action duration of action are other factors no one even talks about anymore as we focused on the damn number CGM is a valuable tool if you understand its limitations and the other aspects of diabetes management

  3. I have only used Libre 2 and 3. I have found sensor level variation when tested against BG; be it at the sensor or the different area of the body – there is an identification problem. I have also found with Accuheck BG tests, to get different consecutive readings and sometimes quite wide that I have had to do 3 or 4 tests. However, what I do find useful with the sensors is the direction of BG travel. From a closed loop CGM perspective, I find the control pretty good even though there is a lag. Having said all this, this is useful to have the numbers side by side. Just as an aside, it would be useful to have the formulas used for the statistics on the blog itself. Note: the hospital often takes the numbers as gospel truths and at that time I take my notes out to show the comparisons of Sensor and finger prick variations – there does not seem to be a bias pattern that I can show with the data.

  4. Regarding coverage issue, I use One+/G7 with Juggluco and xdrip at the same time. Juglucco backfill to AAPS, but not to xdrip. So I can see coverage within xdrip. I got 98% coverage (teenager use, so not very compliant), and with the same setup and libre 2/2+ it was around 75%. This alone made us switch to G7.

  5. I’ve done my own “two sensor tango” recently and found extraordinary differences between Freestyle Libre 3+ and Dexcom G7: the 3+ always shows lower values than G7, often more than 50% lower values…and Contour Next BG was always closer to the G7’s values, usually within 1-2% of the G7. Contour Next fingerstick values never, ever came close to 3+ values.
    In practical terms, this meant the 3+ often indicated hypoglycemia that was not reflected in the G7‘s readings or fingerstick values. The implications of this difference for AID pumps is lots of unnecessary hypoglycemia alarms and potentially higher average glucose levels as the pump suspends insulin delivery.
    After six days of comparing the two CGM sensors, I’m very surprised the Libre 3+ was ever approved for use with AID pumps.
    A caveat: both sensors were worn on my abdomen instead of back of my arm, but I don’t find any clinical data that suggest why this would skew the differences so much.

    • The Libres are curious beasts. In general, as far as I can tell, they tend to align with Abbott blood test strip values, rather than the contour next, and the Abbott is test strips, in my experience, tend to produce lower values from the same blood than contour next.

      We assume that Contour Next is generally a better approximation for blood values for multiple reasons, most notably, because the Diabetes Science and Technology blood test surveys show this.

      As my data shows, in general I find Libres to read lower than blood tests and generally lower than Dexcoms, but I also find that Dexcoms often have a positive bias compared to blood.

      None of them is perfect.

      There was a comparison done by Dexcom some time ago that compared sensors worn on the abdomen with those worn on the arms, and I seem to recall that their abdominal MARD was greater than the arm results. I assume it’s something to do with fat layers and interstitial fluid dynamics around different depths of tissues, but I’ve no real data.

  6. My experience of using Libre 2 and 2+, calibrated using xdrip4ios, is that roughly 1 in 4 were bang on, half of them were considently higher than test strips (especially at the 10+ range) and the remaining quarter were all over the place and eventually failed after a week or so.

  7. Amazing work. Thank you. Your fingers must have been “bloody” sore after that 🙂 I notice that the Caresens Air isn’t available on the NHS, which would prevent me from using it – as well as the fact that my diabetes team is only familiar with Libres and Dexcom. I’m still on MDIs having first been told I qualified for Omnipod 5 and then at my next review been informed that the qualifying criteria had changed with then latest Government. I’m pretty happy with my Libre 2+ but last night it through a wobbler resulting in me having to change sensor following wildly different BG test results throughout the night when alarms sounded. Thanks again.

1 Trackback / Pingback

  1. In the News... top diabetes stories and headlines happening now! - Diabetes Connections

Leave a Reply

Your email address will not be published.


*