Comparing the Libre2 and Dexcom G6 as RT-CGM using #WeAreNotWaiting software

The Libre2 has been available for some time now in both Germany and Norway, and is used by a reasonable number of loopers to drive their DIY systems. It has also been submitted for the iCGM scheme in the US. Given this, I felt that it was time to review how well it worked using DIY CGM Software.

In summary, it was somewhat of a surprise, in that it worked a lot better than I was expecting, and as the figures above show, the MARD versus blood testing was sub-10%. However, the headline numbers don’t tell us the whole story, so let’s dig into what went on, and what the outcomes were….

Set-up and Method

Sensor insertion

Both sensors were inserted and immediately started, putting the “insertion trauma” calibration algorithms of both to the test, rather than applying them both a number of hours prior to start in order to give the trauma a chance to die back.

Software

To run this experiment, for Libre2, the patched Freestyle LibreLink app and xDrip+ were used as the receiving software on a single Android phone. For Dexcom’s G6, it was simply xDrip+.

As a footnote, you can run a standard version of xDrip, plus what’s known as an “xDrip Variant”, which allows to identical instances to be run side by side. With this setup, I was able to see both sets of data side by side. 

It’s important to be clear that when using the patched Librelink app, the data that it produces is a glucose number that the alarms use for determining whether glucose levels are high or low. This is being fed to xDrip, which is then used for display.

The other advantage of this software is that it allows the blood meter to automatically upload readings to the cloud.

Calibration

Whilst using xDrip does provide the ability to calibrate both sensors, in this case the following protocol was observed:

  • Both sensors started at the same time
  • No calibrations through the first nine days of use
  • If calibration required after Dexcom G6 sensor restarted, any calibration will be applied to both sensors

For the two different systems, calibration works differently, as follows:

  • Any calibration on Dexcom G6 is passed to the transmitter and uses Dexcom’s onboard calibration model;
  • For Libre2 data, any calibration applies the xDrip calibration algorithm, meaning the change is only in xDrip and doesn’t change the sensor output data.

The key aim here was to observe the differences between the native self calibrating algorithms over the sensor life of the G6.

Only one calibration was undertaken, after the G6 sensor was restarted and started to track very widely from blood glucose levels. It was applied to both sensors. 

Comparative blood tests

All blood testing was undertaken using Contour Next strips via the Contour Next One bluetooth meter, as this is regarded as the most accurate SMBG meter available by the Diabetes Technology Society, and allowed for automated upload of the data.

Data Capture

All the data was uploaded to Nightscout and also to an Influx-DB database, to allow the standard visualisation from Nightscout, but also the application of Grafana to the Influx-DB set-up, giving a slightly different view.

Either database can then be interrogated to pull off all the records required to undertake analysis.

Method for comparing readings

To compare blood readings and the CGM data, for each blood test data point, the most recent sensor values prior to the blood test were applied. While we are regularly told that there is a 5-10 minute delay in the glucose levels of interstitial fluid, a user is usually likely to compare the most recent sensor value with that of the blood. In addition, there is some algorithmic post-processing done by both Abbott and Dexcom to try and provide a non-delayed glucose reading.

Results

Overall Results

Over the course of the two weeks, the two sets of data can be seen below:

 

Whilst the granularity is hard to see in this image, it’s worth paying attention to the first 36 hours. These are detailed below:

The Libre2 data was vastly different from the Dexcom at first and took 36 hours to align itself. In the image shown, it looks as though the rapid drop during a spinning class broke the calibration algorithm, and it took 36 hours to recover. After this 36 hour point, the Libre2 data tracked the Dexcom G6 much more closely. 

As a result of this misalignment, the first 36 hours of data have been removed from the analysis of MARD vs Blood and Bias, as there was clear error in the algorithm’s function present.

Overall, there were 65 clear sets of data. When compared over the period, we see the following:

In general, you can see from this plot that at the blood testing points, the Libre2 native values generally read below the native values while the Dexcom G6 values were above the blood values. In this short test, Libre2 was generally better at detecting lower values comapred to the G6.

This is even more obvious when we look at the “Bias” graph.

The Bias graph shows the distinct difference between the two data sets when running in non-calibration mode for both sensors.

Overall, the calculated accuracy values for both sensors show that the Dexcom G6 had the better MARD from blood (MARDB) and standard deviation.

We can also see when reviewing the Surveillance Error Grids for both sensors that while the calculated values appear to show significant difference, the distribution of values isn’t considered to be too high a risk, given the relatively small number of comparative points.

The measured values show no values outside of the green zones, and while the Libre2 has a slightly wider dispersion, the differences between the two are not vast.

If we review this data using a modified Bland-Altman plot, the biases of the two sets of data are very much more obvious:

MBA – Dexcom G6
MBA – Libre2

They both show a number of results outside the 15% band that is normally required as the accuracy level for blood testing meters, but it’s worth bearing in mind that these are values versus a blood test meter, not a YSI analyzer.

The other point of note was that during the testing period, I identified significant variation in readings as a result of undertaking Bikram Yoga, as described here. While this obviously plays into the outcomes of the dataset, it’s an additional point to be wary of when using CGM systems in significant heat.

Native, non-Calibration results

During the initial nine-day period, the un-calibrated results of both systems showed a more marked gap than the overall results.

While the Dexcom’s MARD versus blood and bias weren’t significantly different from overall, the Libre2 was substantially worse. While 10.3% MARDB isn’t a terrible outcome, it’s generally considered not to be low enough to use in a automated insulin delivery device or to dose from.

If we compare this data to that from the Libre1 with a MiaoMiao attached, the uncalibrated Libre2 performed better than or as well as the calibrated Libre1 in both the previous tests.

It’s worth bearing in mind that in both of these tests, the quality of the Libre1 sensors was questioned. 

Calibrated results

The number of results received after calibrating was only 16, so while the values show a different story to the non-calibrated set-up, the number of results raises questions about the overall validity.

 

It’s clear that for both systems, adding in the single calibration over this small number of results made a difference to the performance of each. The improvement in both sensors with a single calibration, even with different calibration mechanisms, is marked.

What we don’t have is long enough use of either sensor to see whether there is drift following the calibration, and if there is, how bad it is. The data points in the graphs post-calibration don’t appear to show anything significant though.

Discussion

As mentioned at the start, the first 36 hours of the Libre2 produced terrible data, and this was discounted from the analysis. The reasons for this are unclear and a second test with a Libre2 left in place for 24 hours prior to starting was planned to investigate this further. Unfortunately, the second sensor refused to start, so I was unable to confirm whether this would make a difference. 

It’s not clear what the reasons for this were, but it does suggest that the Dexcom G6 “insertion trauma calibration algorithm” (ITCA) is more effective than that of the Libre2. 

What was clear from the data produced was that in this n=1 case, the Libre2 uncalibrated data showed some clear differences from that of the Dexcom G6. In general, the Libre2 data when left uncalibrated reported lower than SMBG values most of the time. Likewise the G6 reported higher than SMBG data, but not by quite as much.

More surprisingly, once the single calibration had been applied to both sensors, the MARD and Bias over the remaining life were better in the Libre2 than the Dexcom G6, although the standard deviation of the G6 suggested better clustering of values. Anecdotal feedback from users of Libre2 in Europe suggests this is a common occurrence.

It’s worth noting that the uncalibrated Libre2 sensor performed as well as or better than a calibrated Libre1 sensor with MiaoMiao attached, although how much this was down to sensor variation is unclear. I hope to investigate this further.

Overall, the data produced by the Libre2 was much closer to blood values than I was anticipating in non-calibrated use, and provided some good results when calibrated in a “safe” way. The data appears to show that careful calibration with the Libre2 in xDrip will allow for safe use, and suggests that as an iCGM, with some care, Abbott will be able to provide what AID providers need.

It was unfortunate that the second sensor failed as it meant that this comparison became about a single sensor for each brand of CGM rather than a broader comparison, and more data is needed. It’s worth noting, however, that at least with the failure of the second sensor, there was no risk of erroneous data, however frustrating a failure on start-up might be. 

Conclusions

The Libre2 data had a much better accuracy profile than I was expecting to see, and it also had a potentially safer profile for use in an AID, with a negative bias as opposed to a positive one. 

It’s worth noting that the number of data points, and the fact that this is an n=1 observation introduce a reasonable amount of uncertainty in relation to this output. 

The main question for me in relation to Libre2 data being accessed via xDrip is “Would I loop with it?”. What’s come from this test of one sensor has been pleasantly surprising, given my expectations from Libre1. It appears that Libre2 sensors with poor data issues may just die rather than continuing to produce bad data, unlike the Libre1. Having said that, the first two days of data were pretty horrible, which raises different concerns relating to usability.

At this stage, I’d cautiously consider using the Libre2 for looping, pending checking whether there were noticeable differences between sensors, as we’ve seen with Libre1, and whether the issues I saw with the first 36 hours could be mitigated.

I was pleased that the data this sensor produced was, in general, considerably better than I’ve seen from Libre1 and looks like it could present a reasonable alternative to Dexcom, if you are someone who isn’t waiting.

8 Comments

  1. Very nice work, it’s really good to see the consistency within each of the systems. Quite amusing for me to see that Dexcom and Libre behave exactly the other way around in your trial, compared to what I tried to test out (undocumented). For me Libre was reliably showing the lower values and Dexcom the higher ones (best example for n=1 experience and ‘each-sensor-may-vary xD).

    Really outstanding visual elaboration of your values :thumbsup:

  2. I used the Libre 2 with blucon and LinkBluCon app, I calibrated only when readings were drifting away more than 10% of the fingerprick

    • I assume that the LinkBluCon app works similarly to Tomato for MiaoMiao, in that Bluecon reads the transmitter and forwards the scan data to an app.

      In which case, you’re as reliant on the underlying algorithm in the Libre2 as using the patched app. I think with some of these technologies, as long as you’re aware of the risks associated with them you can make your own mind up as to how you use them.

      Given what I saw of this one sensor, I’d probably want to calibrate the data rather than leave it to its own thing.

  3. Really good assessment. How amazing not to have to calibrate. Libre2 seems to be a serious contender to Dexcom. Good to have something else reaching that high standard.

  4. Hi Tim,

    Interesting analysis as always! My experience with G6 has been that one or two calibrations early on make a big difference- the out-of-the-box zero point often seems a bit off (generally high for me) but settles in nicely once calibrated. That’s consistent with your post-calibration testing here.

    What software are you using for the comparison plots? (Bland-Altman and error grid) They are nice visualizations.

  5. Too many wrong numbers! The mean bias of Dexcom G6 is approximatelly -5% (from -2% to -7%), but not +5%. Also the accuracy of Libre2 in the first 36 hours is like yours only in 1 of 5-6 sensors. For the difference of Libre1, most of the Libre2 sensors start working from the first day with the same accuracy as on the third day, for example. Calibration in xDrip is usually required around 6 or 7 days, when usually the internal calibration of the sensors for some reason changes and larger deviations occur.

  6. Looking at the second (detailed) time series…did you have any other exercise-induced drops in BG analogous to the one you associated with a spin class in those first 36 hours? While it seems unlikely, it’d be interesting to learn how the 2 CGMs compare under these circumstances. The highest peak in your BG is also followed by a steep decline with a similar response latency in the Libre2 although the lag isn’t as prolonged as the first. Thanks for this!

Leave a Reply to Eric Jensen Cancel reply

Your email address will not be published.


*