Welcome to the Wild, Wild West: the CGM head-to-head study

In the past at Diabettech, we’ve talked about how the design of a CGM safety and accuracy study plays a big part in the results that are achieved (Lies, damned lies and statistics) and have highlighted ways that various manufacturers have “doped” studies to get competitive MARD numbers in order to compete with some of the major manufacturers.

We’ve also looked at ways that studies have been run to ensure that while all the data is 100% valid, leaving out certain aspects give a better MARD value, but cause the data that comes out of the study to be less useful (as was the case with the Libre3).

What we’ve not talked about are the head-to-head studies that pop up from time to time. Perhaps more importantly, we need to talk about why these may need to be consumed very carefully and often, with a pinch of salt.

As a publisher of n=1 head-to-head studies, I’m acutely aware that certain activities can cause poor quality data, and I take steps to minimise them. If your primary aim is to publish a marketing piece disguised as an academic paper, then you might consider this differently.

But why are we talking about this now?

In January, the Journal of Diabetes Science and Technology published a paper by Hanson et al that compares the Freestyle Libre3 and the Dexcom G7.

It drew some surprising results, suggesting that Libre3 accuracy was far superior to Dexcom G7 based on MARD (8.9% compared to 13.6%). Additionally, in their study, the MARD of the Dexcom G7 was far worse than Dexcom’s safety and accuracy study stated (8.2% when worn on the arm).

Comparison of MARD data distribution from the JDST study

But if we delve into this, we find that there are some glaring and obvious issues and questions need to be asked about the quality of what we’re looking at.

These issues are highlighted in responses from both Dexcom, and unusually, the IFCC to the JDST, and I cover them here, as they are all valid.

What are the concerns?

There are a number of items that stand out:

  • YSI venous testing was done on days 1, 2, 5, 7, 9 and 10
  • There was no glucose manipulation in the venous comparison section of the trial
  • Sensor lag
  • Where venous and capillary data were compared, the meter used was an Abbott blood glucose meter
  • The study wasn’t registered at clinical trials.gov
  • Sponsorship and data analysis were both by Abbott Diabetes Care

Apart from the obvious point at the bottom, which should give us pause for thought as to whether this is truly an objective study, let’s consider some of the other points.

Timing of venous testing

The timing of these tests is clearly designed around the Dexcom G7 life. There’s nothing wrong with scheduling venous testing like this, however it creates data by omission. In general, as has been shown in most studies of glucose sensors, as you get to the end of the life of a sensor, the performance reduces.

Comparing a G7 in its last two days with a Libre3 in it’s mid-life (where often sensors appear at their most accurate) doesn’t seem reasonable. Likewise, not including any data for days 13 and 14 of the Libre3 (where it too has been shown to perform less well) eliminates the risk of this data showing up in your “MARD”.

In addition to this, if we look at the distribution of MARD histogram in the original Dexcom accuracy study, we see it’s not so dissimilar to the one shown earlier in this article.

Dexcom MARD histogram from Accuracy and Safety study 2022

Lack of glucose manipulation

The lack of glucose manipulation is starting to become a bit of a theme in Abbott CGM studies. As we noted earlier, it is another piece of data by omission, and removes the risk of greater variance that is always seen in sensors at low glucose levels, enhancing MARD values.

This is even highlighted in the study limitations section, which enahnces how unusual this is:

our assessment of accuracy in the hypoglycemic ranges was based on a very small number of paired glucose values; < 1.0% of the glucose values assessed were < 70 mg/dL. This is notably lower than the percentage of paired values in the hypoglycemic range reported in previous studies

And if you don’t collect the data, you can’t be accused of cherry picking…

Sensor Lag

We also noted that during the pairing with blood glucose data, the following method was used:

YSI values were paired with sensor values by choosing the sensor value that was closest in time to the YSI blood draw, but no more than five minutes before or after the YSI blood draw.

Given the stated lag for the G7 and Libre3 in the accuracy studies, this adds another confounding factor and may explain the poor performance of the G7.

The Dexcom G7 accuracy study suggests that the mean lag between venous and CGM readings was 3 minutes 30 seconds. For the Libre3, the equivalent study suggests it had a lag of 1 minute 48 seconds.

Given the above lag data, it’s clear that in the case where the closest reading on the CGM for the Dexcom was 2mins and 29 seconds prior to the blood sample, the CGM value would relate to a blood sample more than five minutes earlier.

The Libre3, with minute by minute sampling, would be within 30 seconds, and measuring a worst case value no more than 2 mins and 28 seconds earlier.

This would exacerbate inaccuracies when there were rapidly changing glucose levels.

Given this aspect is baked into the design, it might be considered to substantially favour the Libre3.

Glucose meter selection

At first glance, the question is “Is this really an issue?”, as we use the venous data for comparison, however, comparison with capillary data is made in the paper.

Whilst it’s not directly relevant in this study, in the past, when reporting inaccurate Libre sensors to Abbott, they refused to change a sensor if the blood test you were using for comparison was not done on the Libre scanning device. The choice here is reminiscent of that.

While this is circumstantial, it leads to questions over whether there is a similar bias in readings from Abbott’s own sensors and SMBG sticks. It wouldn’t have been hard to use an independent brand that sat at the top of the Diabetes Technology Society’s BGMS Surveillance Program to eliminate a potential source of in-house bias.

Not registered at clinicaltrials.gov

Again, is this really an issue? It’s certainly unusual. For nearly twenty years, the International Committee of Medical Journal Editors (IMJCE) has insisted on prospective trial registration as a condition of publication, however many trials register post trial start and are still published. This one, however, simply doesn’t exist there.

What’s perhaps more interesting still is that the JDST states they conform to the IMJCE requirement regarding clinical trial registration in their author guidance, which makes it even more unusual that outcomes from an unregistered trial might be published.

Except that this study doesn’t call itself a clinical trial. It calls itself a:

multicenter, single-arm, prospective, nonsignificant risk evaluation

So why would it need to be registered?

Abbott’s sponsorship

While these types of trial are often interesting, when they are done by one large manufacturer comparing their product with another large manufacturer, we, as consumers, should always beware. It’s highly unlikely that what’s being presented is without bias.

The study is a little unusual in this case. It appears to be presented as an independent study, with no Abbott employees amongst the authors, and yet, when you get to the funding section, you find this:

Abbott Diabetes Care provided funding for the study. The study was designed by Abbott Diabetes Care and performed all statistical analyses and data interpretation. Christopher Parkin, MS, provided editorial support in writing and formatting the manuscript. He received consulting fees from Abbott Diabetes Care for his services. Each investigator was responsible for conducting the study at the sites according to the protocol and data collection. All authors contributed to the development of the manuscript and take responsibility for the accuracy of the data reported.

Abbott didn’t just fund the study, they designed it, performed all the statistical analyses and data interpretation, and paid for Chris Parkin to ensure it was written and produced professionally. Given this, it seems a little odd that neither Chris, nor anyone from Abbott, is included as a co-author.

Any other concerns?

Apart from the obvious pieces here regarding the manipulation of the study to generate favourable data and the slightly unusual publication standards, the responses (and the article itself) highlight that the lack of standards in CGM data collection and publication is what leads to this in the first place. It is very clear that standards are required when it comes to publishing CGM data.

When you delve into the details, there are little items that stand out, like using sensors from two lots, instead of the best practice, which is to use a minimum of three.

The comparison is also mixing its metaphors a bit. It purports to contain a “standard use” component where people finger prick to compare values with standard blood glucose monitoring when at home but then uses only the factory calibration. In the real world, Dexcom advises that calibration is undertaken once values are more than the 20/20 rule apart.

The other point that we consider pertinent is that when Dexcom submitted the G7 data for iCGM approval, they submitted over 31,000 paired readings from 619 sensors. This study has between 3,600 and 4,000 from 55 of each sensor. It’s much more comparable in size to some of the new entrants approval studies than traditional Dexcom and Abbott submissions.

Takeaways?

Aside from the obvious challenges this study has, and what appears to be an abnormally high MARD value for the Dexcom G7, even with everything else taken into account, the key things that we as users should be aware of are:

  • This is a funkily designed study that will always favour the output from the Libre3.
  • It’s funded by Abbott.

Ultimately, it’s marketing masquerading as science. Which does none of us any favours.

3 Comments

  1. I completely agree with you.

    Seems to be a bit like the “papers” published with funding by the tobacco industry or the sugar/food companies.

    As far as I am concerned any scientific (or supposedly scientific) paper that is published without competent peer review should be treated with a huge pinch of salt.

    Problem is of course, is that the person looking at the paper or the highlighted results from the paper, has to have some sort of understanding of the science (on whatever topic the paper discusses) and some knowledge of experimental design.

    I would venture to suggest that average sensor user does not possess such knowledge. Therefore I feel that companies who use such techniques are no better than tobacco companies.

Leave a Reply

Your email address will not be published.


*