Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centre
Background A growing number of research studies have reported inter-observer variability in sizes of tumours measured from CT scans. It remains unclear whether the conventional statistical measures correctly evaluate the CT measurement consistency for optimal treatment management and decision-making...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMJ Publishing Group
2020-11-01
|
Series: | BMJ Open |
Online Access: | https://bmjopen.bmj.com/content/10/11/e040096.full |
id |
doaj-6b00da514dfa4edd8504ad975b74175e |
---|---|
record_format |
Article |
spelling |
doaj-6b00da514dfa4edd8504ad975b74175e2021-06-25T12:35:00ZengBMJ Publishing GroupBMJ Open2044-60552020-11-01101110.1136/bmjopen-2020-040096Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centreMinJae Woo0Moonseong Heo1A Michael Devane2Steven C Lowe3Ronald W Gimbel41 Public Health Sciences, Clemson University, Clemson, South Carolina, USA1 Public Health Sciences, Clemson University, Clemson, South Carolina, USA 2 Radiology, Prisma Health Upstate, Greenville, South Carolina, USA2 Radiology, Prisma Health Upstate, Greenville, South Carolina, USA1 Public Health Sciences, Clemson University, Clemson, South Carolina, USABackground A growing number of research studies have reported inter-observer variability in sizes of tumours measured from CT scans. It remains unclear whether the conventional statistical measures correctly evaluate the CT measurement consistency for optimal treatment management and decision-making. We compared and evaluated the existing measures for evaluating inter-observer variability in CT measurement of cancer lesions.Methods 13 board-certified radiologists repeatedly reviewed 10 CT image sets of lung lesions and hepatic metastases selected through a randomisation process. A total of 130 measurements under RECIST 1.1 (Response Evaluation Criteria in Solid Tumors) guidelines were collected for the demonstration. Intraclass correlation coefficient (ICC), Bland-Altman plotting and outlier counting methods were selected for the comparison. The each selected measure was used to evaluate three cases with observed, increased and decreased inter-observer variability.Results The ICC score yielded a weak detection when evaluating different levels of the inter-observer variability among radiologists (increased: 0.912; observed: 0.962; decreased: 0.990). The outlier counting method using Bland-Altman plotting with 2SD yielded no detection at all with its number of outliers unchanging regardless of level of inter-observer variability. Outlier counting based on domain knowledge was more sensitised to different levels of the inter-observer variability compared with the conventional measures (increased: 0.756; observed: 0.923; improved: 1.000). Visualisation of pairwise Bland-Altman bias was also sensitised to the inter-observer variability with its pattern rapidly changing in response to different levels of the inter-observer variability.Conclusions Conventional measures may yield weak or no detection when evaluating different levels of the inter-observer variability among radiologists. We observed that the outlier counting based on domain knowledge was sensitised to the inter-observer variability in CT measurement of cancer lesions. Our study demonstrated that, under certain circumstances, the use of standard statistical correlation coefficients may be misleading and result in a sense of false security related to the consistency of measurement for optimal treatment management and decision-making.https://bmjopen.bmj.com/content/10/11/e040096.full |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
MinJae Woo Moonseong Heo A Michael Devane Steven C Lowe Ronald W Gimbel |
spellingShingle |
MinJae Woo Moonseong Heo A Michael Devane Steven C Lowe Ronald W Gimbel Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centre BMJ Open |
author_facet |
MinJae Woo Moonseong Heo A Michael Devane Steven C Lowe Ronald W Gimbel |
author_sort |
MinJae Woo |
title |
Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centre |
title_short |
Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centre |
title_full |
Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centre |
title_fullStr |
Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centre |
title_full_unstemmed |
Retrospective comparison of approaches to evaluating inter-observer variability in CT tumour measurements in an academic health centre |
title_sort |
retrospective comparison of approaches to evaluating inter-observer variability in ct tumour measurements in an academic health centre |
publisher |
BMJ Publishing Group |
series |
BMJ Open |
issn |
2044-6055 |
publishDate |
2020-11-01 |
description |
Background A growing number of research studies have reported inter-observer variability in sizes of tumours measured from CT scans. It remains unclear whether the conventional statistical measures correctly evaluate the CT measurement consistency for optimal treatment management and decision-making. We compared and evaluated the existing measures for evaluating inter-observer variability in CT measurement of cancer lesions.Methods 13 board-certified radiologists repeatedly reviewed 10 CT image sets of lung lesions and hepatic metastases selected through a randomisation process. A total of 130 measurements under RECIST 1.1 (Response Evaluation Criteria in Solid Tumors) guidelines were collected for the demonstration. Intraclass correlation coefficient (ICC), Bland-Altman plotting and outlier counting methods were selected for the comparison. The each selected measure was used to evaluate three cases with observed, increased and decreased inter-observer variability.Results The ICC score yielded a weak detection when evaluating different levels of the inter-observer variability among radiologists (increased: 0.912; observed: 0.962; decreased: 0.990). The outlier counting method using Bland-Altman plotting with 2SD yielded no detection at all with its number of outliers unchanging regardless of level of inter-observer variability. Outlier counting based on domain knowledge was more sensitised to different levels of the inter-observer variability compared with the conventional measures (increased: 0.756; observed: 0.923; improved: 1.000). Visualisation of pairwise Bland-Altman bias was also sensitised to the inter-observer variability with its pattern rapidly changing in response to different levels of the inter-observer variability.Conclusions Conventional measures may yield weak or no detection when evaluating different levels of the inter-observer variability among radiologists. We observed that the outlier counting based on domain knowledge was sensitised to the inter-observer variability in CT measurement of cancer lesions. Our study demonstrated that, under certain circumstances, the use of standard statistical correlation coefficients may be misleading and result in a sense of false security related to the consistency of measurement for optimal treatment management and decision-making. |
url |
https://bmjopen.bmj.com/content/10/11/e040096.full |
work_keys_str_mv |
AT minjaewoo retrospectivecomparisonofapproachestoevaluatinginterobservervariabilityincttumourmeasurementsinanacademichealthcentre AT moonseongheo retrospectivecomparisonofapproachestoevaluatinginterobservervariabilityincttumourmeasurementsinanacademichealthcentre AT amichaeldevane retrospectivecomparisonofapproachestoevaluatinginterobservervariabilityincttumourmeasurementsinanacademichealthcentre AT stevenclowe retrospectivecomparisonofapproachestoevaluatinginterobservervariabilityincttumourmeasurementsinanacademichealthcentre AT ronaldwgimbel retrospectivecomparisonofapproachestoevaluatinginterobservervariabilityincttumourmeasurementsinanacademichealthcentre |
_version_ |
1721359794126716928 |