Summary: | Real-time tool tracking in minimally invasive-surgery (MIS) has numerous applications for computer-assisted interventions (CAIs). Visual tracking approaches are a promising solution to real-time surgical tool tracking, however, many approaches may fail to complete tracking when the tracker suffers from issues such as motion blur, adverse lighting, specular reflections, shadows, and occlusions. We propose an automatic real-time method for two-dimensional tool detection and tracking based on a spatial transformer network (STN) and spatio-temporal context (STC). Our method exploits both the ability of a convolutional neural network (CNN) with an in-house trained STN and STC to accurately locate the tool at high speed. Then we compared our method experimentally with other four general of CAIs’ visual tracking methods using eight existing online and in-house datasets, covering both in vivo abdominal, cardiac and retinal clinical cases in which different surgical instruments were employed. The experiments demonstrate that our method achieved great performance with respect to the accuracy and the speed. It can track a surgical tool without labels in real time in the most challenging of cases, with an accuracy that is equal to and sometimes surpasses most state-of-the-art tracking algorithms. Further improvements to our method will focus on conditions of occlusion and multi-instruments.
|