Summary: | Object tracking in challenging videos is a hot topic in machine vision. Recently, novel training-based detectors, especially using the powerful deep learning schemes, have been proposed to detect objects in still images. However, there is still a semantic gap between the object detectors and higher level applications like object tracking in videos. This paper presents a comparative study of outstanding learning-based object detectors such as ACF, Region-Based Convolutional Neural Network (RCNN), FastRCNN, FasterRCNN and You Only Look Once (YOLO) for object tracking. We use an online and offline training method for tracking. The online tracker trains the detectors with a generated synthetic set of images from the object of interest in the first frame. Then, the detectors detect the objects of interest in the next frames. The detector is updated online by using the detected objects from the last frames of the video. The offline tracker uses the detector for object detection in still images and then a tracker based on Kalman filter associates the objects among video frames. Our research is performed on a TLD dataset which contains challenging situations for tracking. Source codes and implementation details for the trackers are published to make both the reproduction of the results reported in this paper and the re-use and further development of the trackers for other researchers. The results demonstrate that ACF and YOLO trackers show more stability than the other trackers.
|