Summary: | The task of encoding visual information into tactile information has been studied since the 1960s. There is still an open challenge in converting the data of an image into a small set of signals that will be sent to the user as tactile input. In this study, we evaluated two methods that have never been used for encoding vision-to-touch using convolutional neural networks, a bag of convolutional features (BoF) and a vector of locally aggregated descriptors (VLAD). We also present here a very new method for evaluating the semantic property of the encoded signal by taking the idea that objects with similar features must have similar signals in the tactile interface; we created a semantic property evaluation (SPE) metric. Using this metric, we proved the advantage of using the BoF and VLAD methods, obtaining an SPE of 70.7% and 64.5%, respectively, which is a considerable improvement over the downscaling method used by many systems such as BrainPort, with 56.2%.
|