Text this: Multi-Perspective Image and Video Processing for Human-Machine Interaction