Text this: Robust Multimodal Emotion Recognition from Conversation with Transformer-Based Crossmodality Fusion