Learning Aligned Cross-Modal Representations from Weakly Aligned Data
People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new cross-modal scene dataset. While convolutional neural networks c...
Main Authors: | Castrejon, Lluis (Author), Pirsiavash, Hamed (Author), Aytar, Yusuf (Contributor), Vondrick, Carl Martin (Contributor), Torralba, Antonio (Contributor) |
---|---|
Other Authors: | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science (Contributor) |
Format: | Article |
Language: | English |
Published: |
Institute of Electrical and Electronics Engineers (IEEE),
2017-12-29T19:43:54Z.
|
Subjects: | |
Online Access: | Get fulltext |
Similar Items
-
SoundNet: learning sound representations from unlabeled video
by: Aytar, Yusuf, et al.
Published: (2020) -
Anticipating Visual Representations from Unlabeled Video
by: Vondrick, Carl, et al.
Published: (2018) -
Learning Cross-Modal Aligned Representation With Graph Embedding
by: Youcai Zhang, et al.
Published: (2018-01-01) -
Generating videos with scene dynamics
by: Vondrick, Carl, et al.
Published: (2020) -
Learning visual biases from human imagination
by: Vondrick, Carl Martin, et al.
Published: (2018)