Published: Sep 1, 2020 by
Blind Baselines Perform Unexpectedly Well
We built three blind baselines which never use videos for training or inference. Our baselines put scores of deep models in a context. Surprisingly, our blind baselines are competitive and even outperform some deep models.
- Prior-Only Blind : This baseline predicts temporal locations without using videos or query sentences.
- Action-Aware Blind: This baseilne uses only one word in a query sentence to predict temporal locations of moments. For simplicity, we use the first verb in a query sentence.
- Blind-TAN: This is a neural network-based model that uses the full query sentence. Blind-TAN is built upon 2D-TAN 1. Blind-TAN removes a module to extract visual features and replace the module with a learnable map to capture language biases. We trained Blind-TAN to predicts temporal locations solely with query sentences.
The score of TripNet is updated according to the latest published scores.
-
Songyang Zhang, Houwen Peng, Jianlong Fu, and Jiebo Luo. Learning 2D temporal adjacent networks for moment localization with natural language. In The AAAI Conference on Artificial Intelligence, 2020. ↩