In this post I will discuss about A Structured Self-Attentive Sentence Embedding (published at ICLR in 2017) which is an interesting paper that introduces a new sentence representation different from the conventional sentence embedding vector. To understand this post and the paper, the readers are required to have a basic understanding of latent space representation of words (word embeddings), recurrent architectures like LSTMs and BiLSTMs and non linear functions like softmax and tanh. I highly recommend the readers to read the original paper after reading this post to learn about the proposed model in more detail. Introduction This paper proposes a new model for extracting a sentence embedding by using self-attention. Instead of using a traditional 1-D vector, the authors propose to use a 2-D matrix to represent the sentence, with each row of the matrix attending on a different part of the sentence. They also propose a self-attention mechanism and a special regularization term...