Natural Language Processing is one of the most dynamic applications of deep learning in today’s world. And when it comes to translating a human encoded language into a computer understandable one, there are some barriers which we need to overcome. Context and semantics being the major ones of them. Semantics is understanding the meaning and reference of words in any language. Just having Vector representation of words in word vector space, without having any corelation between them will make a language just a collection of words without any meaning when used together.
Hence advanced deep learning approaches have been developed to represent a language in a vector space, taking semantics into account. For this purpose, there are libraries like spacy, gensim ,nltk and also advanced transformer-based word/sentence embeddings, which are used to pre train the input words/sentences.
NLP is a technique used in many different applications in today’s world, with Chatbots being one of the most trending ones. A chatbot is an AI based bot used as an agent to communicate with the user/client without human intervention.
There are many ways of building a chatbot, which depends on the type of response which is expected from the chatbot. A basic chatbot has a pre-defined sequence of question and answers, where at every step the user/client is displayed a question along with the pre-defined answers for the user to choose from. The next step involves mapping of different answers selected from the user with the related questions from the next series of questions, and this goes on continuing up to the last defined acknowledgement answer.
The more advanced type of bots have the capabilities to translate the input queries as a semantic value and respond accordingly. Here there is no sequential Q&A pattern, the bot reacts to any query being asked irrespective of the sequence of queries.
The below illustration is a way of building a story based chatbot. This is trained on the facebook’s babi research dataset. It has 3 parts –
1-Story-Made up of a sequence of sentences to form a story describing an event/situation. Eg – ‘Mary’, ‘moved’, ‘to’, ‘the’, ‘bathroom’,
2-Question/Query-Is a question asked on the described story. Eg- Where is Mary at this point of time?
3-Answer-The response to the query in context of the story. Eg- ‘bedroom’
Building a Model-
The first part involves pre-processing of the input data, which involves tokenization and then creating word embeddings. After this the story, question data is sequenced and padded to standardize the sequence length.
After the data pre-processing the model is ready to be trained.
The architecture consists of 3 embedding layers- Embedding layers A and C are used for the sentences of the stories. Embedding layer B is used for question embeddings.
The sentences of the stories are passed through two different embedding layers A and C. The question data is passed through the Embedding Layer B. The next step is to take a dot product of the output embeddings of Layer A (Story Embeddings) and Layer B (Question Embeddings). The dot product is taken to build a contextual corelation between the two.
The output tensor of the dot product is passed through a softmax activation layer to genrate probabilities of the corelations between the question and story.
Each question is mapped to each story and probabilities are generated. The generated weight probabilities are then summed with the output tensor of the Embedding C Layer. This combined Tensor is then concatenated (weighted sum) with the output of embedding B Layer (Question embedding layer).
This final vector is passed through the softmax activation layer and the output answer is predicted.
There are two encoders and one decoder used for this training process.
Here the output of one layer is passed on to the next adjacent layer, and this goes on as scene in the figure.
End to End Memory Networks –