Convolutional neural networks are a subtype of Deep Neural Networks and are primarily used in Computer Vision and Image Processing. Comprising of one or more convolutional layers, this is being advancely used in different applications like Radiology, Image Classification, Face Recognition, Art Design, Health Risk Assessment, Anomaly Detection and many more.
A CNN has one or more convolutional layers, either one dimensional of multi dimensional. The working architecture comprises of a convolutional filter with a kernel size ( typically a matrix of shape 3X3, 5X5, 7X7…), which in turn operates over an input and converts it into an equivalent processed matrix. The basic idea is to reduce and convert the input tensor to an easy to process/compute state, eventually reducing the computational capabilities and increase the speed. The Network having CNN Layers one after the other, tends to go deeper and deeper into the process of feature extraction of the inputs provided. This also involves further computations like pooling layers, which tend to capture the internal outlying patterns.
CNNs are more commonly used in computer vision applications like image processing and allied fields. But another interesting application of this is Natural Language Processing . NLP is a field where ML professionals mostly prefer the memory units of the Deep Neural Networks, like the RNNs, LSTMs, GRUs, along with the Dense Layers. But the implementation of CNNs in NLP is also evolving and perhaps giving very good results as NLP also has certain applications where feature extraction becomes very critical part of the analysis. One such application is Sarcasm Detection.
Sarcasm Detection is a part of the Natural Language Processing where the trained model is able to detect whether a sentence is sarcastic or not. When it comes to sarcasm, sometimes it becomes very dicey for even humans to understand it, so the computer being able to detect it is quite a challenging task. The use of LSTMs ,RNNs, GRUs, Transformers in NLP is to train the model and make the use of the memory element integrated with each of them, because the primary idea in making the model understand the language, is that it needs to be trained to understand the sequence of words. The sequence of words matters to make the model, understand it’s virtual meaning of the sentence.
When it comes to detecting a sarcasm from a sentence, the context of the word will play a deciding role. ‘’Car accused for an accident”, This is a sarcastic news headline, and the features deciding the context are ‘Car’, ‘accused’ and ‘accident’. These 3 together create a contradictory context to the reader as how can a car be accused for an accident. This is where feature extraction plays its part and hence CNNs become very handy in operation to this form of data. Below example evaluates it better.
Sarcasm detection for news headlines-
Dataset: The dataset is taken from kaggle – https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection.It has 3 columns – The Headline , Is_Sarcastic , Article Link
Creating Sentence Embeddings-
The Sentence embeddings have been developed using the Transformer Document Embeddings-(‘roberta-base’). Here the importance of Sentence embeddings and their significance in the vector space becomes very critical as a headline is a very short phrase, and to understand the context of words from such a short phrase precisely is essential, and hence their vector representation in the vector space should be very accurate and contextually strong. Hence advanced sentence embedding methods have been prefered over the regular word embedding methods like the count vectorizor or Tfid Vectorizor.
The Convolutional Neural Network-
As evident from the above model the pattern used to build the neural network is of Truncating Structure. With every new convolutional layer, the number of convolutional units goes on increasing( Double the earlier value ) and after a plateau point is reached, it goes on decreasing ( Half the earlier value ). 1st layer-128, 2nd layer-256, 3rd layer-512, 4th layer-1024, 5th layer-1024, 6th layer- 512, 7th layer-128.The advantage of using such a structure of CNN is to get through the feature extraction process in a more efficient way.
Results- The model after evaluation on the test dataset, gave an excellent result with ROC AUC score of 0.8 with precision 0.76 and recall 0.85. This eventually concludes that using the truncated CNN structure for sarcasm detection after applying transformer sentence embeddings give quite significant results
The idea of using CNN for sarcasm detection is inspired from the following research paper-