FACIAL EXPRESSION RECOGNITION IN A VIDEO
MAJOR PROJECT REPORT
Date: November 2018
Dr. SANDEEP PAUL
Department of Physics & Computer Science,
Dayalbagh Educational Institute ,Agra
This is to certify that the work in this report entitled “FACIAL EXPRESSION RECOGNITION IN A VIDEO” submitted by Shivangi Singh to Dayalbagh Educational Institute, in fulfillment of the requirements for the award of the degree of Master of Technology, is a record of the bona fide work carried out by her under my supervision.
Prof. G.S. Tyagi Dr. Sandeep Paul
Dept. of Physics and Computer Science Dept. of Physics and Computer Science Faculty of Science Faculty of Science
Dayalbagh Educational Institute Dayalbagh Educational Institute
(Deemed University) (Deemed University)
I hereby declare that the submission of project report is my own work to the best of my knowledge and belief. It contains no material previously published or written by another person. It contains the result of original work and studies carried out by myself and the contents of the report file do not form the basis of any other degree to the candidate.
Shivangi Singh Roll no: 1704281
The synopsis presents the system for recognizing emotions through facial expression displayed in the video. Emotions of the face are response of feeling to particular situation or environment. Emotions are very important part of our existence, like when we smile to show greeting, glare someone when confused, raises voice when angry .The recovery of facial activities in video sequence is an important and challenging problem. Expressions of the face are a key mechanism for understanding and conveying emotion and recognition of emotions in a video could be very challenging. Since we know computers are “emotionally challenged” they could not interact with humans through emotions. If we want machines to interact with humans they should communicate by changing responses with respect to the various changes in the emotional state of human users in different interactions. For this various machine learning approaches can be used.
In interactions of human and computer , recognition of emotions and expressions from the video streaming plays a very important role. In this project , various machine learning approaches are implemented to detect emotions such as happiness , sadness , disgust , surprise , fear and anger. As we know that , plenty of techniques have been developed to track and recognize facial activities. I will use different classification techniques and the results are compared to find which technique gives best results.
As Human-Robot Interaction is increasing its attention nowadays. In order to put some limelight on socializing robots with human, Understanding the facial gestures and visual cues of an individual is a need. It allows a robot to understand the expressions of humans in turn enhancing its effectiveness in performing various tasks. It serves as a Measurement systems for behavioral science. Socially intelligent software tools can be accomplished.
In this paper we propose an implement a general convolutional neural network (CNN) building framework for
designing real-time CNNs. We validate our models by creating a real-time vision system which accomplishes the tasks of face detection, gender classification and emotion classification simultaneously in one blended step using our proposed CNN architecture.
Real time detection of face and interpreting different facial expressions like happy, anger, sad, fear, surprise etc. is based on facial features and their actions.
The key elements of face are considered for detection of face and prediction of expressions or emotions of face.
To determine the different facial expressions, the variations in each facial features are used.
For detection and classification of different classes of facial expressions, machine learning algorithms are used by training of different set of images.
The proposed algorithm uses open source computer vision (OpenCV) and Machine learning with python.
Keywords: Machine Intelligence, Random Forest, SVM, FFT, Encephalogram signals, Epilepsy, Seizure.
TABLE OF CONTENTS
I would like to thank all those who have contributed to the completion of the project and helped me with valuable suggestions for improvement. I am extremely grateful to Dr. Sandeep Paul, Department of Computer Science, Dayalbagh Educational Institute for providing me with best facilities and atmosphere for the creative work guidance and encouragement.
Recognition of emotions is becoming an increasingly significant and important component in human – machine interaction systems. The human computer interactions are an emerging field in computer science. Therefore in order to make the computer systems intelligent it must interact with humans the way humans interact. Humans interact through physical gestures and postures which mainly include facial expressions. Recognizing human facial emotion by computer is interesting and challenging problems as emotions are the key semantic component of human communication.
You can do emotion recognition using different modalities, such as , expression, talk , body gesture etc. Recognition of facial expressions expression has attracted a lot of interest in last few decades because it is complex. Expression of our face says a lot even if we are not speaking. According to some theories, these movements basically convey the emotions of an individual to the observers. Expression of our face are basically form of non verbal communication. Basically they are primary means of communicating the social message between humans.
Emotions recognition is basically a process of extracting the features which can help to recognize the mood and future perception of the individual. The basic emotions are :
Happiness – It is symbolized by raising the corners of the mouth(an obvious smile) and tightening of the eyelids.
Surprise – It is symbolized by arching of eyebrows, wide openingof eyes and exposing more white, with the jaw dropping slightly.
Sadness – It is symbolized by lowering of the corners of the mouth, the descending of eyebrows to the inner corners and the eyelids drooping.
Anger – It is symbolized by the lowering of eyebrows, lips pressing firmly and eyes bulging.
Neutral – It is symbolized by no movements of eyebrows, lips and eyes.
Disgust – It is symbolized by the raising of the upper lip, Nose Bridge wrinkling and cheeks raising.
Fear – It is symbolized by the raising of upper eyelids, eyes opening and the lips stretching horizontally.
The objective is to detect all faces and their corresponding emotions (happiness , sadness , disgust , surprise , fear and anger) and neutral face in a video source and live video. I have used three classifiers ( svm , Knn , logistic regression) to compare the emotional states.. The results will be compared to find out which classification technique is best for emotion recognition.
2.1 LITERATURE REVIEW
Epilepsy is the central nervous system disorder. In the world the affected proportion of epilepsy in all age group is very high. The main problem of this disorder is unknown root cause and sudden issues occurrence such as unconsciousness, uncontrollable steps or movement, etc.13
Lots of research and experiments were made in past and present years. Different kinds of feature selection and classification methods were experimented for detecting seizure on time to save patients from any hazards.
In this work we used FFT and Correlation coefficient method for features selection purpose. FFT were chosen as different magnitude of different range of frequencies would provide better set of features14. 1 second clip across all channels of EEG signals were chosen for applying FFT . phase information is discarded we have taken log10 of the frequency measure in 1to 47 hertz range. This 1 to 47 hertz range has been chosen on the basis of hit and trial method during performing dimensionality reduction process. 147 range gives better result as compared with any other range. 0hz omitted as we can consider it as bias term as also occur due to noise and instrument error. We got results of this step in form of (No. of channels for each patient:N , No. of samples:M). For training all pairs are chained .
Correlation coefficient is an statistic measure which finds relation between two variables dependency. Correlation coefficient and its values (eigenvalues) for time and frequency domain make another effective feature 15. For finding correlation coefficient and their eigenvalues firstly we normalize time domain and frequency domain data using different normalization methods such as binning. We used 1 to 47 Hz frequency bins for normalization of FFT output. Each 1hz in treated as vector. Then some other operation using mean ,median, standard deviation are performed on vector. Now this normalized matrix is used for calculating NxN coefficient smatrix. we used this correlation coefficient matrix to find its eigenvalues and these eigenvalues are sorted magnitude vice for final feature. In frequency domain we have frequency range as an input while in time domain we have time series as an input. At this step we used upper triangular correlation matrix and sorted eigenvalues as a feature.
.But there are various challenges in classifying these emotions which are as follows:
Posing and Frequent movements of head
Structural components presence
Orientation of the image
Subtle facial deformation
Ambiguity and uncertainty in face motion measurement
3.1 METHODOLOGY AND DATA
The data consists of 48×48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image.
train.csv contains two columns, “emotion” and “pixels”. The “emotion” column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The “pixels” column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row major order. test.csv contains only the “pixels” column and your task is to predict the emotion column.
The training set consists of 28,709 examples. The public test set used for the leaderboard consists of 3,589 examples. The test set consists of another 3,589 examples.
This dataset was prepared by Pierre-Luc Carrier and Aaron Courville, as part of an ongoing research project and was available through kaggle.
In the first step i am going to take the input using webcam and video source and detect the face using opencv in python and try to get the features using CNN concept of deep learning and for classification task, the extracted features are given to the classifier like Logistic Regression, SVM etc and classifier predicts the recognized expression as output.
I have used a HAAR filter from OpenCV to automate face finding.
I have used Python 3.6 with virtual environment, Python AIML libraries used are as follows :
Support Vector Machine (SVM) :
SVM is one of the most important classification techniques. The Support vector machine basically views any classification problem as a quadratic optimization problem.
The k-nearest- neighbours algorithm is a classification algorithm, and it is supervised. Nearest neighbour basically takes a bunch of labelled points and then use these points to learn how to label other points. In order to label any new point, it just looks which labelled points are closest to that new point .
Logistic regression is a statistical method for analysing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes).
4.1 EXPERIMENTS AND RESULTS
Detection of emotions on a video source: