Face Recognition Using Metric Learning with Head Pose Information

博士 === 國立交通大學 === 電子研究所 === 107 === Face recognition has gained much interest recently and is widely used in daily applications such as video surveillance, applications for smart phones and airport security. Nevertheless, recognizing faces in large profile views still remains a hard problem since im...

Full description

Bibliographic Details
Main Authors: Hsu, Heng-Wei, 許恆瑋
Other Authors: Lee, Chen-Yi
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/69788s
id ndltd-TW-107NCTU5428058
record_format oai_dc
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立交通大學 === 電子研究所 === 107 === Face recognition has gained much interest recently and is widely used in daily applications such as video surveillance, applications for smart phones and airport security. Nevertheless, recognizing faces in large profile views still remains a hard problem since important features start to be obscured as a person’s head turns. This problem can be divided into two sub-problems: first, an accurate head pose estimation model is required to predict the angle given a face image. Second, a face recognition model that leverages the angle information is also needed to discriminate different people in different angles. In this dissertation, we aim to fulfill this gap. Instead of estimating the angle of head poses through a commonly used two-step process, a set of landmarks are first detected from faces then angles are estimated through these detected landmarks, we propose to directly predict angles from face images by training a deep convolutional neural network model. We further provide a metric learning based face recognition framework to leverage the angle information and improve the overall performance. Our contribution can be mainly divided into three parts: first, we propose a novel geometric loss for face recognition that explores the area relations within quadruplets of samples, which inherently considers the geometric characteristics of each sample set. The sampled quadruplet includes three positive samples and one negative sample which form a tetrahedron in the embedding space. The area of the triangular face formed by positive samples is minimized to reduce intraclass variations, whereas the areas of the triangular faces including the negative sample are maximized to increase interclass distances. With our area based objective function, the gradient of each sample considers its neighboring samples and adapts to local geometry which leads to improved performance. Second, we conduct an in-depth study of head pose estimation and present a multi-regression loss function, a L2 regression loss combined with an ordinal regression loss, to train a convolutional neural network (CNN) that is dedicated to estimating head poses from RGB images without depth information. The ordinal regression loss is utilized to address the non-stationary property observed as the facial features change with respect to different head pose angles and learn robust features. The L2 regression loss leverages these features to provide precise angle predictions for input images. To avoid the ambiguity problem in the commonly used Euler angle representation, we further formulate the head pose estimation problem in quaternions. Our quaternion-based multi-regression loss method achieves state-of-the-art performance on several public benchmark datasets. Third, we designed a sophisticated face recognition training framework. We start from data cleaning, an automatic method to deal with the labeling noise issue which most recent large datasets suffer. We then designed a data augmentation method that randomly augments the input image under various condition, such as adjusting the contrast, saturation, and the lighting condition of an image. Sharpening, blurring and noises are also applied to the images to simulate cases from different camera sources. The boundary values of the parameters for each image processing method are designed such that the resulting images are reasonable. Experiment results demonstrate that models trained with this kind of data augmentation show robust performance to unseen images. When training with large datasets, the size of the last fully connected layer for the classification loss are often large since the datasets consist of large number of identities. This makes the training process hard to converge, as the weights are randomly initialized. Thus we propose an iterative training and finetuning process that makes the training loss converge smoothly. Furthermore, to leverage the angle information for improving face recognition performance, we provide a detailed analysis of a metric learning based method that learns to minimize the distance between a person’s frontal and profile images. Qualitative and quantitative results are shown to demonstrate the effectiveness of our proposed training methodology. The following publications form the foundation of this thesis Heng-Wei Hsu, Tung-Yu Wu, Sheng Wan, Wing Hung Wong, and Chen-Yi Lee, “QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss,” IEEE Transactions on Multimedia, Aug 2018. • Heng-Wei Hsu, Tung-Yu Wu, Wing Hung Wong, and Chen-Yi Lee, “Correlation-based Face Detection for Recognizing Faces in Videos,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3101–3105, Apr 2018. • Heng-Wei Hsu, Tung-Yu Wu, Sheng Wan, Wing Hung Wong, and Chen-Yi Lee, “Deep Metric Learning with Geometric Loss,” under review. • Sheng Wan, Tung-Yu Wu, Heng-Wei Hsu, Yi-Wei Chen, Wing H. Wong, and Chen-Yi Lee, “Model-based JPEG for Convolutional Neural Network Classifications,” under review.
author2 Lee, Chen-Yi
author_facet Lee, Chen-Yi
Hsu, Heng-Wei
許恆瑋
author Hsu, Heng-Wei
許恆瑋
spellingShingle Hsu, Heng-Wei
許恆瑋
Face Recognition Using Metric Learning with Head Pose Information
author_sort Hsu, Heng-Wei
title Face Recognition Using Metric Learning with Head Pose Information
title_short Face Recognition Using Metric Learning with Head Pose Information
title_full Face Recognition Using Metric Learning with Head Pose Information
title_fullStr Face Recognition Using Metric Learning with Head Pose Information
title_full_unstemmed Face Recognition Using Metric Learning with Head Pose Information
title_sort face recognition using metric learning with head pose information
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/69788s
work_keys_str_mv AT hsuhengwei facerecognitionusingmetriclearningwithheadposeinformation
AT xǔhéngwěi facerecognitionusingmetriclearningwithheadposeinformation
AT hsuhengwei yīngyòngjiǎodùzīxùngǎijìnyǐdùliàngxuéxíwèijīchǔzhīrénliǎnbiànshíxìtǒng
AT xǔhéngwěi yīngyòngjiǎodùzīxùngǎijìnyǐdùliàngxuéxíwèijīchǔzhīrénliǎnbiànshíxìtǒng
_version_ 1719178697859137536
spelling ndltd-TW-107NCTU54280582019-05-16T01:40:47Z http://ndltd.ncl.edu.tw/handle/69788s Face Recognition Using Metric Learning with Head Pose Information 應用角度資訊改進以度量學習為基礎之人臉辨識系統 Hsu, Heng-Wei 許恆瑋 博士 國立交通大學 電子研究所 107 Face recognition has gained much interest recently and is widely used in daily applications such as video surveillance, applications for smart phones and airport security. Nevertheless, recognizing faces in large profile views still remains a hard problem since important features start to be obscured as a person’s head turns. This problem can be divided into two sub-problems: first, an accurate head pose estimation model is required to predict the angle given a face image. Second, a face recognition model that leverages the angle information is also needed to discriminate different people in different angles. In this dissertation, we aim to fulfill this gap. Instead of estimating the angle of head poses through a commonly used two-step process, a set of landmarks are first detected from faces then angles are estimated through these detected landmarks, we propose to directly predict angles from face images by training a deep convolutional neural network model. We further provide a metric learning based face recognition framework to leverage the angle information and improve the overall performance. Our contribution can be mainly divided into three parts: first, we propose a novel geometric loss for face recognition that explores the area relations within quadruplets of samples, which inherently considers the geometric characteristics of each sample set. The sampled quadruplet includes three positive samples and one negative sample which form a tetrahedron in the embedding space. The area of the triangular face formed by positive samples is minimized to reduce intraclass variations, whereas the areas of the triangular faces including the negative sample are maximized to increase interclass distances. With our area based objective function, the gradient of each sample considers its neighboring samples and adapts to local geometry which leads to improved performance. Second, we conduct an in-depth study of head pose estimation and present a multi-regression loss function, a L2 regression loss combined with an ordinal regression loss, to train a convolutional neural network (CNN) that is dedicated to estimating head poses from RGB images without depth information. The ordinal regression loss is utilized to address the non-stationary property observed as the facial features change with respect to different head pose angles and learn robust features. The L2 regression loss leverages these features to provide precise angle predictions for input images. To avoid the ambiguity problem in the commonly used Euler angle representation, we further formulate the head pose estimation problem in quaternions. Our quaternion-based multi-regression loss method achieves state-of-the-art performance on several public benchmark datasets. Third, we designed a sophisticated face recognition training framework. We start from data cleaning, an automatic method to deal with the labeling noise issue which most recent large datasets suffer. We then designed a data augmentation method that randomly augments the input image under various condition, such as adjusting the contrast, saturation, and the lighting condition of an image. Sharpening, blurring and noises are also applied to the images to simulate cases from different camera sources. The boundary values of the parameters for each image processing method are designed such that the resulting images are reasonable. Experiment results demonstrate that models trained with this kind of data augmentation show robust performance to unseen images. When training with large datasets, the size of the last fully connected layer for the classification loss are often large since the datasets consist of large number of identities. This makes the training process hard to converge, as the weights are randomly initialized. Thus we propose an iterative training and finetuning process that makes the training loss converge smoothly. Furthermore, to leverage the angle information for improving face recognition performance, we provide a detailed analysis of a metric learning based method that learns to minimize the distance between a person’s frontal and profile images. Qualitative and quantitative results are shown to demonstrate the effectiveness of our proposed training methodology. The following publications form the foundation of this thesis Heng-Wei Hsu, Tung-Yu Wu, Sheng Wan, Wing Hung Wong, and Chen-Yi Lee, “QuatNet: Quaternion-Based Head Pose Estimation With Multiregression Loss,” IEEE Transactions on Multimedia, Aug 2018. • Heng-Wei Hsu, Tung-Yu Wu, Wing Hung Wong, and Chen-Yi Lee, “Correlation-based Face Detection for Recognizing Faces in Videos,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3101–3105, Apr 2018. • Heng-Wei Hsu, Tung-Yu Wu, Sheng Wan, Wing Hung Wong, and Chen-Yi Lee, “Deep Metric Learning with Geometric Loss,” under review. • Sheng Wan, Tung-Yu Wu, Heng-Wei Hsu, Yi-Wei Chen, Wing H. Wong, and Chen-Yi Lee, “Model-based JPEG for Convolutional Neural Network Classifications,” under review. Lee, Chen-Yi 李鎮宜 2018 學位論文 ; thesis 116 en_US