{meta}
To push the state-of-the-art blink detection algorithms further we make our data available. We publish our results on available benchmarks. We publish our code for evaluation and our datasets annotations to be able to compare blink detection algorithm performance. All data available here is under GPL3 license.
Abstract. Computer users often complain about eye discomfort caused by dry eye syndrome. This is sometimes caused and accompanied by incomplete blinks. There are several algorithms for eye blink detection, but none which would distinguish complete blinks from the incomplete ones. We introduce the first method which detects blink completeness. Blinks differ in speed and duration similar to speech, therefore Recurrent Neural Network (RNN) is used as a classifier due to its suitability for sequence-based features. We show that using unidirectional RNN with time shifting achieves higher performance compared to a bidirectional RNN, which is a suitable choice in this kind of problem where the feature pattern is not yet observed for the initial frames. We report the best results (increase by almost 8%) on the most challenging dataset: Researcher's night. We formulate a new important problem and state an initial benchmark for further research.
Abstract. A new eye blink detection algorithm is proposed. Motion vectors obtained by Gunnar–Farneback tracker in the eye region are analyzed using a state machine for each eye. Normalized average motion vector with standard deviation and time constraint are the input to the state machine. Motion vectors are normalized by the intraocular distance to achieve invariance to the eye region size. The proposed method outperforms related work on the majority of available datasets. We extend the way how to evaluate eye blink detection algorithms without the impact of algorithms used for face and eye detection. We also introduce a new challenging dataset Researcher’s night, which contains more than 100 unique individuals with 1849 annotated eye blinks. It is currently the largest dataset available.
During my PhD. I read every paper on blink detection I know about. The problem with evaluation was, that each paper used different evaluation procedure and usually their own dataset annotation. I propose an evaluation procedure which eliminates the face and eye algorithms influence (their locations are included within the annotation) and penalizes both the incorrect length and position of the detected blink. Please to know more read FogeltonCVIU2016. There are two important things to notice. What to do with multiple blinks (there is no non-blink frame between blinks)? And how to merge blinks from both eyes? There are multiple choices how to do it. This depends on the algorithm use. Our evaluation procedure in FogeltonCVIU2018 differs a bit from FogeltonCVIU2016. We can not merge left and right blinks because their completeness can differ. People can blink multiple times in a row. Some algorithms detect multiple blinks as one [Pan2007] and others like n [FogeltonCVIU2016]. Both parameters; blink rate and inter-blink interval (time between individual blinks) are important. We decide in favor of inter-blink interval, so we detect multiple blinks as one.
We make some of our evaluation procedures available to make it easier to compare with us C++ code, python. We are glad to compare your results with ours, please contact here
dataset | ground truth blink count | FogeltonCVIU2018 | FogeltonCVIU2016 |
---|---|---|---|
Researcher's night test set | 1447 | 0.879 | 0.8 |
Talking face | 122 | 0.971 | 0.93 |
ZJU | 510 | 0.976 | 0.938 |
Eyeblink8 | 804 | 0.913 | 0.916 |
Silesian5 25fps | 562 | 0.945 | 0.914 |
dataset | GT comp. | GT incomp. | FogeltonCVIU2018 comp. | FogeltonCVIU2018 incomp |
---|---|---|---|---|
Researcher's night test set | 1043 | 433 | 0.744 | 0.466 |
Talking face | 119 | 3 | 0.939 | 0.25 |
ZJU | 488 | 22 | 0.838 | 0.203 |
Eyeblink8 | 762 | 44 | 0.893 | 0.337 |
Silesian5 25fps | 508 | 60 | 0.86 | 0.326 |
We also want to compare performance of our algorithm to existing ones. The Comparison is not valid, because the annotation and evaluation procedure differs but you can still have a feeling about the performance. In FogeltonCVIU2018, the results are discussed in greater detail.
paper | ZJU | Talking | Eyeblink8 | Silesian5 |
---|---|---|---|---|
Anas et al. 2017 | 0.937 | 1 | x | x |
Soukupova and Cech 2016 | 0.952 | 0.948 | 0.952 | 0.957 |
Radlak and Smolka 2013 | 0.992 | x | x | 0.915 |
FogeltonCVIU2018 | 0.976 | 0.971 | 0.913 | 0.945 |
This dataset contains 8 videos with 4 individuals (1 wearing glasses). Videos are recorded in a home environment. People are sitting in front of the camera and mostly act naturally with vivid facial mimics, similarly to Talking face dataset. There is 408 eye blinks on 70 992 annotated frames with resolution 640x480.
dataset
We introduce a new dataset which was collected during an event called Researcher's night 2014, which is available on demand. People were asked to read an article on a computer screen or blink while being recorded. There is sometimes more than one person in the camera view. We collected 107 videos with 223 000 frames of different people with a cluttered background. People are often acting naturally, wearing glasses (around 20%), touching their face, different head movements or even talking to somebody. Some of the blinks are unnaturally long, which can be considered as voluntary (people knew they are being recorded) or extended blinks. 1849 blinks were annotated which makes Researcher's night the biggest real-world dataset publicly available.
There are two subsets: Researcher's night 15 and Researcher's night 30 that are captured with 15 and 30 frames per second (fps) with resolution 640x480. Small deviations can occur. Severe CPU usage can cause some frames to be postponed before delivered to the recording software or not delivered at all. For example video 10 in the test set of emph{Researcher's night} 30 has only 20 fps for the first 3 seconds. This is the reason why time-stamp information could be crucial for successful detection. On the other side, we observed that sometimes the same frame could be delivered twice to fulfill the device driver requirement to capture video stream at given frame-rate or it is an error of the encoder (we used x264vfw) or codec (FFmpeg in OpenCV 2.4.6). These cases happen rarely, mostly during bad light conditions.
While recording, H264 codec was set to the baseline profile level 3, which is primarily used for lower-cost applications with limited computing resources like video conferencing or mobile applications. The quantizer was set to 23 (range 0-51). Therefore, the video quality corresponds to common video quality, which could be acquired using a mobile phone. Dataset is divided into train, validation, and test set with ratio 1/4, 1/4 and 1/2. We believe this real-world dataset can help researchers to develop more precise algorithms.
Dataset is available on demand
ZJU dataset consists of 80 videos, each lasting few seconds. 20 individuals are recorded using 4 clips (frontal view with and without glasses (2 types) and upward view) stated to be captured at 30fps with resolution 320x240. Different numbers of the ground truth eye blinks are reported for this dataset by related work. This is because different annotators could consider as eye blink also an eye opening or eye closing. These often occur in this dataset at the beginning and at the end of a video. We report 6 double blinks as 12 individual eye blinks, that is why we report 261 eye blinks instead of 255 as reported in most of the related work. There are also very short eye blinks (2 frames long for example). Based on this finding, we believe that videos are not captured using 30fps the entire time. On the other side, there are also very long (20 frames long for example) eye blinks that can be considered as intentional and not endogenous eye blink. Subjects in the dataset are still, almost without any head movement during the recording. They are Asian ethnicity.
our annotation
Dataset Talking face is not originally created to evaluate eye blinks, but to evaluate facial landmarks detection precision, which means there is no official ground truth data for blinks. It was used in several related work to evaluate blinks beside its small size. There are 5000 frames captured with 25fps with resolution 720x576, in total 200 seconds of the video stream. It consists of images of one subject sitting and talking in front of the camera. Dataset is annotated with 68 facial landmarks. We annotated 61 eye blinks. The ground truth intervals of individual eye blinks differ because we decided to do a completely new annotation. We use the eye corner locations from the original facial landmarks annotation.
This dataset is captured with a high speed Basler camera using 100fps with resolution 640x480. It is a subset used for eye blink evaluation of larger Silesian dataset[1]. There are 5 subjects captured in close distance and controlled environment with 300 blinks. We annotated the dataset with face bounding box and eye corner positions. Because of converting the dataset images into compressed video, we lost few last frames from the dataset because of the used codec (around 30 last frames of each video). We annotated 58 884 frames from the original count of 59 031. When we converted the blink interval annotations from the original one, we found out different precision. The original ground truth blink intervals are not annotated so precisely as ours. The annotated blink interval usually starts several frames before the eye blink actually starts and ends a few frames after it ends. Another difference is what is considered as double blink. The original annotation uses this term also for micro stops of an eyelid while it moves down. Sometimes one frame could be a part of two consecutive blinks in the original annotation which in our opinion can cause lower precision in evaluation. Moreover, these blinks are not considered as double blink even if they are very close to each other.
our annotation
To obtain our converted videos from the original videos, please contact Mr. Krystian Radlak with signed and scaned license agreement. If he allows, I will share our videos with you.
[1] Radlak, K., Bozek, M., Smolka, B.: Silesian Deception Database: Presentation and Analysis. In: Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection. WMDD ’15, ACM, 2015, pp. 29–35.
Because of image acquisition instability, which is caused mostly by insufficient light conditions, we record also the frame acquisition time. Recording software contains also other features required by its users: postponed start of the recording, beep before start, setting the video length or running an application before recording starts.
Annotations of face bounding box and eye corners are obtained manually by annotators. Annotators were instructed to maintain precision 5px maximum out of the correct eye corner locations. Thanks to these annotations, evaluation of eye blink detection algorithm will be no more influenced by the face or eye detector. We provide the information about the eye blink interval with the information of whether the eye is fully closed or not, separately for individual eyes. The annotation row consists of the following information: frame ID : blink ID : NF : LE_FC : LE_NV : RE_FC : RE_NV : F_X : F_Y: F_W : F_H : LE_LX : LE_LY : LE_RX : LE_RY: RE_LX : RE_LY : RE_RX : RE_RY, where the individual notation means:
A tool was created in order to speed up the annotation process which would be quite time consuming if all frames would be annotated manually. If a new frame is observed, first a Waldboost face detection is performed. Afterward, eye corners are detected. In a case of inaccuracy or not detecting a face at all, the annotator can adjust them. If a frame is part of a blink, annotator will mark given frames with the relevant labels. Start of the blink is the first frame when the eyelids start to move and the end of the blink is when eyelids stop their opening. Additional tags, like left/right eye is fully closed or not visible are marked. A non-frontal face can be marked too. All parts of the annotation are checked or created by the annotator. Searching through the video can be done using arrows or by selection of a given row in the annotation table. It does not allow jumping to whatever video frame, only to already annotated frames. This way errors caused by annotators are minimized. Eyes and face corners can be adjusted by switching to adjustment mode. Frames do not differ significantly because of higher fps, so we implemented few shortcuts to copy the previous annotation to the current frame. This was very practical, mostly, when the face or eye detection was failing.
Annotation tool (uses opencv 2.4.6) [no technical support, bugs can occur]
The reported performance depends also on the threshold used to classify True positive. There is a difference whether the blink is annotated and detected is a single position over a video or as an interval. A distance of the positions can be thresholded to report True positive. While using an interval type of annotation, different metrics can be used. Inspired from PASCAL VOC challenge and its object detection evaluation, we use Intersection Over Union (IOU) metrics which penalizes difference between intersection and union. We define blink as True positive if the IOU with ground truth blink interval is larger than 0.2.
We have an application which detects blinks in videos available on demand.