+ All Categories
Home > Documents > SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS)...

SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS)...

Date post: 20-Aug-2019
Category:
Upload: lydung
View: 214 times
Download: 0 times
Share this document with a friend
17
SCIENTIFIC PAPER A novel human–machine interface based on recognition of multi-channel facial bioelectric signals Iman Mohammad Rezazadeh S. Mohammad Firoozabadi Huosheng Hu S. Mohammad Reza Hashemi Golpayegani Received: 11 September 2010 / Accepted: 6 November 2011 Ó Australasian College of Physical Scientists and Engineers in Medicine 2011 Abstract This paper presents a novel human–machine interface for disabled people to interact with assistive systems for a better quality of life. It is based on multi- channel forehead bioelectric signals acquired by placing three pairs of electrodes (physical channels) on the Fron- talis and Temporalis facial muscles. The acquired signals are passed through a parallel filter bank to explore three different sub-bands related to facial electromyogram, electrooculogram and electroencephalogram. The root mean square features of the bioelectric signals analyzed within non-overlapping 256 ms windows were extracted. The subtractive fuzzy c-means clustering method (SFCM) was applied to segment the feature space and generate initial fuzzy based Takagi–Sugeno rules. Then, an adaptive neuro-fuzzy inference system is exploited to tune up the premises and consequence parameters of the extracted SFCMs rules. The average classifier discriminating ratio for eight different facial gestures (smiling, frowning, pulling up left/right lips corner, eye movement to left/right/ up/down) is between 93.04% and 96.99% according to different combinations and fusions of logical features. Experimental results show that the proposed interface has a high degree of accuracy and robustness for discrimination of 8 fundamental facial gestures. Some potential and fur- ther capabilities of our approach in human–machine interfaces are also discussed. Keywords Human–machine interfaces Multi-channel facial bioelectric signals Subtractive fuzzy c-means clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for the exchange of information in daily life. All facial functions, such as speech, mastication and facial expression, are accomplished by individual facial muscles. From the point of view of facial muscle kinematics, it is evident that a facial muscle is a small 3D combination of muscular slips that carry out a variety of complex orofacial functions. A specific neural feature of facial muscles is that their contractions are not only under voluntary but also emotional control [1]. Because of this, information including emotion, fatigue, feeling and general affective measures could be analyzed based on facial response [29]. Thus, the facial muscles could play a dominant role as a communication medium to accomplish message transmis- sion and information acquisition. When acquiring facial bioelectric signals, facial muscle activities (fEMG) as well as the other associated bioelectric potentials such as facial EOG (fEOG) and facial EEG (fEEG) that are generated simultaneously can be recorded I. Mohammad Rezazadeh (&) S. M. Firoozabadi S. M. R. Hashemi Golpayegani School of Biomedical Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran e-mail: [email protected]; [email protected] S. M. Firoozabadi School of Medical Sciences, Tarbiat Modares University, Tehran, Iran H. Hu School of Computer Science & Electronic Engineering, University of Essex, Colchester, UK S. M. R. Hashemi Golpayegani School of Biomedical Engineering, Amir Kabir University of Technology, Tehran, Iran 123 Australas Phys Eng Sci Med DOI 10.1007/s13246-011-0113-1 Author's personal copy
Transcript
Page 1: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

SCIENTIFIC PAPER

A novel human–machine interface based on recognition

of multi-channel facial bioelectric signals

Iman Mohammad Rezazadeh •

S. Mohammad Firoozabadi • Huosheng Hu •

S. Mohammad Reza Hashemi Golpayegani

Received: 11 September 2010 / Accepted: 6 November 2011

Ó Australasian College of Physical Scientists and Engineers in Medicine 2011

Abstract This paper presents a novel human–machine

interface for disabled people to interact with assistive

systems for a better quality of life. It is based on multi-

channel forehead bioelectric signals acquired by placing

three pairs of electrodes (physical channels) on the Fron-

talis and Temporalis facial muscles. The acquired signals

are passed through a parallel filter bank to explore three

different sub-bands related to facial electromyogram,

electrooculogram and electroencephalogram. The root

mean square features of the bioelectric signals analyzed

within non-overlapping 256 ms windows were extracted.

The subtractive fuzzy c-means clustering method (SFCM)

was applied to segment the feature space and generate

initial fuzzy based Takagi–Sugeno rules. Then, an adaptive

neuro-fuzzy inference system is exploited to tune up the

premises and consequence parameters of the extracted

SFCMs rules. The average classifier discriminating ratio

for eight different facial gestures (smiling, frowning,

pulling up left/right lips corner, eye movement to left/right/

up/down) is between 93.04% and 96.99% according to

different combinations and fusions of logical features.

Experimental results show that the proposed interface has a

high degree of accuracy and robustness for discrimination

of 8 fundamental facial gestures. Some potential and fur-

ther capabilities of our approach in human–machine

interfaces are also discussed.

Keywords Human–machine interfaces � Multi-channel

facial bioelectric signals � Subtractive fuzzy c-means

clustering � Adaptive neuro-fuzzy inference system

(ANFIS)

Introduction

The face is a very important body part that provides an

interface for the exchange of information in daily life. All

facial functions, such as speech, mastication and facial

expression, are accomplished by individual facial muscles.

From the point of view of facial muscle kinematics, it is

evident that a facial muscle is a small 3D combination of

muscular slips that carry out a variety of complex orofacial

functions. A specific neural feature of facial muscles is that

their contractions are not only under voluntary but also

emotional control [1]. Because of this, information

including emotion, fatigue, feeling and general affective

measures could be analyzed based on facial response [2–9].

Thus, the facial muscles could play a dominant role as a

communication medium to accomplish message transmis-

sion and information acquisition.

When acquiring facial bioelectric signals, facial muscle

activities (fEMG) as well as the other associated bioelectric

potentials such as facial EOG (fEOG) and facial EEG

(fEEG) that are generated simultaneously can be recorded

I. Mohammad Rezazadeh (&) � S. M. Firoozabadi �S. M. R. Hashemi Golpayegani

School of Biomedical Engineering, Science and Research

Branch, Islamic Azad University, Tehran, Iran

e-mail: [email protected]; [email protected]

S. M. Firoozabadi

School of Medical Sciences, Tarbiat Modares University,

Tehran, Iran

H. Hu

School of Computer Science & Electronic Engineering,

University of Essex, Colchester, UK

S. M. R. Hashemi Golpayegani

School of Biomedical Engineering, Amir Kabir University of

Technology, Tehran, Iran

123

Australas Phys Eng Sci Med

DOI 10.1007/s13246-011-0113-1

Author's personal copy

Page 2: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

at the same time. In considering the potential capabilities of

these bioelectric signals, their extracted features can be

used individually or combined in many research areas,

especially in designing human–machine interfaces. It was

found that even the recorded signals from facial muscle

activities show noticeable pattern changes as subjects

began to become tired from repeated smiling, nose wrin-

kling, or frowning, causing the bioelectric signals to

gradually lose similarity among the patterns as time goes

[10].

Other bioelectric signals from the user’s face (e.g.,

fEOG and fEEG) could illustrate emotional states and

attention when physical gestures or mimicking is weak or

difficult to generate for the user [10]. To acquire valid data

and consequently extract rich information from facial

bioelectric signals, some important factors such as elec-

trode placement, recording protocol and signal condition-

ing should be considered. By employing the correct

processing methods, one can design an accurate and robust

interface for real-world applications. A good study

regarding the measurement considerations of fEMG and its

applications can be found in [1].

Related works

Ang et al. [2] stated that fEMG can be related to certain

facial expressions that are the most visual representation of

a person’s physical emotional states. Mahlke and Minge [7]

used emotional states extracted from fEMG to discriminate

between usable and unusable computerized contexts by

placing two pairs of electrodes on the Zygomaticus Major

and Corrugators’ Supercili muscles to detect positive and

negative emotional states, respectively. They concluded

that frowning activity is significantly higher in the unusable

system condition than in the usable one. Surakka studied

the effects of affective interventions using fEMG in a

human–computer interface (HCI) and concluded that the

frowning activity was attenuated significantly after positive

interventions compared to the no intervention conditions

[11, 12].

The fEOG signal can also be used for eye-gaze tracking

to enhance the interaction level [13]. Hori et al. [14] used

two electrodes mounted on the upper part and temple of the

dominant eye side to detect 4 directions of eye movement.

By using a thresholding technique, they achieved 94.16%

average accuracy. McFarland, Neat and Pfurtscheller

focused on the detection of the alpha rhythm sub-band of

fEEG, an 8–12 Hz brainwave of sinusoidal nature occur-

ring at the sensor–motor cortex, to detect event-related

desynchronization (ERD) [15]. Junker combined certain

EMG signals with EEG signals to develop the Cyber-

LinkTM, which is a small wearable device that acquires

signals using three sensors mounted on a headband [16].

This headband can amplify and decode the forehead signal

into three individual frequency channels and eleven sub-

bands. These channels belong to fEOG (eye motion and

lateral eye movements), fEEG (alpha and beta bands) and

fEMG.

Stroke patients suffer facial paralysis because of neu-

rological damage, so restoring the capability of facial

expressions (gestures) is important for these patients [17].

Work has been performed by researchers to develop as-

sistive/therapeutic HMIs for disabled people using facial

bioelectric signals as control commands. Ferreira and his

colleagues designed a HMI based on bioelectric-signals

from facial muscles and brain activity, which correspond to

eye blink and visual information, for control of a robotic

wheelchair [10].

Kim et al. [18] used linear prediction coefficients

(LPCs) of Temporalis muscle activities (clenching left,

right, both molar teeth and eye blink) to control an elec-

trically powered wheelchair (EPW). They used Hidden

Markov Model (HMM) as a classifier, and achieved 96.5%

and 97.1% discriminating ratios for handicapped and

healthy groups respectively. Tsui et al. [13] controlled an

EPW using forehead bioelectric-signals acquired by

CyberLinkTM. They used Frontalis electrical activity

amplitudes as a click or double-click to define five-com-

mand control states: ‘‘stop’’, ‘‘forward’’, ‘‘backward’’,

‘‘left’’ and ‘‘right’’. The fEOG signal was also used for

speed control during EPW movement. Firoozabadi et al.

[3] also used multi-channel fEMG as an interface to control

a virtual robotic wheelchair. They used facial gestures

(smile, frown, pulling up lip corners) and their corre-

sponding fEMG to command a virtual wheelchair by

extracting root mean square (RMS) features of captured

bioelectric signals and employing a support vector machine

(SVM) classifier. They reported 89.75–100% classification

accuracy. Mohammad Rezazadeh et al. [4] expanded

Firoozabadi’s study [3] using the same setup to control a

virtual interactive tower crane in a construction area. They

employed subtractive fuzzy c-means clustering (SFCM)

and reported an average 92.6% classification ratio for five

different facial gestures (smiling, pulling up right/left lip

corners, mouth opening and clenching molar teeth).

Study goal

This study investigates how multimodal facial bioelectric

signals could be effectively recognized in order to build a

novel HMI for disabled people so they can access assistive

systems and enjoy a better quality of life within society.

The main reason to use the face as an interface in our

approach is that the human face is a rich resource of

information from syntactic to semantic and pragmatic

aspects, and most facial gestures are natural and voluntary

Australas Phys Eng Sci Med

123

Author's personal copy

Page 3: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

and can be easily generated by many individuals whose

motor impediments are mainly from the neck down. The

physical state (such as gestures or fatigue) of a user can be

monitored by recording and processing his or her fEMG

signal specifications (such as amplitude, power and fre-

quency spectrum). In addition, the user’s fEOG and fEEG

specifications (such as amplitude, entropy, phase space,

etc.) can be used to indicate the user’s affective states

(from happiness to sadness, mental and cognitive stress, or

attention to a display screen). The proposed multimodal

approach is to improve and enhance the interface and to

avoid saturating or overloading the physical states.

The rest of the paper is organized as follows. The

‘‘Material and methods’’ section presents the methods

adopted in this research, including electrode site selection,

experimental setup, and the data recording method. The

methods used for data preprocessing such as filter banks,

data segmentation, feature selection and onset detection are

described. In the ‘‘Classification’’ section, the input–output

subtractive fuzzy clustering method, adaptive neuro-fuzzy

inference system (ANFIS) and majority voting are dis-

cussed and then ‘‘Experimental results and analysis’’ sec-

tion are given. Finally, ‘‘Conclusion and the future work’’

section are provided.

Materials and methods

The general block diagram of our proposed bioelectric

interface is shown in Fig. 1. It consists of the following

sections, which have been discussed in this paper: electrode

site selection, configuration and placement, data pre-pro-

cessing and filtering, windowing and data segmentation,

feature extraction and transformation, inference system

designing and majority voting sub-blocks, which are dis-

cussed in the coming section.

Eight predefined facial gestures (smiling, frowning,

pulling up right and left lip corners, moving eye to up,

down, left and right) were studied in our experiments.

These gestures are primary facial movements that a healthy

user or a user who has suffered a stroke (from the neck

down) can generate easily. In addition, they can convey

users’ emotions (fEMG and fEEG), directions of view

(fEOG) and also levels of distraction of mental states

(fEOG and fEEG) [3–6]. Each subject was asked to per-

form the above gestures according to the recording proto-

col and the relative facial bioelectric signals were acquired

simultaneously, using three pairs of electrodes mounted on

the volunteer’s forehead (hereinafter, physical channels)

and then fEMG, fEOG, and fEEG sub-frequency bands

were extracted from each of the physical channels by

employing appropriate filter banks. Thus, for each physical

channel, there are three logical channels. Then, the RMS

features of the explored bioelectric signals for non-over-

lapped 256 ms windows were extracted. The SFCM was

used to segment the feature space and generate initial fuzzy

based Takagi–Sugeno (TS) rules. Afterwards, the ANFIS

exploited the extracted rules from SFCM to tune up the

premises and consequence parameters.

Electrodes site selection and placement

As illustrated in Fig. 2, three pairs of pre-gelled Ag/AgCl

electrodes were placed on the volunteer’s facial muscles in

a different configuration to harness the highest amplitude

of signals:

– One pair on Frontalis muscle; above the eyebrows with

a 2 cm inter-electrode distance (Channel 2).

– Two pairs placed on left and right Temporalis muscles

(Channel 1 and Channel 3).

– One ground electrode on the bony part of the left wrist.

Note that the skin of the selected electrode placement

area should be gently rubbed with an alcohol-soaked cotton

swab in order to minimize the impedance between elec-

trodes and skin and also to eliminate motion artifacts. The

lead wires should be secured to the volunteer’s face via an

anti-allergy adhesive tape.

This configuration is a heuristic choice because these

three pairs are capable of gathering valuable information

from the face as described below:

– As the Frontalis muscle is spread out around the

forehead and its fibers are not too deep and are not

perpendicular to the electrodes inter-center line (except

for in a very small area), Channel 2 could be

responsible for collecting Frontalis activity and it

could be used to detect frowning, a major task of

Frontalis (Fig. 2). On the other hand, it could captureFig. 1 General block diagram of the proposed approach

Australas Phys Eng Sci Med

123

Author's personal copy

Page 4: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

horizontal movement (EOG) due to the differences in

proportional displacement and orientation of Corneor-

etinal dipoles with respect to the electrode configura-

tion in Channel 2, which can then record this

bioelectric potential difference [5].

– It should be noted that the magnitude of the recorded

signal for vertical eye movement in Channel 2 is low

because the proportional displacement and orientation

of the Corneoretinal dipoles are approximately similar

with respect to the electrodes configuration in Channel

2. In addition, Channel 2 can be considered to be a

good logical channel to obtain fEEG.

– Contraction of the Temporalis muscle elevates the

mandible. Channels 1 and 3 can be used for detecting

Temporalis activities and its related fEMG (such as

smiling, pulling lip corners upward, teeth clenching,

eye winking and eye blinking). These two pairs of

electrodes can also detect vertical eye movement

(EOG) because the proportional displacement and

orientation of the Corneoretinal dipoles are different

with respect to the electrodes configuration in Channels

1 and 3, which can record this bioelectric potential

difference. They can also be used to capture temporal

EEG. It should be noted that recording Temporalis

muscle activities was preferred over Masseter muscles,

which also contract during smiling and pulling up right/

left lips corner. This is because the Masseter muscles

are heavily involved during the act of speaking, which

may cause false command generation. In addition,

because we have proposed to use this HMI for disabled

people, the electrode placement should be set up as

easily as possible. When considering our electrodes

configuration and placement, the electrodes can be

mounted on a headband or a sport cap, but this goal

could not be achieved by placing electrodes on the

Masseter. In addition to this, we aim to reduce the

number of electrodes as much as possible to reduce the

amount of raw data and processing time. Temporalis

can also capture fEMG, fEEG and fEOG but Masseter

cannot (Table 1) [5].

– In our study, experimental trails before the main

experiment showed the capability of Channels 1–3 for

capturing EMG, EOG and EEG signals.

In our proposed bipolar electrodes configuration, the

conductive volume effect for each pair of electrodes is

more localized and specified to the muscle fibers that are

Fig. 2 Illustration of the electrodes configuration over Frontalis and Temporalis facial muscles [3–6] the configuration of the Channel 3 is the

same as Channel 1, but on the opposite side of the face

Table 1 The relationship between the recording physical channels and their assignments

Channel

name/bioelectric

signals

EMG EOG EEG

Facial movement Most significant muscle

contracting

Eye movement direction Mental states

Channel 1 Smiling

Pulling up right lip corner

Right Temporalis Moving right eye toward up & down All frequency sub bands

Channel 2 Frowning Frontalis Moving eyes toward left & right All frequency sub bands

Channel 3 Smiling

Pulling up left lip corner

Left Temporalis Moving left eye toward up & down All frequency sub bands

Australas Phys Eng Sci Med

123

Author's personal copy

Page 5: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

located underneath each electrode pair. Thus, the levels of

bioelectric signals (cross-talk) to be sensed by the electrode

pairs other than the one assigned to capture specified ges-

tures were reduced [4]. In our previous studies [4–6], we

used facial bioelectric signals of a user as an interface to

control an assistive robot or device. The physical states (i.e.,

gestures) of the assistive machine could be controlled by

fEMG signals, while fEEG and fEOG signals could be used

simultaneously for an adaptive interface based on the user’s

affective state. Thus, according to the electrode placement,

we are able to gather the proper bioelectric signals that

mirror both physical and affective states of the user and

achieve a robust and adaptable interface. The theoretical

studies and simulations that emphasize and confirm that the

proposed electrode placement is a proper choice to capture

the mentioned signals can be found in [5, 6].

It should be noted that our proposed electrode placement

is different from configurations recommended by bio-

sensing technology manufacturers. In traditional methods,

the electrodes are placed over Masseter muscles, 4 corners

of the eyes and in a 10–20 configuration in order to record

facial muscle activities, EOG and EEG, respectively.

However, using the proposed method, one can capture

facial muscle gestures, eye movement in 4 directions and

mental states (which can be referred to the forehead EEG)

using only three pairs of electrodes. Also, the proposed

configuration can provide the capability to distinguish

more physical and mental states with respect to traditional

methods (see ‘‘Experimental results and analysis’’ section).

Data acquisition setup

In this project, the Biopac system (MP100 model and

ack100w software version) [19] was used to acquire bio-

electric-signals. The system can collect bioelectric-signals

accurately and store them in its internal or PC memory

(1.73 GHz, 2 G RAM). However, MP100 could not be

used for online applications inherently. An in-house

MATLAB code has been written to read acquired signals in

a short time from the Biopac internal memory. Thus, it

provides online capability for our applications. The sam-

pling frequency and amplifier gain were selected to be

1,000 Hz and 5,000, respectively. The low cutoff fre-

quency of the filter was chosen to be 0.1 Hz to avoid

motion artifacts and a narrow band-stop filter (48–52 Hz)

was also used to eliminate line noise.

Participants

Ten volunteers participated in the study, and were grouped

into three different physically healthy (with no history of

disorders) and non-athletic sub-groups depending on their

ages:

– Sub-group #1 (C1–C2): Two children 10 and 13 years

– Sub-group #2 (A1–A7): Seven adults aged between 19

and 29 years (the mean age was 23.7 years). All

volunteers were from school of biomedical engineer-

ing, Azad University in Tehran, Iran.

– Sub-group #3 (E1): One elderly adult aged 42.

These sub-groups were selected and studied for the

following major purposes:

– To validate the experimental procedure

– To use outcomes as a proxy for the potential applica-

tions for users with disabilities.

– To validate the robustness of the method:

• C1 was a 10 year old child who had a lack of

attention due to unfamiliarity with the test. Thus, he

could be a good case to study the effect of age and

also the level of required attention to perform the

experiment and achieve the desired outcomes.

• A1 was a 29 year old adult and had some difficul-

ties in pulling up his left lip corner. He closed his

left eyes when he was asked to pull his left lip

corner up. In his case, a left eye wink or left eye

closing could be misclassified as pulling up the left

lip corner. Thus winking could be considered to be

interference in our interface.

• A2 was a 22 years old adult and she had some

difficulties in frowning and generating strong signals

because of a BotoxTM (Botulinum Toxin) injection

on her forehead muscles for cosmetic purposes.

• All other subjects had no significant difficulties in

performing the experiment.

• Sub-group #1 and #3 are the boundaries of sub-

group #2 (considered as the main sub-group) with

respect to their age ranges. So, assessing the effect

of age to determine whether age is a dependent

variable in the study could also be evaluated.

Data recording protocol

In each recording session, the volunteer was asked to

perform the following gesture classes according to the

protocol below:

– Class #1—facial muscle gesture class—includes: Smil-

ing, frowning, pulling up right and left lip corners (all

moderately)

– Class #2—eye movement class—includes: Moving eye

from the center to up, down, left and right (not

periodically)

– Class #3—wink class—includes: Closing left eye, right

eye, both eyes and frowning powerfully and

moderately.

Australas Phys Eng Sci Med

123

Author's personal copy

Page 6: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

Based on the experiment requirements, the recording

session took about 30 min to be performed completely for

each volunteer. Before each recording session, the volun-

teer was trained to perform the desired gestures using his or

her facial muscles. Then, he or she was asked to take a rest

and try to relax for a period of 5 min After this period, the

Quiescent bioelectric signals from all physical data chan-

nels were recorded simultaneously for 1 min, while he or

she was still resting. These signals were used to determine

the on-set threshold to distinguish between off-set and

active states of the classifier.

In each trial, the volunteer was asked to perform one of

the gestures for the recording period and the recording was

started 1 s prior to the gesture performance and ended right

after 5 s from the beginning of the gesture. After 10 s

resting interval, he or she would repeat the gesture again;

the above gesture-rest task was cycled 10 times. The rest

period was chosen empirically to eliminate the fatigue

effect during training (Fig. 3).

Data pre-processing

The ‘‘Data pre-processing’’ section describes the specifi-

cations of filter banks, choosing appropriate data segment

length, feature selection and onset detection.

Filter banks

The acquired data was passed through parallel Butterworth

digital filter banks with predefined frequency characteris-

tics to obtain desired frequency sub-bands:

– 0.2–3 Hz for assigning to fEOG to detect subject’s

vertical and horizontal eye movements and enhance the

cognitive level of interactions.

– 7–12 Hz and 13–22 Hz for assigning to fEEG to

determine the subject’s affective states and enhance the

cognitive interaction for future applications.

– 30–450 Hz for assigning to fEMG to determine the

subject’s gestures and enhance the physical level of

interaction.

It should be noted that the desired sub-bands are far

from each other and thus no frequency contamination or

aliasing has been seen in each of the sub-bands from the

others. In addition, in our proposed bipolar electrodes

configuration, the conductive volume effect for each pair of

electrodes is more localized and specified to the muscle

fibers that are located underneath each electrode pair. Thus,

the levels of bioelectric signals (cross-talk) to be sensed by

the electrode pairs other than the one assigned to capture

specified gestures were reduced [4–6]. Thus, no significant

cross-talk has been observed in our study. It should be

noted that, other biopotential activities (ECG or chest

movement, for example) were not observed in the facial

gestures signals because of long distance between the

electrodes’ field of view and other biopotential dipoles.

Data segmentation

A segment is a time slot for acquiring bioelectric-signals

data considered for feature extraction. It should be noted

that EMG is comprised of two states: transient state and

steady state. Englehart et al. [20] have done an extensive

Fig. 3 The block diagram for recording session protocol

Australas Phys Eng Sci Med

123

Author's personal copy

Page 7: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

study on hand gesture classification and shown that steady-

state data is classified more accurately than transient data,

and classification suffers less degradation with shorter

segment lengths. The rate of classification degrades more

quickly as the segment length of transient data is

decreased, than with steady-state data. Therefore, steady-

state data with a shorter segment length such as 128 ms is

more reliable if a faster system response is required. After

determining segment length and the state of data, a third

important point in data segmentation is the data windowing

technique. There are two major techniques in data win-

dowing: adjacent windowing and overlapped windowing.

Farina and Merletti showed that overlapped segments

increase processing time without providing a significant

improvement in the accuracy of spectral features, such as

autoregressive coefficients. They also showed that a seg-

ment length \125 ms leads to high variance and bias in

frequency domain features [20]. Due to our future real-time

approach, an adjacent segment length plus the processing

time of generating classified control commands should be

equal to or\300 ms. Furthermore, a segment length should

be adequately large, since the bias and variance of features

increases as segment length decreases, and consequently

degrades classification performance. Therefore, a trade-off

in response time and accuracy exists. However, Englehart

and Hudgins highlighted that by adopting continuous seg-

mentation on a steady state signal, the segment length can

be reduced to 128 ms, or even 32 ms, without a consid-

erable decrease in accuracy. Because of real-time com-

puting and high-speed microprocessors, processing time is

often\50 ms; the segment length can vary between 32 and

250 ms [20].

The interest of using facial bioelectric signals as inter-

face channels has been increasing, but according to our

research, there has been no comprehensive study about

choosing the optimal segment length when using them as

interface channels. We have performed an extensive study

on facial bioelectric signals and evaluated the effect of

segment length on classifier performance based on our

approaches. According to our study, the 256 ms non-

overlapped segment length was chosen for our experiments

(see ‘‘Experimental results and analysis’’ section).

It should be noted that in general, facial muscle bio-

electric signals are non-stationary. However, in reference

to Oskoei and Hu [20], low-level (20–30% MVC) and

short-time contractions (20–40 s) can be assumed to be

wide-sense stationary. Moreover, at higher levels (50–80%

of MVC) they can only be assumed to be locally stationary

for a period of 500–1,500 ms. Therefore, the time-slot of

the mentioned signals can be assumed to be stationary in

real-time applications, even if they have variant spectral

characteristics [20]. Also, it was stated that a segment

length of 250–500 is more suitable to achieve less variance

and bias in the estimation, compared to other segment

lengths [20].

Feature selection and onset detection

Feeding a bioelectric signal presented as a time sequence

directly to a classifier is impractical, due to the large

number of inputs and randomness of the signal. Therefore,

the sequence must be mapped into a smaller dimension

vector, which is called a feature vector. Features represent

raw bioelectric signals for classification, so the success of

any pattern recognition problem depends almost entirely on

the selection and extraction of features. A wide spectrum of

features has been introduced in the literature for bioelectric

classification, which fall into one of three categories: time

domain, frequency (spectral) domain, and time-scale

(time–frequency) domain [20].

Mean absolute value (MAV) and RMS are two well-

known time domain features. Theoretically, when a signal

is modeled as a Gaussian random process, RMS provides

the maximum likelihood estimation of amplitude. In this

model SNR &

ffiffiffiffiffiffi

2Np

, where N is the number of statistical

degrees of freedom. MAV provides a maximum likelihood

estimate of the amplitude, when a signal is modeled as a

Laplacian random process. In this case SNR =ffiffiffiffi

Np

, which

is 32% lower than a Gaussian-based model [20]. Thus, with

this general prescription in our mind, the RMS features of

each bioelectric-signal channel were calculated within a

non-overlapped window of 256 ms:

Ri ¼ RMS EXGið Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

R T

0EXG2

i dt

T

s

ð1Þ

where i = 1,…,K; K: channel number; T = 256; EXG: X

can be replaced with M, O, or E.

Onset for each action was automatically determined as

the point when the bioelectric-signal RMS feature was

greater than its mean of RMS of Quiescent signal plus three

standard deviations [21].

Ti ¼�

Ri � 3Mean RMS EXGQuiescentð Þð Þþ 3 std RMS EXGQuiescentð Þð Þ

ð2Þ

Then, the above threshold RMS features were

normalized to make the sum of RMS for K channels

equal to 1 [22]:

Si ¼Ti ÿMean RMS EXGQuiescentð Þð ÞPK

i¼1 Ti ÿ RMS EXGQuiescentð Þð Þð3Þ

Furthermore, to have a more separable feature space, all

extracted features can be transformed to a non-linear

Australas Phys Eng Sci Med

123

Author's personal copy

Page 8: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

simple feature space using a log transform to spread the

concentrated data points while condensing the highly

scattered points [23], as shown in Fig. 4a and b.

Fi ¼ log Sið Þ ð4Þ

Classification

Extracted features need to be classified into distinctive

classes for recognition of the desired gesture. In addition to

inherent variation of bioelectric-signals over time, there are

external factors, such as changes in electrodes position,

fatigue, and sweat, which may cause changes in a signal

pattern over time. A classifier should be able to cope with

such varying patterns optimally, as well as prevent over-

fitting. Classification should be adequately fast to meet

real-time processing constraints. A suitable classifier has to

be efficient in classifying novel patterns; online training

can maintain the stably of classification performance over a

long-term operation [20].

In our study, the input–output matrix should be created

where inputs indicate the corresponding RMS features for

specified logical data channels and the output indicates a

proper label for the specified gesture. The features’ fusion

in our approach means concatenation of RMS features of

logical data channels to each other. Suppose that the

inputs-output matrix consisting of fEXG feature data from

the ith logical channel—X can be replaced with M (Myo),

O (Occulo) or E (Encephalo)—and its corresponding out-

put label as follows:

InputsÿOutput ¼ fEXG dataChiOutput Label½ � ð5Þ

Now, if we want to fuse (concatenate) fEYG features

data from jth logical channel—Y can be also M, O or E—

Fig. 4 a Raw data and

frequency spectrum of three

channels for pulling up left lip

corners. b Feature space for 8

different facial gestures and the

relaxation state (eye-opened)

Australas Phys Eng Sci Med

123

Author's personal copy

Page 9: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

for the same output label gesture to the input–output

matrix, then we have:

InputÿOutput

¼ fEXG dataChi fEYG dataChj Output label½ � ð6Þ

where X and Y could not be alike for the same logical

channel to eliminate the redundancy effect among input–

output matrix columns.

We have performed a comparative study on choosing

appropriate classifiers for our experimental data that is

based on the achieved results (see ‘‘Experimental results

and analysis’’ section). The power of the input–output

subtractive fuzzy clustering method has been chosen and

utilized to obtain a set of rules and to adjust its parameters

using an ANFIS.

Subtractive clustering

The idea of fuzzy clustering is to divide the data space into

fuzzy clusters, each representing one specific part of the

system behavior. Fuzzy c-means is one the fuzzy clustering

methods that is a supervised algorithm, used because it is

necessary to tell it how many clusters, c, to look for. If the

number of centers is not known beforehand, it is necessary

to apply an unsupervised algorithm. SFCM is based on a

measure of the density of data points in the feature space.

The idea is to find regions in the feature space with a high

density of data points. The point with the highest number

of neighbors is selected as the center for a cluster. The data

points within a pre-specified, fuzzy radius are then

removed (subtracted), and the algorithm looks for a new

point with the highest number of neighbors. Subtractive

clustering uses data points as the candidates for cluster

centers, instead of grid points, as in mountain clustering.

This means that the computation is now proportional to the

problem size instead of the problem dimension. However,

the actual cluster centers are not necessarily located at one

of the data points, but in most cases it is a good approxi-

mation. After projecting the clusters onto the inputs-output

space, the antecedent parts of the fuzzy rules can be found.

The consequent parts of the rules can be represented by

simple mathematical functions. Using this method, one

cluster corresponds to one rule of the TS model [24, 25].

After the recording session, the extracted features from

trials with the odd index numbers and even index numbers

were added to the training set and the testing set, respec-

tively. Then the input–output matrix was made as described

above. In order to form a cross validation, the Leave-One–

Training Trial-Out, k 9 k-fold algorithm was applied to

the training set; where k is equal to 10. The k-1 fold from

the training set were used to train the classifier and were

applied to SFCM to derive the initial fuzzy inference

system, and the rest onefold (from different trial) was used

to validate it. To have a robust classifier, one should look

for the most likely SFCM radius during the training pro-

cess. Thus, to validate our classifier, the above instruction

was applied 10 times and the characteristics of the SFCM

with the higher classification ratio and the most likely

radius were chosen as parameters for designing the initial

fuzzy inference system. The above description has been

shown in Fig. 5. The achieved initial system was passed

through the ANFIS to tune up its parameters. It should be

noted that the testing set is completely separate from the

training set and all the achieved results were obtained by

employing our method over the testing set.

Adaptive neuro-fuzzy inference system

The ANFIS is a fuzzy TS model put in the framework of

adaptive systems to facilitate learning and adaptation

(Fig. 6). A hybrid algorithm combining the least squares

and the gradient descent methods is applied to tune the

premise and consequence parameters of the TS model. The

Fig. 5 SFCM classifier training and validating protocols

Australas Phys Eng Sci Med

123

Author's personal copy

Page 10: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

least squares method (forward pass) is used to optimize the

consequent parameters with the premise parameters fixed.

Once the optimal consequent parameters are found, the

backward pass starts immediately. The gradient descent

method (backward pass) is used to optimally adjust the

premise parameters corresponding to the fuzzy sets in the

inputs’ domain. The output of the ANFIS is calculated by

employing the consequent parameters found in the forward

pass. It has been proven that this hybrid algorithm is highly

efficient in training ANFIS. In addition, ANFIS has the

advantage of being significantly faster and more accurate

than many pure neural network-based methods and can

avoid the pitfall of over-fitting the training data, thereby

achieving excellent generalization ability [26]. ANFIS has

shown its capabilities as a powerful classifier in the gesture

classification area. For example, Khezri and Jahed [27]

have achieved 96% accuracy in discriminating between six

different hand gestures using surface EMG, by employing

an ANFIS classifier over time domain and time–frequency

domain features.

In our study, after deriving an initial fuzzy inference

system based on subtractive fuzzy clustering, the premise

and consequence parameters were optimized as described

above.

Majority voting

We applied majority voting (MV) as a post-processing

method to manage excessive classified output regarding

continuous segmentation. This can improve system per-

formance and make a smooth and reliable decision from a

dense stream of class decisions. MV includes the last and

next m-decisions for a given point, to generate a new

decision. The final decision of each point is merged based

on the greatest number of occurrences in 2m ? 1 decision

points. The number of decisions used in MV is determined

by processing time and acceptable delay. As mentioned,

the accuracy of bioelectric control degrades rapidly with

decreasing segment length. Englehart and Hudgins point

out that this degradation would be prevented if majority

voting were used for post-processing after classification

[20]. In our method, after deriving the AFNIS classifier, the

MV algorithm was applied and m was varied from 1 to 3 to

achieve the best results. It should be noted that MV works

like a hard averaging algorithm and achieves smoothness

of classifier decisions if m is increased. In addition,

m cannot increase too much due to limitation of delay time

and because of the real-time approach of our proposed

method (see Fig. 7a, b).

Experimental results and analysis

We performed 4 experiments to evaluate the proposed

method. In each experiment, the participants’ bioelectric

signals were captured according to the recording protocol.

The accuracy Ai for the ith gesture was given by:

Ai ¼Gi

Ni

� 100%; i ¼ 1; 2; 3; . . .;C ð7Þ

where Gi is the number of times that gesture i was correctly

recognized by the classifier, C is the total number of dif-

ferent gestures and Ni is the number of times that the ith

gesture was requested from the classifier.

Experiment #1: optimal segment length

All gestures within Class #1 and #2 were requested to be

performed from all the sub-groups of participants (C2, A1

and A2 were excluded because of their problems in per-

forming the experimental protocol correctly).

As discussed in the ‘‘Data segmentation’’ section,

choosing the optimal segment length is an important factor

that influences the performance of the classifier. Thus, to

study this effect, the segment length was varied from 64 to

1,024 ms and for each segment, the RMS value was cal-

culated. Then the RMS features were fused together and

used to format the input–output matrix.

InputÿOutput

¼ fEMGCh1 fEMGCh2 fEMGCh3 Output label½ � ð8Þ

Then, our classification method applied to the above

input–output matrix to classify 8 different facial gestures.

The achieved results support Englehart’s results [20] that

involve selecting the optimal segment length for upper

limb EMG. Thus, based on Fig. 8 and the above discussion,

a segment length equal to 256 ms has been chosen. In

addition, one can conclude that facial and upper limb

muscles follow the same fundamental feature selection rule

despite some differences in their structures, such as

carrying a few muscle spindles in facial muscles

compared to upper limb muscles. It is interesting to note

that the number of spindles in involved muscles for most ofFig. 6 Simplified ANFIS structure for two inputs and two rules

Australas Phys Eng Sci Med

123

Author's personal copy

Page 11: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

the performed gestures in our study is higher compared to

other possible facial gestures [28].

Experiment #2: classification of gesture Class #1

All gestures within Class #1 were requested to be per-

formed by all the sub-groups of participants (C2, A1 and

A2 were excluded because of their problems performing

the experimental protocol correctly), and the fEMG fea-

tures were fed to the classifier as training and test sets. As

Table 2 shows, we obtained an accuracy of 42.54–100%

for Class #1 using different fusions of the forehead bio-

electric signal features. Clearly, the average discrimination

ratio of ANFIS is better than SFCM. ANFIS provides more

robust results and fewer output fluctuations. Furthermore,

its outputs are mostly in the defined domain (Fig. 7a, b).

Training SFCM classifier needs less time than ANFIS, so

there could be considered to be a tradeoff between higher

accuracy and robustness and training time. However, as

training happened only once, it is worthy to spend more

training time to implement the ANFIS classifier and

achieve better results. Also, it is clear that, to classify Class

Fig. 7 a SFCM (above) classifies performance and robustness for

gesture #4 (pulling up left lip corner). The effect of the MV algorithm

is also shown (below). b ANFIS (above) classifies performance and

robustness for gesture #4 (pulling up left lip corner). The effect of the

MV algorithm is also shown (below)

Australas Phys Eng Sci Med

123

Author's personal copy

Page 12: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

#1, separating fEEG and fEOG from fEMG generates more

accurate results and the classifier needs less training time

than other feature combinations. The number of rules

column in Table 2 underscores the above results. As

shown, the number of rules in the classifier that was fed

only with fEMG features is less than other classifier rules;

this is due to simplicity and separability of the feature

space when using fEMG features. Also, this is because

fEEG and fEOG features are not separable in the Class #1

gestures and they generate extra complexity, which is not

helpful for classifier separability.

Experiment #3: using fEOG features to enhance

the classifier

All gestures within Class #1 and #2 were requested to be

performed from all sub-groups of participants (C2, A1 and

A2 were excluded; the results were given before). The

fEOG can represent user attention to the performed task

and can also be used to detect eye movement gestures by

enhancing the features’ space, as described in the previous

sections. With this three-channel configuration, one can

discriminate between different eye movement directions by

fEOG feature fusion to fEMG, due to the fact that the

fEOG features are the main features in the eye movement

gesture class. Here, the fused features of fEMG and fEOG

were fed to the classifier as training and test sets. As shown

in Table 3, the average discrimination ratio for Class #1

and Class #2 found using feature fusion of fEMG and

fEOG is 96.99%, compared to 93.04%, which is the dis-

crimination ratio when only using fEMG features. Clearly,

the trained classifier achieved by using an fEOG fusion to

fEMG features has more discrimination capabilities in the

eye movement class; however there is a decrease of the

smile class discrimination ratio from 98.2 to 97.94%, due

to the complexity of features’ space. Also, compared to a

previous Ref. [14], our results show that our electrode

placement and classification technique provides a better

discrimination ratio for similar gestures and our proposed

method can distinguish 4 more gestures—the gestures in

Class #1. It should be stated again that the numbers of

electrodes and their configurations are important factors in

classification results. Chin et al. [17] uses NeuroScan

NuAmps 40-channel Quik-Cap to discriminate 6 facial

gestures: smile, straight, wince, agape, stern and frown.

They have concluded that use of a 34 electrode configu-

ration instead of a 6 electrode configuration could increase

the average classification accuracy between 78.5 and 86%.

However, their 6 electrode configuration was placed

according to the 10–20 frontal electrodes standard. Also,

the results of our proposed method show higher classifi-

cation accuracy for the same gestures and also detect 4

more gestures (eye movements) compared to the work of

Chin et al. [17], despite use of less or the same number of

electrodes. In addition, evaluation of our method’s per-

formance is provided by the confusion matrix in Table 4.

Clearly, there is no misclassification between Class #1 and

Class #2 gestures. Thus, one can conclude that the gesture

interactions only occurred within the same class of ges-

tures, and also, Class #1 and Class #2 are somehow inde-

pendent. In addition, the most significant interactions

occurred between pulling up right/left lip corners and

smiling, as their signaling source are the same.

Experiment #4: investigating logical channels

capabilities

Table 5 indicates the discriminating capability of logical

channels using different combinations of them. Here, for

each column, the classifier was trained using a fEMG and

fEOG features’ fusion of the logical channel related to that

column, and was then tested. It is shown that Channel 1 can

be considered as a good data source for smiling, frowning,

and pulling up when it was employed as only data channel.

In this case, the accuracies of the mentioned gestures are

higher comparing to the other gestures which means

Channel 1 cannot be used as a good data source to obtain

rich information content for discriminating eye movements

and pulling up left lip corner gestures. The similar results

can be obtained when only Channel 2 or Channel 3 were

employed as data source. However, according to Table 5, it

can be concluded that the most accurate results could be

achieved using all explored logical channels at once by

fusing data from all logical channels together. In this case

the information content for training the classifier increased

Fig. 8 Experiment #1 result: the effect of segment length on

classifier performance over gesture classes #1 and #2 (error % refers

to classifier discrimination ratio error) using fEMG features from all

three channels

Australas Phys Eng Sci Med

123

Author's personal copy

Page 13: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

and it led to higher discrimination ratios. These results

support our hypothesis for electrode placement because the

highest accuracy from each channel was achieved when the

specified gestures were captured by that individual channel.

In addition, in the proposed bipolar electrode configuration,

the conductive volume effect for each pair of electrodes is

more localized and specified to the muscle fibers, which are

located underneath the electrode pair. This combination

could also prevent overloads on other individual channels.

Experiment #5: wink discrimination

Wink class gestures can be considered to be gestures that

can conflict with the smile class; for example, in Case A1,

the subject had some difficulties in pulling up his left lip

corner, as he closed his left eyes when he was asked to pull

his left lip corner up. In this case, left eye winking or left

eye closing could be misclassified as the pulling up left lip

corner gesture. Thus, the intended gestures should be

Table 2 Discrimination ratios (sub-group #1, #2 and #3; cases C2, A1 and A2 were excluded) for gesture Class #1 with different logical

bioelectric-signal configurations using an fEMG (all three channels), fEOG (all three channels) and fEEG (all three channels)

Test

index

Considered logical

channel(s) (fused data)

Number of

classifier’s

rules

Best

SFCM

radius

average

Subtractive FCM

(SFCM) discrimination

ratio (average)

ANFIS ? SFCM

discrimination

ratio

Mean of

training time

for SFCM (s)

Mean of

training time

for ANFIS (s)

1 fEMG 4 0.475 100 ± 0.0 100 ± 0.0 0.0187 0.1543

2 fEOG 232 0.02 39.05 ± 3.4 42.54 ± 1.7 0.1309 32.79

3 fEEG 13 .2575 57.2 ± 4.11 56.61 ± 3.2 0.0316 1.1821

4 fEEG ? fEOG 144 .0525 55.40 ± 3.27 60.15 ± 2.1 1.0740 111.41

5 fEMG ? fEOG 8 0.5025 99.11 ± 0.18 99.47 ± 0.08 0.6513 0.0281

6 fEMG ? fEEG 5 0.72 100 ± 0.0 100 ± 0.0 0.04181 0.0264

7 fEMG ? fEEG ? fEOG 4 0.8925 99.09 ± 0.31 98.91 ± 0.12 0.0470 0.9908

Average 59 41.64 78.55 ± 1.61 79.66 ± 1.02 0.2850 20.9402

‘‘?’’ in this table means fusion

Table 3 Discrimination ratios over sub-group #1, #2 and #3 (cases C2, A1 and A2 were excluded), for Class #1 and Class #2 using fEMG (all

three channels) and fEOG (all three channels)

Gesture No. Considered logical

channel

Gesture ANFIS ? SFCM

discrimination ratio

Mean of discrimination

ratio

1 fEMG Smile 98.30 ± 0.02 98.2 ± 0.04

2 Frown 100 ± 0.0

3 Pulling up right lip 97.40 ± 0.03

4 Pulling up left lip 97.10 ± 0.11

5 Moving eye up 92.73 ± 2.16 87.88 ± 2.2

6 Moving eye down 94.66 ±1.07

7 Moving eye right 82.24 ± 3.26

8 Moving eye left 81.91 ± 2.31

Average 93.04 ± 1.12

1 fEMG ? fEOG Smile 96.72 ± 0.06 97.94 ± 0.10

2 Frown 100 ± 0.0

3 Pulling up right lip 97.44 ± 0.14

4 Pulling up left lip 97.61 ± 0.21

5 Moving eye up 95.23 ± 0.53 96.04 ± 1.01

6 Moving eye down 94.48 ± 0.42

7 Moving eye right 96.97 ± 1.10

8 Moving eye left 97.50 ± 2.02

Average 96.99 ± 0.55

‘‘?’’ in this table means fusion

Australas Phys Eng Sci Med

123

personal copy

Page 14: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

extracted and discriminated from the interacted gestures.

All gestures within Class #1, #2 and #3 were requested to

be performed by A1 and the features were fused as indi-

cated in Table 6. As Table 6 shows, the worse results were

achieved when the classifier was asked to discriminate

between the smile and wink classes. This is due to high

similarities among features of these two gesture classes

(e.g. pulling up the left lip corner and left eye winking)

because the dominant sources of these gestures are the

same facial muscles. It should be noted that in the similar

situations, the proposed method is not very helpful if we

want to discriminate more gestures (such as eye closing) as

commands in the HMI from the same sources. Hence, our

predefined gestures can be considered as fundamental

gestures for HMI commands when facial muscles are

employed as interface.

Experiment #6: studies on the effects of difficulties

that occurred while performing the tasks

The capabilities of our proposed method have been tested

over cases A1, A2 and C1, as they had some difficulties in

performing the gestures correctly. Here, all gestures within

Class #1 were requested to be performed by them, and their

fEMG features were used to train and test the classifier.

Table 4 Confusion matrix averaged over sub-group #1, #2 and #3 (cases C2, A1 and A2 were excluded), for Class #1 and Class #2 using fEMG

(all three channels) and fEOG (all three channels)

True class

Smile Frown Pulling up right

lip

Pulling up left

lip

Moving eye

up

Moving eye

down

Moving eye

right

Moving eye

left

Predicted class

Smile 96.72 0 2.56 2.39 0 0 0 0

Frown 0 100 0 0 0 0 0 0

Pulling up right

lip

2.12 0 97.44 0 0 0 0 0

Pulling up left

lip

1.16 0 0 97.61 0 0 0 0

Moving eye up 0 0 0 0 95.23 3.66 0.38 1.06

Moving eye

down

0 0 0 0 3.11 94.48 0.33 0.31

Moving eye

right

0 0 0 0 1.12 1.59 96.97 1.13

Moving eye left 0 0 0 0 0.54 0.27 2.32 97.5

Table 5 Capability of logical channel combinations (sub-group #1, #2 and #3; cases C2, A1 and A2 were excluded) by fusing fEMG and fEOG.

(Ch channel)

Gesture no. and name Channels’ discrimination ratio

Ch1 Ch2 Ch3 Ch1 ? Ch2 Ch1 ? Ch3 Ch2 ? Ch3 Ch1 ? Ch2 ? Ch3

#1. Smile 80.00 ± 0.7 0 ± 0.0 40.00 ± 3.6 78.00 ± 1.1 96.00 ± .09 92.00 ± 0.07 96.72 ± 0.06

#2. Frown 72.00 ± 0.8 100 ± 0.0 0 ± 0.0 100 ± 0.0 70.00 ± 1.56 100 ± 0.0 100 ± 0.0

#3. Pulling up right lip

corner

80.00 ± 1.1 0 ± 0.0 0 ± 0.0 92.00 ± 0.4 100 ± 0.0 18.00 ± 4.1 97.44 ± 0.14

#4. Pulling up left lip

corner

18.00 ± 4.8 28.00 ± 2.9 82.00 ± 0.8 24.00 ± 4.9 96.00 ± 0.84 88.00 ± 1.2 97.61 ± 0.21

#5. Eye moving up 60.00 ± 3.1 38.00 ± 3.3 32.00 ± 2.4 82.00 ± 1.05 52.00 ± 3.7 14.00 ± 3.1 95.23 ± 1.01

#6. Eye moving down 58.00 ± 4.5 6.00 ± 3.7 38.00 ± 2.7 98.00 ± 0.06 68.00 ± 3.2 50.00 ± 2.4 94.48 ± 0.42

#7. Eye moving right 48.78 ± 3.8 0 ± 0.0 78.05 ± 0.9 85.00 ± 0.74 90.24 ± 1.2 95.00 ± .65 96.97 ± 1.10

#8. Eye moving left 48.00 ± 0.78 28.00 ± 3.9 64.00 ± 3.6 64.00 ± 1.8 74.00 ± 2.1 48.00 ± 3.1 97.50 ± 2.02

Average 58.09 ± 2.44 25.00 ± 1.72 41.75 ± 1.75 77.87 ± 1.25 80.78 ± 1.58 63.12 ± 1.82 96.99 ± 0.55

Australas Phys Eng Sci Med

123

Author's personal copy

Page 15: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

Table 7 shows some interesting results. In Case A1, as

long as there is no need to add more commands to the HMI

(like eye closing gestures), the classifier can easily detect

smile class gestured despite the subject’s difficulties in

generating the correct patterns.

Case A2 has some difficulties with frowning because of

BotoxTM injection. As the Frontalis muscle was weak due

to the injection, it could hardly produce the moderate

frowning patterns, which is why the classifier has less

accuracy for recognizing frowning activity, but has 100%

accuracy for the rest of the activities. In Case C1, the

results show that the discrimination ratio is decreased

because of the subject’s lack of attention due to his age and

playful behavior during training and performing of the

requested tasks. Thus, maintaining moderate attention

towards the task is a prerequisite of using our HMI.

Experiment #7: comparing SFCM and ANFIS output

smoothness

Figure 7a and b shows the effects of the ANFIS classifier

and the Majority Voting (MV) algorithm on the method’s

performance. After training the classifier, the sequences of

gesture #3 (pulling up left lip corner) were fed to the

classifier. It is clear that using the ANFIS (after obtaining

SFCM) classifier makes the classifier more robust and

prevents out of range classifier outputs. In addition, by

using MV, most of the output fluctuations were restrained

as cleared in Fig. 7 (m set to 2).

Experiment #8: comparing the proposed classifier

accuracy with the other common classifiers

The 4 other common classifiers have been tested for par-

ticular volunteers (C1, A1 and A2) to see their perfor-

mances in the most complex cases in our experiments. In

Table 8, the average classifier accuracies were compared

for Class #1 gestures. It is obvious that SFCM ? ANFIS

has more accurate results and is a good choice in our

approach.

Experiment #9: comparing our proposed electrode site

selection and configuration with other related studies

To explore the novelties of our electrode placements and

configurations, the electrode placement suggested by Bar-

reto et al. [15] and their feature extraction method was

implemented to compare with our method. Their placement

was based on mono-polar configuration over the Corru-

gator Supercilii facial muscle and the Zygomaticus Major

facial muscle. They indicated that to cancel the effect of

the cross-talk between facial EMG due to different muscle

activity, one should use the spectral power in the

300–500 Hz range from the power spectrum density (PSD)

of the acquired signal as a feature. This is used because this

range could discriminate the signal between associated

voluntary facial movements under each electrode and the

cross-talk signals. For Case A1 and Case A2 (who had

some difficulties in performing the experiments), the

achieved results from implementation based on this method

Table 6 Discriminating wink class (Class #3) from other gesture

classes for Case A1

Considered

gesture classes

Feature

fusion

Best radius

average

ANFIS?SFCM

discrimination

ratio (average)

Smile ? wink fEMG 0.05 48.06

Eye movement ? wink fEMG ?

fEOG

0.0475 90.94

Wink fEMG 0.1425 99.68

Table 7 Average discrimination ratio in smile class for cases C1, A1

and A2, who have some difficulties in training and performing

gestures

Gesture Discrimination ratio (%)

Sub-group #1

(Case C1)

Sub-group #2

(Case A1)

Sub-group #2

(Case A2)

Smile 92.10 100 100

Pulling up right

lip corner

94.54 100 100

Pulling up left

lip corner

100 100 100

Frown 93.06 100 94.6

Average 94.42 100 98.65

Table 8 Comparison of different classifier discrimination ratios for cases A1, A2 and C1 (over smile class)

Method case SVM FCM SFCM SFCM ? ANFIS Fuzzy ARTMAP

A1 100 100 100 100 100

A2 95.82 94.49 95.1 98.65 96.11

C1 89.75 91.30 91.8 94.42 92.77

SVM support vector machine, FCM fuzzy c-means, SFCM subtractive fuzzy c-means clustering, ANFIS adaptive neuro-fuzzy inference system,

ARTMAP adaptive resonance theory MAP

Australas Phys Eng Sci Med

123

Author's personal copy

Page 16: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

for Class #1 lead to 97.04 and 91.0% discrimination ratios,

respectively. Comparing the results with Table 7, it is clear

that our electrode placement leads to more accurate results.

It is also important that in our proposed method the elec-

trodes can be mounted on a sport headband and can easily

be placed over the user’s face, and the conductive volume

underneath each pair of electrodes is more localized.

Our proposed three-channel configuration is a robust

interface and the experimental results are similar to

CyberLink Brainfinger’s results [16] and underscore its

applicability. In addition, with our electrode configuration,

8 main facial gestures could be recognized and discrimi-

nated compared to 3 recognizable facial gestures (frown-

ing, moving eye left and right) when using CyberLink

based on its electrode configuration and 2 different muscle

activation states in Ref. [15] or 6 gestures in Ref. [17].

Moreover, image/video based detection systems can

discern basic facial gestures with an accuracy of 64–98%

[17]. Also, several problems are observed during experi-

ments using video-based gesture recognition systems, such

as drifts, loss of communication, and slow communication

rates. For subjects with insufficient muscles control or for

the movements with small changes in facial gesture (such

as clenching molar teeth) the camera-based methods could

become quite ineffective because the sensitivity of them is

low. In addition, the video-based gesture recognition

requires a high-speed image processing hardware, so the

overall cost of system becomes higher than our proposed

method. In addition, it requires fixed or predefined back-

grounds and camera positions for calibration and is suitable

for a small set of gestures [29] compare with our wide

range of gestures which could be recognized using only

three pairs of electrodes So, our achieved results stand in

the high rank of accuracy and also the implementation cost

of the proposed method is lower than video/image based

systems.

Thus, in this study we have achieved more capabilities

and have enhanced our system’s performance compared to

our last works [4–6] and others studies (mentioned above)

to allow discrimination of 8 facial gestures (smiling,

frowning, pulling up left/right lips corner, eye movement to

left/right/up/down) using a SFCM ? ANFIS. It is clear that

in spite of having more computation costs, one can achieve

a better discrimination ratio using the proposed method

compared to other classifiers and different electrode con-

figurations. Also, by employing our proposed method over

Case A1, Case A2, Case C1 and Case E1 it is obvious that

the method can be used with a high degree of robustness

and accuracy over a wide range of users, even ones with

difficulties in gesture generation (except E1).

It should be noted that all involved volunteers felt

comfortable during the experiments since no sophisticated

or complex gestures were asked to be performed. This is an

important cognitive issue when designing a human–

machine interface. If requirement is not followed, the user

may be burdened with extra cognitive pressure and as a

result be exhausted and eager to stop the experiment or use

of the HMI.

Conclusion and future works

This paper presents a careful offline study on three gesture

classes (8 gestures) among 10 healthy subjects and con-

siders different combinations of logical channels and dif-

ferent fusions of their features by employing SFCM and

ANFIS classifiers in order to discriminate between them.

Using ANFIS makes the learning phase of the classifier

quick enough and provides an extremely high degree of

accuracy and robustness. The electrodes can be installed

easily on the volunteer’s face using a headband, which

leads to quick setup time and is suitable to be used as a

human–machine interface. In addition, it has been shown

that each of the logical channels has its own noticeable

information content and one can explore individual ges-

tures (or states) within them.

In this study, fEMG and fEOG signals have been used to

discriminate between mentioned gestures. The main use of

fEMG is for detecting facial movements (like smiling,

pulling up lip corners and frowning). fEOG can enhance

the features’ space and helps classifiers to detect eye

movements and the level of attention in future approaches.

The main use of fEEG frequency bands could be in future

studies, by considering them as affective measures that can

mirror user’s emotional cues as important factors, allowing

for design, updating and reorganization of the interface.

Therefore, it can be concluded that the proposed method

can be used as a useful interface in human–machine

interaction applications and it can cover both physical

and cognitive aspects of human–machine interface

requirements.

Considering the tradeoff between the numbers of the

electrodes, acquired information content, computation cost

and discriminating power, it is also important to design an

interface that is safe, ergonomic and comfortable and also

easy-to-setup and easy-to-use. The achieved results in this

work emphasize that our proposed method complies with

the above requirements and demonstrates its efficacy in

healthy individual groups and the potential use for disabled

people. By employing the multi-modal capabilities of the

presented interface, the presented interface has potential to

be used in rehabilitation phase for people who are suffering

from facial muscles impediment. These people could use

this system as a training environment for practicing to

generate facial gestures and it could help them out to

reactive their weak muscles.

Australas Phys Eng Sci Med

123

Author's personal copy

Page 17: SCIENTIFIC PAPER - dces.essex.ac.uk · clustering Adaptive neuro-fuzzy inference system (ANFIS) Introduction The face is a very important body part that provides an interface for

In our future work, we will extend this interface to

achieve an online adaptable and co-adaptive interface for

rehabilitation and assistive purposes especially for disabled

population who could be the main users of our interface.

Furthermore, by using EEG data, the interface could

increase the context’s awareness in working sites and

prevent possible dangers might come for the user and its

working environment. This can be done by detecting his/

her feeling and mental states and then performing proper

action (help, alarm, or stop for example) for pursuing and

accomplishing the task.

Acknowledgments We would like to thank Dr. Christian Jones

from USC-Australia for sharing his expertise in the area of affective

computing and the help from eager volunteer participants is also

appreciated.

References

1. Huang CN, Chen CH, Chung HY (2004) The review of appli-

cations and measurements in facial electromyography. J Med

Biol Eng 25(1):15–20

2. Ang L, Belen E, Bernardo R, Boongaling E, Briones G, Coronel J

(2004) Facial expression recognition through pattern analysis of

facial movements utilizing electromyogram sensors. In: IEEE

TENCON2004, Chiang Mai, Thailand, pp 600–603

3. Firoozabadi M, Oskoei MA, Hu H (2008) A human–computer

interface based on forehead multichannel biosignals to control a

virtual wheelchair. In: ICBME08, Tehran, Iran

4. Mohammad Rezazadeh I, Wang X, Wang R, Firoozabadi M

(2009) Toward affective handsfree human–machine interface

approach in virtual environments-based equipment operation. In:

Training 9th international conference on construction applica-

tions on virtual reality (CONVR2009), Sydney, Australia

5. Mohammad Rezazadeh I, Firoozabadi M, Hu H, Hashemi Gol-

payegani MR (2010) Determining the surface electrodes locations

to capture facial bioelectric signals. Iran J Med Phys 7:65–79

6. Mohammad Rezazadeh I, Wang X, Firoozabadi M, Hashemi

Golpayegani MR (2011) Using affective human–machine inter-

face to interface to increase the operation performance in virtual

construction crane training system: a novel approach. Autom

Construct J 20:289–298

7. Mahlke S, Minge M (2006) Emotions and EMG measures of

facial muscles in interactive contexts. In: Conference in human

factor in Eng., Montreal, Canada

8. Rosalind P (1997) Affective computing. MIT Press, Cambridge

9. Vanhala T, Surakka V (2007) facial activation control (FACE).

In: Proceedings of ACII2007. Lecture notes in computer science,

vol 4738, pp 278–289

10. Ferreira A, Silva RL, Celeste WC, Bastos Filho TF, Filho MS

(2007) Human–machine interface based on muscular and brain

signals applied to a robotic wheelchair. In: 16th Argentine Bio-

eng. Cong. J. Physics, Conference Series 90, Argentina

11. Niemenlehto PH, Juhola M, Surakka V (2006) Detection of

electromyographic signal from facial muscles with neural net-

works. ACM Trans Appl Percept 3(1):48–61

12. Surraka V (1998) Contagion and modulation of human emotions.

University of Tampere, Tampere

13. Tsui C, Jia P, Gan JQ, Hu H, Yuan K (2007) EMG-based hands-

free wheelchair control with EOG attention shift detection. In:

IEEE international conference on robotics and biomimetics RO-

BIO2007, Sanya, China, pp 1266–1271

14. Hori J, Sakano K, Saitoh Y (2004) Development of communi-

cation supporting device controlled by eye movements voluntary

eye blink. In: Proc. the 26th annual international conference of

IEEE EMBS, San Francisco, CA, USA

15. Barreto AB, Scargle SD, Adjouadi M (2000) A practical EMG-

based human–computer interface for users with motor disabili-

ties. J Rehabil Res Dev 37(1):53–64

16. Brainfinger. http://www.brainfinger.com. Accessed on 19 March

2011

17. Chin ZY, Ang KK, Guan C (2008) Multiclass voluntary facial

expression classification based on filter bank common spatial

pattern. In: 30th annual international conference of IEEE EMBS,

Vancouver, BC, Canada

18. Kim KH, Yoo JK, Kim HK, Son W, Lee SY (2006) A practical

biosignal-based human interface applicable to the assistive sys-

tems for people with motor impairment. IEICE Trans Inform Syst

E89-D(10):2644–2652

19. Biopac. http://www.biopac.com, Accessed 19 March 2011

20. Oskoei M, Hu H (2007) Myoelectric control systems—a survey.

J Biomed Signal Process Control 2(4):275–294

21. Ajiboye AB, Weir R (2005) A heuristic fuzzy logic approach to

EMG pattern recognition for multi-functional prosthesis control.

IEEE Trans Neural Syst Rehabil Eng 13(3):280–291

22. Fukuda O, Tsuji T, Kaneko M, Otsuka A (2003) A human-

assisting manipulator teleoperated by EMG signals and arm

motions. IEEE Trans Robotics Autom 19(2):210–222

23. Momen K, Krishnan S, Chau T (2007) Real-time classification of

forearm electromyographic signals corresponding to user-selec-

ted intentional movements for multifunction control. IEEE Trans

Neural Syst Rehabil Eng 15(4):535–542

24. Moertini V (2002) Introduction to five data clustering algorithm.

Integral 7(2):87–96

25. Priyona A, Ridwan M, Alias A, Atiq R, Rahmat OK, Hassan A,

Ali A (2003) Generation of fuzzy rules with subtractive cluster-

ing. Universiti Teknologi Malaysia. Jurnal Teknologi 43(D):

143–153

26. Palot K, Yosunkaya S, Gunes S (2008) Comparison of different

classifier algorithms on the automated detection of obstructive

sleep apnea syndrome. J Med Syst 32:243–250

27. Khezri M, Jahed M (2003) Real-time intelligent pattern recog-

nition algorithm for surface EMG signals. Biomed Eng Online

6:45

28. Paradiso G, Cunic D, Gunraj C, Chen R (2005) Representation of

facial muscles in human motor cortex. J Physiol 567:323–336

29. Ahsan MD, Ibrahimy M, Khalifa O (2009) EMG signal classifi-

cation for human computer interaction: a review. Eur J Sci Res

33(3):480–501

Australas Phys Eng Sci Med

123

Author's personal copy


Recommended