SCIENTIFIC PAPER
A novel human–machine interface based on recognition
of multi-channel facial bioelectric signals
Iman Mohammad Rezazadeh •
S. Mohammad Firoozabadi • Huosheng Hu •
S. Mohammad Reza Hashemi Golpayegani
Received: 11 September 2010 / Accepted: 6 November 2011
Ó Australasian College of Physical Scientists and Engineers in Medicine 2011
Abstract This paper presents a novel human–machine
interface for disabled people to interact with assistive
systems for a better quality of life. It is based on multi-
channel forehead bioelectric signals acquired by placing
three pairs of electrodes (physical channels) on the Fron-
talis and Temporalis facial muscles. The acquired signals
are passed through a parallel filter bank to explore three
different sub-bands related to facial electromyogram,
electrooculogram and electroencephalogram. The root
mean square features of the bioelectric signals analyzed
within non-overlapping 256 ms windows were extracted.
The subtractive fuzzy c-means clustering method (SFCM)
was applied to segment the feature space and generate
initial fuzzy based Takagi–Sugeno rules. Then, an adaptive
neuro-fuzzy inference system is exploited to tune up the
premises and consequence parameters of the extracted
SFCMs rules. The average classifier discriminating ratio
for eight different facial gestures (smiling, frowning,
pulling up left/right lips corner, eye movement to left/right/
up/down) is between 93.04% and 96.99% according to
different combinations and fusions of logical features.
Experimental results show that the proposed interface has a
high degree of accuracy and robustness for discrimination
of 8 fundamental facial gestures. Some potential and fur-
ther capabilities of our approach in human–machine
interfaces are also discussed.
Keywords Human–machine interfaces � Multi-channel
facial bioelectric signals � Subtractive fuzzy c-means
clustering � Adaptive neuro-fuzzy inference system
(ANFIS)
Introduction
The face is a very important body part that provides an
interface for the exchange of information in daily life. All
facial functions, such as speech, mastication and facial
expression, are accomplished by individual facial muscles.
From the point of view of facial muscle kinematics, it is
evident that a facial muscle is a small 3D combination of
muscular slips that carry out a variety of complex orofacial
functions. A specific neural feature of facial muscles is that
their contractions are not only under voluntary but also
emotional control [1]. Because of this, information
including emotion, fatigue, feeling and general affective
measures could be analyzed based on facial response [2–9].
Thus, the facial muscles could play a dominant role as a
communication medium to accomplish message transmis-
sion and information acquisition.
When acquiring facial bioelectric signals, facial muscle
activities (fEMG) as well as the other associated bioelectric
potentials such as facial EOG (fEOG) and facial EEG
(fEEG) that are generated simultaneously can be recorded
I. Mohammad Rezazadeh (&) � S. M. Firoozabadi �S. M. R. Hashemi Golpayegani
School of Biomedical Engineering, Science and Research
Branch, Islamic Azad University, Tehran, Iran
e-mail: [email protected]; [email protected]
S. M. Firoozabadi
School of Medical Sciences, Tarbiat Modares University,
Tehran, Iran
H. Hu
School of Computer Science & Electronic Engineering,
University of Essex, Colchester, UK
S. M. R. Hashemi Golpayegani
School of Biomedical Engineering, Amir Kabir University of
Technology, Tehran, Iran
123
Australas Phys Eng Sci Med
DOI 10.1007/s13246-011-0113-1
Author's personal copy
at the same time. In considering the potential capabilities of
these bioelectric signals, their extracted features can be
used individually or combined in many research areas,
especially in designing human–machine interfaces. It was
found that even the recorded signals from facial muscle
activities show noticeable pattern changes as subjects
began to become tired from repeated smiling, nose wrin-
kling, or frowning, causing the bioelectric signals to
gradually lose similarity among the patterns as time goes
[10].
Other bioelectric signals from the user’s face (e.g.,
fEOG and fEEG) could illustrate emotional states and
attention when physical gestures or mimicking is weak or
difficult to generate for the user [10]. To acquire valid data
and consequently extract rich information from facial
bioelectric signals, some important factors such as elec-
trode placement, recording protocol and signal condition-
ing should be considered. By employing the correct
processing methods, one can design an accurate and robust
interface for real-world applications. A good study
regarding the measurement considerations of fEMG and its
applications can be found in [1].
Related works
Ang et al. [2] stated that fEMG can be related to certain
facial expressions that are the most visual representation of
a person’s physical emotional states. Mahlke and Minge [7]
used emotional states extracted from fEMG to discriminate
between usable and unusable computerized contexts by
placing two pairs of electrodes on the Zygomaticus Major
and Corrugators’ Supercili muscles to detect positive and
negative emotional states, respectively. They concluded
that frowning activity is significantly higher in the unusable
system condition than in the usable one. Surakka studied
the effects of affective interventions using fEMG in a
human–computer interface (HCI) and concluded that the
frowning activity was attenuated significantly after positive
interventions compared to the no intervention conditions
[11, 12].
The fEOG signal can also be used for eye-gaze tracking
to enhance the interaction level [13]. Hori et al. [14] used
two electrodes mounted on the upper part and temple of the
dominant eye side to detect 4 directions of eye movement.
By using a thresholding technique, they achieved 94.16%
average accuracy. McFarland, Neat and Pfurtscheller
focused on the detection of the alpha rhythm sub-band of
fEEG, an 8–12 Hz brainwave of sinusoidal nature occur-
ring at the sensor–motor cortex, to detect event-related
desynchronization (ERD) [15]. Junker combined certain
EMG signals with EEG signals to develop the Cyber-
LinkTM, which is a small wearable device that acquires
signals using three sensors mounted on a headband [16].
This headband can amplify and decode the forehead signal
into three individual frequency channels and eleven sub-
bands. These channels belong to fEOG (eye motion and
lateral eye movements), fEEG (alpha and beta bands) and
fEMG.
Stroke patients suffer facial paralysis because of neu-
rological damage, so restoring the capability of facial
expressions (gestures) is important for these patients [17].
Work has been performed by researchers to develop as-
sistive/therapeutic HMIs for disabled people using facial
bioelectric signals as control commands. Ferreira and his
colleagues designed a HMI based on bioelectric-signals
from facial muscles and brain activity, which correspond to
eye blink and visual information, for control of a robotic
wheelchair [10].
Kim et al. [18] used linear prediction coefficients
(LPCs) of Temporalis muscle activities (clenching left,
right, both molar teeth and eye blink) to control an elec-
trically powered wheelchair (EPW). They used Hidden
Markov Model (HMM) as a classifier, and achieved 96.5%
and 97.1% discriminating ratios for handicapped and
healthy groups respectively. Tsui et al. [13] controlled an
EPW using forehead bioelectric-signals acquired by
CyberLinkTM. They used Frontalis electrical activity
amplitudes as a click or double-click to define five-com-
mand control states: ‘‘stop’’, ‘‘forward’’, ‘‘backward’’,
‘‘left’’ and ‘‘right’’. The fEOG signal was also used for
speed control during EPW movement. Firoozabadi et al.
[3] also used multi-channel fEMG as an interface to control
a virtual robotic wheelchair. They used facial gestures
(smile, frown, pulling up lip corners) and their corre-
sponding fEMG to command a virtual wheelchair by
extracting root mean square (RMS) features of captured
bioelectric signals and employing a support vector machine
(SVM) classifier. They reported 89.75–100% classification
accuracy. Mohammad Rezazadeh et al. [4] expanded
Firoozabadi’s study [3] using the same setup to control a
virtual interactive tower crane in a construction area. They
employed subtractive fuzzy c-means clustering (SFCM)
and reported an average 92.6% classification ratio for five
different facial gestures (smiling, pulling up right/left lip
corners, mouth opening and clenching molar teeth).
Study goal
This study investigates how multimodal facial bioelectric
signals could be effectively recognized in order to build a
novel HMI for disabled people so they can access assistive
systems and enjoy a better quality of life within society.
The main reason to use the face as an interface in our
approach is that the human face is a rich resource of
information from syntactic to semantic and pragmatic
aspects, and most facial gestures are natural and voluntary
Australas Phys Eng Sci Med
123
Author's personal copy
and can be easily generated by many individuals whose
motor impediments are mainly from the neck down. The
physical state (such as gestures or fatigue) of a user can be
monitored by recording and processing his or her fEMG
signal specifications (such as amplitude, power and fre-
quency spectrum). In addition, the user’s fEOG and fEEG
specifications (such as amplitude, entropy, phase space,
etc.) can be used to indicate the user’s affective states
(from happiness to sadness, mental and cognitive stress, or
attention to a display screen). The proposed multimodal
approach is to improve and enhance the interface and to
avoid saturating or overloading the physical states.
The rest of the paper is organized as follows. The
‘‘Material and methods’’ section presents the methods
adopted in this research, including electrode site selection,
experimental setup, and the data recording method. The
methods used for data preprocessing such as filter banks,
data segmentation, feature selection and onset detection are
described. In the ‘‘Classification’’ section, the input–output
subtractive fuzzy clustering method, adaptive neuro-fuzzy
inference system (ANFIS) and majority voting are dis-
cussed and then ‘‘Experimental results and analysis’’ sec-
tion are given. Finally, ‘‘Conclusion and the future work’’
section are provided.
Materials and methods
The general block diagram of our proposed bioelectric
interface is shown in Fig. 1. It consists of the following
sections, which have been discussed in this paper: electrode
site selection, configuration and placement, data pre-pro-
cessing and filtering, windowing and data segmentation,
feature extraction and transformation, inference system
designing and majority voting sub-blocks, which are dis-
cussed in the coming section.
Eight predefined facial gestures (smiling, frowning,
pulling up right and left lip corners, moving eye to up,
down, left and right) were studied in our experiments.
These gestures are primary facial movements that a healthy
user or a user who has suffered a stroke (from the neck
down) can generate easily. In addition, they can convey
users’ emotions (fEMG and fEEG), directions of view
(fEOG) and also levels of distraction of mental states
(fEOG and fEEG) [3–6]. Each subject was asked to per-
form the above gestures according to the recording proto-
col and the relative facial bioelectric signals were acquired
simultaneously, using three pairs of electrodes mounted on
the volunteer’s forehead (hereinafter, physical channels)
and then fEMG, fEOG, and fEEG sub-frequency bands
were extracted from each of the physical channels by
employing appropriate filter banks. Thus, for each physical
channel, there are three logical channels. Then, the RMS
features of the explored bioelectric signals for non-over-
lapped 256 ms windows were extracted. The SFCM was
used to segment the feature space and generate initial fuzzy
based Takagi–Sugeno (TS) rules. Afterwards, the ANFIS
exploited the extracted rules from SFCM to tune up the
premises and consequence parameters.
Electrodes site selection and placement
As illustrated in Fig. 2, three pairs of pre-gelled Ag/AgCl
electrodes were placed on the volunteer’s facial muscles in
a different configuration to harness the highest amplitude
of signals:
– One pair on Frontalis muscle; above the eyebrows with
a 2 cm inter-electrode distance (Channel 2).
– Two pairs placed on left and right Temporalis muscles
(Channel 1 and Channel 3).
– One ground electrode on the bony part of the left wrist.
Note that the skin of the selected electrode placement
area should be gently rubbed with an alcohol-soaked cotton
swab in order to minimize the impedance between elec-
trodes and skin and also to eliminate motion artifacts. The
lead wires should be secured to the volunteer’s face via an
anti-allergy adhesive tape.
This configuration is a heuristic choice because these
three pairs are capable of gathering valuable information
from the face as described below:
– As the Frontalis muscle is spread out around the
forehead and its fibers are not too deep and are not
perpendicular to the electrodes inter-center line (except
for in a very small area), Channel 2 could be
responsible for collecting Frontalis activity and it
could be used to detect frowning, a major task of
Frontalis (Fig. 2). On the other hand, it could captureFig. 1 General block diagram of the proposed approach
Australas Phys Eng Sci Med
123
Author's personal copy
horizontal movement (EOG) due to the differences in
proportional displacement and orientation of Corneor-
etinal dipoles with respect to the electrode configura-
tion in Channel 2, which can then record this
bioelectric potential difference [5].
– It should be noted that the magnitude of the recorded
signal for vertical eye movement in Channel 2 is low
because the proportional displacement and orientation
of the Corneoretinal dipoles are approximately similar
with respect to the electrodes configuration in Channel
2. In addition, Channel 2 can be considered to be a
good logical channel to obtain fEEG.
– Contraction of the Temporalis muscle elevates the
mandible. Channels 1 and 3 can be used for detecting
Temporalis activities and its related fEMG (such as
smiling, pulling lip corners upward, teeth clenching,
eye winking and eye blinking). These two pairs of
electrodes can also detect vertical eye movement
(EOG) because the proportional displacement and
orientation of the Corneoretinal dipoles are different
with respect to the electrodes configuration in Channels
1 and 3, which can record this bioelectric potential
difference. They can also be used to capture temporal
EEG. It should be noted that recording Temporalis
muscle activities was preferred over Masseter muscles,
which also contract during smiling and pulling up right/
left lips corner. This is because the Masseter muscles
are heavily involved during the act of speaking, which
may cause false command generation. In addition,
because we have proposed to use this HMI for disabled
people, the electrode placement should be set up as
easily as possible. When considering our electrodes
configuration and placement, the electrodes can be
mounted on a headband or a sport cap, but this goal
could not be achieved by placing electrodes on the
Masseter. In addition to this, we aim to reduce the
number of electrodes as much as possible to reduce the
amount of raw data and processing time. Temporalis
can also capture fEMG, fEEG and fEOG but Masseter
cannot (Table 1) [5].
– In our study, experimental trails before the main
experiment showed the capability of Channels 1–3 for
capturing EMG, EOG and EEG signals.
In our proposed bipolar electrodes configuration, the
conductive volume effect for each pair of electrodes is
more localized and specified to the muscle fibers that are
Fig. 2 Illustration of the electrodes configuration over Frontalis and Temporalis facial muscles [3–6] the configuration of the Channel 3 is the
same as Channel 1, but on the opposite side of the face
Table 1 The relationship between the recording physical channels and their assignments
Channel
name/bioelectric
signals
EMG EOG EEG
Facial movement Most significant muscle
contracting
Eye movement direction Mental states
Channel 1 Smiling
Pulling up right lip corner
Right Temporalis Moving right eye toward up & down All frequency sub bands
Channel 2 Frowning Frontalis Moving eyes toward left & right All frequency sub bands
Channel 3 Smiling
Pulling up left lip corner
Left Temporalis Moving left eye toward up & down All frequency sub bands
Australas Phys Eng Sci Med
123
Author's personal copy
located underneath each electrode pair. Thus, the levels of
bioelectric signals (cross-talk) to be sensed by the electrode
pairs other than the one assigned to capture specified ges-
tures were reduced [4]. In our previous studies [4–6], we
used facial bioelectric signals of a user as an interface to
control an assistive robot or device. The physical states (i.e.,
gestures) of the assistive machine could be controlled by
fEMG signals, while fEEG and fEOG signals could be used
simultaneously for an adaptive interface based on the user’s
affective state. Thus, according to the electrode placement,
we are able to gather the proper bioelectric signals that
mirror both physical and affective states of the user and
achieve a robust and adaptable interface. The theoretical
studies and simulations that emphasize and confirm that the
proposed electrode placement is a proper choice to capture
the mentioned signals can be found in [5, 6].
It should be noted that our proposed electrode placement
is different from configurations recommended by bio-
sensing technology manufacturers. In traditional methods,
the electrodes are placed over Masseter muscles, 4 corners
of the eyes and in a 10–20 configuration in order to record
facial muscle activities, EOG and EEG, respectively.
However, using the proposed method, one can capture
facial muscle gestures, eye movement in 4 directions and
mental states (which can be referred to the forehead EEG)
using only three pairs of electrodes. Also, the proposed
configuration can provide the capability to distinguish
more physical and mental states with respect to traditional
methods (see ‘‘Experimental results and analysis’’ section).
Data acquisition setup
In this project, the Biopac system (MP100 model and
ack100w software version) [19] was used to acquire bio-
electric-signals. The system can collect bioelectric-signals
accurately and store them in its internal or PC memory
(1.73 GHz, 2 G RAM). However, MP100 could not be
used for online applications inherently. An in-house
MATLAB code has been written to read acquired signals in
a short time from the Biopac internal memory. Thus, it
provides online capability for our applications. The sam-
pling frequency and amplifier gain were selected to be
1,000 Hz and 5,000, respectively. The low cutoff fre-
quency of the filter was chosen to be 0.1 Hz to avoid
motion artifacts and a narrow band-stop filter (48–52 Hz)
was also used to eliminate line noise.
Participants
Ten volunteers participated in the study, and were grouped
into three different physically healthy (with no history of
disorders) and non-athletic sub-groups depending on their
ages:
– Sub-group #1 (C1–C2): Two children 10 and 13 years
– Sub-group #2 (A1–A7): Seven adults aged between 19
and 29 years (the mean age was 23.7 years). All
volunteers were from school of biomedical engineer-
ing, Azad University in Tehran, Iran.
– Sub-group #3 (E1): One elderly adult aged 42.
These sub-groups were selected and studied for the
following major purposes:
– To validate the experimental procedure
– To use outcomes as a proxy for the potential applica-
tions for users with disabilities.
– To validate the robustness of the method:
• C1 was a 10 year old child who had a lack of
attention due to unfamiliarity with the test. Thus, he
could be a good case to study the effect of age and
also the level of required attention to perform the
experiment and achieve the desired outcomes.
• A1 was a 29 year old adult and had some difficul-
ties in pulling up his left lip corner. He closed his
left eyes when he was asked to pull his left lip
corner up. In his case, a left eye wink or left eye
closing could be misclassified as pulling up the left
lip corner. Thus winking could be considered to be
interference in our interface.
• A2 was a 22 years old adult and she had some
difficulties in frowning and generating strong signals
because of a BotoxTM (Botulinum Toxin) injection
on her forehead muscles for cosmetic purposes.
• All other subjects had no significant difficulties in
performing the experiment.
• Sub-group #1 and #3 are the boundaries of sub-
group #2 (considered as the main sub-group) with
respect to their age ranges. So, assessing the effect
of age to determine whether age is a dependent
variable in the study could also be evaluated.
Data recording protocol
In each recording session, the volunteer was asked to
perform the following gesture classes according to the
protocol below:
– Class #1—facial muscle gesture class—includes: Smil-
ing, frowning, pulling up right and left lip corners (all
moderately)
– Class #2—eye movement class—includes: Moving eye
from the center to up, down, left and right (not
periodically)
– Class #3—wink class—includes: Closing left eye, right
eye, both eyes and frowning powerfully and
moderately.
Australas Phys Eng Sci Med
123
Author's personal copy
Based on the experiment requirements, the recording
session took about 30 min to be performed completely for
each volunteer. Before each recording session, the volun-
teer was trained to perform the desired gestures using his or
her facial muscles. Then, he or she was asked to take a rest
and try to relax for a period of 5 min After this period, the
Quiescent bioelectric signals from all physical data chan-
nels were recorded simultaneously for 1 min, while he or
she was still resting. These signals were used to determine
the on-set threshold to distinguish between off-set and
active states of the classifier.
In each trial, the volunteer was asked to perform one of
the gestures for the recording period and the recording was
started 1 s prior to the gesture performance and ended right
after 5 s from the beginning of the gesture. After 10 s
resting interval, he or she would repeat the gesture again;
the above gesture-rest task was cycled 10 times. The rest
period was chosen empirically to eliminate the fatigue
effect during training (Fig. 3).
Data pre-processing
The ‘‘Data pre-processing’’ section describes the specifi-
cations of filter banks, choosing appropriate data segment
length, feature selection and onset detection.
Filter banks
The acquired data was passed through parallel Butterworth
digital filter banks with predefined frequency characteris-
tics to obtain desired frequency sub-bands:
– 0.2–3 Hz for assigning to fEOG to detect subject’s
vertical and horizontal eye movements and enhance the
cognitive level of interactions.
– 7–12 Hz and 13–22 Hz for assigning to fEEG to
determine the subject’s affective states and enhance the
cognitive interaction for future applications.
– 30–450 Hz for assigning to fEMG to determine the
subject’s gestures and enhance the physical level of
interaction.
It should be noted that the desired sub-bands are far
from each other and thus no frequency contamination or
aliasing has been seen in each of the sub-bands from the
others. In addition, in our proposed bipolar electrodes
configuration, the conductive volume effect for each pair of
electrodes is more localized and specified to the muscle
fibers that are located underneath each electrode pair. Thus,
the levels of bioelectric signals (cross-talk) to be sensed by
the electrode pairs other than the one assigned to capture
specified gestures were reduced [4–6]. Thus, no significant
cross-talk has been observed in our study. It should be
noted that, other biopotential activities (ECG or chest
movement, for example) were not observed in the facial
gestures signals because of long distance between the
electrodes’ field of view and other biopotential dipoles.
Data segmentation
A segment is a time slot for acquiring bioelectric-signals
data considered for feature extraction. It should be noted
that EMG is comprised of two states: transient state and
steady state. Englehart et al. [20] have done an extensive
Fig. 3 The block diagram for recording session protocol
Australas Phys Eng Sci Med
123
Author's personal copy
study on hand gesture classification and shown that steady-
state data is classified more accurately than transient data,
and classification suffers less degradation with shorter
segment lengths. The rate of classification degrades more
quickly as the segment length of transient data is
decreased, than with steady-state data. Therefore, steady-
state data with a shorter segment length such as 128 ms is
more reliable if a faster system response is required. After
determining segment length and the state of data, a third
important point in data segmentation is the data windowing
technique. There are two major techniques in data win-
dowing: adjacent windowing and overlapped windowing.
Farina and Merletti showed that overlapped segments
increase processing time without providing a significant
improvement in the accuracy of spectral features, such as
autoregressive coefficients. They also showed that a seg-
ment length \125 ms leads to high variance and bias in
frequency domain features [20]. Due to our future real-time
approach, an adjacent segment length plus the processing
time of generating classified control commands should be
equal to or\300 ms. Furthermore, a segment length should
be adequately large, since the bias and variance of features
increases as segment length decreases, and consequently
degrades classification performance. Therefore, a trade-off
in response time and accuracy exists. However, Englehart
and Hudgins highlighted that by adopting continuous seg-
mentation on a steady state signal, the segment length can
be reduced to 128 ms, or even 32 ms, without a consid-
erable decrease in accuracy. Because of real-time com-
puting and high-speed microprocessors, processing time is
often\50 ms; the segment length can vary between 32 and
250 ms [20].
The interest of using facial bioelectric signals as inter-
face channels has been increasing, but according to our
research, there has been no comprehensive study about
choosing the optimal segment length when using them as
interface channels. We have performed an extensive study
on facial bioelectric signals and evaluated the effect of
segment length on classifier performance based on our
approaches. According to our study, the 256 ms non-
overlapped segment length was chosen for our experiments
(see ‘‘Experimental results and analysis’’ section).
It should be noted that in general, facial muscle bio-
electric signals are non-stationary. However, in reference
to Oskoei and Hu [20], low-level (20–30% MVC) and
short-time contractions (20–40 s) can be assumed to be
wide-sense stationary. Moreover, at higher levels (50–80%
of MVC) they can only be assumed to be locally stationary
for a period of 500–1,500 ms. Therefore, the time-slot of
the mentioned signals can be assumed to be stationary in
real-time applications, even if they have variant spectral
characteristics [20]. Also, it was stated that a segment
length of 250–500 is more suitable to achieve less variance
and bias in the estimation, compared to other segment
lengths [20].
Feature selection and onset detection
Feeding a bioelectric signal presented as a time sequence
directly to a classifier is impractical, due to the large
number of inputs and randomness of the signal. Therefore,
the sequence must be mapped into a smaller dimension
vector, which is called a feature vector. Features represent
raw bioelectric signals for classification, so the success of
any pattern recognition problem depends almost entirely on
the selection and extraction of features. A wide spectrum of
features has been introduced in the literature for bioelectric
classification, which fall into one of three categories: time
domain, frequency (spectral) domain, and time-scale
(time–frequency) domain [20].
Mean absolute value (MAV) and RMS are two well-
known time domain features. Theoretically, when a signal
is modeled as a Gaussian random process, RMS provides
the maximum likelihood estimation of amplitude. In this
model SNR &
ffiffiffiffiffiffi
2Np
, where N is the number of statistical
degrees of freedom. MAV provides a maximum likelihood
estimate of the amplitude, when a signal is modeled as a
Laplacian random process. In this case SNR =ffiffiffiffi
Np
, which
is 32% lower than a Gaussian-based model [20]. Thus, with
this general prescription in our mind, the RMS features of
each bioelectric-signal channel were calculated within a
non-overlapped window of 256 ms:
Ri ¼ RMS EXGið Þ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R T
0EXG2
i dt
T
s
ð1Þ
where i = 1,…,K; K: channel number; T = 256; EXG: X
can be replaced with M, O, or E.
Onset for each action was automatically determined as
the point when the bioelectric-signal RMS feature was
greater than its mean of RMS of Quiescent signal plus three
standard deviations [21].
Ti ¼�
Ri � 3Mean RMS EXGQuiescentð Þð Þþ 3 std RMS EXGQuiescentð Þð Þ
ð2Þ
Then, the above threshold RMS features were
normalized to make the sum of RMS for K channels
equal to 1 [22]:
Si ¼Ti ÿMean RMS EXGQuiescentð Þð ÞPK
i¼1 Ti ÿ RMS EXGQuiescentð Þð Þð3Þ
Furthermore, to have a more separable feature space, all
extracted features can be transformed to a non-linear
Australas Phys Eng Sci Med
123
Author's personal copy
simple feature space using a log transform to spread the
concentrated data points while condensing the highly
scattered points [23], as shown in Fig. 4a and b.
Fi ¼ log Sið Þ ð4Þ
Classification
Extracted features need to be classified into distinctive
classes for recognition of the desired gesture. In addition to
inherent variation of bioelectric-signals over time, there are
external factors, such as changes in electrodes position,
fatigue, and sweat, which may cause changes in a signal
pattern over time. A classifier should be able to cope with
such varying patterns optimally, as well as prevent over-
fitting. Classification should be adequately fast to meet
real-time processing constraints. A suitable classifier has to
be efficient in classifying novel patterns; online training
can maintain the stably of classification performance over a
long-term operation [20].
In our study, the input–output matrix should be created
where inputs indicate the corresponding RMS features for
specified logical data channels and the output indicates a
proper label for the specified gesture. The features’ fusion
in our approach means concatenation of RMS features of
logical data channels to each other. Suppose that the
inputs-output matrix consisting of fEXG feature data from
the ith logical channel—X can be replaced with M (Myo),
O (Occulo) or E (Encephalo)—and its corresponding out-
put label as follows:
InputsÿOutput ¼ fEXG dataChiOutput Label½ � ð5Þ
Now, if we want to fuse (concatenate) fEYG features
data from jth logical channel—Y can be also M, O or E—
Fig. 4 a Raw data and
frequency spectrum of three
channels for pulling up left lip
corners. b Feature space for 8
different facial gestures and the
relaxation state (eye-opened)
Australas Phys Eng Sci Med
123
Author's personal copy
for the same output label gesture to the input–output
matrix, then we have:
InputÿOutput
¼ fEXG dataChi fEYG dataChj Output label½ � ð6Þ
where X and Y could not be alike for the same logical
channel to eliminate the redundancy effect among input–
output matrix columns.
We have performed a comparative study on choosing
appropriate classifiers for our experimental data that is
based on the achieved results (see ‘‘Experimental results
and analysis’’ section). The power of the input–output
subtractive fuzzy clustering method has been chosen and
utilized to obtain a set of rules and to adjust its parameters
using an ANFIS.
Subtractive clustering
The idea of fuzzy clustering is to divide the data space into
fuzzy clusters, each representing one specific part of the
system behavior. Fuzzy c-means is one the fuzzy clustering
methods that is a supervised algorithm, used because it is
necessary to tell it how many clusters, c, to look for. If the
number of centers is not known beforehand, it is necessary
to apply an unsupervised algorithm. SFCM is based on a
measure of the density of data points in the feature space.
The idea is to find regions in the feature space with a high
density of data points. The point with the highest number
of neighbors is selected as the center for a cluster. The data
points within a pre-specified, fuzzy radius are then
removed (subtracted), and the algorithm looks for a new
point with the highest number of neighbors. Subtractive
clustering uses data points as the candidates for cluster
centers, instead of grid points, as in mountain clustering.
This means that the computation is now proportional to the
problem size instead of the problem dimension. However,
the actual cluster centers are not necessarily located at one
of the data points, but in most cases it is a good approxi-
mation. After projecting the clusters onto the inputs-output
space, the antecedent parts of the fuzzy rules can be found.
The consequent parts of the rules can be represented by
simple mathematical functions. Using this method, one
cluster corresponds to one rule of the TS model [24, 25].
After the recording session, the extracted features from
trials with the odd index numbers and even index numbers
were added to the training set and the testing set, respec-
tively. Then the input–output matrix was made as described
above. In order to form a cross validation, the Leave-One–
Training Trial-Out, k 9 k-fold algorithm was applied to
the training set; where k is equal to 10. The k-1 fold from
the training set were used to train the classifier and were
applied to SFCM to derive the initial fuzzy inference
system, and the rest onefold (from different trial) was used
to validate it. To have a robust classifier, one should look
for the most likely SFCM radius during the training pro-
cess. Thus, to validate our classifier, the above instruction
was applied 10 times and the characteristics of the SFCM
with the higher classification ratio and the most likely
radius were chosen as parameters for designing the initial
fuzzy inference system. The above description has been
shown in Fig. 5. The achieved initial system was passed
through the ANFIS to tune up its parameters. It should be
noted that the testing set is completely separate from the
training set and all the achieved results were obtained by
employing our method over the testing set.
Adaptive neuro-fuzzy inference system
The ANFIS is a fuzzy TS model put in the framework of
adaptive systems to facilitate learning and adaptation
(Fig. 6). A hybrid algorithm combining the least squares
and the gradient descent methods is applied to tune the
premise and consequence parameters of the TS model. The
Fig. 5 SFCM classifier training and validating protocols
Australas Phys Eng Sci Med
123
Author's personal copy
least squares method (forward pass) is used to optimize the
consequent parameters with the premise parameters fixed.
Once the optimal consequent parameters are found, the
backward pass starts immediately. The gradient descent
method (backward pass) is used to optimally adjust the
premise parameters corresponding to the fuzzy sets in the
inputs’ domain. The output of the ANFIS is calculated by
employing the consequent parameters found in the forward
pass. It has been proven that this hybrid algorithm is highly
efficient in training ANFIS. In addition, ANFIS has the
advantage of being significantly faster and more accurate
than many pure neural network-based methods and can
avoid the pitfall of over-fitting the training data, thereby
achieving excellent generalization ability [26]. ANFIS has
shown its capabilities as a powerful classifier in the gesture
classification area. For example, Khezri and Jahed [27]
have achieved 96% accuracy in discriminating between six
different hand gestures using surface EMG, by employing
an ANFIS classifier over time domain and time–frequency
domain features.
In our study, after deriving an initial fuzzy inference
system based on subtractive fuzzy clustering, the premise
and consequence parameters were optimized as described
above.
Majority voting
We applied majority voting (MV) as a post-processing
method to manage excessive classified output regarding
continuous segmentation. This can improve system per-
formance and make a smooth and reliable decision from a
dense stream of class decisions. MV includes the last and
next m-decisions for a given point, to generate a new
decision. The final decision of each point is merged based
on the greatest number of occurrences in 2m ? 1 decision
points. The number of decisions used in MV is determined
by processing time and acceptable delay. As mentioned,
the accuracy of bioelectric control degrades rapidly with
decreasing segment length. Englehart and Hudgins point
out that this degradation would be prevented if majority
voting were used for post-processing after classification
[20]. In our method, after deriving the AFNIS classifier, the
MV algorithm was applied and m was varied from 1 to 3 to
achieve the best results. It should be noted that MV works
like a hard averaging algorithm and achieves smoothness
of classifier decisions if m is increased. In addition,
m cannot increase too much due to limitation of delay time
and because of the real-time approach of our proposed
method (see Fig. 7a, b).
Experimental results and analysis
We performed 4 experiments to evaluate the proposed
method. In each experiment, the participants’ bioelectric
signals were captured according to the recording protocol.
The accuracy Ai for the ith gesture was given by:
Ai ¼Gi
Ni
� 100%; i ¼ 1; 2; 3; . . .;C ð7Þ
where Gi is the number of times that gesture i was correctly
recognized by the classifier, C is the total number of dif-
ferent gestures and Ni is the number of times that the ith
gesture was requested from the classifier.
Experiment #1: optimal segment length
All gestures within Class #1 and #2 were requested to be
performed from all the sub-groups of participants (C2, A1
and A2 were excluded because of their problems in per-
forming the experimental protocol correctly).
As discussed in the ‘‘Data segmentation’’ section,
choosing the optimal segment length is an important factor
that influences the performance of the classifier. Thus, to
study this effect, the segment length was varied from 64 to
1,024 ms and for each segment, the RMS value was cal-
culated. Then the RMS features were fused together and
used to format the input–output matrix.
InputÿOutput
¼ fEMGCh1 fEMGCh2 fEMGCh3 Output label½ � ð8Þ
Then, our classification method applied to the above
input–output matrix to classify 8 different facial gestures.
The achieved results support Englehart’s results [20] that
involve selecting the optimal segment length for upper
limb EMG. Thus, based on Fig. 8 and the above discussion,
a segment length equal to 256 ms has been chosen. In
addition, one can conclude that facial and upper limb
muscles follow the same fundamental feature selection rule
despite some differences in their structures, such as
carrying a few muscle spindles in facial muscles
compared to upper limb muscles. It is interesting to note
that the number of spindles in involved muscles for most ofFig. 6 Simplified ANFIS structure for two inputs and two rules
Australas Phys Eng Sci Med
123
Author's personal copy
the performed gestures in our study is higher compared to
other possible facial gestures [28].
Experiment #2: classification of gesture Class #1
All gestures within Class #1 were requested to be per-
formed by all the sub-groups of participants (C2, A1 and
A2 were excluded because of their problems performing
the experimental protocol correctly), and the fEMG fea-
tures were fed to the classifier as training and test sets. As
Table 2 shows, we obtained an accuracy of 42.54–100%
for Class #1 using different fusions of the forehead bio-
electric signal features. Clearly, the average discrimination
ratio of ANFIS is better than SFCM. ANFIS provides more
robust results and fewer output fluctuations. Furthermore,
its outputs are mostly in the defined domain (Fig. 7a, b).
Training SFCM classifier needs less time than ANFIS, so
there could be considered to be a tradeoff between higher
accuracy and robustness and training time. However, as
training happened only once, it is worthy to spend more
training time to implement the ANFIS classifier and
achieve better results. Also, it is clear that, to classify Class
Fig. 7 a SFCM (above) classifies performance and robustness for
gesture #4 (pulling up left lip corner). The effect of the MV algorithm
is also shown (below). b ANFIS (above) classifies performance and
robustness for gesture #4 (pulling up left lip corner). The effect of the
MV algorithm is also shown (below)
Australas Phys Eng Sci Med
123
Author's personal copy
#1, separating fEEG and fEOG from fEMG generates more
accurate results and the classifier needs less training time
than other feature combinations. The number of rules
column in Table 2 underscores the above results. As
shown, the number of rules in the classifier that was fed
only with fEMG features is less than other classifier rules;
this is due to simplicity and separability of the feature
space when using fEMG features. Also, this is because
fEEG and fEOG features are not separable in the Class #1
gestures and they generate extra complexity, which is not
helpful for classifier separability.
Experiment #3: using fEOG features to enhance
the classifier
All gestures within Class #1 and #2 were requested to be
performed from all sub-groups of participants (C2, A1 and
A2 were excluded; the results were given before). The
fEOG can represent user attention to the performed task
and can also be used to detect eye movement gestures by
enhancing the features’ space, as described in the previous
sections. With this three-channel configuration, one can
discriminate between different eye movement directions by
fEOG feature fusion to fEMG, due to the fact that the
fEOG features are the main features in the eye movement
gesture class. Here, the fused features of fEMG and fEOG
were fed to the classifier as training and test sets. As shown
in Table 3, the average discrimination ratio for Class #1
and Class #2 found using feature fusion of fEMG and
fEOG is 96.99%, compared to 93.04%, which is the dis-
crimination ratio when only using fEMG features. Clearly,
the trained classifier achieved by using an fEOG fusion to
fEMG features has more discrimination capabilities in the
eye movement class; however there is a decrease of the
smile class discrimination ratio from 98.2 to 97.94%, due
to the complexity of features’ space. Also, compared to a
previous Ref. [14], our results show that our electrode
placement and classification technique provides a better
discrimination ratio for similar gestures and our proposed
method can distinguish 4 more gestures—the gestures in
Class #1. It should be stated again that the numbers of
electrodes and their configurations are important factors in
classification results. Chin et al. [17] uses NeuroScan
NuAmps 40-channel Quik-Cap to discriminate 6 facial
gestures: smile, straight, wince, agape, stern and frown.
They have concluded that use of a 34 electrode configu-
ration instead of a 6 electrode configuration could increase
the average classification accuracy between 78.5 and 86%.
However, their 6 electrode configuration was placed
according to the 10–20 frontal electrodes standard. Also,
the results of our proposed method show higher classifi-
cation accuracy for the same gestures and also detect 4
more gestures (eye movements) compared to the work of
Chin et al. [17], despite use of less or the same number of
electrodes. In addition, evaluation of our method’s per-
formance is provided by the confusion matrix in Table 4.
Clearly, there is no misclassification between Class #1 and
Class #2 gestures. Thus, one can conclude that the gesture
interactions only occurred within the same class of ges-
tures, and also, Class #1 and Class #2 are somehow inde-
pendent. In addition, the most significant interactions
occurred between pulling up right/left lip corners and
smiling, as their signaling source are the same.
Experiment #4: investigating logical channels
capabilities
Table 5 indicates the discriminating capability of logical
channels using different combinations of them. Here, for
each column, the classifier was trained using a fEMG and
fEOG features’ fusion of the logical channel related to that
column, and was then tested. It is shown that Channel 1 can
be considered as a good data source for smiling, frowning,
and pulling up when it was employed as only data channel.
In this case, the accuracies of the mentioned gestures are
higher comparing to the other gestures which means
Channel 1 cannot be used as a good data source to obtain
rich information content for discriminating eye movements
and pulling up left lip corner gestures. The similar results
can be obtained when only Channel 2 or Channel 3 were
employed as data source. However, according to Table 5, it
can be concluded that the most accurate results could be
achieved using all explored logical channels at once by
fusing data from all logical channels together. In this case
the information content for training the classifier increased
Fig. 8 Experiment #1 result: the effect of segment length on
classifier performance over gesture classes #1 and #2 (error % refers
to classifier discrimination ratio error) using fEMG features from all
three channels
Australas Phys Eng Sci Med
123
Author's personal copy
and it led to higher discrimination ratios. These results
support our hypothesis for electrode placement because the
highest accuracy from each channel was achieved when the
specified gestures were captured by that individual channel.
In addition, in the proposed bipolar electrode configuration,
the conductive volume effect for each pair of electrodes is
more localized and specified to the muscle fibers, which are
located underneath the electrode pair. This combination
could also prevent overloads on other individual channels.
Experiment #5: wink discrimination
Wink class gestures can be considered to be gestures that
can conflict with the smile class; for example, in Case A1,
the subject had some difficulties in pulling up his left lip
corner, as he closed his left eyes when he was asked to pull
his left lip corner up. In this case, left eye winking or left
eye closing could be misclassified as the pulling up left lip
corner gesture. Thus, the intended gestures should be
Table 2 Discrimination ratios (sub-group #1, #2 and #3; cases C2, A1 and A2 were excluded) for gesture Class #1 with different logical
bioelectric-signal configurations using an fEMG (all three channels), fEOG (all three channels) and fEEG (all three channels)
Test
index
Considered logical
channel(s) (fused data)
Number of
classifier’s
rules
Best
SFCM
radius
average
Subtractive FCM
(SFCM) discrimination
ratio (average)
ANFIS ? SFCM
discrimination
ratio
Mean of
training time
for SFCM (s)
Mean of
training time
for ANFIS (s)
1 fEMG 4 0.475 100 ± 0.0 100 ± 0.0 0.0187 0.1543
2 fEOG 232 0.02 39.05 ± 3.4 42.54 ± 1.7 0.1309 32.79
3 fEEG 13 .2575 57.2 ± 4.11 56.61 ± 3.2 0.0316 1.1821
4 fEEG ? fEOG 144 .0525 55.40 ± 3.27 60.15 ± 2.1 1.0740 111.41
5 fEMG ? fEOG 8 0.5025 99.11 ± 0.18 99.47 ± 0.08 0.6513 0.0281
6 fEMG ? fEEG 5 0.72 100 ± 0.0 100 ± 0.0 0.04181 0.0264
7 fEMG ? fEEG ? fEOG 4 0.8925 99.09 ± 0.31 98.91 ± 0.12 0.0470 0.9908
Average 59 41.64 78.55 ± 1.61 79.66 ± 1.02 0.2850 20.9402
‘‘?’’ in this table means fusion
Table 3 Discrimination ratios over sub-group #1, #2 and #3 (cases C2, A1 and A2 were excluded), for Class #1 and Class #2 using fEMG (all
three channels) and fEOG (all three channels)
Gesture No. Considered logical
channel
Gesture ANFIS ? SFCM
discrimination ratio
Mean of discrimination
ratio
1 fEMG Smile 98.30 ± 0.02 98.2 ± 0.04
2 Frown 100 ± 0.0
3 Pulling up right lip 97.40 ± 0.03
4 Pulling up left lip 97.10 ± 0.11
5 Moving eye up 92.73 ± 2.16 87.88 ± 2.2
6 Moving eye down 94.66 ±1.07
7 Moving eye right 82.24 ± 3.26
8 Moving eye left 81.91 ± 2.31
Average 93.04 ± 1.12
1 fEMG ? fEOG Smile 96.72 ± 0.06 97.94 ± 0.10
2 Frown 100 ± 0.0
3 Pulling up right lip 97.44 ± 0.14
4 Pulling up left lip 97.61 ± 0.21
5 Moving eye up 95.23 ± 0.53 96.04 ± 1.01
6 Moving eye down 94.48 ± 0.42
7 Moving eye right 96.97 ± 1.10
8 Moving eye left 97.50 ± 2.02
Average 96.99 ± 0.55
‘‘?’’ in this table means fusion
Australas Phys Eng Sci Med
123
personal copy
extracted and discriminated from the interacted gestures.
All gestures within Class #1, #2 and #3 were requested to
be performed by A1 and the features were fused as indi-
cated in Table 6. As Table 6 shows, the worse results were
achieved when the classifier was asked to discriminate
between the smile and wink classes. This is due to high
similarities among features of these two gesture classes
(e.g. pulling up the left lip corner and left eye winking)
because the dominant sources of these gestures are the
same facial muscles. It should be noted that in the similar
situations, the proposed method is not very helpful if we
want to discriminate more gestures (such as eye closing) as
commands in the HMI from the same sources. Hence, our
predefined gestures can be considered as fundamental
gestures for HMI commands when facial muscles are
employed as interface.
Experiment #6: studies on the effects of difficulties
that occurred while performing the tasks
The capabilities of our proposed method have been tested
over cases A1, A2 and C1, as they had some difficulties in
performing the gestures correctly. Here, all gestures within
Class #1 were requested to be performed by them, and their
fEMG features were used to train and test the classifier.
Table 4 Confusion matrix averaged over sub-group #1, #2 and #3 (cases C2, A1 and A2 were excluded), for Class #1 and Class #2 using fEMG
(all three channels) and fEOG (all three channels)
True class
Smile Frown Pulling up right
lip
Pulling up left
lip
Moving eye
up
Moving eye
down
Moving eye
right
Moving eye
left
Predicted class
Smile 96.72 0 2.56 2.39 0 0 0 0
Frown 0 100 0 0 0 0 0 0
Pulling up right
lip
2.12 0 97.44 0 0 0 0 0
Pulling up left
lip
1.16 0 0 97.61 0 0 0 0
Moving eye up 0 0 0 0 95.23 3.66 0.38 1.06
Moving eye
down
0 0 0 0 3.11 94.48 0.33 0.31
Moving eye
right
0 0 0 0 1.12 1.59 96.97 1.13
Moving eye left 0 0 0 0 0.54 0.27 2.32 97.5
Table 5 Capability of logical channel combinations (sub-group #1, #2 and #3; cases C2, A1 and A2 were excluded) by fusing fEMG and fEOG.
(Ch channel)
Gesture no. and name Channels’ discrimination ratio
Ch1 Ch2 Ch3 Ch1 ? Ch2 Ch1 ? Ch3 Ch2 ? Ch3 Ch1 ? Ch2 ? Ch3
#1. Smile 80.00 ± 0.7 0 ± 0.0 40.00 ± 3.6 78.00 ± 1.1 96.00 ± .09 92.00 ± 0.07 96.72 ± 0.06
#2. Frown 72.00 ± 0.8 100 ± 0.0 0 ± 0.0 100 ± 0.0 70.00 ± 1.56 100 ± 0.0 100 ± 0.0
#3. Pulling up right lip
corner
80.00 ± 1.1 0 ± 0.0 0 ± 0.0 92.00 ± 0.4 100 ± 0.0 18.00 ± 4.1 97.44 ± 0.14
#4. Pulling up left lip
corner
18.00 ± 4.8 28.00 ± 2.9 82.00 ± 0.8 24.00 ± 4.9 96.00 ± 0.84 88.00 ± 1.2 97.61 ± 0.21
#5. Eye moving up 60.00 ± 3.1 38.00 ± 3.3 32.00 ± 2.4 82.00 ± 1.05 52.00 ± 3.7 14.00 ± 3.1 95.23 ± 1.01
#6. Eye moving down 58.00 ± 4.5 6.00 ± 3.7 38.00 ± 2.7 98.00 ± 0.06 68.00 ± 3.2 50.00 ± 2.4 94.48 ± 0.42
#7. Eye moving right 48.78 ± 3.8 0 ± 0.0 78.05 ± 0.9 85.00 ± 0.74 90.24 ± 1.2 95.00 ± .65 96.97 ± 1.10
#8. Eye moving left 48.00 ± 0.78 28.00 ± 3.9 64.00 ± 3.6 64.00 ± 1.8 74.00 ± 2.1 48.00 ± 3.1 97.50 ± 2.02
Average 58.09 ± 2.44 25.00 ± 1.72 41.75 ± 1.75 77.87 ± 1.25 80.78 ± 1.58 63.12 ± 1.82 96.99 ± 0.55
Australas Phys Eng Sci Med
123
Author's personal copy
Table 7 shows some interesting results. In Case A1, as
long as there is no need to add more commands to the HMI
(like eye closing gestures), the classifier can easily detect
smile class gestured despite the subject’s difficulties in
generating the correct patterns.
Case A2 has some difficulties with frowning because of
BotoxTM injection. As the Frontalis muscle was weak due
to the injection, it could hardly produce the moderate
frowning patterns, which is why the classifier has less
accuracy for recognizing frowning activity, but has 100%
accuracy for the rest of the activities. In Case C1, the
results show that the discrimination ratio is decreased
because of the subject’s lack of attention due to his age and
playful behavior during training and performing of the
requested tasks. Thus, maintaining moderate attention
towards the task is a prerequisite of using our HMI.
Experiment #7: comparing SFCM and ANFIS output
smoothness
Figure 7a and b shows the effects of the ANFIS classifier
and the Majority Voting (MV) algorithm on the method’s
performance. After training the classifier, the sequences of
gesture #3 (pulling up left lip corner) were fed to the
classifier. It is clear that using the ANFIS (after obtaining
SFCM) classifier makes the classifier more robust and
prevents out of range classifier outputs. In addition, by
using MV, most of the output fluctuations were restrained
as cleared in Fig. 7 (m set to 2).
Experiment #8: comparing the proposed classifier
accuracy with the other common classifiers
The 4 other common classifiers have been tested for par-
ticular volunteers (C1, A1 and A2) to see their perfor-
mances in the most complex cases in our experiments. In
Table 8, the average classifier accuracies were compared
for Class #1 gestures. It is obvious that SFCM ? ANFIS
has more accurate results and is a good choice in our
approach.
Experiment #9: comparing our proposed electrode site
selection and configuration with other related studies
To explore the novelties of our electrode placements and
configurations, the electrode placement suggested by Bar-
reto et al. [15] and their feature extraction method was
implemented to compare with our method. Their placement
was based on mono-polar configuration over the Corru-
gator Supercilii facial muscle and the Zygomaticus Major
facial muscle. They indicated that to cancel the effect of
the cross-talk between facial EMG due to different muscle
activity, one should use the spectral power in the
300–500 Hz range from the power spectrum density (PSD)
of the acquired signal as a feature. This is used because this
range could discriminate the signal between associated
voluntary facial movements under each electrode and the
cross-talk signals. For Case A1 and Case A2 (who had
some difficulties in performing the experiments), the
achieved results from implementation based on this method
Table 6 Discriminating wink class (Class #3) from other gesture
classes for Case A1
Considered
gesture classes
Feature
fusion
Best radius
average
ANFIS?SFCM
discrimination
ratio (average)
Smile ? wink fEMG 0.05 48.06
Eye movement ? wink fEMG ?
fEOG
0.0475 90.94
Wink fEMG 0.1425 99.68
Table 7 Average discrimination ratio in smile class for cases C1, A1
and A2, who have some difficulties in training and performing
gestures
Gesture Discrimination ratio (%)
Sub-group #1
(Case C1)
Sub-group #2
(Case A1)
Sub-group #2
(Case A2)
Smile 92.10 100 100
Pulling up right
lip corner
94.54 100 100
Pulling up left
lip corner
100 100 100
Frown 93.06 100 94.6
Average 94.42 100 98.65
Table 8 Comparison of different classifier discrimination ratios for cases A1, A2 and C1 (over smile class)
Method case SVM FCM SFCM SFCM ? ANFIS Fuzzy ARTMAP
A1 100 100 100 100 100
A2 95.82 94.49 95.1 98.65 96.11
C1 89.75 91.30 91.8 94.42 92.77
SVM support vector machine, FCM fuzzy c-means, SFCM subtractive fuzzy c-means clustering, ANFIS adaptive neuro-fuzzy inference system,
ARTMAP adaptive resonance theory MAP
Australas Phys Eng Sci Med
123
Author's personal copy
for Class #1 lead to 97.04 and 91.0% discrimination ratios,
respectively. Comparing the results with Table 7, it is clear
that our electrode placement leads to more accurate results.
It is also important that in our proposed method the elec-
trodes can be mounted on a sport headband and can easily
be placed over the user’s face, and the conductive volume
underneath each pair of electrodes is more localized.
Our proposed three-channel configuration is a robust
interface and the experimental results are similar to
CyberLink Brainfinger’s results [16] and underscore its
applicability. In addition, with our electrode configuration,
8 main facial gestures could be recognized and discrimi-
nated compared to 3 recognizable facial gestures (frown-
ing, moving eye left and right) when using CyberLink
based on its electrode configuration and 2 different muscle
activation states in Ref. [15] or 6 gestures in Ref. [17].
Moreover, image/video based detection systems can
discern basic facial gestures with an accuracy of 64–98%
[17]. Also, several problems are observed during experi-
ments using video-based gesture recognition systems, such
as drifts, loss of communication, and slow communication
rates. For subjects with insufficient muscles control or for
the movements with small changes in facial gesture (such
as clenching molar teeth) the camera-based methods could
become quite ineffective because the sensitivity of them is
low. In addition, the video-based gesture recognition
requires a high-speed image processing hardware, so the
overall cost of system becomes higher than our proposed
method. In addition, it requires fixed or predefined back-
grounds and camera positions for calibration and is suitable
for a small set of gestures [29] compare with our wide
range of gestures which could be recognized using only
three pairs of electrodes So, our achieved results stand in
the high rank of accuracy and also the implementation cost
of the proposed method is lower than video/image based
systems.
Thus, in this study we have achieved more capabilities
and have enhanced our system’s performance compared to
our last works [4–6] and others studies (mentioned above)
to allow discrimination of 8 facial gestures (smiling,
frowning, pulling up left/right lips corner, eye movement to
left/right/up/down) using a SFCM ? ANFIS. It is clear that
in spite of having more computation costs, one can achieve
a better discrimination ratio using the proposed method
compared to other classifiers and different electrode con-
figurations. Also, by employing our proposed method over
Case A1, Case A2, Case C1 and Case E1 it is obvious that
the method can be used with a high degree of robustness
and accuracy over a wide range of users, even ones with
difficulties in gesture generation (except E1).
It should be noted that all involved volunteers felt
comfortable during the experiments since no sophisticated
or complex gestures were asked to be performed. This is an
important cognitive issue when designing a human–
machine interface. If requirement is not followed, the user
may be burdened with extra cognitive pressure and as a
result be exhausted and eager to stop the experiment or use
of the HMI.
Conclusion and future works
This paper presents a careful offline study on three gesture
classes (8 gestures) among 10 healthy subjects and con-
siders different combinations of logical channels and dif-
ferent fusions of their features by employing SFCM and
ANFIS classifiers in order to discriminate between them.
Using ANFIS makes the learning phase of the classifier
quick enough and provides an extremely high degree of
accuracy and robustness. The electrodes can be installed
easily on the volunteer’s face using a headband, which
leads to quick setup time and is suitable to be used as a
human–machine interface. In addition, it has been shown
that each of the logical channels has its own noticeable
information content and one can explore individual ges-
tures (or states) within them.
In this study, fEMG and fEOG signals have been used to
discriminate between mentioned gestures. The main use of
fEMG is for detecting facial movements (like smiling,
pulling up lip corners and frowning). fEOG can enhance
the features’ space and helps classifiers to detect eye
movements and the level of attention in future approaches.
The main use of fEEG frequency bands could be in future
studies, by considering them as affective measures that can
mirror user’s emotional cues as important factors, allowing
for design, updating and reorganization of the interface.
Therefore, it can be concluded that the proposed method
can be used as a useful interface in human–machine
interaction applications and it can cover both physical
and cognitive aspects of human–machine interface
requirements.
Considering the tradeoff between the numbers of the
electrodes, acquired information content, computation cost
and discriminating power, it is also important to design an
interface that is safe, ergonomic and comfortable and also
easy-to-setup and easy-to-use. The achieved results in this
work emphasize that our proposed method complies with
the above requirements and demonstrates its efficacy in
healthy individual groups and the potential use for disabled
people. By employing the multi-modal capabilities of the
presented interface, the presented interface has potential to
be used in rehabilitation phase for people who are suffering
from facial muscles impediment. These people could use
this system as a training environment for practicing to
generate facial gestures and it could help them out to
reactive their weak muscles.
Australas Phys Eng Sci Med
123
Author's personal copy
In our future work, we will extend this interface to
achieve an online adaptable and co-adaptive interface for
rehabilitation and assistive purposes especially for disabled
population who could be the main users of our interface.
Furthermore, by using EEG data, the interface could
increase the context’s awareness in working sites and
prevent possible dangers might come for the user and its
working environment. This can be done by detecting his/
her feeling and mental states and then performing proper
action (help, alarm, or stop for example) for pursuing and
accomplishing the task.
Acknowledgments We would like to thank Dr. Christian Jones
from USC-Australia for sharing his expertise in the area of affective
computing and the help from eager volunteer participants is also
appreciated.
References
1. Huang CN, Chen CH, Chung HY (2004) The review of appli-
cations and measurements in facial electromyography. J Med
Biol Eng 25(1):15–20
2. Ang L, Belen E, Bernardo R, Boongaling E, Briones G, Coronel J
(2004) Facial expression recognition through pattern analysis of
facial movements utilizing electromyogram sensors. In: IEEE
TENCON2004, Chiang Mai, Thailand, pp 600–603
3. Firoozabadi M, Oskoei MA, Hu H (2008) A human–computer
interface based on forehead multichannel biosignals to control a
virtual wheelchair. In: ICBME08, Tehran, Iran
4. Mohammad Rezazadeh I, Wang X, Wang R, Firoozabadi M
(2009) Toward affective handsfree human–machine interface
approach in virtual environments-based equipment operation. In:
Training 9th international conference on construction applica-
tions on virtual reality (CONVR2009), Sydney, Australia
5. Mohammad Rezazadeh I, Firoozabadi M, Hu H, Hashemi Gol-
payegani MR (2010) Determining the surface electrodes locations
to capture facial bioelectric signals. Iran J Med Phys 7:65–79
6. Mohammad Rezazadeh I, Wang X, Firoozabadi M, Hashemi
Golpayegani MR (2011) Using affective human–machine inter-
face to interface to increase the operation performance in virtual
construction crane training system: a novel approach. Autom
Construct J 20:289–298
7. Mahlke S, Minge M (2006) Emotions and EMG measures of
facial muscles in interactive contexts. In: Conference in human
factor in Eng., Montreal, Canada
8. Rosalind P (1997) Affective computing. MIT Press, Cambridge
9. Vanhala T, Surakka V (2007) facial activation control (FACE).
In: Proceedings of ACII2007. Lecture notes in computer science,
vol 4738, pp 278–289
10. Ferreira A, Silva RL, Celeste WC, Bastos Filho TF, Filho MS
(2007) Human–machine interface based on muscular and brain
signals applied to a robotic wheelchair. In: 16th Argentine Bio-
eng. Cong. J. Physics, Conference Series 90, Argentina
11. Niemenlehto PH, Juhola M, Surakka V (2006) Detection of
electromyographic signal from facial muscles with neural net-
works. ACM Trans Appl Percept 3(1):48–61
12. Surraka V (1998) Contagion and modulation of human emotions.
University of Tampere, Tampere
13. Tsui C, Jia P, Gan JQ, Hu H, Yuan K (2007) EMG-based hands-
free wheelchair control with EOG attention shift detection. In:
IEEE international conference on robotics and biomimetics RO-
BIO2007, Sanya, China, pp 1266–1271
14. Hori J, Sakano K, Saitoh Y (2004) Development of communi-
cation supporting device controlled by eye movements voluntary
eye blink. In: Proc. the 26th annual international conference of
IEEE EMBS, San Francisco, CA, USA
15. Barreto AB, Scargle SD, Adjouadi M (2000) A practical EMG-
based human–computer interface for users with motor disabili-
ties. J Rehabil Res Dev 37(1):53–64
16. Brainfinger. http://www.brainfinger.com. Accessed on 19 March
2011
17. Chin ZY, Ang KK, Guan C (2008) Multiclass voluntary facial
expression classification based on filter bank common spatial
pattern. In: 30th annual international conference of IEEE EMBS,
Vancouver, BC, Canada
18. Kim KH, Yoo JK, Kim HK, Son W, Lee SY (2006) A practical
biosignal-based human interface applicable to the assistive sys-
tems for people with motor impairment. IEICE Trans Inform Syst
E89-D(10):2644–2652
19. Biopac. http://www.biopac.com, Accessed 19 March 2011
20. Oskoei M, Hu H (2007) Myoelectric control systems—a survey.
J Biomed Signal Process Control 2(4):275–294
21. Ajiboye AB, Weir R (2005) A heuristic fuzzy logic approach to
EMG pattern recognition for multi-functional prosthesis control.
IEEE Trans Neural Syst Rehabil Eng 13(3):280–291
22. Fukuda O, Tsuji T, Kaneko M, Otsuka A (2003) A human-
assisting manipulator teleoperated by EMG signals and arm
motions. IEEE Trans Robotics Autom 19(2):210–222
23. Momen K, Krishnan S, Chau T (2007) Real-time classification of
forearm electromyographic signals corresponding to user-selec-
ted intentional movements for multifunction control. IEEE Trans
Neural Syst Rehabil Eng 15(4):535–542
24. Moertini V (2002) Introduction to five data clustering algorithm.
Integral 7(2):87–96
25. Priyona A, Ridwan M, Alias A, Atiq R, Rahmat OK, Hassan A,
Ali A (2003) Generation of fuzzy rules with subtractive cluster-
ing. Universiti Teknologi Malaysia. Jurnal Teknologi 43(D):
143–153
26. Palot K, Yosunkaya S, Gunes S (2008) Comparison of different
classifier algorithms on the automated detection of obstructive
sleep apnea syndrome. J Med Syst 32:243–250
27. Khezri M, Jahed M (2003) Real-time intelligent pattern recog-
nition algorithm for surface EMG signals. Biomed Eng Online
6:45
28. Paradiso G, Cunic D, Gunraj C, Chen R (2005) Representation of
facial muscles in human motor cortex. J Physiol 567:323–336
29. Ahsan MD, Ibrahimy M, Khalifa O (2009) EMG signal classifi-
cation for human computer interaction: a review. Eur J Sci Res
33(3):480–501
Australas Phys Eng Sci Med
123
Author's personal copy