ORIGINAL ARTICLE
Using Ultrasound to Quantify Tongue Shape and Movement Characteristics
Natalia Zharkova, M.A., Ph.D.
Objective: Previous experimental studies have demonstrated abnormal lingual articulatorypatterns characterizing cleft palate speech. Most articulatory information to date has beencollected using electropalatography, which records the location and size of tongue-palatecontact but not the tongue shape. The latter type of data can be provided by ultrasound. Thepresent paper aims to describe ultrasound tongue imaging as a potential tool for quantitativeanalysis of tongue function in speakers with cleft palate. A description of the ultrasoundtechnique as applied to analyzing tongue movements is given, followed by the requirements forquantitative analysis. Several measures are described, and example calculations are provided.
Measures: Two measures aim to quantify overuse of tongue dorsum in cleft palatearticulations. Crucially for potential clinical applications, these measures do not require head-to-transducer stabilization because both are based on a single tongue curve. The other threemeasures compare sets of tongue curves, with the aim to quantify the dynamics of tonguedisplacement, token-to-token variability in tongue position, and the extent of separationbetween tongue curves for different speech sounds.
Conclusions: All measures can be used to compare tongue function in speakers with cleftpalate before and after therapy, as well as to assess their performance against that in typicalspeakers and to help in selecting more effective treatments.
KEY WORDS: cleft palate, cleft palate speech, lingual articulation, measurement, tongue,
ultrasound
Experimental studies of cleft palate (CP) speech have
suggested that impaired development of tongue function, as
well as structural abnormalities, can limit the ability of the
tongue to make fine adjustments required for producing
fully intelligible speech (Hardcastle et al., 1989). Many CP
error patterns include lingual misarticulation (Grunwell,
1993; Gibbon, 2004). The main articulatory technique that
has been used in research and treatment is electropalato-
graphy (EPG) (Gibbon & Lee, 2011), which records the
location and size of tongue-palate contact but not the
tongue shape. Information on tongue shapes can be
obtained with ultrasound tongue imaging relatively easily
compared with other imaging techniques. Ultrasound has
been used in speech research for the last four decades (see
Lee et al., in press, for a review). Several recent publications
have reported results of visual feedback therapy using
ultrasound (Bernhardt et al., 2003; Bernhardt et al., 2005;
Bacsfalvi et al., 2007) and of qualitative ultrasound analysis
of CP compensatory articulations (Bressmann et al., 2011).
No quantitative measurements of CP speech in research or
therapy have been reported yet. Using ultrasound-based
measures of tongue function in addition to established
techniques in speech therapy will help to improve
diagnostic accuracy, which would allow more effective
treatments to be selected. The present paper describes
several measures based on tongue contours that could be
used to quantify abnormal tongue patterns in CP speech.
Ultrasound does not require anything to be inserted into
the speaker’s mouth. When the transducer is placed below
the chin, an image of the tongue outline is displayed on the
screen (see Fig. 1). The image in the figure is midsagittal,
the area just below the bright white line is the tongue body,
and the anterior part of the tongue is on the right. The
shadows of the hyoid bone and of the mandible are shown
as dark areas. The tongue tip may not always be visible on
ultrasound scans because it can be obscured by the air
below it or by the shadow of the mandible. When scanning
the tongue in speech, ultrasound does not normally image
any structures in the vocal tract other than the tongue, so it
is impossible to use any of these structures as a constantly
present reference for quantitative analysis. In order to
quantify the difference between two or more tongue curves,
Dr. Zharkova is Research Fellow, Clinical Audiology, Speech and
Language Research Centre, Queen Margaret University, Edinburgh,
United Kingdom.
Supported by an Economic and Social Research Council (ESRC)
research grant RES-000-22-4075. The data used in the paper were
collected by the author within her Ph.D. project (Zharkova, 2007),
supported by a Ph.D. studentship from Queen Margaret University
College (2003 to 2006) and ESRC research grants RES-000-22-2833 (2008
to 2009) and RES-000-22-4075 (2010 to 2011) to the author.
Submitted August 2011; Revised October 2011; Accepted November 2011.
Address correspondence to: Dr. Natalia Zharkova, Clinical Audiology,
Speech and Language Research Centre, Queen Margaret University,
Queen Margaret University Drive, Musselburgh EH21 6UU, East
Lothian, UK. E-mail [email protected].
DOI: 10.1597/11-196
The Cleft Palate-Craniofacial Journal 50(1) pp. 76–81 January 2013’ Copyright 2013 American Cleft Palate-Craniofacial Association
76
they need to be within the same coordinate system. For a
given speaker, consistent positioning of the tongue in the
same coordinate space can be achieved by stabilizing the
ultrasound transducer in relation to the head over a
number of repetitions (Stone, 2005). It would be a big
advantage for therapy if meaningful measurements could
be taken without the need for head-to-transducer stabili-
zation. One way to make such measurements is to use data
from a single tongue curve, rather than from multiple
curves. The next section proposes two simple measures
relevant to CP speech, which are based on a single curve.
MEASURES BASED ON A SINGLE TONGUE CURVE
Many errors in CP speech can be explained by an
abnormal position of the tongue, largely confined to the
upper posterior area of the oral space, and overuse of the
tongue dorsum in articulations (Lawrence and Philips,
1975; see also Hardcastle et al., 1989). This behavior of the
tongue has been termed lingual assistance by Trost (1981)
because the high tongue body posture can help a speaker
with velopharyngeal insufficiency to produce consonants
that require build up of pressure inside the oral cavity. The
raised and retracted tongue position has generally been
attributed to a habit developed before surgery. Compensa-
tory movements of the tongue are usually detected by
auditory analysis and/or transcription, both of which are
subjective methods. Cleft palate speech contains notori-
ously complex phonetic material, which leads to low
intertranscriber agreement (Howard and Heselwood,
2002). Important drawbacks of x-ray and EPG, both of
which have also been used to detect compensatory
articulations in CP speech (Gibbon and Lee, in press), are
that the former is hazardous to health and the latter
requires patients to wear artificial palates. Ultrasound can
overcome limitations of all these methods. Bressmann et al.
(2011) carried out a qualitative ultrasound analysis of
midsagittal tongue contours in five speakers with CP
and concluded that ‘‘ultrasound provides useful diagnostic
information for the analysis of cleft-type compensatory
articulations’’ (p. 4). In addition, and very important,
ultrasound can be used with infants and toddlers (Suzuki et
al., 2006; Gick et al., 2008) when the transducer is handheld.
Two measures described below aim to directly assess the
extent of tongue dorsum involvement in articulation.
Crucially for clinical applications, both measures do not
require head-to-transducer stabilization because they are
based on a single tongue image. The scans necessary for
both measures can be obtained in the clinic, even with a
small portable ultrasound scanner. During the recording it
must be ensured that the entirety of the curve located
between the shadow of the hyoid bone and the shadow of
the mandible is present in the image for the speech sound(s)
to be analyzed. The tongue contour between the two
shadows is then traced and represented as a series of
x-y points. The calculations are performed using R (R
Development Core Team, 2011). Each measure is based on
a series of x-y points for one tongue curve. Euclidean
distances are used for calculations. Illustrations of mea-
surements are provided in Figure 2 for two consonants, /k/
and /t/ (from the syllables /ka/ and /ta/, respectively), and
the vowel /a/, all produced by the same adult male speaker
of Southern British English, without speech disorders. The
data to be assessed were collected upon ethical approval
obtained following standard procedures at Queen Margaret
University. Informed consent was obtained from all adult
speakers and parents/caregivers of child speakers.
Dorsum Excursion Index (DEI). A straight line N is
traced between the ends of the curve for a given speech
sound. The middle of N is defined. Perpendicular lines to N
are traced from each point on the tongue curve. The
perpendicular line that crosses N at the midpoint of N is
identified (the solid line D). D represents the extent of the
dorsum excursion in relation to the tongue front and back.
The point at which D crosses N is labeled ‘‘1’’ on the
graphs. DEI is computed as the ratio of D to N. The
greater the value of DEI, the more the tongue dorsum
excursion, and, in the case of CP speech, the more its
potential overuse. In the examples, DEI is 0.50 for /k/, 0.26
for /t/, and 0.33 for /a/. The dorsum is most raised for /k/.
The consonant /t/ requires the tongue tip and blade rising;
whereas, the dorsum during /t/ in this vowel context is
rather low for typical speakers of English, hence the lower
DEI value for /t/. It needs to be pointed out that D and
consequently DEI could potentially be slightly affected (in
most cases reduced) if some of the anterior tongue is
missing from the ultrasound image, as may be the case in
speech sounds that require the tongue tip to be high and
advanced. However, any such influence would be too subtle
to affect larger scale differences due to the tongue dorsum
position change. For example, using the data from Figure 2
and performing calculations on the portion of the tongue
curve not including 1 cm at the front, DEI is 0.46 for /k/,
0.22 for /t/, and 0.31 for /a/.
FIGURE 1 An ultrasound image taken at the middle of the consonant /s/
from /sa/, produced by a boy aged 10 years 10 months, a speaker of Standard
Scottish English. The anterior part of the tongue is on the right in all
the figures.
Zharkova, USING ULTRASOUND TO QUANTIFY TONGUE FUNCTION 77
Tongue Constraint Position Index (TCPI). When thetongue dorsum is raised to the position for /k/, in typical
speakers it is expected to be the furthest part of the tongue
from N. The longest line between the tongue curve and N,
parallel to D, is referred to as L (the dashed line). The point
at which L crosses N is labeled ‘‘2’’ on the graphs. The
TCPI is the proportion of (N/2) taken by the distance
between D and L (i.e., the distance between points 1 and 2
on the graphs). The greater this distance, the less likely the
dorsum is to be the active articulator. Positive and negative
TCPI values mean, respectively, that the most constrained
part of the tongue is further forward or further back in the
mouth. In the examples, TCPI is 0.03 for /k/, 0.19 for /t/,
and 20.12 for /a/.
During speech sounds not involving the dorsum as active
articulator, such as alveolar consonants, speakers with CP
who overuse the tongue dorsum are expected to have higher
DEI values and lower (with more negative) TCPI values
than speakers with no speech disorders or speakers with CPwho do not overuse the tongue dorsum.
MEASURES BASED ON SETS OF TONGUE CURVES
Evidence from tongue-palate contact patterns suggests
that speakers with CP may have reduced tongue shape
range and complexity (for a review of EPG studies
reporting abnormal tongue patterns, see Gibbon, 2004).
Cleft palate speech is also characterized by high within-
speaker variability (Yamashita et al., 1992; Howard, 2004;
Bressmann et al., 2011). Three measures described below
aim to quantify variation in tongue shape and position, in
space and over time. These measures are based on
comparing sets of curves, so they require head-to-trans-
ducer stabilization. In the absence of a stabilizing device,procedures of signal processing need to be applied after
recording (Mielke et al., 2005), which would place tongue
curve data from different repetitions or from multiple
sessions of the same speaker in a single coordinate system.
Tongue Dynamics. When two consecutive sounds require
contrasting tongue positions in the midsagittal plane, a
certain amount of tongue movement is expected in typical
speakers. For example, the transition between the conso-
nants /k/ and /l/ in the word clown requires simultaneous
raising of the tongue tip and lowering of the tongue
dorsum. Speakers with CP may substitute glottal or
pharyngeal articulations for lingual articulations. This
may be one of the reasons why proportionately less
movement and/or abnormal timing could be expected
in CP speakers (Gibbon, 2004). Figure 3A and 3B show
tongue contours throughout the consonant /s/ (at equal
intervals, 10 milliseconds each), from the sentence ‘‘It’s asea, Pam,’’ produced by an adult female without speech
disorders and a typically developing boy aged 10 years
11 months, respectively. Visual observation suggests that in
the adult, more movement occurs in the second half of /s/
than in the first half; whereas, the rate of tongue movement
during the consonant in the child is more even. Using the
nearest-neighbor distance method (Zharkova and Hewlett,
2009), mean nearest-neighbor distances are calculated
between consecutive pairs of tongue curves in Python
FIGURE 2 Single tongue curves from the middle of three speech sounds.
A: /k/. B: /t/. C: /a/. All three sounds were produced by the same speaker.
Individual data points in the tongue curves are plotted as empty circles.
Explanations for the labels are provided in the text.
78 Cleft Palate–Craniofacial Journal, January 2013, Vol. 50 No. 1
(Lutz, 2008). In order to achieve normalization for time
across speakers, x-y values for the tongue curve data from
the same number of curves (17 in this example) are used for
each speaker. An array or a sum of these distances can be
analyzed, to provide, respectively, information on the
dynamics of tongue displacement or a number representing
the total amount of tongue travel. In this example, for
establishing whether the difference in tongue movement
rate in the two halves of the consonant between the two
speakers was statistically significant, an analysis of vari-
ance (ANOVA) was carried out, comparing the amount
of tongue displacement in 16 individual intervals with
independent variables Speaker (adult versus child) and Half
(first versus second). Table 1 presents the sum of displace-
ments, in millimeters, in the first and the last eight intervals
of /s/ for each speaker. The ANOVA showed a significant
interaction between the two independent variables (F1,28 5
6.34, p , .05), suggesting that the tongue displacement
during /s/ in the adult was indeed less even than in the child.
Table 1 shows that the absolute total amount of tongue
travel during /s/ was greater in the adult than in the child.
For comparing the total distance traveled by the tongue
during /s/ across speakers, a normalization of all distance
values for vocal tract size needs to be carried out (the
procedure is described in greater detail in Zharkova et al.,
2011). The normalization procedure consists of two steps.
First, tongue length values, in millimeters, are calculated
for all speakers. The measurement is carried out on the
imaged tongue surface between the shadow of the mandible
and the shadow of the hyoid bone. In order to minimize
any possible differences across speakers in the imaged
tongue length, this measurement needs to be consistently
carried out on the tongue contours of an open vowel (/a/).
When producing such a vowel, the tongue tip is low and
therefore likely to be present in the ultrasound image; the
larynx is also relatively low, thus maximizing the imaged
tongue contour to the front of the hyoid bone shadow. A
speaker with the greatest length of imaged tongue surface is
identified, and the tongue length value for each speaker is
expressed as a proportion of this length. In the example
from Figure 3, the tongue length for the adult is 80.30 mm,
and the tongue length for the child is 61.71 mm.
Proportionate tongue length value for the adult is 1, and
for the child it is 0.77. In the second step, the total distance
values are divided by the proportionate tongue length
values, separately for each speaker. The resulting numbers
are 9.74 mm for the adult and 6.88 mm for the child.
Variability. Ultrasound data capture variability in
tongue position across and within speakers very accurately,
unlike perceptual analysis. The measure of token-to-token
variability illustrated below is based on distances between
tongue curves from a number of repetitions of the same
speech sound. A more detailed description of this measure
can be found in Zharkova et al. (2011). Figure 4A and 4B
display tongue curves from the consonants /s/ and /#/,produced by a woman and by a boy aged 6 years 4 months.
Both speakers have no known speech disorders. The curves
for /s/ are more tightly packed together in the adult than in
the child. In order to quantify this difference, mean nearest-
neighbor distances are calculated between all /s/ curves,
separately for each speaker. They are referred to as within-
set distances (WS). The number of WS distances equals
(M 3 (M 2 1))/2, where M is the number of curves in a set.
In this example, there are 10 repetitions of each consonant,
so we obtain 45 WS distances for each speaker. For the
adult, the mean WS is 0.94 mm, and for the child it is
1.74 mm. Given that the tongue is substantially shorter in
the child, we would expect smaller WS values in the child
than in the adult if the two speakers had the same extent of
FIGURE 3 Tracings of tongue curves in speakers of Standard Scottish
English; successive tongue curves over the consonant /s/ from /si/. A: Adult
data—the first 11 tongue contours are in solid lines, the last 10 tongue
contours are in dotted lines. B: Child data—the first nine tongue contours
are in solid lines, the last nine tongue contours are in dotted lines.
TABLE 1 Amount of Tongue Travel (in mm) During the First
and Second Halves of /s/, as Well as the Total Amount of
Tongue Travel During /s/, in Two Speakers
Adult Child
First half of /s/ 2.83 1.98
Second half of /s/ 6.91 3.32
Total amount 9.74 5.30
Zharkova, USING ULTRASOUND TO QUANTIFY TONGUE FUNCTION 79
token-to-token variability. Thus, obtaining significantly
greater WS distances for the child allows us to conclude
that there is more token-to-token variability in the child
than in the adult, and we do not need to carry out
normalization for differences in tongue length.
Separation of Tongue Curves. The ability to separate
tongue postures for different sounds can be crucial for
making perceptually important contrasts, such as between
the consonants /s/ and /#/. In English, /#/ has a higher and
more fronted tongue blade position than /s/. The sets of
curves for /s/ and /#/ in the child (Fig. 4B) seem to be less
separated than those in the adult (Fig. 4A). Mean nearest-
neighbor distances between each curve in one set and all
the curves in the other set (across-set distances, AS) are
calculated, separately for each speaker. The number of AS
distances equals M 3 M, where M is the number of curves
in a set. In this example, we obtain 100 mean AS distances
for each speaker. The mean AS value is 6.69 mm for the
adult, and 4.50 mm for the child. The greater the AS
distances, the more separation there is between the tongue
curves for /s/ and /#/. However, greater token-to-token
variability, as in the child /s/, also increases AS distances.
To account for any such influence, AS distances between
/s/ and /#/ are divided by the average of /s/ and /#/ WS
distances (to simplify the explanation, only the means
were used in this example calculation). The mean values,
representing the extent of separation between /s/ and /#/curves, are 6.93 for the adult and 2.88 for the child.
The measures described in the present paper could be
used to compare tongue movement patterns in people with
and without CP, to compare tongue positions before and
after intervention, or to assess whether reduction in
variability after therapy has occurred. Research studies
need to be carried out, in order to provide sufficient
amount of data from speakers without CP, as well as data
from speakers with CP who do not have speech disorders.
Several other ultrasound-based techniques for lingual
articulation analysis could potentially be applied to CP
speech. They include measures of midline tongue grooving,
anteriority in the midsagittal plane, and asymmetry in the
coronal plane (Bressmann et al., 2005), and the smoothing
spline ANOVA technique aimed at identifying parts of the
tongue where significant differences between two sets of
curves occur (Davidson, 2006; see also Mielke et al., 2010,
where the technique was adapted to assess the extent of
differences between sets of curves). The ‘‘differentiation
index,’’ addressing complexity in tongue shape (Gick et al.,
2008), could also be applied to CP articulations, particu-
larly where glottal or pharyngeal substitutions result in the
lack of tongue shape complexity. All these methods could
be used to complement the measures described above.
Acknowledgments. I am grateful to Fiona Gibbon and two anonymous
reviewers for extremely helpful comments and suggestions to the
manuscript.
REFERENCES
Bacsfalvi P, Bernhardt B, Gick B. Electropalatography and ultrasound in
vowel remediation for adolescents with hearing impairment. Adv
Speech Lang Pathol. 2007;9:36–45.
Bernhardt B, Gick B, Bacsfalvi P, Adler-Bock M. Ultrasound in speech
therapy with adolescents and adults. Clin Linguist Phon. 2005;
19:605–617.
Bernhardt B, Gick B, Bacsfalvi P, Ashdown J. Speech habilitation of hard
of hearing adolescents using electropalatography and ultrasound as
evaluated by trained listeners. Clin Linguist Phon. 2003;17:199–216.
Bressmann T, Radovanovic B, Kulkarni GV, Klaiman P, Fisher D. An
ultrasonographic investigation of cleft-type compensatory articulations
of voiceless velar stops [online publication ahead of print]. Clin Linguist
Phon. 2011;25:1028–1033.
Bressmann T, Thind P, Bollig CM, Uy C, Gilbert RW, Irish JC. Quantitative
three-dimensional ultrasound analysis of tongue protrusion, grooving and
symmetry: data from twelve normal speakers and a partial glossectomee.
Clin Linguist Phon. 2005;19:573–588.
Davidson L. Comparing tongue shapes from ultrasound imaging using
smoothing spline analysis of variance. J Acoust Soc Am. 2006;120:
407–415.
Gibbon F. Abnormal patterns of tongue-palate contact in the speech of
individuals with cleft palate. Clin Linguist Phon. 2004;18:285–311.
FIGURE 4 Tracings of tongue curves in speakers of Standard Scottish
English; tongue curves at midconsonant from 10 repetitions of the
consonant-/a/ syllable: /s/—solid lines; /#/—dotted lines. A: Adult data. B:
Child data. The plots are based on recordings from the database of child and
adult lingual articulations (Zharkova, 2009).
80 Cleft Palate–Craniofacial Journal, January 2013, Vol. 50 No. 1
Gibbon F, Lee A. Articulation—instruments for research and clinical
practice. In: Howard S, Lohmander A, eds. Cleft Palate Speech:
Assessment and Intervention. Oxford: Wiley-Blackwell; 2011:221–238.
Gick B, Bacsfalvi P, Bernhardt BM, Oh S, Stolar S, Wilson I. A motor
differentiation model for liquid substitutions: English /r/ variants
in normal and disordered acquisition. Proceedings of Meetings on
Acoustics. 2008;1,060003:1–9.
Grunwell P, ed. Analysing Cleft Palate Speech. London: Whurr
Publishers; 1993.
Hardcastle W, Morgan Barry R, Nunn M. Instrumental articulatory
phonetics in assessment and remediation: case studies with the
electropalatograph. In: Stengelhofen J, ed. Cleft Palate: The Nature
and Remediation of Communication Problems. Edinburgh: Churchill
Livingstone; 1989:136–164.
Howard SJ. Compensatory articulatory behaviours in adolescents with
cleft palate: comparing the perceptual and instrumental evidence. Clin
Linguist Phon. 2004;18:313–340.
Howard SJ, Heselwood BC. Learning and teaching phonetic transcription
for clinical purposes. Clin Linguist Phon. 2002;16:371–401.
Lawrence CW, Philips BJ. A telefluoroscopic study of lingual contacts
made by persons with palatal defects. Cleft Palate J. 1975;12:85–94.
Lee A, Zharkova N, Gibbon F. Vowel imaging. In: Ball M, Gibbon F,
eds. Handbook of Vowels and Vowel Disorders. 2nd ed. Psychology
Press; In press.
Lutz M. Learning Python. 3rd ed. Beijing: O’Reilly Media; 2008.
Mielke J, Baker A, Archangeli D. Variability and homogeneity in
American English /
r
/ allophony and /s/ retraction. In: Fougeron C,
Kuhnert B, D9Imperio M, Valee N, eds. Laboratory Phonology 10.
New York: Walter de Gruyter; 2010:699–730.
Mielke J, Baker A, Archangeli D, Racy S. Palatron: a technique for
aligning ultrasound images of the tongue and palate. Coyote Papers.
2005;14:96–107.
R Development Core Team. R: a language and environment for statistical
computing. Vienna: R Foundation for Statistical Computing; 2011.
Available at http://www.R-project.org. Accessed August 9, 2011.
Stone M. A guide to analyzing tongue motion from ultrasound images.
Clin Linguist Phon. 2005;19:455–501.
Suzuki K, Yamazaki Y, Sezaki K, Nakakita N. The effect of preoperative
use of an orthopedic plate on articulatory function in children with
cleft lip and palate. Cleft Palate Craniofac J. 2006;43:406–414.
Trost JE. Articulatory additions to the classical description of the speech
of persons with cleft palate. Cleft Palate J. 1981;18:193–203.
Yamashita Y, Michi K-I, Imai S, Suzuki N, Yoshida H. Electropalato-
graphic investigation of abnormal lingual-palatal contact patterns in
cleft palate patients. Clin Linguist Phon. 1992;6:201–217.
Zharkova N. An Investigation of Coarticulation Resistance in Speech
Production Using Ultrasound [PhD thesis]. Queen Margaret University,
Edinburgh; 2007.
Zharkova N. An ultrasound/acoustic database of lingual articulation in
children and adults [computer database]. Colchester, Essex: UK Data
Archive, UKDAstore, September 2009; UKDA-store record number
280.
Zharkova N, Hewlett N. Measuring lingual coarticulation from midsag-
ittal tongue contours: description and example calculations using
English /t/ and /a/. Journal of Phonetics. 2009;37:248–256.
Zharkova N, Hewlett N, Hardcastle WJ. Coarticulation as an indicator of
speech motor control development in children: an ultrasound study.
Motor Control. 2011;15:118–140.
Zharkova, USING ULTRASOUND TO QUANTIFY TONGUE FUNCTION 81