+ All Categories
Home > Documents > 1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

Date post: 05-Apr-2018
Category:
Upload: lu-san
View: 220 times
Download: 1 times
Share this document with a friend
3
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-32, NO. 2, MARCH 1986 307 TABLE II GROUP WITH THE OPTIMAL STATISTICAL PE RFORMANCE FOR THE FIRST-ORDER MARROV PROCESS n P 6 8 12 16 18 24 30 32 36 40 42 48 54 56 60 64 ‘.9 S, Q2 G X s3 CM Cl, C C C '36 GO C 1.99 s, Q, C, x S, C, x Q2 C, x S , C, TQ, C, ?:es, C, TQ, s3 X S, Q, X c5 s3 k c c54 c56 s, x""Q, S, x C, Q2 x C, S, ?C,, Q22Q; TABLE III DISPERSION FOR THE GROUP FILTER WITH G = S, X C, P 0.5 0.6 0.7 0.8 0.9 0.92 0.94 0.96 0.99 Ds3xc2 0.4545 0. 4294 0.3944 0.3437 0.2628 0.2398 0.2128 0.1800 0.111 TABLE IV GROUP WITH THE OPTIMAL STATISTICAL PERFORMANCE FOR THE RANDOM SINE WAVE X 6 8 12 16 18 24 30 32 3: 40 42 48 54 56 60 64 1.01 c, c, c,, cl6 S, x ~3 S3 x C, S3 x C, Q2 x C, S3 X G Q2 X G s, x c, S, x C, S, x C, Q2 x C, 1.05 2 ;: 22 “, 2 z;x”,‘: s3 x Cl0 ep2 ; g3 s, x c, s, x c, s, x c, Q2 x C, S, x S, Q2 x C, S3 x C, S3 X Q2 S, x C, Q2 x G S3 x Go Q: x Q: s3 1.1 S3 Q2 C2 x S, G x Q2 s3 s3 c,, G4 c56 60 C 64 ACKNOWLEDGMENT The authors wish to thank Mr. 0. Zimmerman for conducting the computer experiments of Section III. VI PI I31 [41 [51 [61 [71 WI [91 PO1 Ull v31 v41 [I51 V61 (171 IlRl I1 91 REFERENCES D. F. Elliott and K. R. Rae, Fast Transforms: Algorithm, Analyses und Applications. New York: Academic, 1983. H. C. Andrews, Picture Processin g and D igital Filtering. New York: Springer-Verlag. 1975. N. Ahmed and K. R. Rae, Orthogonal Transform for Digitul Si gnal Processing. New York: Springer-Verlag. 1975. B. J. Fino and V. R. Algazi, “A unified treatment of discrete f ast unitary transforms,” SIAM J. Computing, vol. 6, pp. 700-717, Dec. 19 77. H. C . Andrew and K. L. Caspari, “Degrees of freedom and modular structure in matrix multiplication,” IEEE Trans. Comput., vol. C-19, pp. 16-25, 1970. N. Ahmed, T. Natarajan, and K. R. Rao, “ Discrete cosine trans form,” IEEE Trans. Comput., vol. C-23, pp. 90-93, 1974. J. Pearl, “Basis restricted transformations and performance measures for spectral representations,” IEEE Trms. Inform. Theory, vol. IT-17, pp. 751-752, 1971. H. S. Kwak, R. Srinivasan, and K. R. Rao, “C-matrix transforms,” submitted to IEEE Truns. Acoust. Speec h, Signul Process ng. W. M. Chen, C. M. Smith, and S. C. Fralick, “A fast computational algorithm for the discrete cosine transform,” IEEE Truns. Commun., vol. COM-25, pp. 1004-1009.1977. K. R. Rao, J. G. K. Kuo, and M. A. Narasimhan, “Slant-Haar trans- form,” Int. J. Comput er M&h., sec. B, 7, pp. 73-83, 1979. E. A. Trachtenberg, “Fast Wiener filtering computation technique,” in Proc. 1979 Int. Symp. on the Muthemuticul Theory of Networks and Swtems. vol. 3, Delft, The Netherlands pp. 174-177. E. A. Trachtenberg, “Construc tion of fast unitary transforms which are equivalent to Karhunen-Loeve spectral representations , in Proc. 1980 IEEE Int. Symp. on EMC, Baltimore, MD, USA, pp. 376-379. M. G. Karpovsky and E. A. Trachtenberg, “Some optimization prob- lems for convolution systems over finite groups,” In/arm. Corm., vol. 34, pp. 227-247.1977. J. Pearl, “Optimal dyadic models of time-invariant systems ,” IEEE Trum. Comp;t., vol. C-24, 1975, pp. 598-602. W. K. Pratt, “Generalized Wiener filtering computation technique,” IEEE Truns. Comput., vol. C-21, pp. 636-641, 1972. J. Pearl, “Walsh processing of random signals,” IEEE Truns. Electro- mugn. Cornput., vol. EMC-13, pp. 137-141, 1971. -, “On coding and filtering stationery signals by discrete Fourier transform,” IEEE Truns. Inform. Theory, vol. IT-19, pp. 229-232, 1973. -. “Asymptotic equivalence of spectral representations,” IEEE Truns. Acoust.. Speech, Signul Processing, vol. ASSP-23, pp. 547-551. 1975. Y. Yemini and J. Pearl, “Asymptotic properties of discrete unitary transforms,” IEEE Trans. Pattern Anal. Muchine Intell., vol. PAMI-1, pp. 366-371. 1979. PO1 WI WI 1231 [241 1251 WI v71 WI M. G. Karpovsky, “Fast Fourier transforms on finite non-Abelian groups,” IEEE Truns. Comput., vol. C -26, pp. 1028-1030, 1977. P. J. Nicholson, “Algebraic theory of finite Fourier transforms,” J. Comput. Syst. Sci., pp. 524-547, 1971. L. Domhoff, Group Representation Theory. New York: Marcel Dekker, 1971. M. G. Karpovsky and E. A. Trachtenberg, “Fourier transform over finite groups for error detection and error correction in computation channels,” Inform. Contr., vol. 40, pp. 335-358, 1979. M. D. Flickner and N. Ahmed, “A derivation for the discrete cosine transform,” Proc. IEEE, vol. 70, pp. 1132-1134, 1982. M. Hamidi and J. Pearl, “Comparison of cosine and Fourier transforms of Markov-1 signals,” IEEE Trans. Acourt., Speech, Signcrl Processing, vol. ASSP-24, pp. 428-429 ,1976. E. A. Trachtenberg, “Systems over finite groups as suboptimal Wiener filters: A comparative study,” presented at 1983 Int. Symp. on Muthe- nwtical Theory of System and Networks, Beer-Sheva, Israel, 1983. M. G. Karpovsky, Finite Orthogonal Series in the Design of Digital Deuices. New Y ork: Wiley, 1976. E. A. Trachtenberg, “Construction of Group Transforms Subject to Several Performance Criteria,” IEEE Tmns. Acourt. Speech, Signal Processing, vol. ASSP-33, pp. 1521-1531, 1985. Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains B. H. JUANG, STEPHEN E. LEVINSON, SENIOR MEMBER, IEEE, AND M. M. SONDHI Abstract-To use probabilisti c fun ctions of a Markov chain to model certain parameterizati ons of the speech signal, we extend an estimation technique of Liporace to the cases of multivariate mixtures, such as Gaussian sums, and products of mixtures. We al so show how thes e problems relate to Liporace’s original framework. Manuscript received September 10, 1984; rev ised July 12, 1985. This work was presented at the 1985 IEEE International Symposium on Information Theory, Brighton, England, June 24-28. The a uthors are with Acoustics Research Department, Bell Laboratories, Murray Hill, NJ 07974. IEEE Log Number 8406633. 001%9448/86/0300-0307$01.00 01986 IEEE Authorized licensed use limited to: UNIVERSIDADE DE BRASILIA. Downloaded on June 23, 2009 at 11:23 from IEEE Xplore. Restrictions apply.
Transcript
Page 1: 1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

8/2/2019 1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

http://slidepdf.com/reader/full/1986levinsonmaximum-likelihood-estimation-for-multivariate-mixture-observations 1/3

Page 2: 1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

8/2/2019 1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

http://slidepdf.com/reader/full/1986levinsonmaximum-likelihood-estimation-for-multivariate-mixture-observations 2/3

308 IEEE TRANSACTIONSON INFORMATIONTHEORY,VOL.IT-32,NO.2,~ARCH1986

INTRODUCTION

In a recently published paper, Liporace [6] derived a methodfor estimating the parameters of a broad class of elli pticallysymmetric probabilistic functions of a Markov chain. Thecorollary to that work presented here was motivated by the desireto use this general technique to mode l the speech signal for whichit is known [4], [8] that, unfortunately, certain of its most usefulparameterizations do not possess he prescribed symmetry. Since

any continuous probability density function can be approximatedarbitrarily closely by a normal mixture [9], it i s reasonable to usesuch constructs to avoid the restrictions imposed by the require-ment of elli ptical symmetry. In thi s correspondence we adapt themethod and proof of [6] to two types of mixture densities.

02~ (0) = 0. Thus a recursive application of Y to somevalue of X converges to a local maximum (or possibly an tion point) of the likelihood functions. Liporace’s result laxed the original requirement of Baum et al. [2] that 5(xstrictly log concave to the requirement that it be strictconcave and/or elliptically symmetric. We will further exteclass of admissible pdf’s to mixtures and products of mixtustrictly log concave and/or elliptically symmetric densities

For the present problem we will show that a suitable ma

Y is given by the following equations:T-l

C a,(i>u,jbj(O,+l)p,+l(j)Zij =quij) = t=l

T-l

NOMENCLATURE c %(9&(i)Throughout this presentation we shall, where possible, adopt

the notation used in [6]. Consider an unobservable n-state Markovchain with state transition matrix A = [ u~~],,~,~. ssociated witheach state j of the hidden Markov chain is a probability densityfunction b,(x), of the observed d-dimensional random vector x.Here we shall consider densities of the form

t=1

Cjk .7(cjk) = ‘=;

t~14i)8,(i) ’

(1)k=l

where m is known; cjk 2 0 for 1 5 j I n, 1 I k 5 m;C;‘,ic,, = 1 for 1 < j I n; and .N(x, p, U) denotes the d-di-mensional normal density function of mean vector p and covari-ance matrix U.

It is convenient then to think of our hidden Markov chains asbeing defined over a parameter manifold A = { zZpp” Q” x gdx @‘}, where .sP is the set of all n x n row-wise stochasticmatrices; Y”’ is the set of all M x n row-wise stochastic matrices;Wd is the usual d-dimensional Euclidean space; and ed is theset of all d x d real symmetric positive definite matrices. Thenfor a given sequence of observations, 0 = 0, ,Oz,. ’ . , Or, of thevector x and a particular choice of parameter values X E A, wecan efficiently evaluate the likelihood function, L,(O), of the

hidden Markov chain by the forward-backward method ofBaum [l].

The forward and backward partial likelihoods, cy,( ) andp,(i), are computed recursively from

and

respectively. The recursion is initialized by setting a,(l) =1, q)(j) = 0 for 2 <j I n and &-(I’) = 1 for 1 I i I n,

whereupon we may write

(4)

for any t between 1 and T - 1.

THEESTIMATIONALGORITHM

The parameter estimation problem is then one of maximizingZi( 0) with respect to X for a given 0. One way to mtimize Z’Ais to use conventional methods of constrained optimization.Liporace, on the other hand, advocates a reestimation techniqueanalogous to that of Ba um et al. [l], [2]. It is essentially amapping Y: A + A with the property that

T

i$k =r(ll/k) = ‘=f ,

c di, k>Nj)I=1

and

f dj? k)bt(j)(q - pjk)(q - pjk)’

if& = 9-( qk) = 1-1

,$ldj9 k)&(j)

forl~i,jIn,lIkImandlIr,sId.

In U-(9)

( db,tfort=l,

for 1 < t 4

(For f ixed k, p,( j, k) is formally identical to p:(j), as definLiporace.)

Proof of the Formu1u.s:Equations (6) and (7) for the rmation of ai, and cjk follow directly from a theorem of

and Sell [3] because the likelihood function ZA(0) given ina polynomial with nonnegative coefficients in the varu,~, ~,~,l s i, j I n,l I k I m.

To prove (8) and (9) our strategy, following Liporace, define an appropriate auxiliary function Q(1,x). This funwill have the property that Q(x, X_> Q(x, A) implies Z’x<sA(0). Further, as a function of h for any fixed X, Q(x, Xhave a unique global maximum given by (6)-(9).

As a first step to derive such a function we exprelikelihood function as a sum over the set, 9, of all staquences S:

Page 3: 1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

8/2/2019 1986_Levinson_Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains

http://slidepdf.com/reader/full/1986levinsonmaximum-likelihood-estimation-for-multivariate-mixture-observations 3/3

IEEE TRANSACTIONS ON INFORMATION THEORY, V OL. IT-32, NO. 2, MARCH 1986 309

Let us partition the likelihood functi on further by choosi ng aparticular sequence, K = (k, , k, , . . . , k,), of mixture densit ies.As in the case of state sequences e denote the set of all mixturesequences s X= { 1,2,. . . , m}? Thus for someparticular K E Xwe can write the joint likelihood of 0, S, and K as

~A(O,S,K)ras-,.,~(~)~LS,I(I)US,~,)CS,~,.14=1

We have now succee ded n partitioning the likelihood function as

Yh(O) = c c g~(o,s,K). (13)SE9KEY

In view of the similarity of t he representation (13) to that of-Ep, in [6], we now define the auxiliary function

Q&x) =CC~~(O,S,K)log~~(O,S,K). (14)S K

When the expressions for Px and 2~ derived from (12) aresubstituted in (14), we get

where Y,~,~,, 0. The innermost summation in (15) is formallyidentical to that used by Liporace in his proof; therefore, theproperties which he demonstrated for hi s auxiliary function withrespect to p and U hold i n our case as well, thus giving us (8)and (9). We may thus conclude that (5) is correct for Y definedby (6)-(9). Furthermore, the parameter separation made explicitin (12)-(15) allows us to apply the same algorithm to mixtures ofstrictly log concave densities and/or,elliptically symmetric densi-ties as treated by Liporace in 161.

DISCUSSION

In [6] Liporace notes that by setting a,, = pj,l I j I n,Vi,the special case of a single mixture can be treated. It is naturalthen to t hink of usi ng a model with n clusters of m states, eachwith a single associated Gaussian density function as a way of

treating the Gaussian mixture problem considered here.The transformation can be accompli shed n the following way.First we expand the state spaceof our n-state model as shown inFig. 1, in which we have adde d states j, through j, for eachstate j in the original Markov chain. Associated with statesJl, 52,‘. ., j,, are distinct Gaussian densities corresponding to t hem terms of the j th Gaussian mixture in our initial formulation.The transitions exiting state j have probabilities equal to thecorresponding mixture weights. State j,, is a distinguished statethat is entered with probability 1 from the other new states, exitsto state j with probabili ty ujj, and generates o observation in sodoing. The transition matrix for this configuration can be writtendown by inspection. A large number of the entries in it will be

Fig. 1. Equivalent M + 2 state configuration for each state with m-term

Gaussian mixture.

zero or unity. As these are unaltered by (6) and (7), they need notbe reestimated. Using this reconfiguration of the state diagram,Liporace’s formulas can be used n case b,(x) is any mixture ofelliptically symmetric densities.

A variant on the Gaussian mixture theme results from usingb,(x) of the form of a product of mixtures,

D m

What we have considered so far is the special case of (16) forD = 1.From the structure of our derivation it is clear that for hidden

Markov chains having densities of t he form (16), reestimationformulas can be derived as before by solving v-,Q(x, X) = 0.Such solutions will yield results quite analogous o (6)-(10). Notethat this case too can be represented as a reconfiguration of thestate diagram.

One numerical difficulty which may be manifest in the meth-ods described is the phenomenon noted by Nadas [7] in whichone or more of the mean vectors converge to a particular ob-servation while the corresponding covariance matrix approachesa singular matrix. Under t hese conditions, PA(O) -+ cc but t hevalue of X is meaningless.A practical, if unedifying, remedy forthis difficulty is to try a different initial A. Alternatively, one candrop the offending term from the mixture since t is only contrib-

uting at one point of A.Finally, we call attention to two minor facets of these al-

gorithms. First, for flexibility in modeling, the number of termsin each mixture may vary with state, so that m i n (1) could aswell be m,. A similar dependenceon dimension results if m in(16) is repl aced by mj,. In either case, the constraints on themixture weights must be satisfied.

Second, for realistic numbers of observati ons, for example,T 2 5000, the reestimation formulas will underflow on any exist-ing computer. The basic scaling mechani sm described n [5] canbe used to alleviate the problem but must be modified to accountfor the fact that the pt/3, product will be missing the tth scalefactor. To divide out the product of scale factors, the tth sum-mand in both numerator and denominator of (7), (9), and (10)must be multiplied by t he missing coefficient.

At this writing numerical experiments based on Monte Carlo

simulations and classification experiments using real speechsig-nals are being conducted. We h ope to report the results of thesestudies upon their completion.

111

PI

[31

[41

[51

161

[71

WI

[91

REFERENCES

L. E. Baum, “An inequalit y and associated maximization technique instatistical estimation for probabilistic functions of a Markov process,” in

Inequalities, vol. III, 0. Shisha, Ed. New York: Academic, 1972, pp.l-8.

L. E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximizationtechnique occurring in the statistical analysis of probabilistic functions ofMarkov chains” Ann. Murh. Statist., vol. 41, pp. 164-171, 1970.

L. E. Baum and G. R. Sell, “Growth transformations for functions onmanifolds,” Pm. J. Math., vol. 27, pp. 211-227, 1968.

A. H. Gray an d J. D. Markel, “Quantization and bit allocation in speechprocessing,” IEEE Truns. Acourt. Speech Signal Processing, vol. ASSP-24,pp. 459-473, Dec. 1976.

S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction tothe applicati on of the theory of probabilistic functions of a Markov

process to automatic speech recognition,” Bell. Cyst. Tech. J., vol. 62, pp.1035-1074, Apr. 1983.L. R. Liporace, “Maximum likeli hood estimati on for multivariate ob-

servations of Markov sources.” IEEE Truns. Inform. Theory. vol. IT-28.

pp. 129-734, Sept. 1982.A. Nadas, “Hidden Markov chains, the forward-backward algorithm andinitial statistics,” IEEE Truns. Acoust. Speech Signd Processing, vol.

ASSP-31, pp. 504-506, Apr. 1983.L. R. Rabiner, J. G. Wilpon, and J. G. Ackenhusen, “On the effects ofvarying analysis parameters on an LPC-based isolated word recognizer,”Bell Svst. Tech. J.. vol. 60, pp. 893-911, 1981.

H. W. Sorenson and D. L. Alspach, “Recursive Bayesian estimation using

Gaussian sums,” Aut omuticu, vol. 7, pp. 465-479, 1971.

Authorized licensed use limited to: UNIVERSIDADE DE BRASILIA Downloaded on June 23 2009 at 11:23 from IEEE Xplore Restrictions apply


Recommended