+ All Categories
Home > Documents > Sinusoidal Synthesis of Speech Using MATLAB

Sinusoidal Synthesis of Speech Using MATLAB

Date post: 04-Jun-2018
Category:
Upload: akshay-jain
View: 231 times
Download: 0 times
Share this document with a friend

of 35

Transcript
  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    1/35

    1

    SINUSOIDAL SYNTHESIS OF SPEECH USING MATLAB

    Thesis

    Submitted in partial fulfillment of the requirement of

    BITS C421T Thesis

    BY

    AKSHAY VIJAY JAIN

    2009B4A8568P

    Under the supervision ofDr. RAHUL SINGHAL

    Assistant Professor, EEE

    Dept.

    BITS-Pilani

    AT

    BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI

    November, 2013

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    2/35

    2

    ACKNOWLEDGEMENT

    I would like to thank the Almighty first of all for his blessings.

    I am obliged to Prof B.N. JAIN, Vice Chancellor, Birla Institute of Technology & Science, Pilani

    for providing us with a course pattern where a student gets exposure to projects.

    I wish to express deep sense of gratitude to DrRahul Singhal, my supervisor for Thesis named

    Sinusoidal Synthesis of speech using MATLAB for providing me this wonderful opportunity to

    learn about various parameters associated with speech and synthesis of speech from spectrogram.

    I would also like to thanks him for his constant advice, encouragement and support in the study.

    I wish to express gratitude to all other people as well as all the websites for the content they

    provided me for performance of research work.

    Last but not the least; I would like to thank our parents for their constant support and motivation.

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    3/35

    3

    CERTIFICATE

    This is to certify that Thesis entitled ____________________________

    ________Sinusoidal Synthesis of Speech using Matlab

    ______________ is submitted by _Akshay Vijay Jain_ ID NO _2009B4A8568P in partial fulfillment of the requirement of the BITS

    C421T Thesis embodies the work done by him under my supervision

    Signature of Supervisor

    Date: 25 November 2013 Dr Rahul Singhal

    Assistant Professor,EEE Department,

    BITS PILANI, PILANI CAMPUS

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    4/35

    4

    Thesis Abstract

    This thesis report discusses speech signal, how it is stored on computer,

    how it is analyzed and how it is synthesized. One of the way ofanalyzing speech signal is Short Time Fourier Transform, which is

    discussed in the Thesis report along with its parameter. Based on this

    analysis of speech signal, we are extracting the matrix containing

    frequency present in the signal as function of time. Then after having

    obtained the matrix from the spectrogram generated from the MATLAB,

    we try to resynthesize the speech signal back by sinusoidal addition

    using MATLAB code.

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    5/35

    5

    TABLE OF CONTENTS

    1)Introduction

    2)Recording of speech signal

    3)Analysis of speech signal

    a) Long term frequency analysis

    b) Window sequence

    c) Effect of window

    d) Choice of window

    e) Parameters of Short Term Frequency Spectrum

    f) Time-Frequency domain: spectrogram

    g) Length of window and fundamental frequency

    4)Why sinusoids?

    5)Additive synthesis

    6)Frequency Vs Time matrix from spectrogram in MATLAB

    1.GenerateFreqVsTime Matlab Code

    2.Croplimit MatlabCroplimit Code

    3.Screenshots

    7)Speech signal from Frequency Vs Time matrix in MATLAB

    1.GenerateSoundData Matlab Code

    2.TestAtLevel Matlab Code

    8)Conclusion

    9)Bibliography/Reference

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    6/35

    6

    1) IntroductionWe all know speech is an acoustic signal by that we mean that it is a

    mechanical wave that is an oscillation of pressure transmitted through

    solid liquid or gas and it is composed of frequencies within hearing

    range. Sound is a sequence of waves of pressure that propagates through

    compressible media such as air or water. (Sound can propagate through

    solids as well, but there are additional modes of propagation). Sound that

    is perceptible by humans has frequencies from about 20 Hz to

    20,000 Hz. In air at standard temperature and pressure, the

    corresponding wavelengths of sound waves range from 17 m to 17 mm.

    During propagation, waves can be reflected, refracted, or attenuated bythe medium.

    Figure 1. Typical sound signal

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    7/35

    7

    2) Recording of SpeechSound recording is an electrical or mechanical inscription of sound waves,

    such as spoken voice, singing, instrumental music, or sound effects. The

    two main classes of sound recording technology are analogrecording and digital recording. Acoustic analog recording is achieved by

    a small microphone diaphragm that can detect changes in atmospheric

    pressure (acoustic sound waves) and record them as a graphic

    representation of the sound waves on a medium such as a phonograph (in

    which a stylus senses grooves on a record). In magnetic tape recording,

    the sound waves vibrate the microphone diaphragm and are converted into

    a varying electric current, which is then converted to a varying magnetic

    field by an electromagnet, which makes a representation of the sound asmagnetized areas on a plastic tape with a magnetic coating on it.

    Digital recording converts the analog sound signal picked up by the

    microphone to a digital form by a process of digitization, allowing it to

    be stored and transmitted by a wider variety of media. Digital recording

    stores audio as a series of binary numbers representing samples of

    the amplitude of the audio signal at equal time intervals, at a sample

    rate high enough to convey all speechs capable of being heard. Digital

    recordings are considered higher quality than analog recordings notnecessarily because they have higher fidelity (wider frequency

    response or dynamic range), but because the digital format can prevent

    much loss of quality found in analog recording due to noise

    and electromagnetic interference in playback, and mechanical

    deterioration or damage to the storage medium. A digital audio signal

    must be reconverted to analog form during playback before it is applied

    to a loudspeaker or earphones.

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    8/35

    8

    3) Analysis of Speech SignalThelong-term frequency analysis of speech signals yields good

    information about the overall frequency spectrum of the signal, but no

    information about the temporal location of those frequencies. Sincespeech is a very dynamic signal with a time-varying spectrum, it is often

    insightful to look at frequency spectra of short sections of the speech

    signal.

    a)Long-term frequency analysisThe frequency response of a system is defined as the discrete-time

    Fourier transform (DTFT) of the system's impulse response h[n]:

    Similarly, for a sequencex[n], its long-term frequency spectrum is

    defined as the DTFT of the Sequence

    Theoretically, we must know the sequence x[n] for all values of n (from

    n=- until n=) in order to compute its frequency spectrum.

    Fortunately, all terms where x[n] = 0 do not matter in the sum, andtherefore an equivalent expression for the sequence's spectrum is

    Here we've assumed that the sequence starts at 0 and is N samples long.

    This tells us that we can apply the DTFT only to all of the non-zero

    samples of x[n], and still obtain the sequence's true spectrum X (). But

    what is the correct mathematical expression to compute the spectrum

    over a short section of the sequence, that is, over only part of the non-zero samples of the sequence?

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    9/35

    9

    b)Window sequenceIt turns out that the mathematically correct way to do that is to multiply

    the sequence x[n] by a window sequence w[n] that is non-zero only for

    n=0 L-1, where L, the length of the window, is smaller than the length

    N of the sequence x[n]:Now

    Then we compute the spectrum of the windowed sequence xw[n] as

    usual

    The following figure illustrates how a window sequence w[n] is applied

    to the sequence x[n]:

    Figure 2 Result of application of windowed sequence to data sequence

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    10/35

    10

    As the figure shows, the windowed sequence is shorter in length than the

    original sequence. So we can further truncate the DTFT of the

    windowed sequence:

    Using this windowing technique, we can select any section of arbitrary

    length of the input sequence x[n] by choosing the length and location of

    the window accordingly. The only question that remains is: how does

    the window sequence w[n] affect the short-term frequency spectrum?

    c)Effect of the windowTo answer that question, we need to introduce an important property ofthe Fourier transform. The diagram below illustrates the property

    graphically:

    I. Implementation of an LTI system in the time domain.

    II. Equivalent implementation of an LTI system in the frequency

    domain.

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    11/35

    11

    The two implementations of an LTI system are equivalent: they will give

    the same output for the same input. Hence, convolution in the time

    domain = multiplication in the frequency domain:

    And since the time domain and the frequency domain are each others

    dual in the Fourier transform, it is also true that multiplication in the

    time domain = convolution in the frequency domain:

    This shows that multiplying the sequence x[n] with the windowsequence w[n] in the time domain is equivalent to convolving the

    spectrum of the sequence X (), with the spectrum of the window W().

    The result of the convolution of the spectra in the frequency domain is

    that the spectrum of the sequence is smeared by the spectrum of the

    window. This is best illustrated by the example in the figure below:

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    12/35

    12

    Figure 3 Result of application of window sequence in time and

    frequency domain

    d)Choice of windowBecause the window determines the spectrum of the windowed sequenceto a great extent, the choice of the window is important. Matlab supports

    a number of common windows, each with their own strengths and

    weaknesses. Some common choices of windows are shown below.

    Figure 4 Rectangular window sequence

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    13/35

    13

    Figure 5 Triangular and Hamming window sequence

    All windows share the same characteristics. Their spectrum has a peak,

    called the main lobe, and ripples to the left and right of the main lobecalled the side lobes. The width of the main lobe and the relative height

    of the side lobes are different for each window. The main lobe width

    determines how accurate a window is able to resolve different

    frequencies: wider is less accurate. The side lobe height determines how

    much spectral leakage the window has. An important thing to realize is

    that we can't have short-term frequency analysis without a window.

    Even if we don't explicitly use a window, we are implicitly using a

    rectangular window.

    e)Parameters of the short-term frequency spectrumBesides the type of windowrectangular, hamming, etc.there are

    two other factors in Matlab that control the short-term frequency

    spectrum: window length and the number of frequency sample points.

    The window length controls the fundamental trade-off between time

    resolution and frequency resolution of the short-term spectrum,

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    14/35

    14

    irrespective of the window's shape. A long window gives poor time

    resolution, but good frequency resolution. Conversely, a short window

    gives good time resolution, but poor frequency resolution. For example,

    a 250 millisecond long window can, roughly speaking, resolve

    frequency components when they are 4 Hz or more apart (1/0.250 = 4),but it can't tell where in those 250 millisecond those frequency

    components occurred. On the other hand, a 10millisecond window can

    only resolve frequency components when they are 100 Hz or more apart

    (1/0.010= 100), but the uncertainty in time about the location of those

    frequencies is only 10 millisecond. The result of short-term spectral

    analysis using a long window is referred to as a narrowband spectrum

    (because a long window has a narrow main lobe), and the result of short-

    term spectral analysis using a short window is called a widebandspectrum. In short-term spectral analysis of speech, the window length is

    often chosen with respect to the fundamental period of the speech signal,

    i.e., the duration of one period of the fundamental frequency. A common

    choice for the window length is either less than 1 times the fundamental

    period, or greater than 2-3 times the fundamental period.

    Examples of narrowband and wideband short-term spectral analysis of

    speech are given in the figures below:

    Figure 6 Wideband and Narrowband analysis of speech

    The other factor controlling the short-term spectrum in Matlab is thenumber of points at which the frequency spectrum H () isevaluated.

    The number of points is usually equal to the length of the window.

    Sometimes a greater number of points is chosen to obtain a smoother

    looking spectrum. Evaluating H () at fewer points than the window

    length is possible, but very rare.

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    15/35

    15

    f) Time-frequency domain: SpectrogramAn important use of short-term spectral analysis is theshort-time

    Fourier transform orspectrogram of a signal. The spectrogram of a

    sequence is constructed by computing the short term spectrum of a

    windowed version of the sequence, then shifting the window over to anew location and repeating this process until the entire sequence has

    been analyzed. The whole process is illustrated in the figure below:

    Figure 7 Demonstration of making of spectrogram

    Together, these short-term spectra (bottom row) make up the

    spectrogram, and are typically shown in a two-dimensional plot, where

    the horizontal axis is time, the vertical axis is frequency, and magnitude

    is the color or intensity of the plot. For example:

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    16/35

    16

    Figure 8 A typical spectrogram

    The appearance of the spectrogram is controlled by a third parameter:

    window overlap. Window overlap determines how much the window is

    shifted between repeated computations of the short term spectrum.

    Common choices for window overlap are 50% or 75% of the window

    length. For example, if the window length is 200 samples and window

    overlap is 50%, the window would be shifted over 100 samples between

    each short-term spectrum. In the case that the overlap was 75%, the

    window would be shifted over 50 samples. The choice of window

    overlap depends on the application. When a temporally smooth

    spectrogram is desirable, window overlap should be 75% or more. Whencomputation should be at a minimum, no overlap or 50% overlap are

    good choices. If computation is not an issue, you could even compute a

    new short-term spectrum for every sample of the sequence. In that case,

    window overlap = window length1, and the window would only shift

    1 sample between the spectra. But doing so is wasteful when analyzing

    speech signals, because the spectrum of speech does not change at such

    a high rate. It is more practical to compute a new spectrum every 20-50

    millisecond, since that is the rate at which the speech spectrum changes.

    g)Length of the window and fundamental frequency

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    17/35

    17

    In a wideband spectrogram (i.e., using a window shorter than the

    fundamental period), the fundamental frequency of the speech signal

    resolves in time. That means that you can't really tell what the

    fundamental frequency is by looking at the frequency axis, but you can

    see energy fluctuations at the rate of the fundamental frequency alongthe time axis. In a narrowband Spectrogram (i.e., using a window 2-3

    times the fundamental period), the fundamental frequency resolves in

    frequency, i.e., you can see it as an energy peak along the frequency

    axis. See for example the figures below:

    Figure 9. Wideband Speech Spectrogram

    Figure 10. Narrowband Speech Spectrogram

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    18/35

    18

    4) Why Sinusoids?In general the goal of modelling a signal is to reduce redundancy and to

    get a more compact representation of the data. There are different

    techniques to model a time series and it depends on the signal which

    technique to apply. Sinusoids are especially suited for modelling speech

    with harmonic content. Most natural acoustical sounds exhibit this

    attribute and the reason for this sinusoidity can be found in the way of

    the speech production. Human voice production system consists of two

    fundamental parts working together, namely the voice chords (the

    excitation source) and the pharynx with mouth and nasal cavities acting

    as acoustical filter. During voiced parts of speech the vocal chords are

    opening and closing at a certain frequency (the fundamentalfrequency, f0) modulating the airstream coming from the lungs. The

    harmonic overtone structure results from the structure of the pharynx

    which can be seen as an open tube in a simplified way, letting develop

    all overtones.

    f1fn being integer multiples of the fundamental f0.

    5) Additive SynthesisSine waves can be considered the building blocks of speech. In

    fact, it was shown in the 19th Century by the mathematician

    Joseph Fourier that any periodic function can be expressed as a

    series of sinusoids of varying frequencies and amplitudes. This

    concept of constructing a complex speech out of sinusoidal terms

    is the basis for additive synthesis, sometimes calledFourier

    synthesisfor the aforementioned reason. In addition to this, the

    concepts of additive synthesis have also existed since the

    introduction of the organ, where different pipes of varying pitch

    are combined to create a sound or timbre.

    A simple block diagram of the additive form may appear like

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    19/35

    19

    Figure 11. Block Diagram representation of Sinusoidal Synthesis

    Its mathematical form based on Fourier series will be

    Where is an offset value for the whole function (typically 0),

    = the amplitude weightings for each sine term,

    = the frequency multiplier value.

    With hundreds of terms each with their own individual frequency

    and amplitude weightings, we can design and specify some

    incredibly complex sounds, especially if we can modulate the

    parameters over time.

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    20/35

    20

    6) Frequency Vs Time Matrix fromSpectrogram in MATLAB

    Determination of the frequency content present in speech at a particular

    instant of time is possible approximately by the Short Term Fourier

    Transform (STFT), for our thesis work we are using the Narrow Band

    Spectrogram produced from Matlab. We are choosing narrow band

    because it gives better frequency resolution and acceptable time

    resolution. We tried with Wideband Spectrogram, but the speech

    synthesized using information from Wideband Spectrogram was very

    noisy.

    First of all, we take the spectrogram of speech signal with the help of

    MATLAB commandspectrogram. The spectrogram produced by the

    MATLAB command spectrogram is a RGB image in decibel scale ,

    where in the intensities above 0 dB are expressed in varying shades of

    Red color, so we separate out the Red component from the RGB image,

    then in the separated component we can easily identify the frequencies

    which had higher intensities in the speech, since the pixelscorresponding to high intensity frequencies will appear white while

    others will appear black and the intermediate will be in gray scale. Now

    the Red component is appropriately cropped and resized with number of

    rows equal to 400 implying every row for 10 Hz range and into number

    of columns hundred times the duration of the speech signal implying that

    each column in the speech signal corresponds to 10 milliseconds of

    speech.

    It has been found that when we convert the resized image in to black

    and white by converting gray pixel nearer to white into white and gray

    pixel nearer to black into black the quality of speech is very near to the

    original speech. So we produce the black and white image which

    corresponds to Frequency Vs Time Graph for the speech signal.

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    21/35

    21

    a) The MATLAB code for performing above task is as follows

    1)% function GenerateFreqVsTime()

    2)% Record your voice for 5 seconds.

    3)f=input('Enter the time in seconds for which you want to record');

    4)recObj = audiorecorder(8000,8,1);

    5)disp('Start speaking.');

    6)recordblocking(recObj,f);

    7)disp('End of Recording.');

    8)% Play back the recording.

    9) play(recObj);

    10) % Store data in double-precision array.

    11) myRecording = getaudiodata(recObj);

    12) figure(1)

    13) plot (myRecording);title('sound ');

    14) % Plot the spectrogram

    15) figure(2)

    16) spectrogram(myRecording, 1000,923, 1024,8E3,'yaxis');

    17) h=gcf;

    18) set(gcf, 'Position', get(0,'Screensize')); % Maximize figure.

    19) level=input('Please enter level between 0 and 1');

    20) saveas(h,'spectrogram1.jpg');

    21) fig=imread('spectrogram1.jpg');

    22) figG1ray=rgb2gray(fig);

    23) figure(9)

    24) imshow(figGray); title('FigGray');

    25) figRed=fig(:,:,1);

    26) figure(3)

    27) imshow(figRed);

    28) title('figRed');

    29) [xmin ymin width height]=croplimits(figRed);

    30) figure(4)

    31) figRedCropped=imcrop(figRed,[xmin ymin width height]);

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    22/35

    22

    32) imshow(figRedCropped);title('figRed Cropped');

    33) figure(5)

    34) figRedCroppedResized=imresize(figRedCropped,[400 100*f]);

    35) imshow(figRedCroppedResized);title('figRedCroppedResized');

    36) figRedCroppedResizedCorrected=flipud(figRedCroppedResized);

    37) figure(6)

    38) figRedCroppedResizedBW=im2bw(figRedCroppedResized,level);

    39) imshow(figRedCroppedResizedBW);title('figRedCroppedResizedBW');

    40) figure(7)

    41) figRedCroppedResizedBWCorrected=flipud(figRedCroppedResizedBW);

    42) imshow(figRedCroppedResizedBWCorrected);

    b) Matlab code for Croplimits function used in above code is as follows

    1) function [xmin ymin width height]=croplimits(input)

    2) xmin=0;r2=0;ymin=0;c2=0;

    3) [row,column]=size(input);

    4) for i=30:90

    5) if(input(i,column/2)~=255)

    6) ymin=i+5;

    7) break

    8) end

    9) end

    10) count=0;

    11) for ki=row:-1:row-120

    12) if(input(ki,column/2)~=255)

    13) for kj=column/2:column/2+50

    14) if(input(ki,kj)~=255)

    15) count=count+1;

    16) else count=count-1;

    17) end

    18) if(count>0)

    19) r2=ki;

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    23/35

    23

    20) break

    21) end // end of if on line 18

    22) end //end of for loop from line 13

    23) end //end of if on line 12

    24) end //end of for loop on line 11

    25) count=0;

    26) for j=80:180

    27) if(input(row/3,j)~=255)

    28) for i=row/2:row/2+40

    29) if(input(i,j)~=255)

    30) count=count+1;

    31) else

    32) count=count-1;

    33) end

    34) end //end of for loop on line 28

    35) if(count>24)

    36) xmin=j+8;break;

    37) end

    38) end //end of if on line 27

    39) end //end of for loop on line 26

    40) count=0;

    41) for j=column:-1:column-120

    42) if(input(row/2,j)~=255)

    43) for i=row/2:row/2+100

    44) if(input(i,j)~=255)

    45) count=count+1;

    46) else

    47) count=count-1;

    48) end //end of if from line 44

    49) end //end of for from line 43

    50) if(count>0)

    51) c2=j;break;

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    24/35

    24

    52) end // end of if from line 51

    53) end // end of 42

    54) end // end of 41

    55) height=r2-ymin+1;

    56) width=c2-xmin+1;

    57) end // end of function

    c) Screenshots

    i. Speech Waveform

    Figure 12 Speech Waveform

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    25/35

    25

    ii. Spectrogram of above speech using Matlab

    Figure 13 Spectrogram of above speech using Matlab

    iii. Grayscale Spectrogram

    Figure 14 Grayscale Spectrogram

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    26/35

    26

    iv. Image of Red component of spectrogram since red componentrepresents positive magnitude

    Figure 15 Red component of spectrogram

    v. Same figure after being cropped by the matlab functioncroplimit

    Figure 16. Same figure after being cropped by the matlab function

    croplimit

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    27/35

    27

    vi. Above figure resized by Matlab function to generate acolumn of pixel corresponding to 10 milliseconds

    Figure 17 Resized using Matlab

    vii. Above figure inverted so as to make first row correspond to10Hz frequency and next row correspond to 20Hz while last400throw correspond to 4KHz

    Figure 18 Same figure as previous but inverted

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    28/35

    28

    viii. Same figure as above with pixels having intensity less than.9 reduced to zero while others extended to 1

    Figure 19 Same figure as above with pixels having intensity less

    than .9 reduced to zero while others extended to 1

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    29/35

    29

    7) Speech signal from Frequency VsTime Matrix in MATLAB

    Once we have Frequency Vs Time matrix, we can generate the all thefrequencies using thesin function of MATLAB and add them all and do

    these for all the columns which correspond to 10 milliseconds. Now we

    can concatenate the data generated for each column and result is the

    speech signal.

    The MATLAB code for performing above series of task is as follows

    a)GenerateSoundData Matlab Code:

    1) function sounddata=GenerateSoundData(image)

    2) [row column]=size(image);

    3) image=image/.255;

    4) sounddata=zeros(1,80*column);

    5) timeResolution=.01;% 10 milliseconds

    6) samplingRate=8000;%8000Hz

    7) time=1/samplingRate:1/samplingRate:timeResolution;

    8) fori=1:column

    9)y=sqrt(double(image(10,i)))*sin(2*pi*time*1*10);10) forj=11:row-100

    12) y=y+sqrt(double(image(j,i)))*sin(2*pi*time*j*10);

    13) end

    14) sounddata(80*(i-1)+1:80*i)=y;

    15)end

    16) sounddata=sounddata';

    In this code we are only generating frequencies in the range 100Hz to

    3000 Hz, because other frequencies do not affect the hearing ability so

    much.

    b)TestAtLevel Matlab Code:

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    30/35

    30

    1) function sdata=TestAtLevel(spectrograph,level)

    2) bwspectrograph=im2bw(spectrograph,level);

    3) sdata=GenerateSoundData(bwspectrograph);

    4) soundsc(sdata,8000);

    5) end

    In the above function namely TestAtLevel, we pass the matrix

    obtained from the GenerateFreqVsTime function of name

    figRedCroppedResizedCorrected, along with the level which specifies

    the threshold for converting lower values to zero while values

    greater than level to 1.

    8) ResultsThe speech waveform generated with different values of level for

    conversion of Red component of spectrogram into Black and

    White image are demonstrated below along with their spectrogram

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    31/35

    31

    a)Level = 0.8

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    32/35

    32

    b)Level = 0.9

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    33/35

    33

    c)Level = 0.95

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    34/35

    34

    9) ConclusionFrom the above three speech waveforms, it seems that the level of

    around 0.9 is the best threshold for Red component of spectrogram

    generated from the Matlab, so that the speech generated using the above

    Matlab function namely GenerateSoundData is matching more with

    the original speech.

    The sinusoidal model, a framework for modelling speech and music

    signals, has been presented. Sinusoidal synthesis of speech by extracting

    frequency and time information form the spectrogram gives acceptable

    quality of speech. Another strategy would be decomposing the signal

    into deterministic and stochastic parts and using different models for the

    different portions of a speech as proposed by [5].

    9)Bibliography/References[1] R. McAulay, Th. Quatieri: Speech Analysis/Synthesis Based on a SinusoidalRepresentation,in IEEE Transactions on Acoustics, Speech, and Signal Processing, August1986

    [2] J. Smith III, X. Serra: PARSHL: An Analysis/Synthesis Program for Non -HarmonicSounds Based on a Sinusoidal Representation

    [3] K. Fitz, L. Haken: On the Use of Time-Frequency Reassignment in Additive SoundModelling

    [4] M. Lagrange, S. Marchand, M. Raspaud, J.-B. Rault: Enhanced Partial Tracking using

    Linear Prediction, in Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03),September 2003

    [5] X. Serra: A System for Sound Analysis/Transformation/Synthesis based on a Deterministicplus Stochastic Decomposition, Thesis, Stanford University, 1989

  • 8/13/2019 Sinusoidal Synthesis of Speech Using MATLAB

    35/35


Recommended