Speaker Recognition System

Slide 1

AuthenticateKnowledgeAcceptRejectDataAuthenticateVoiceVoicePrintsINTERNAL GUIDEMr. VENUAsst.Prof.in E.C.E DeptSPEAKER RECOGNITION SYSTEM

M.RADHAM.KIRANKUMART.BHASKERCHRISTINE DCRUZE

Christu jyoti institute of technology & science(Electronic & Communication Engineering)BY

ABSTRACTSpeaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers. This project describes how to build a simple, yet complete and representative automatic speaker recognition system. Such a speaker recognition system has potential in many security applications. For example, users have to speak a PIN (Personal Identification Number) in order to gain access to the laboratory door, or users have to speak their credit card number over the telephone line to verify their identity. By checking the voice characteristics of the input utterance, using an automatic speaker recognition system similar to the one that we will describe, the system is able to add an extra level of security.Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. The system that we will describe is classified as text-independent speaker identification system since its task is to identify the person who speaks regardless of what is saying.

INTRODUCTIONSpeaker recognition is a process where a person is recognized on the basis of his voice of a person has many prominent characteristics like pitch, tone which can be used to distinguish a person from the other.

Speech contains information about the identity of the speaker. A speech signal includes also the language this is spoken, the presence and type of speech pathologies, the physical and emotional state of the speaker. Often, humans are able to extract the identity information when the speech comes from speaker they are acquainted with.

The recording of the human voice for speaker recognition requires a human to say something .In other words the human has to show some of his/her speaking behavior. Therefore, voice recognition fits within the category of behavioral biometrics. A speech signal is a very complex function of the speaker and his environment that can be captured easily with a standard microphone. In contradiction recognition are not fixed, no static and no physical characteristics.

PRINCIPLES OF SPEAKER RECOGNITIONSpeaker recognition can be classified into identification and verification. Speaker identification is the process of determining which register speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting by identity claim of a speaker.

Speaker recognition methods can also be divided into Text-independent and Text-dependent methods. In a text-independent system, speaker models capture characteristics of somebody speech which show up irrespective of what one is saying. In text-dependent system, on the other hand, the recognition of speaker identity is based on his or her speaking one or more specific phrases, like passwords, card numbers, pin codes etc.

All technologies of speaker recognition identification and verification, text-independent and text-dependent, which has its own advantages and disadvantages and may require different treatments and techniques. The choice of which technology to use is applications and specific. This system that we will develop is classified as text-independent speaker identification system since its task is to identify the person who speaks regardless what he saying.

Speaker recognition is divided in to two types :They are described belowSpeaker dependent speaker independent

Speaker dependent :Speaker dependent systems are trained by the individual who will be using the system. These systems are capable of achieving a high command count and better than 95% accuracy for word recognition. The drawback to this approach is that the system only responds accurately only to the individual who trained the system.

Speaker independent Speaker independent is a system trained to respond to a word regardless of who speaks. Therefore the system must respond to a large variety of speech patterns, inflections and enunciation's of the target word. The command word count is usually lower than the speaker dependent however high accuracy can still be maintain within processing limits. Industrial requirements more often need speaker independent voice systems, such as the AT&T system used in the telephone systems.

SPEAKER RECOGNITION:

Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual information included in speech waves. This technique makes it possible to use the speaker's voice to verify their identity and control access to services such as voice dialing, banking by telephone, telephone shopping, database access services, information services, voice mail, security control for confidential information areas, and remote access to computers.

SpeechRecognitionLanguageRecognitionSpeakerRecognitionWordsLanguage NameSpeaker NameHow are you?EnglishKIRANSpeech Signal

WORKING PRINCIPLES

Feature extractionSpeakerModelKirans VoiceprintMy Name is KiranACCEPTKiranImpostorModelIdentity ClaimDecisionREJECTSInput SpeechImpostor Voiceprints

Two distinct phases to any speaker verification system

Feature extractionModel trainingEnrollment speech for each speakerkiranSallyVoiceprints (models) for each speakerSallykiranEnrollment PhaseAccepted!Feature extractionVerificationdecisionClaimed identity: SallyVerification Phase

Strongest securitySomething you have - e.g., badgeSomething you know - e.g., passwordSomething you are - e.g., voiceHaveKnowAreSecurity and PrivacyConsequences of a pervasive networkDevices are numerous, ubiquitous and sharedThe network shares the context and preferences of the userSmart spaces are aware of the location and intent of the userSecurity ConcernsOnly authorized individuals need to be given accessAuthentication should be minimally intrusiveDevices should be trustworthyPrivacy issuesUser should be aware of when he is being observedThe user context should be protected within the networkSpeech production model: source-filter interactionAnatomical structure (vocal tract/glottis) conveyed in speech spectrum

Glottal pulsesVocal tractSpeech signalSpeech is a continuous evolution of the vocal tract Need to extract time series of spectraUse a sliding window - 20 ms window, 10 ms shift

...Fourier TransformMagnitudeProduces time-frequency evolution of the spectrumVerification decision approaches have roots in signal detection theoryLikelihood S came from speaker HMMLikelihood S did not come from speaker HMML = log L< q rejectFeature extractionSpeakerModelImpostorModelDecisionS+-> q acceptLL

There are many factors to consider in design of an evaluation of a speaker verification systemSpeech qualityChannel and microphone characteristicsNoise level and typeVariability between enrollment and verification speechSpeech modalityFixed/prompted/user-selected phrasesFree textSpeech durationDuration and number of sessions of enrollment and verification speech Speaker populationSize and composition

Text-dependent recognitionRecognition system knows text spoken by personExamples: fixed phrase, prompted phraseUsed for applications with strong control over user inputKnowledge of spoken text can improve system performanceText-independent recognitionRecognition system does not know text spoken by personExamples: User selected phrase, conversational speechUsed for applications with less control over user inputMore flexible system but also more difficult problemSpeech recognition can provide knowledge of spoken text

PROBABILITY OF FALSE ACCEPT (in %)PROBABILITY OF FALSE REJECT (in %)Equal Error Rate (EER) = 1 %Wire Transfer:False acceptance is very costlyUsers may tolerate rejections for securityToll Fraud:False rejections alienate customersAny fraud rejection is beneficial Application operating point depends on relative costs of the two error typesHigh ConvenienceHigh SecurityBalanceExample Performance Curve

Probability of False Accept (in %)Probability of False Reject (in %)Text-dependent (Combinations)Clean DataSingle microphoneLarge amount of train/test speechText-independent (Conversational)Telephone DataMultiple microphonesModerate amount of training dataText-dependent (Digit strings)Telephone DataMultiple microphonesSmall amount of training dataText-independent (Read sentences)Military radio DataMultiple radios & microphonesModerate amount of training dataIncreasing constraintsAuthenticateKnowledgeAcceptRejectDataAuthenticateVoiceVoicePrintsPlease enter your account number5551234Say your date of birthOctober 13, 1964Youre accepted by the system

BiometricBiometricKnowledge

Voiceover Telephone

Speaker recognition model circuit Access controlPhysical facilitiesData and data networksTransaction authenticationToll fraud preventionTelephone credit card purchasesBank wire transfersMonitoringRemote time and attendance loggingHome parole verificationPrison telephone usageInformation retrievalCustomer information for call centersAudio indexing (speech skimming device)ForensicsVoice sample matchingAPPLICATIONSConclusionsSpeaker verification is one of the few recognition areas where machines can outperform humans

Speaker verification technology is a viable technique currently available for applications

Speaker verification can be augmented with other authentication techniques to add further securityFuture DirectionsResearch will focus on using speaker verification techniques for more unconstrained, uncontrolled situationsAudio search and retrievalIncreasing robustness to channel variabilitiesIncorporating higher-levels of knowledge into decisions

Speaker recognition technology will become an integral part of speech interfacesPersonalization of services and devicesUnobtrusive protection of transactions and information

Date post:	02-Dec-2014
Category:	Documents
Upload:	kiran-conquer
View:	538 times
Download:	1 times

Speaker Recognition System

Documents