+ All Categories
Home > Documents > VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern...

VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern...

Date post: 26-Dec-2015
Category:
Upload: leslie-poole
View: 219 times
Download: 2 times
Share this document with a friend
Popular Tags:
27
VoIP Data IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275, USA [email protected] Support provided by Fulbright Grant and IIIT Allahabad IIIT Allahabad 1
Transcript

VoIP DataIIIT Allahabad Margaret H. DunhamDepartment of Computer Science and EngineeringSouthern Methodist UniversityDallas, Texas 75275, [email protected]

Support provided by Fulbright Grant and IIIT Allahabad

IIIT

All

ah

ab

ad

1

VoIP Data Outline

• VoIP overview• CDR• CDR Example using EMM

IIIT

All

ah

ab

ad

2

VoIP Overview

http://www.voipmechanic.com/what-is-voip.htm

IIIT

All

ah

ab

ad

3

VoIP Advantages• Travel• Cost reduction• Additional Features: Voice messages, call forwarding, logs,

caller ID, …• Integration of business tools• Common network infrastructure

IIIT

All

ah

ab

ad

4

VoIP Disadvantages• Need reliable broadband internet connection• Voice quality

IIIT

All

ah

ab

ad

5

Telephone-VoIP Steps• Analog Telephone Adapter (ATA) converts analog phone call to

digital signal.• Sent over internet as data packets.• Converted back to digital analog. II

IT A

lla

ha

ba

d

6

VoIP Codec• Software on server or ATA that converts voice signal into

digital data.• COmpressor – DECompressor• COder – DECoder• Sample (8000, 24000, 32000 times per second)• Sort • Compress• Packetize

IIIT

All

ah

ab

ad

7

Protocols• SIP (Session Initiation Protocol)• Signaling to set up and tear down sessions.

• SDP (Session Description Protocol) • Describe call

• RTP (Realtime Transport Protocol) • Exchange data/voice packets• Media Transport to transmit packets

IIIT

All

ah

ab

ad

8

SIP• Setup• Connect• Disconnect• Syntax similar to HTTP• Bind to IP address using SIP registration• URLs for address format: [email protected]• Independent of application or data types• Uses RTP and SDP

IIIT

All

ah

ab

ad

9

SIP Overview

http://www.voipmechanic.com/sip-basics.htm

IIIT

All

ah

ab

ad

10

VoIP Data Packet [4]

IIIT

All

ah

ab

ad

11

VoIP Data• Any of this digital data could be saved and analyzed.• Typically only statistical/summary information about the calls

is saved• These Call Detail Records (CDR) are use for billing and analysis II

IT A

lla

ha

ba

d

12

Call Detail Record• Log of VoIP usage• May be by account• Typical attributes:• Source• Destination• Duration of call• Amount billed• Total usage time in billing period• Remaining time in billing period• Total charge in billing period

• The format of the CDR varies among VoIP providers or programs. Some programs allow CDRs to be configured by the user.

IIIT

All

ah

ab

ad

13

CDR Generation [3]• Usually created through special Authentication, Authorization,

and Accounting (AAA) server. • May also be created by logging capabilities at gateway or

router using a syslog server software.• Normally simply csv format.• Normally uses UDP, so underlying data packets are not

sequenced and may be lost (Redundancy of servers can help.)• Timestamps between routers can be synchronized using a

Network Time Protocol (NTP). • CDR generated for both forward and return leg of call.• http://

www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml

IIIT

All

ah

ab

ad

14

Example: CISCO CDR Data• VoIP traffic in their Richardson, Texas facility from Mon Sep

22 12:17:32 2003 to Mon Nov 17 11:29:11 2003. • Over 1.5 million call trials were logged• 272,646 connected calls• 66 attributes including source, destination, starting time,

duration, routing/switching, device, etc• Application: Anomaly Detection (Classification)• Goal: Find unusual call patterns based on type and time of

call• Technique: New data structure, New classification

algorithm, New visualization technique• Sample of raw csv data:http://lyle.smu.edu/~mhd/iiit/start.csv

IIIT

All

ah

ab

ad

15

CISCO Preprocessing• Remove the attributes other than source, destination, starting

time, duration from the logs. • Count the connected calls and discard unconnected calls. • The total number of connected calls was 272,646.5 phone

classes: internal, local, national, international, unknown.• 25 link classes (source class + destination class)• Data is aggregated into 15 minute time intervals. • The total number of time points is 5422 and the total number

of attributes is 26.• Add two attributes, namely, type of day (workday or weekend)

and time of the day, to the processed data. This step gives a spatio-temporal cube in the model space.

• http://www.engr.smu.edu/~mhd/7331f08/CISCOEMM.xls

IIIT

All

ah

ab

ad

16

CISCO Data Visualization

IIIT

All

ah

ab

ad

http://www.lyle.smu.edu/~mhd/7331f11/CiscoEMM.png

17

IIIT

All

ahab

ad

Spatiotemporal Stream Data

Records may arrive at a rapid rateHigh volume (possibly infinite) of continuous dataConcept drifts: Data distribution changes on the flyData does not necessarily fit any distribution patternMultidimensionalTemporalSpatialData are collected in discrete time intervals,Data are in structured format, <a1, a2, …>Data hold an approximation of the Markov property.

18

IIIT

All

ahab

ad

Spatiotemporal Environment• Events arriving in a stream• At any time, t, we can view the state of

the problem as represented by a vector of n numeric values:

Vt = <S1t, S2t, ..., Snt>

V1 V2 … VqS1 S11 S12 … S1qS2 S21 S22 … S2q… … … … …Sn Sn1 Sn2 … Snq

Time 19

IIIT

All

ahab

ad

Data Stream Modeling• Single pass: Each record is examined at most once• Bounded storage: Limited Memory for storing synopsis• Real-time: Per record processing time must be low• Summarization (Synopsis )of data• Use data NOT SAMPLE• Temporal and Spatial• Dynamic• Continuous (infinite stream)• Learn• Forget• Sublinear growth rate - Clustering

20

20

IIIT

All

ahab

ad

MMA first order Markov Chain is a finite or countably infinite

sequence of events {E1, E2, … } over discrete time points, where Pij = P(Ej | Ei), and at any time the future behavior of the process is based solely on the current state

A Markov Model (MM) is a graph with m vertices or states, S, and directed arcs, A, such that:• S ={N1,N2, …, Nm}, and• A = {Lij | i 1, 2, …, m, j 1, 2, …, m} and Each arc,

Lij = <Ni,Nj> is labeled with a transition probability Pij = P(Nj | Ni).

21

IIIT

All

ahab

ad

Extensible Markov Model (EMM)• Time Varying Discrete First Order Markov Model• Nodes are clusters of real world states.• Learning continues during application phase.• Learning:• Transition probabilities between nodes• Node labels (centroid/medoid of cluster)• Nodes are added and removed as data arrives

22

IIIT

All

ahab

ad

EMM Creation

<18,10,3,3,1,0,0>

<17,10,2,3,1,0,0>

<16,9,2,3,1,0,0>

<14,8,2,3,1,0,0>

<14,8,2,3,0,0,0>

<18,10,3,3,1,1,0.>

1/3

N1

N2

2/3

N3

1/11/3

N1

N2

2/3

1/1

N3

1/1

1/2

1/3

N1

N2

2/31/2

1/2

N3

1/1

2/3

1/3

N1

N2

N1

2/21/1

N1

1

23

IIIT

All

ah

ab

ad

EMMRare• EMMRare algorithm indicates if the current input

event is rare. Using a threshold occurrence percentage, the input event is determined to be rare if either of the following occurs:• The frequency of the node at time t+1 is below

this threshold • The updated transition probability of the MC

transition from node at time t to the node at t+1 is below the threshold

24

Sublinear Growth Rate

IIIT

All

ah

ab

ad

25

Rare Event in Cisco Data

IIIT

All

ah

ab

ad

26

References1. VoIP Mechanic, “What is VoIP?, a tutorial.” http://www.voipmechanic.com/what-is-voip.htm .2. Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal

Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634.3. Cisco, “CDR Logging Configuration with Syslog Servers and Cisco IOS Gateways,” Document ID: 14068,

February 24, 2006, http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml .

4. Cisco, “Voice Over IP – Per Call Bandwidth Consumption,” Document ID: 7934, February 2, 2008, http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml .

5. “VoIPThink”, http://www.en.voipforo.com , Accessed February 1, 2012.6. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM

Conference, November 2004, pp 371-374.7. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,”

Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)

8. Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50.

9. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) (Extended version submitted to Journal of Computers.)

10. Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp 629-634.

IIIT

All

ah

ab

ad

27


Recommended