+ All Categories
Home > Documents > VoIP and Telephony

VoIP and Telephony

Date post: 05-Apr-2018
Category:
Upload: pachito-marco-calabrese
View: 217 times
Download: 0 times
Share this document with a friend

of 24

Transcript
  • 7/31/2019 VoIP and Telephony

    1/24

  • 7/31/2019 VoIP and Telephony

    2/24

    2 VoIP

    1 Introduction

    VoIP stands for Voice over Internet Protocol. VoIP is a term that describe a set of technology for providetelephony over Internet. The main difference with the classic telephony is the way the voice is transportedin the network and the type of the network. The classic telephony uses PSTN (Public Switch Telephone

    Network) which is a circuit switched network (SN) VoIP uses Internet which is a package switched network(PN). The different way to transport audio flow force to implement the architecture, the protocols, and allthe mechanisms for authentication handling compression and so in complete different fashion. From a toppoint of view the user will still have a telephony service.

    2 VoIP

    Figure 1: Logic VoIP network

    The purpose of the VoIP technology is provide telephony service over Internet. It takes a audio stream (thevoice) and it sends through some mechanism over the Internet protocol using the broadband connection(one of the most common VoIP is Skype). A VoIP call can be establish between computers, betweencomputer and fixed phone or between computer and mobile phone. Also some smartphone are VoIPenabled and thank to the WiFi connection can establish calls from any WiFi hotspot. VoIP is also known asInternet Telephony or Voice over Broadband. Because the calls go through Internet they are not chargedfrom the operator so VoIP might be a more convenient way to call. But before explore pros and cons, letssee how the architecture of VoIP service are, lets see hot the protocols works and lets also dig into thesecurity. In this document we will focus on Session Initiation Protocol (IETF RFC 3261) which is one ofthe most common use open standard protocol for VoIP communication. We will also see some conceptof Skype protocol. There are different protocols out there but it is difficult cover all of them. From thestandard lets define SIP.

    SIP is an application-layer control (signaling) protocol for creating, modifying, and terminat-ing sessions with one or more participants. These sessions include Internet telephone calls,multimedia distribution, and multimedia conferences.

    Basically SIP is the signalling protocol for manage a session (a call between user) and is used for route

    calls and multimedia.

    2.1 Architecture

    SIP network is described in figure . There are the following logical entities:

    User Agent (UA)

    Registrars

    Location server

    Redirect server

    Proxy server

    2

  • 7/31/2019 VoIP and Telephony

    3/24

    2.1 Architecture 2 VoIP

    From a logical point of view an UA can be an UAC (UA-Client) or an UAS (UA-Server) and it will be for theentire SIP session. So the protocol SIP establish a communication (in a client-server fashion) betweenUAC and UAS. The UAC send some request and the UAS answare back with some responces. The mainfunction of the SIP servers is to provide name resolution and user location, since the caller is unlikely toknow the IP address or host name of the called party, and to pass on messages to other servers using

    next hop routing protocols. Lets see how SIP accomplish these tasks and how those logical entities areinvolved in the entire process.

    2.1.1 Registration

    Figure 2: Registration procedure

    [7]Registration is the procedure for discover how reach a user. Basically each user before establish a callmust find the user in the network. The logical device involved for the registration procedure is the Regis-trars. In refer to figure the user Carol send a REGISTER request to a the Registar. The registar will thenstore the address of Carol in the database (and , so now Carol can be reached. Now that Carol is regis-tered Bob wants call her (INVITE request), so he digits carol SIP address-of-record [email protected] his smartphone (VoIP enabled). The chicago.com is resolved by DNS (in the same fashion it hap-pends for an email) and then the INVITE request is sent to the SIP proxy server sip.chicago.com. Theproxy server will receive the carol SIP address-of-record and it will look up in the table of the locationserver (query) and then it will get (Resp) the SIP URI that identify a particular end point for [email protected]. Now that the proxy server has the SIP URI of the user so can completethe INVITE message and deliver it to the user Carol. The REGISTER procedure must be repeated forkeep alive the bindings of the address in the Registrar. Each Registrar request received from the Registrarwill follow with a responce from the Registrar to the terminal with all the list of the binding and the expiring

    timer.

    3

  • 7/31/2019 VoIP and Telephony

    4/24

    2.1 Architecture 2 VoIP

    2.1.2 Dialogs

    Figure 3: Example of a INVITE transaction

    When the users have URI and they have valid binding with the Registar they can be reached.

    A dialog represents a peer-to-peer SIP relationship between two user agents that persistsfor some time.

    The dialog is a flow of messages proper routed between the UAs involved in the dialog itself. A dialog iscontrolled by the dialog ID which is composed by:

    Call-id

    Local tag

    Remote tag

    The Call-id identify a unique group of messages between UAs. The format of a Call-id is the [email protected]. The local tag at one UA is identical to the remotetag at the peer UA. Dialogs are created by the generation of a specific set of valid request and responces(positive responses are 101-199 and 2XX). The massages are transmitted in a client-server fashion. Therequest for establish the dialog is the INVITE request. The INVITE request is generated by the UA that

    wants to call another UA. The INVITE request generates a INVITE transaction where is performed thethree-way handshake authentication. Lets refer to figure, and lets make an example. The INVITE request

    4

  • 7/31/2019 VoIP and Telephony

    5/24

    2.1 Architecture 2 VoIP

    is sent by the UA with the SIP URI sip:[email protected] to the UA with SIP URI sip:[email protected]. Theinvitation transaction performs and the INVITE message contains some information for control the calling(like from where the message came to who is send the messages and the Call-ID) and also contains alisting of media types and associated encoding that the calling party is willing . Same of the information areused for routing the INVITE message and some others will be usefull later for set up the dialog (list of the

    audio codecs). All the messages are sanded through proxies which will forward and route the messages tothe correct UA. If the negotiation ends with positive responces, the dialog is set-up and then the users areconnected peer-to-peer. When the dialog is established between two UAs, the UA who send the requestwill act as UAC and the UA who will responce will act as UAS (Note that these may be different rolesthan the UAs held during the transaction procedure before establish the dialog). Some of the responsereceived from the UAC (and sanded by UAS) are:

    100 Trying - This response indicates that the request has been received by the next-hop server andthat some unspecified action is being taken on behalf of this call (for example, a database is beingconsulted). This response, like all other provisional responses, stops retransmissions of an INVITEby a UAC

    180 Ringing - The UA receiving the INVITE is trying to alert the user. This response MAY be used

    to initiate local ringback

    200 OK - The request has succeeded. The information returned with the response depends on themethod used in the request

    481 Call/Transaction Does Not Exist - This status indicates that the UAS received a request thatdoes not match any existing dialog or transaction

    484 Address incomplete - The server received a request with a Request-URI that was incomplete.Additional information SHOULD be provided in the reason phrase

    The packet switch nature of the network make the communication based on a client-server architecture.It is clear that this architecture and protocol are inspired from other protocols like HTTP, IMAP and POP3.Lets remember that SIP handles only the signalling and the messages exchange between the UAs, and

    does not care about the audio flow itself. The audio flow is exchanged directly between the peers involvedin the conversation.

    2.1.3 SIP network and other networks

    Figure 4: Media Gateway interconnect VoIP network to TDM network

    Beside calling from VoIP to VoIP phone (also called softphone because they are software running onsmartphone or PC) it is possible to call between a classic phone (connected to PSTN) and a VoIP phone.Thanks to the gateways which interconnect SIP network with other kind of networks. Basically the gateway

    is able to convert a bunch of SS7 protocol (signalling on the TDM) to VoIP protocols like SIP or H.323. Forexample a typical TDM to VoIP media gateway will support the following signalling protocols.

    5

  • 7/31/2019 VoIP and Telephony

    6/24

    2.2 Protocols 2 VoIP

    Figure 5: Protocols supported by a typical TDM to VoIP media gateway

    2.2 Protocols

    2.2.1 Protocol stack

    Figure 6: SIP protocol stack

    Sip protocol stack is shown in figure 6. SIP works on UDP and TCP transport layer. SIP protocol itself isa signalling protocol and no more, so this means that for have a call we need other protocols that will take

    care the real time audio stream and the codecs for compress the audio. In this section we will see theprotocols SIP RTP and the codecs used.

    6

  • 7/31/2019 VoIP and Telephony

    7/24

    2.2 Protocols 2 VoIP

    2.2.2 SIP

    [7]As already we have seen the SIP protocol works in client server fashion. For registration, invite trans-action and control of the call UAs send and receive request and responces. Lets list the SIP methods:

    INVITE ACK

    OPTION

    CANCEL

    BYE

    REGISTER

    These are methods sanded by the UAC (UA Client). Instead the UAS sends the Responses. The re-sponses are grouped in six family. Lets list them:

    1XX Provisional 100 Trying

    2XX Successful 200 OK

    3XX Redirection 302 Moved Temporarily

    4XX Client Error 404 Not found

    5XX Server Error 504 Server Time-out

    6XX Global failure 603 Decline

    This methods and responses are used in text-encoded messages which are exchanged between UAs. Forsee how those messages are built lets take an example similar to the previous one and lets analyze stepby step how an INVITE session works from a protocol point of view.

    Figure 7:

    7

  • 7/31/2019 VoIP and Telephony

    8/24

    2.2 Protocols 2 VoIP

    [7, 8]Alice has a SIP URI sip: [email protected]. She wants call Bob, so she type on her soft phonethe Bob SIP URI sip:[email protected]. where biloxi is the domain of the Bobs SIP service provider. WhenAlice text Bobs SIP URI and send the call. Sendind the calls means that she forward a INVITE messageover the network. This text encoded message look like this:

    In the first line of the message we find INVITE which is the method used and then the SPI URI of theuser we want to call and in the end the protocol with its version. The other lines are the headers of themessage. Lets go through each of them.

    Via contains the protocol and the version (2.0) and the transport protocol used UDP in this case, afterthat there is the address (pc33.atlanta.com) at which Alice is expecting to receive responses to thisrequest (that means that she is registered to that registar server). and in the end there is the branchparameter that identifies this transaction.

    Max-Forwards defines how many hop server can go through, in this case this request cannot go morefar away of 70 hops. Each hops will decrees of 1 this value.

    To contains the display name Bob and the SIP URI in the angular brackets towards which the request wasoriginally directed.

    From also contains the originator of the request. It has the same format of the field To, but also has a tagparameter containing a random string (1928301774) that was added to the URI by the softphone, itis used for identification purpose.

    Call-ID contains a globally unique identifier for this call, generated by the combination of a random stringand the softphones host name or IP address. The combination of the To tag, From tag, and Call-IDcompletely defines a peer-to-peer SIP logic link between Alice and Bob. This field is unique for eachcall.

    CSeq or Command Sequence contains an integer and a method name. The CSeq increments for eachnew request within a dialog and is a traditional sequence number, it helps to track the commandhistory of a dialog.

    Contact contains a SIP or SIPS URI that represents a direct route to contact Alice, usually composed of a

    username at a fully qualified domain name (FQDN). While an FQDN is preferred, many end systemsdo not have registered domain names, so IP addresses are permitted. This field might be similar tothe Via header field tells other elements where to send the response, the Contact header field tellsother elements where to send future requests, within a dialog Via is used, in a new dialog is usedthe Contact field for directly route the request to Alice.

    Content-Type contains a description of the message body in this case is a SDP packet

    Content-Length contains the length of the message body in octet in this case 142 bytes.

    The message body is not shown but we can see from the Content-Type and Content-Length headers thatis a SDP packet long 142 bytes.SIP body message contains a Session Description Protocol packet. Inthe headers of the SDP packet there are details about the session, such as the type of media, codec, or

    sampling rate, which are not described using SIP. SDP will be discussed later in the document.Referring to the figure we can see that the message is forward to the SIP proxy server that serve Alices

    8

  • 7/31/2019 VoIP and Telephony

    9/24

    2.2 Protocols 2 VoIP

    domain atlanta.com can be resolved by DNS or can be an IP address saved in the Alices soft-phone. Theproxy server atlanta.com receive the INVITE message and response back with a 100 Trying, which meansthat the server received the request and that the proxy server is processing the message to route it to thedestination. The proxy server atalanta .com it obtains with a DNS request the IP address of the biloxi.comand forward the INVITE request there.But before forwarding the request, the atlanta.com proxy server

    adds an additional Via header field value that contains its own address. So the text encoded messagesent from atlanta.com to biloxi.com will be:

    The biloxi.com proxy server receives the INVITE and responds back with a 100 (Trying) .The atlanta.comknows that the INVITE request is processing by biloxi.com. The proxy server biloxi.com it ask to a database(location server) the IP address of Bob. The biloxi.com proxy server after have discovered the IP addressof Bob (like the atlanta.com proxy server) adds another Via header field value with its own address to theINVITE message and proxies it to Bobs SIP phone.

    Bobs SIP phone receives the INVITE and the softphone it alerts Bob to the incoming call (by popping upa message on the display of a computer, or by ringing a smartphone) from Alice so that Bob can decidewhether to answer the call. Bobs SIP phone wants to indicate to Alice that it is ringing, so it sent a 180(Ringing) response, which is routed back through the two proxies in the reverse direction. Each proxy uses

    the Via header field to determine where to send the response and removes its own address from the top.So Alices softphone will receive a ringing response and it will wait until Bob answare. Alice will warnedperhaps by hearing an audio ringback tone or by displaying a message on Alices screen. Bob after someseconds decides to answer the call. When he picks up the handset, his SIP phone sends a 200 (OK)response to indicate that the call has been answered. The 200 (OK) carries, in its body, an SDP packetwith a media description of the type of session that Bob is willing to establish with Alice. As a result, thereis a two-phase exchange of SDP messages, the first as been sent from Alice to Bob and second is sentfrom Bob back to Alice. This exchange of SDP packet provides basic negotiation capabilities.

    9

  • 7/31/2019 VoIP and Telephony

    10/24

    2.2 Protocols 2 VoIP

    This SIP message is a 200 response message and it has the same field of the invite message. Bobs SIPphone has added a tag parameter to the To header field. This tag will be incorporated in the future textencoded messages exchanged during this call. This 200 OK messages is routed to Alices softphonewhich will stop ringing and it will display that the call has been answered. In the Contact header of the lastmessage there is the direct SIP URI of Bob so from now on he can be reached directly. In fact Alice sends

    an ACK (a 200 OK) back to Bob in a direct way bypassing the two proxies (from now on because both theends points has learned their directly SIP URI they can exchange data without go through the proxies).This completes the INVITE/200/ACK three-way handshake used to establish SIP sessions. The SIP ses-sion between the Alice and Bob is up. Now the can send media packets using the format to which theyagreed during the three-way handshake. In general, the end-to-end media packets take a different pathfrom the SIP signaling messages, so now we have two logical point-to-point flow, one for the signallingheld by the SIP protocol and another for the (audio) media flow held by some other protocol. A protocolthat is able to perform real time streaming is the Real Time Protocol. But before see deep in the RTP letssee how SDP works.

    2.2.3 SDP

    [9]The Session Description Protocol (SDP) is a format for describing streaming media initialization param-eters. As we have seen voice-over-IP calls have to convey some stream (audio) the SDP describe howthis media information are transported. The streaming itself will be handled by some codecs (see figure6)The set of properties and parameters are often called a session profile. SDP supports many mediatypes and formats. The following parameters are described in a SDP session:

    The type of media (video, audio, etc.)

    The transport protocol (RTP/UDP/IP, H.320, etc.)

    The format of the media (H.261 video, MPEG video, etc.)

    For unicast IP sessions (like VoIP) are convey also this parameters:

    The remote address for media

    The remote transport port for media

    An SDP packet has the following fields (the fields with a O are optional):

    10

  • 7/31/2019 VoIP and Telephony

    11/24

    2.2 Protocols 2 VoIP

    Session description

    v protocol version

    o originator and session identifier

    s session name

    i session information O

    u URI of description O

    e email address O

    p phone number O

    c connection information not required if included in all media O

    b zero or more bandwidth information lines O

    z time zone adjustments) O

    k encryption key O

    a zero or more session attribute lines O

    Time description

    t time the session is active

    r zero or more repeat times O

    Media description

    m media name and transport address

    i media title O

    c connection information optional if included at session level O

    b zero or more bandwidth information lines O

    k encryption key O

    a zero or more media attribute lines O

    An example for the offer of SDP session of a VoIP call is:

    for a answare is:

    Lets describe with more details some of the most relevant fields.

    Protocol Version ("v=") - gives the version of the SDP. The version in the previous examples is 0.

    Origin ("o=") - is a complex string containing some fields in the following format o= .

    11

  • 7/31/2019 VoIP and Telephony

    12/24

    2.2 Protocols 2 VoIP

    in the last example is bob, and is the ID of the user that originates the session.

    is the unique identifier of the session. It is a number which is created from someof the fields contained in the origin field and often with the Network Time Protocol for ensureuniqueness.

    is a version number for this session description, technically could be a clone ofthe sess-id like in the examples made.

    is a text string giving the type of network, in our examples is IN which stands forInternet.

    is a text string giving the type of the address that follows. Basically could be "IP4"or "IP6" but can be defined other values as well.

    is the address of the machine from which the session was created. For anaddress type of IP4, this is either the fully qualified domain name (FQDN) of the machine (likein our example) or the dotteddecimal representation of the IP version 4 (format xxx.xxx.xxx.xxx)address of the machine. For an address type of IP6, this is either the fully qualified domainname of the machine or the compressed textual representation of the IP version 6 address ofthe machine.

    Session Name ("s=") - is the name of the session, usually is a human intelligible name (could beVoIP call ). This field cannot be empty in our examples there is a space, so actually is s= .

    Session Information ("i=") - provides textual information about the session, this field is optional andis not present in ours examples.

    URI ("u=") - provides more information about the session, this field is optional as well.

    Connection Data ("c=") - the format for this field is andcontains connection data. The first two fields are IN and IP4 and they have the same meaning of theinternal fields of the origin field. The last, which in our case is is an IPaddress in the unicast case.

    Timing ("t=") - defines the stop and start time of the session. The format of this field is

    . This field can be opted during the session with additional "t=" lines.

    Media Descriptions ("m=") - has three sub-field in the following format lets describe them:

    is the media type. Currently defined media are "audio", "video", "text", "application",and "message". In our last example is audio in the first m field and video in the second one.

    is the transport port to which the media stream is sent. This value must coherent withthe transport protocol specified in the sub field protocol.

    is the transport protocol. The meaning of the transport protocol is dependent on theaddress type field in the relevant "c=" field. Basically if we have a IP4 protocol defined in thefield c= we must use a transport protocol for IP4.

    is a media format description. The interpretation of the media format depends on thevalue of the sub-field. If the sub-field is "RTP/AVP" or "RTP/SAVP" (like in ourexamples) the sub-fields contain RTP payload type numbers. Because the interpretationof this field depends from field there are other implementation so the meaning of the can dramatically change depending from the .

    SDP Attributes (a=) - there are many attributes for SDP but we go through only one used in theexample the a=rtpmap. This attribute maps from an RTP payload type number (as used in an"m=" line) to an encoding name denoting the payload format to be used. It also provides informationon the clock rate and encoding parameters. this manly is a value that refers to the encoding nameused in the m= field. In our specif case the a=rtpmap relative to the m= audio ... field says thatthe type of the payload is 0 the codec is PCMU (Pulse code modulation known also as G.711 usedmainly in the telephony [1]) 8000 indicates how many samples per second.

    For the video a= field we have that the type of payload is 32 the codec is MPV and 90000 Hz asclock rate. A complete list of the RTP/AVP audio and video payload types is on http://en.wikipedia.org/wiki/RTP_aud

    12

    http://en.wikipedia.org/wiki/RTP_audio_video_profilehttp://en.wikipedia.org/wiki/RTP_audio_video_profile
  • 7/31/2019 VoIP and Telephony

    13/24

    2.2 Protocols 2 VoIP

    2.2.4 RTP

    [6]The last protocol we need is the one for send the audio and video. The Real-time Transport Protocol(RTP) defines a standardized packet format for delivering audio and video over IP networks. The definitionfrom the standard:

    RTP provides end-to-end network transport functions suitable for applications transmit-ting real-time data, such as audio, video or simulation data, over multicast or unicast networkservices. [6]

    Basically is the protocol for transmit the voice (because is a real-time data). RTP service include payloadtype identification, sequence numbering, timestamping and delivery monitoring, but does not provide anymechanism for ensure flow controlling (delivery or prevent out-of-order delivery) or QoS (only monitoring).RTP is supported by the protocol RTCP (RTP control protocol) to monitor the quality of service and toconvey information about the participants in an on-going session. The audio data are preceded by RTPheader. The RTP header indicates which kind of codecs are used also contains timing information and asequence number that allow the receivers to reconstruct.

    Figure 8: Because of the unpredictable Daley of the IP network the order at the receiver point is not thesame of the sender, but fortunately we can reconstruct the right order thanks to the sequence number.

    Often (like in our case) an RTP data and RTP header are contained in a UDP packet (figure 6). RTPhas fixed header field (while SDP and SIP where encoded tex messages). The RTP fixed header is thefollowing.

    1 2 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

    V P X CC M PT sequence number

    time stamp

    synchronization source (SSRC) identifier

    contributing source (CSRC) identifiers

    ....

    Lets see the meaning of each field:

    version (V): 2 bits - This field identifies the version of RTP.

    padding (P): 1 bit - If the padding bit is set, the packet contains one or more additional padding octets

    at the end which are not part of the payload. The last octet of the padding contains how many octetsmust be consider as padding included itself.

    extension (X): 1 bit - If the extension bit is set the fixed header must be followed by exactly oneheader extention.

    CSRC count (CC): 4 bits - contains the number of CSRC identifiers that follow the fixed header.

    marker (M): 1 bit - indicates that this header warn a significant event. This could be for exampleframe boundaries to mark the begin and the end of a packet stream.

    payload type (PT): 7 bits - This field identifies the format of the RTP payload and determines itsinterpretation by the application. For example a sender report packet will have a PT=200. Each wellknown packet has its own code and packet format (see table 1).

    13

  • 7/31/2019 VoIP and Telephony

    14/24

    2.2 Protocols 2 VoIP

    sequence number: 16 bits - The sequence number increments by one for each RTP data packetsent, and may be used by the receiver to detect packet loss and to restore packet sequence. Theinitial value of the sequence number should be random for security reason.

    timestamp: 32 bits - The timestamp reflects the sampling instant of the first octet in the RTP data

    packet. The sampling instant must be derived from a clock that increments monotonically and lin-early in time to allow synchronization and jitter calculations. The role of the timestamp is importantbecause without synchronization in the receiver end will be impossible to reconstruct the audio flow.

    SSRC: 32 bits - The SSRC field identifies the synchronization source. This identifier should bechosen randomly, with the intent that no two synchronization sources within the same RTP sessionwill have the same SSRC identifier. Basically the SSRC must be unique for each session.

    CSRC list: 0 to 15 items, 32 bits each - The CSRC list identifies the contributing sources for thepayload contained in this packet. The number of identifiers is given by the CC field.

    Packet type Acronym Description

    192 FIR Full INTRA-frame request193 NACK Negative acknowledgment

    200 SR Sender report for transmission and reception statistics from active senders (periodically transmitted)

    201 RR Receiver report for reception statistics from participants that are not active senders (periodically transmitted)

    202 SDES Source description items (including CNAME canonical name)

    203 BYE Goodbye Indicates end of participation

    204 APP Application specific functions

    207 XR RTCP extension

    Table 1: Packet Type

    Let us make an example on the packetization of the audio and let us find the packet rate and lets see howthe sequence number and time stamp is used. An audio format for digitalizing the voice is the PCM (pulsecode modulation). It works by sampling the voice at 8000 Hz. Basically each sampling is 125us of voice.Let us say that each RTP can carry 160 sampling of 125us each, so 20ms per packet. The first packet willbe send with a random time stamp (we call this random value x) and a random sequence number (we callit y) . The second packet will have the timestam incremented by 160, so will be x+160 and the sequencenumber of y+1). Each RTP packet will carry 160 samples of 8bit each so 1280 bits of payload.For better explanation of the SSRC and CSRC fields lets make another example.

    Figure 9: Scenario with different media flows

    [10]What we can see from figure 9 is that the SSRC indicates a unique source like the microphonesare labeled with different SSRC fields. The CSRC, instead, list different mixed sources which belong

    to the same media (in this case audio). So the SSRC=20 CSRC (10,11) tell us that the unique sourceSSRC=20 is a mix of two unique audio sources 10 and 11. What also we can see in this scenario is

    14

  • 7/31/2019 VoIP and Telephony

    15/24

    2.3 Security 3 Telephony

    the Translator. The Translator is an element of the RTP network which is able to translate the codecsfor better utilization of the bandwidth. For example for a mobile device the bandwidth is limited so thetranslator before deliver the audio data to the mobile access network will recodec the data with a codecthat requires less bandwidth (with some audio quality loss).

    2.3 Security

    Each layer of the protocol stack can include a security mechanism. The RTP carries the audio data, thisdata are confidential. For ensure that this data are not transparent over the network the RTP can providea secure layer with an AES encryption of the data. The lower layer can implement extra encryption andother mechanism like certification server logging, key distribution and authentication and integrity services(these mechanisms are complex and are outside the scope of this document). What is important toremember that by increasing the security the complexity and the processing penalty will increase as well.

    3 Telephony

    3.1 Architecture

    [11]Telephone network is a Circuit-switched network. In a circuit switched network in order to communicatea circuit or connection must be established by the network. Basically there are three phases involved:circuit establishment, data transfer and circuit disconnect. The first phase find the path and alert theother party of the incoming call (by ringing the phone for example) allocate channel (dedicate resources).The data transfer consist in the conversation of the users involved by transferring voice data over thecircuit created. And finally the last phase consist in the release of the resources when they are not usedanylonger (hang up the phone). The cons of this kind of network is mainly that a allocated resource if notused is wasted because a circuit is a dedicate channel between the users, and is not shared with anyoneelse. Let us make an example of a basic telephone network.

    Figure 10: A simple telephonenetwork

    In figure 10 we can see the links used in a basic telephone network. The local loop is a dedicate linethat connects the central office 1 the the phone (or phones) of a subscriber. And the same happen for theother end central office 2 is connected with the other user phone with another local loop. Basically everyuser has a local loop connected with the (closest) central office. Central office are interconnected with aT1 line. T1 lines are SDH/SONET links that use Time Division Multiplex for multiplexing different calls. T1line is organized into frames. Each frame contains 24 time slot. Each time slot is 8 bit long and carries asingle voice call in the format PCM uncompressed 64Kbps. Before explore the architecture let us see theentities of a SDH/SONET network.

    15

  • 7/31/2019 VoIP and Telephony

    16/24

    3.1 Architecture 3 Telephony

    Figure 11:

    SDXC synchronous digital cross connect - Is the element that connects two rings, is able to switchbetween lines with different speed. Can also add and drop lower order signals.

    ADM add/drop multiplexer - this used as a node of ring and can add and drop lower order signals.

    MUX Multiplexer - multiplexes lower order SDH to higher order SDH. Links between PDH and SDH.

    DEMUX De-multiplexer - demultiplexs higher order SDH to lower order SDH.

    Reg - regenerates the signal. It also has some supervision functionality for the network administra-tion.

    From a macroscopic point of view a generic wide telephone network can be drown as in figure 12. PDHnetworks (Plesiochronous Digital Hierarchy) are network where are transported high quantities of date overfiber. The SDH-island network will provide low-to-high and high-to-low speed for interlink the customers.Generally the SDH-island are made from rings interconnected with each other.

    Figure 12: PDH and SDH interconnected networks. Macroscopic view of a wide telephone network

    16

  • 7/31/2019 VoIP and Telephony

    17/24

    3.2 Protocols 3 Telephony

    Figure 13: Real case of a SDH-island network of rings interconnected

    3.2 Protocols

    SDH/SONET is a digital transport technologies for technology. PDH is the first generation of digital trans-

    port technology. SONET was proposed by Bellcore (now Telecordia) and is the second generation digitaltransport technology. SONET stands for synchronous optical network. Synchronous because all the net-works works synchronously and optic because the media where the digital data are on is optic fiber. Thethird generation digital transport network is the G.709 (also known as digital wrapper). This new standarduse optic multiplexing (WDM wavelength division multiplexing) and can carry IP packets, ATM cells Eth-ernet frames and SONET/SDH traffic. We will focus in this section on the SDH/SONET frames. As wealready see T1 links are used to interconnect SONET/SDH equipment. Basically because over the net-work there is a continuous multiplexing and demultiplexing has been defined a standard that specifies howmultiplex several voice calls onto a single link. This is called DS standard. Because of the PCM each voicecalls is 64Kbps (DS0). DS1 is 24 DS0, so is 24 calls multiplexed. DS1C is the concatenation of two DS1so is 48 calls. The information over SONET are transmitted into frames. Because of the synchronicity ofthe network these frames are transmitted continuously one after the other. All the data carried over optical

    link are converted in the electronic domain. The electrical side of SONET is called STS and the electricalside of the SDH. STM. The electrical side are equal and interwork flawless. In the table we can see theData relation between STS and STM. For example STM-1 is three times STS-1, so STM-1 has the samedata rate of STS-3, which is 155.520 Mbps. Basically by explaining the STS-1 frame is easy to deducehow other STS and STM frames are structured.

    17

  • 7/31/2019 VoIP and Telephony

    18/24

    3.2 Protocols 3 Telephony

    3.2.1 STS-1 frame

    Figure 14: SONET frame

    Each frame contains 810 bytes which can be rapresented by a matrix 9x90. 9 rows each of 90 bytes. thefirst three byte of every row are dedicated for TOH (transport overhead). The rest is the payload. Basicallythe 3,33% of the entire frame is used for overhead, and the rest for payload. Also the payload is calledSPE (synchronous payload envelope) and it has some overhead (actually does not carry only pure rawdata) called POH (payload overhead. The TOH is made of the SOH (section overhead) and the LOH(line overhead). Section line and path are entities of the SONET stack. Let us define them. Referringto the figure the path links user-to-user and rapresents the circuit created by the network for set up theconversation. The line is the logical link that inter connets multiplexing/demultiplexing nodes and transportmultiple STS-1 frames in bigger frame like STS-12 (for example). The section is the logical link betweenevery SONET equipment like a generator and a multiplexer. Devices that can terminate a path a line or asectin are respectively called PTE, LTE and STE (TE stands for terminating equipment). In a STS-1 theSOH contains the following fields

    A1 and A2 - Framing bytes are used for alignment. Identify uniquely the beginning of an STS-frame.

    J0 - Section trace is for trace the STS-1 frame back to its originating equipment.

    B1 - Bit interleaved parity is the BIP-8 to perform an even parity check on the previous STS-1.

    E1 - 64Kbps channel provide a voice channel for field engineers.

    F1 - This byte is used by the network operator.

    D1,D2,D3 These bytes are used for network management operations.

    The LOH contains

    H1 and H2 are pointer bytes. Specifies the offset between the H1 and H2 and the beginning of theSPE.

    H3 - pointer action used for compensate the slightly timing differences between SONET devices.

    B2 - carries BIP-8 parity check performed.

    K1 and K2 - These two bytes are used in automatic protection switching.

    D4 to D12 - These are 576Kbps which is used for network management.

    18

  • 7/31/2019 VoIP and Telephony

    19/24

    3.2 Protocols 3 Telephony

    Z1 and Z2 - These two bytes have been partially defined.

    E2 - This byte is similar to the E1 byte in the section overheads.

    The POH is embedded in the SPE and has the following in fields

    J1 - This byte is similar to J0 in the section overhead.

    B3 - This byte carries like the B1 and B2 the BIP-8 parity check on the payload section.

    C2 - This defines which kind of user information carried in the SPE. VT, asynchronous DS3, ATMcells, HDLC-over-SONET, and PPP and SONET.

    G1 - Path status byte carries diagnostic signals.

    F2 - This byte is reserved for future use.

    H4 - This byte is used to identify payloads carried within the same frame.

    Z3 and Z4 - These are reserved for future use.

    Z5 - This byte is used for tandem monitoring. A tandem is a telephone switch that is used in thebackbone telephone network. It switches traffic between other telephone switches and it does notserve subscribers directly.

    STS frames can carries three type

    Virtual tributaries

    This VTG (virtual tributary groups) payload is 108 byte which refereed to the figure occupies 7 rows. InaSPE there could be maximum 12 VTG. Then will remain 3 column one for POH, and the other two arereserved for future use. All the different format of VTG are described in the table below.

    VT carried by the VT VT in VTG Voice channels per VT Voice channel per VTG Voice channel per SPEVT1.5 DS1 4 24 96 1152

    VT2 E2 3 30 90 1080

    VT3 DS1 unchannelized 192bits + 1 framing bit 2 192 bit (data) 384 bit 4608 bit

    VT6 DS2 1 96 96 1152

    Table 2: Virtual tributaries

    ATM cells

    As we can see in figure 14 in the SPE we have 774 byte. Because an ATM cell consists of 53 bytes. AnSPE can contain 774/53=14,6 ATM cells. Because is not a integer number the last cell can be straddle intwo successive SPE. Because ATM cells are not transmitted continuously there are idle cells for maintaina continuous bit stream. Idle cells can be identified uniquely since the header is marked with VPI=0 VCI=0PTI=0 and CLP=0.

    Packet over SONET (PoS)

    Is a scheme for carry directly IP packets over SONET frames. IP packets are encapsulated in HDLCand the are mapped in to the SPE payload. Like for the ATM cells the packets can be straddled in twosuccessive SPE. Like for the ATM cells there is an idle frame (7E) for continuous bit stream when thereare no IP packets to transmit.

    19

  • 7/31/2019 VoIP and Telephony

    20/24

    3.3 Reliability 3 Telephony

    3.3 Reliability

    Figure 15: Failure on the working ring link

    The way SONET rings are architecture, makes SONET infrastructures highly reliable. Also for this reasonthey are called five nines, which basically means the 99,999% they are working. Can happen in everyinfrastructure that a cable get cutted or some equipment stop working. For prevent out-of-service becauseof this accidents the SONET has some mechanism for be highly reliable. The high degree of redundancy

    ensure that there is always some link that connects two nodes or there are always some alternativepaths available in case of a failure. For example for the ring fiber links there are different scheme ofredundancy 1+1 1:1 and 1:N. The 1+1 scheme has two fibers(usually called working and protection fiber)which are used simultaneously and if one brakes all the traffic will be put only on the working fiber whilethe engineering will fix the problem on the broke fiber. In the 1:1 there are two fibers but only one is used,but if get broken then the traffic will switched on the protection fiber. the 1:N is similar on the 1:1 schemebut there are N protection fibers. Is important to keep in mind that the two links should follow differentroutes, otherwise the redundancy is really not efficient. The system is called self-healing and is managedby the automatic protection switching protocol (APS). The restoration time of the service hes to be lessthan 50 msec. The redundancy is applied also in from other point of view of the infrastructure. All theequipment is always powered by uninterruptly power supplies (UPS).This beacuse if there is a black outall the equipment must get powered by a different power source for be kept on and ensure the continous of

    service. UPS provides also high quality power source by filtering spikes and prevent small drop voltage ofthe electricity line, and other different kind of power protection. The data integrity is ensured by the BIP-8algorithm.

    20

  • 7/31/2019 VoIP and Telephony

    21/24

    3.4 Security 4 Differences between VoIP and Telephony

    3.4 Security

    Figure 16: SONET/SDH network with Encryptor

    Data over SONET/SDH are confidential, so the data must be protected by some mechanism. Securityon the SONET/SDH level is ensure by dedicate equipment. This equipment are called SONET Encryptorand basically they encrypt data over the SONET network otherwise sent in plain text. In the figure we cansee where the SONET/SDH are encrypted. There is a small section of the network where the data traffictravels in plain text.

    4 Differences between VoIP and Telephony

    Figure 17: Three protcols involved for a VoIP call

    In this section we will compare the main differences of the VoIP and Telephony.

    Network category - VoIP runs on IP/UDP which is a connectionless switched network. The voicecalls run on SONET/SDH network which is circuit switched network. The difference of the two kindof the network are mainly in the way allocate resources and transmit the data. A packet switchingnetwork the packets are transferred separately, and they follow different path over the network. Thepackets because are following different path arrive to the destination with a different order comperedto the origin point. This means that the for VoIP we need some protocols for ordering the data.SONET/SDH allocate a real circuit which link end to end the users. Basically the circuit is a dedicatedlink for a determinate period, when the phone call is released that path will be available.

    Architecture - The equipment for a IP network is definitely more complex and involves more protocolsand more management for working (but is definitely more flexible). SONET/SDH is a simpler networkand use simpler protocol based in embedded protocol stack over the physical layer.

    21

  • 7/31/2019 VoIP and Telephony

    22/24

    5 Skype

    Flexibility - The IP/UDP stack (where VoIP is up on) has a better use of the capacity of the network.The SONET/SDH is a continuous synchronized flow caring data, if there are not data that capacitywill be wasted because will be sent idle frame. Also VoIP can provide different media services, andit can evolve following future implementation for easy update.

    Security - Both networks provide security. The public nature of the IP network is less secure then aSONET/SDH network, but both provide high standard of securty.

    Protocols architecture - The protocols architecture of a VoIP network is much more complex envolvesmany protocols and mechanism working all togethere. Basically in a VoIP call are envolved threeprotocols which overlay on an IP/UDP or an IP/TCP stack. VoIP must also fight against the ansenceof a QoS system because of the unpredictible delay of the IP network.

    Cost - From a user point of view is really acceptable calls from two continent for free, so VoIP hasdefiantly some advantages. But classic telephony has became cheap and the operator are usuallyproviding all-inclusive subscription with free-call in the operator country. For example in Italy with20C per month you can have a broadband connection typically 4Mbit and free call in all Italy. Soalso telephony is not an expensive service. VoIP is interesting solution for inter-communication in

    big companies and office.

    Reliability - Telephony network is an extreme reliabile network. Basically is almost never down. VoIPis highly dipendent from the IP and the access network (who you are attached to). In 30 calls (from3 minutes up to over 1 hour call) with a VoIP service I have personally experienced 4 extremly badquality calls 7 accettable and the rest were good (in other terms more then 10% of the call didnot have success). In my entire life i have never experienced a bad call on the classic Telephonenetwork.

    5 Skype

    Skype is a VoIP service and does not use the protocol stack described in this document. Skype protocol

    is closed source and there are no docomentation about how it works. Skype is a relativy new software andprotocol - August 2003, the first public beta version was released [5]- and reached some unbelievablemilestone like 25000000 online users.

    22

  • 7/31/2019 VoIP and Telephony

    23/24

    5 Skype

    Figure 18: Skype user online and new user perday[4]

    The key feature of Skype is the way use the network, because is based on a P2P (peer-to-peer)paradigm. Basically the Skype network system is based on the same techonlogy [5]widely deployed and

    popularized by file-sharing applications such as Napster and KaZaA. The definition of P2P is

    A true P2P system, in our opinion, is one where all nodes in a network join togetherdynamically to participate in traffic routing-, processing- and bandwidth intensive tasks thatwould otherwise be handled by central servers.

    Basically in a P2P system a user can be simultanously client and server. Providing services (like upload-ing) to a set of user (be a server) and requiring services from a set(be a client) user(like downloading).This kind of networks are called decentralized P2P networks and have several advantages over classicclient-server networks like the VoIP stack discussed on this document. Tradicional client-server networkwont scale linerally because increasing the clients meand that ther services will be shared from manyuser, more user means each user get less. A P2P networks scale indefinitely without increasing searchtime and without the need for costly centralized resources (server) . This means that the network re-

    sources increase by increasing the number of the users of the P2P network. Basically more user does notmean more server (which are expensive). Another Skype P2P network insipiring feautre is the really fastsearching and all-aware paradigm.

    The Global Index technology is a multi-tiered network where supernodes communicatein such a way that every node in the network has full knowledge of all available users andresources with minimal latency.

    Decentralizing information means find them in many small boxes, this for small P2P network can be doneeasy enough but for fast growing and extremly wide network can takes time. Latency is not good in a realtime application like VoIP, so Skype engeneering has developed a Global Index technology able to makesupernodes comunicate in such away that all nodes are aware of the available users and resources withextrem low latency. P2P network performances are routing dependence. Skype use a techinque that keep

    open a set of path and dynamically use the most performance ones. Finally Skype like any other VoIPencrypts all calls and instant messages end-to-end because it use a TCP/IP and UDP/IP stack.

    23

  • 7/31/2019 VoIP and Telephony

    24/24

    References References

    References

    [1] http://en.wikipedia.org/wiki/G.711

    [2] http://whitepapers.hackerjournals.com/wp-content/uploads/2010/08/Multimedia-over-IP.pdf

    [3] http://en.wikipedia.org/wiki/Skype

    [4] http://skypejournal.com/blog/2010/11/25/skypes-25mm-dialtone-raises-questions-for-investors/

    [5] http://www.skype.com/intl/en-us/support/user-guides/p2pexplained/

    [6] RFC 3550

    [7] RFC 3261

    [8] RFC 4317

    [9] RFC 4566

    [10] Multimedia-over-IP by Dennis Baron McGraw-Hill

    [11] Connection-oriented Networks (first and second Chapter)

    24


Recommended