Understanding Packet Voice Protocols

Understanding Packet Voice Protocols

Over the past decade, the telecommunications industry has witnessed rapid changes in the way people and organizations communicate. Many of these changes spring from the explosive growth of the Internet and from applications based on the Internet Protocol (IP). The Internet has become a ubiquitous means of communication, and the total amount of packet-based network traffic has quickly surpassed traditional voice (circuit-switched) network traffic (DataQuest, 1998).

In the wake of these technology advancements, it has become clear to telecommunications carriers, companies, and vendors that voice traffic and services will be one of the next major applications to take full advantage of IP. This expectation is based on the impact of a new set of technologies generally referred to as voice over IP (VoIP) or IP telephony.

VoIP supplies many unique capabilities to the carriers and customers who depend on IP or other packet-based networks. The most important benefits include the following:

In 1995, the first commercial VoIP products began to hit the market. These products were targeted at companies looking to reduce telecommunications expenses by moving voice traffic to packet networks. Early adopters of VoIP networks built toll-bypass solutions to take advantage of favorable regulatory treatment of IP traffic. Without any established standards, most early implementations were based on proprietary technology.

As these packet telephony networks grew and interconnection dependencies emerged, it became clear that the industry needed standard VoIP protocols. Several groups took up the challenge, resulting in independent standards, each with its own unique characteristics. In particular, network equipment suppliers and their customers were left to sort out the similarities and differences between four different signaling and call-control protocols for VoIP:

In the process of implementing workable VoIP solutions, network engineers had to determine how each of these protocols worked and which ones were best for particular networks and applications.

This paper provides some guidance and understanding of these VoIP protocols, and tries to clarify some of the confusion in the marketplace.

The Great Voice Myth

"The Great Voice Myth" states that there is only one way to build voice networks, and that there should be only one voice protocol for each function in a packet voice network. Although this vision of network nirvana has been discussed in many academic circles, the reality is that multiple VoIP protocols and architectures have already been deployed, they will exist for the foreseeable future, and many networks will continue to be built using multiple VoIP protocols.

Just as today's data networks were built over time using multiple protocols and applications, the VoIP networks of today and tomorrow will be constructed using the protocols and applications that best fit the associated technology and business requirements.

The question that companies must ask is not "Which protocol is best?" but "Which services do we want to deploy and which VoIP protocols best support those services?" The answer to the first question may reflect the bias of a particular vendor or standards body. The answer to the second question depends entirely on the unique requirements of each network implementation. Because no two businesses or networks are exactly the same, each company will answer the second question in its own way.

Making Sense of the VoIP Standards

VoIP comprises many standards and protocols. Basic terminology must be understood in order to understand the applications and usage of VoIP. The following definitions serve as a useful starting point (the protocols are listed in alphabetical order):

Understanding Centralized and Distributed Architectures

In the past, all voice networks were built using a centralized architecture in which dumb endpoints (telephones) were controlled by centralized switches. Although this model worked well for basic telephony services, it mandated a trade-off between simplified management and endpoint and service innovation.

One of the benefits of VoIP technology is that it allows networks to be built using either a centralized or a distributed architecture. This flexibility allows companies to build networks characterized by both simplified management and endpoint innovation, depending on the protocol used.

In general, centralized architectures are associated with MGCP and H.248/Megaco protocols. These protocols were designed for a centralized device—called a media gateway controller or call agent—that handles switching logic and call control. The centralized device talks to media gateways, which route and transmit the audio/media portion of the calls (the actual voice information).

In centralized architectures, the network intelligence is centralized and endpoints are relatively dumb (with limited or no native features). Although most centralized VoIP architectures use MGCP or H.248/Megaco protocols, it is also possible to build SIP or H.323 networks in a centralized fashion using back-to-back user agents (B2BUAs) or gatekeeper routed call signaling (GKRCS), respectively.

Advocates of centralized VoIP architectures favor this model because it centralizes management, provisioning, and call control. It simplifies call flows for replicating legacy voice features. And it is easy for legacy voice engineers to understand. Critics of centralized architectures claim that it stifles innovation of endpoint features and that it will become a hindrance when building VoIP services that move beyond legacy voice features.

Distributed architectures are associated with H.323 and SIP protocols. These protocols allow network intelligence to be distributed between endpoints and call-control devices. Intelligence in this instance refers to call state, calling features, call routing, provisioning, billing, or any other aspect of call handling. The endpoints can be VoIP gateways, IP phones, media servers, or any device that can initiate and terminate a VoIP call. The call-control devices are called gatekeepers in an H.323 network, and proxy or redirect servers in a SIP network.

Advocates of distributed architectures favor this model because of its flexibility. It allows VoIP applications to be treated like any other distributed IP application, and it allows the flexibility to add intelligence to either endpoints or call-control devices, depending on the business and technology requirements of the network. Distributed architectures are usually well understood by engineers who run IP data networks. Critics of distributed architectures commonly point to the existing PSTN infrastructure as the only reference model that should be used when trying to replicate legacy voice services. They also note that distributed networks tend to be more complex.

H.323

H.323 was originally created to provide a mechanism for transporting multimedia applications over local-area networks. Although H.323 is still used by numerous vendors for videoconferencing applications, it has rapidly evolved to address the growing needs of VoIP networks. Because of its early availability and these advancements, H.323 is currently the most widely used VoIP signaling and call-control protocol, with international and domestic carriers relying on it to handle billions of minutes of use each year.

H.323 is considered an "umbrella protocol" because it defines all aspects of call transmission, from call establishment to capabilities exchange to network resource availability. H.323 defines Registration, Admission, and Status Protocol (RAS) protocols for call routing, H.225 protocols for call setup, and H.245 protocols for capabilities exchange.


Figure 1
H.323 Networks


H.323 is based on the Integrated Services Digital Network (ISDN) Q.931 protocol, which allows it to easily interoperate with legacy voice networks such as the PSTN or Signaling System 7 (SS7).

As a protocol used in a distributed architecture, H.323 allows companies to build large-scale networks that are scalable, resilient, and redundant. It provides mechanisms for interconnecting with other VoIP networks, and supports network intelligence on either the endpoints or the gatekeepers.

MGCP/H.248/Megaco

MGCP and H.248/Megaco were designed to provide an architecture where call control and services could be centrally added to a VoIP network. In that sense, an architecture using these protocols closely resembles the existing PSTN architecture and services.

MGCP and H.248/Megaco define most aspects of signaling using a model called packages. These packages define commonly used functionality, such as PSTN signaling, line-side device connectivity, and features such as transfer and hold. In addition, Session Definition Protocol (SDP) is used to convey capabilities exchange.


Figure 2
MGCP/H.248/Megaco Networks


In a centralized architecture, MGCP and H.248/Megaco allow companies to build large-scale networks that are scalable, resilient, and redundant. It provides mechanisms for interconnecting with other VoIP networks and for adding intelligence and features to the call agent.

SIP

SIP was designed as a multimedia protocol that could take advantage of the architecture and messages found in popular Internet applications. By using a distributed architecture—with URLs for naming and text-based messaging—SIP attempts to take advantage of the Internet model for building VoIP networks and applications. In addition to VoIP, SIP is used for videoconferencing and instant messaging.

As a protocol, SIP only defines how sessions are to be set up and torn down. It utilizes other IETF protocols to define other aspects of VoIP and multimedia sessions, such as SDP for capabilities exchange, URLs for addressing, Domain Name System (DNS) for service location, and Telephony Routing over IP (TRIP) for call routing.


Figure 3
SIP Networks


Although the IETF has made great progress defining extensions that allow SIP to work with legacy voice networks, the primary motivation behind the protocol is to create an environment that supports next-generation communication models that utilize the Internet and Internet applications.

As a protocol used in a distributed architecture, SIP allows companies to build large-scale networks that are scalable, resilient, and redundant. It provides mechanisms for interconnecting with other VoIP networks and for adding intelligence and new features on either the endpoints or the SIP proxy or redirect servers.

Interconnecting VoIP Protocols

VoIP networks continue to be deployed at a rapid pace, and VoIP vendors and service providers continue to add new functionality. Because vendor support for each protocol differs and companies have varying business requirements, it is very likely that VoIP networks will continue to be made up of multiple protocols.

Having various protocols gives customers the flexibility they need to connect services from multiple carriers. Using standards, even multiple standards, still simplifies deployment of multivendor endpoints and increases options for network management and provisioning.

As companies expand their networks, they are faced with choices about how to interconnect segments using differing VoIP protocols. These choices often fall into one of three categories:

The downside to this approach is that there is no standard for protocol translation, so not all VoIP protocol translators are exactly the same. Although the IETF has attempted to define a model for translating H.323 to SIP, it involves more than just building a protocol-translation box.

As shown in Table 1, although protocols are somewhat similar, they do have some differences. Vendors of protocol translators need in-depth knowledge of all the protocols being used in the VoIP network, and they must be aware of how various VoIP components utilize different aspects of the protocol.

For example, H.323 and SIP can send dual tone multifrequency (DTMF) digits in either the signaling path or the media path (via RTP). But H.323 mandates only that the H.245 signaling path be used, and SIP does not specify how DTMF should be carried. This means that SIP devices could be sending DTMF in the media path (RFC 2833), and H.323 devices could be sending DTMF in the signaling path (H.245). If the VoIP protocol translator cannot properly recognize both the signaling path and the media path, then it might not function properly.


Table 1:
Details of VoIP Protocols
H.323 SIP MGCP/H.248/Megaco
Standards body

ITU

IETF

MGCP/Megaco—IETF; H.248—ITU

Architecture

Distributed

Distributed

Centralized

Current version

H.323v4

RFC2543-bis07

MGCP 1.0, Megaco, H.248

Call control

Gatekeeper

Proxy/Redirect Server

Call agent/media
gateway controller

Endpoints

Gateway, terminal

User agent

Media gateway

Signaling transport

Transmission Control Protocol (TCP) or User Datagram Protocol (UDP)

TCP or UDP

MGCP—UDP;
Megaco/H.248—both

Multimedia capable

Yes

Yes

Yes

DTMF-relay transport

H.245 (signaling) or RFC 2833 (media)

RFC 2833 (media) or
INFO (signaling)

Signaling or RFC 2833 (media)

Fax-relay transport

T.38

T.38

T.38

Supplemental services

Provided by endpoints or call control

Provided by endpoints or
call control

Provided by call agent

Conclusion—Packet Voice and VoIP Is About Services, Not Protocols

Just as companies choose various protocols for their data networks, they will choose various protocols for their VoIP requirements, depending on the business and technical requirements at hand. Although the variety in VoIP protocols has caused some confusion in the marketplace, it is precisely this protocol flexibility that makes VoIP-based voice systems so much more useful than legacy voice systems.

Companies should choose vendors based on three very important requirements:

By working with vendors that can provide this VoIP flexibility, companies can focus on building scalable and reliable networks that support the requirements of next-generation networks.


BACK HOME