9. Standards9.1. Signaling ProtocolsTo provide multimedia services including video telephony and VoIP in wireless networks, many different protocols are used. ITU developed a series of signaling protocol standards for visual communication for various networks including H.320 for ISDN, H.321 for B-ISDN, H.322 for LAN with a quarantined QoS, H.323 for packet networks and H.324 for PSTN. H.323 is the first signaling protocol for video conferencing over packet switched networks like Internet and LAN. H.323 version 1 was ratified in 1996. In 1996 members in IETF initiated a competing protocol against H.323 named SIP. In this section, we will introduce H.323, H.324, SIP and some companion standards for them.
9.1.1. H.323 and H.245H.323 is an umbrella recommendation developed by ITU that defines the protocols to provide audio-visual communication sessions over a packet network. The original title of H.323 was"Visual Telephone Systems and Equipment For Local Area Networks Which Provide A Non-Guaranteed Quality Of Service". Later ITU changed the title to "Packet-Based Multimedia Communications Systems". H.323 is based on the ISDN Q.931 signaling protocol. H.323 architecture includes terminals, gatekeeper, gateway, and multipoint control unit (MCU). H.323 terminal is an endpoint (e.g., user device or video conferencing system) on a LAN. Gatekeeper manages a zone, which is collection of H.323 devices. Gatekeeper provides address translation, admissions control, bandwidth control and handles optional features such as supplementary services. H.323 Gatekeeper is an optional component. Endpoints can make calls peer-to-peer calls or routed by Gatekeeper. In practice, Gatekeeper is used in most managed services. Gateway provides interoperability between different networks such as converting signals between packet networks and PSTN. MCUs are taking care of establishing multipoint conferences. The following is the list of companion recommendations specified in H.323. - H.225.0 protocol is used to describe call signaling, the audio and video, the stream packetization, media stream synchronization and control message formats. - H.245 protocol describes the messages and procedures used for opening and closing logical channels for audio, video and data, capability exchange, control and indications. - H.450 describes the Supplementary Services - H.235 describes security in H.323 - H.239 describes dual stream use in videoconferencing, usually one for live video, the other for presentation - H.460.17-19 describes firewall traversal in H.323 - H.261 H.263 and H.264 describe video encoding - G.7xx series for audio encoding.
H.245 is a control channel protocol. H.245 is capable of conveying information needed for multimedia communication, such as encryption, flow control, jitter management, preference requests, as well as the opening and closing of logical channels used to carry media streams. The H.245 control channel is the logical channel 0 and is permanently open. After a connection has been established by the call signaling procedure, the H.245 call control protocol is used to resolve the media type and establish the media flow, before the call can be established. It also manages the call after it has been established. The following is a list of key H.245 processes. - Master-slave determination: It is used to determine the master of the call, which is useful for avoiding conflicts during call control operations. - Capability exchange: Each endpoint notifies information about its capability of receiving and transmitting media. The receive capability may be different from the transmit capability in devices meaning a symmetric media call may be established. - Open and Close Logical channel: This procedure is for opening and closing logical channels, which are multiplexed paths between the endpoints used for data transfer. A video conferencing session my provide slide presentation and a data channel may be open by H.245. - Request mode command: The receiving endpoint can request for a change in mode of the transmitted information at any point in the conference. - Control flow command: The receiving endpoint set an upper limit for the transmitter bit rate on any logical channel. - Communication mode messages: Theses messages select a common mode of operation in a multipoint conference. - Video fast update: The receiving endpoint can request updates for video frames in case of data loss. - End session: This command closes all logical channels, terminate the call and inform the gatekeeper about the termination of the call. H. 323 uses ANS.1 to specify protocol syntax. A call between two endpoints may use two different types of media. The Packetizer web site provides comprehensive information about H.323 and its companion standards. 9.1.2. 3G-324MITU ratified a signaling standard for video over Circuit Switched (CS) networked named H.324 in 1996. It is defined for visual communication over analog telephone lines using modems. H.324 is evolved into H.324M indicating that H.324 for mobile devices in wireless networks. It was adopted by 3GPP as a standard with some modifications in codecs and error handling requirements and created 3G-324M (3GPP TS. 26.111, TS 26.110, and TR 26.911). The 3G-324M protocol is initialized after a circuit-switched data (CSD) channel is opened between two mobile phones. The time to establish media channels (audio and video) in the original specification was too long to be useful. New procedures in Annex K are defined for shortening the time to establish media channels. Even though 3GPP and 3GPP2 are considering All-IP based networks, IP is not quite ready to support real-time multimedia over wireless yet. Most of videophones used in Japan and Korea are compliant to the 3G-324M standard. 3GPP conducted a study on "Enhancements to Videotelephony" and published Technical Report, TR 22.903. It was proposed to translate the findings in the TR into technical specifications (TS) to enhance the existing circuit switched (CS) video-telephony service that uses BS30 (Specification of supplementary services). This study has identified the enhancements needed for videotelephony services including: - Call setup time improvements - Additional support of supplementary services - Charging improvements - Improved in call modification - More efficient emergency videotelephony call support - User notification The study concluded that these enhancements could be provided by either:
Availability of Circuit Switched Data (CSD) service is very limited in wireless carriers. The video telephony service may not take off until a new or improved protocol based SIP is developed and a shared data channel is used for transporting the video and audio. 9.1.3. SIP (Session Initiation Protocol)The Session Initiation Protocol (SIP) specified in IETF RFC 3261 is an application-layer signaling protocol for Internet telephone calls, multimedia distribution, and multimedia conferences. It is widely used as signaling protocol for Voice over IP (VoIP) and competing with ITU Recommendation H.323. SIP was accepted as a 3GPP signaling protocol and permanent element of the IMS architecture. SIP is based on an HTTP-like request/response transaction model and using peer-to-peer technology. Two SIP endpoints (User Agents, UA) can communicate each other directly similar to H.323 endpoints. However, a large scale deployment uses proxy and registrar. The SIP proxy server provides similar functionality to a gatekeeper in an H.323 network. It is an intermediary entity that acts as both a server and a UA for the purpose of making requests on behalf of other UAs. A proxy interprets, and, if necessary, rewrites specific parts of a request message before forwarding it. SIP proxy supports both stateless and stateful connections. A stateless proxy establishes a call for UAs and then gets out of the way. A stateful proxy stores all signaling events for the duration of the call. A registrar is a server that accepts REGISTER requests and places the information it receives in those requests into the location service for the domain it handles. Figure 9.1 illustrates a SIP call between two agents without the proxy. SIP may use SDP for selecting a media type to communicate between user agents.
9.1.4. SDP(Session Description Protocol)IETF initially defined Session Description Protocol (SDP) for multicast. The protocol evolved into more general protocol for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session initiation for media communications. RFC 4566 specifies the latest version of SDP. An SDP session description includes the following information: - Media type (video, audio, etc.) - The transport protocol (RTP/UDP/IP, H.320, etc.) - Media format (H.261 video, MPEG video, etc.) SDP is a text-based protocol often used conjunction with RTP and SIP. The SIP messages used to create sessions carrying session descriptions that allow participants to agree on a set of compatible media types. These session descriptions are commonly formatted using SDP. When used with SIP, the offer/answer model (RFC 3264) provides a framework for negotiation using SDP. The following illustrates SDP indicating capabilities: v=0
9.2. Transport ProtocolsThe base of today's wireless core network is founded on a circuit switched SS7 architecture. With the advent of IP technologies and the tremendous growth in data traffic, the wireless industry is considering the evolution of their core networks toward All-IP network. The All-IP concept was initially introduced within 3GPP in Rel-4 with the standardization of the MSC (Mobile Switching Center) Server. 3GPP initiated a new feasibility study on an IP-based core network in AIPN (All IP Network) in 2004 and completed it in 2006. 3GPP TS 23.228 defines service description for the IP Multimedia Core Network Subsystem (IMS), which includes the elements necessary to support IP Multimedia (IM) services. In 3GPP2, a new work topic on the All-IP network is underway as Packet Data Network Evolution (PDANE) in 2006. It is expected that IP will serve an important role in video over wireless networks.
In this section we will introduce protocols developed by IETF for transporting video in the IP network. RFC 791, Internet Protocol specifies Internet Protocol and relationships with other protocols and the structure of IP headers. Data from an upper layer protocol is encapsulated inside one or more headers before carried by IP. Figure 9.2 illustrates payload (e.g., video data) encapsulated by TCP/IP, UDP/IP, or RTP/URP/IP headers. Figure 9.3 illustrates the IP header in IP version 4.
Figure 9.3: IP Header 9.2.1. TCPThe Transmission Control Protocol (TCP) is a connection-oriented protocol that is one of the core protocols of the Internet protocol suite, often referred to as TCP/IP. TCP/IP is developed for providing error-free ordered data transfer by re-transmitting lost packets. TCP/IP is developed to provide reliable connections for time insensitive packets and the reliability is achieved by retransmission. It is not adequate for transporting real-time traffics such as audio and video streams since re-transmitting old data to a receiver is useless once the receiver starts playing a media. However, TCP/IP is widely used for transporting one-way video and audio streams since most firewalls allow TCP/IP packets. Retransmission may be resolved in one-way media stream by buffering and delaying time to play media streams. RFC 793 specifies TCP, which has changed much since it was published in 1981. Figure 9.4 shows the TCP header.
Figure 9.4: TCP header 9.2.2. UDPUnlike TCP, UDP (User Datagram Protocol) is a connectionless protocol. UDP is commonly used for the Domain Name System (DNS) search, streaming and interactive media applications, Voice over IP (VoIP), and online games. RFC 768, User Datagram Protocol, specifies UDP protocol. Figure 9.5 illustrates the UDP header. UDP is almost a null protocol. The only services it provides over IP are checksum and multiplexing by port number. A sender may use UDP to transmit media packets to receivers without knowing if they receive them. It is receiver responsibility to deal with packet loss.
Figure 9.5: UDP header In IPv4, the UDP checksum covers either the entire packet or nothing at all. In IPv6, the UDP checksum is mandatory and must not be disabled. In error prone networks, a different link behavior that permits partially damaged IP packets to be forwarded would be beneficial. The error-detection mechanism of the transport layer must be able to protect vital information such as headers, but also to optionally ignore errors best dealt with by the application. IETF specified a variation of UDP in RFC 3828, The Lightweight User Datagram Protocol (UDP-Lite). UDP-Lite provides a checksum with an optional partial coverage. When using this option, a packet is divided into a sensitive part (covered by the checksum) and an insensitive part (not covered by the checksum). Errors in the insensitive part will not cause the packet to be discarded by the transport layer at the receiving end host. UDP is a special case of UDP-Lite, i.e., an UDP-Lite with the checksum covering the entire packet.
Figure 9.6: UDP-Lite Header In regular UDP, the Length field contains the length in bytes of the UDP header and the encapsulated data. The minimum value for this field is 8. In UDP-Lite, the Checksum Coverage indicates the number of bytes, counting from the first byte of the UDP-Lite header covered by the checksum. The UDP-Lite header must always be covered by the checksum. Compared to UDP, the UDP-Lite partial checksum provides extra flexibility for applications that want to define the payload as partially insensitive to bit errors. Figure 9.6 illustrates the UDP-Lite header. 9.2.3. RTP/RTCPThe Real-time Transport Protocol (RTP) is developed for providing end-to-end delivery services for real-time data such as interactive audio and video. The RTP is specified in RFC 3550, A Transport Protocol for Real-Time Applications. The header includes payload type identification (e.g., H.264, MPEG-3), sequence numbering, timestamping and delivery monitoring. RTP may be used on top of UDP to make use of UDP multiplexing and checksum feature. RTP may be used with other transport protocols. RTP does not provide any mechanism to ensure timely delivery or quality-of-service guarantees, which is essential for real time communications. It relies on lower-layers to handle QoS. The real-time data transport is augmented by the Realtime Transport Control Protocol (RTCP), which allows monitoring of the data delivery and provides control and identification functionality. Figure 9.7 shows the structure of RTP header.
Figure 9.7: RTP header A number of RFCs and Internet Drafts specify RTP payload formats for video and audio applications. Some examples are: 1) RFC 2032: H.261 Video Streams 2) RFC 2190: H.263 Video Stream 3) RFC 2429: 1998 Version of ITU-T Rec. H.263 Video (H.263+) 4) RFC 2435: JPEG-compressed Video 5) RFC 2250: MPEG1/MPEG2 Video 6) RFC 3047: ITU-T Recommendation G.722.1 7) RFC 3016: MPEG-4 Audio/Visual Streams 8) RFC 4184: AC-3 Audio RFC 3016 specifies the payload format of MPEG-4 audio and video streams. The RTP payload formats described in RFC 3016 specify how MPEG-4 audio and video streams are to be fragmented and mapped directly onto RTP packets. These RTP payload formats enable transport of MPEG-4 audio and video streams without using the synchronization and stream management functionality of MPEG-4 Systems specification. 9.2.4. RTSPReal Time Steaming Protocol (RTSP) specified in RFC 2326 is an application-level protocol for control over the delivery of data with real-time properties such as audio and video. This protocol is intended to control multiple data delivery sessions, provide a means for choosing delivery channels such as UDP and TCP as shown in Figure 9.8 and provide a means for choosing delivery mechanisms based upon RTP.
The RTSP is intentionally similar in syntax and operation to HTTP/1.1 but differs in a number of important aspects from HTTP. RTSP introduces a number of new methods and has a different protocol identifier. An RTSP server needs to maintain state by default in almost all cases unlike HTTP. Both an RTSP server and client can issue requests. Data is carried out-of-band by a different protocol. The Request-URI always contains the absolute URI. RTSP is different from RTP/RTCP. RTSP is a control protocol for initiating and directing delivery of streaming multimedia from media servers. It may be viewed as a remote control protocol for an Internet VCR. RTSP does not deliver data, though the RTSP connection may be used to tunnel RTP traffic. RTP is a transport protocol for the delivery of real-time data, including streaming audio and video. RTCP is a part of RTP and helps with lip synchronization and QoS management, among others. RFC 2326 also specifies the use of RTP with RTSP. |


