TelecomReferences.Com Logo
Home > VoW > Video Compression

Telecom Standards                       VoW

 1. Introduction
 2. 3.5G Cellular Networks
   2.1. HSDPA
   2.2. EV-DO
 3. Apps and Services
 4. Technical Issues
 5. VoW Devices
   5.1. HSDPA Devices
   5.2. EV-DO Devices
 6. Phone Components
 7. Mobile TV
 8. Video Compression
   8.1. ITU Standards
   8.2. MPEG
 9. Standards
   9.1. Signaling
   9.2. Transport
 10. Resources
 11. Acronyms

     8. Video Compression

Figure 8.1 shows timeline of progression of video compression technologies.  First generation video compression standard for video conferencing systems, named H.261, was started in late 1980 and ratified by ITU in 1990.  In 1988 a working group of ISO/IEC, MPEG, was founded to develop video and audio encoding standards and MPEG-1 was published in 1993.  A year later MPEG-2/H.262 was developed in a joint partnership between the ITU and ISO/IEC JTC 1 organizations, and it became published as a standard of both organizations.  ITU SG 15 initiated new video coding methods named H.263 and released three revisions.  Meanwhile MPEG defined a newer video compression method in MPEG-2 Part 2.  ITU SG 15 initiated a video compression method for low bandwidth networks called H.26L and jointly developed H.264/MPEG-4 Part 10 with MPEG.  In 1986, a joint ISO/CCITT committee named Joint Photographic Experts Group (JPEG) was created to develop a compression standard for still images.  The group issued JPEG standard for still image compression in 1992, which was approved in 1994 as an ISO standard.   JPEG developed a new image coding system, JPEG 2000, based on wavelet technology unlike the previous versions based on DCT.  JPEG 2000 is in many parts.  Part 3 specifies Motion JPEG, in which each video frame or interlaced field of a digital video sequence is separately compressed as a JPEG image.  

8.1. ITU Video Coding Standards

8.1.1 H.261

H.261 Recommendation (also called px64) was the first practical digital video coding standard specified by ITU Study Group 15. It was originally designed for transmission over ISDN lines on which data rates are multiples of 64 Kbps.  The standard supports two video formats, CIF (Common Intermediate Format) and QCIF (Quarter CIF).  QCIF is mandatory and CIF is optional.   Luma resolutions of CIF and QCIF frames are 352 pixels x 288 lines and 176x144 respectively and chroma resolutions with 4:2:0 sampling are 176x144 and 88x72, respectively.  The coder operates at 29.97 frames per second.  The CIF resolution and the frame rate were a compromise between NTSC and PAL formats.  NTSC has 525 vertical lines (480 vertical resolution) and operates at 59.94 fields per second (29.97 frames per second) and PAL has 625 lines (576 vertical resolution) and refreshes 50 fields per second (25 frames per second).  The number of lines in CIF is equivalent to the vertical resolution of one field in PAL and the frame rate is identical to the NTSC frame rate.  

 

 

H.261 employs a hybrid of inter-frame prediction, transform coding based on Discrete Cosine Transform (DCT), and entropy coding. In H.261, each frame is divided into 16 pixels x 16 pixels macro-blocks (MBs) and inter-frame prediction is performed based on block matching motion estimation and compensation on each macro-block. The inter-frame prediction removes temporal redundancy.  The transform coding based on DCT removes the spatial redundancy.  To remove any further redundancy in the compressed bit stream, variable length coding is used.  The coding algorithm is designed to handle data rate from 40 Kbps and 2 Mbps.

Figure 8.2 illustrates types of video frames in inter-frame prediction in H.261.  Video coding begins with an intra-frame called I frame, which is similar to a coded still image like a JPEG image. A P frame is produced from the first I frame by predicting the difference in the next frame.  Second predictive frame is produced from the first predictive frame and so on. Since predictive frames depends on the previous frames, if an error occurs in the communications media, the error propagates until next intra-frame is produced.  H.261 recommendation specifies that a Macro Block (MB) is intra-frame encoded at least every 132 times it is transmitted. However, the intra-frame refreshment rate can be raised by the receiver in order to speed the recovery when the measured loss rate is significant.

 

 

Figure 8.3 illustrates inter-frame prediction.  The motion compensated prediction assumes that a block in the current frame can be predicted as a translation of a block in the previous frame. The search area is limited within the search window.  Block matching takes place only on the luminance component of frames. The color components of the blocks are included when coding the frame but they are not used when for motion estimation.

 

 

8.1.2. H.263

ITU developed H.263, "Video coding for low bit rate communication," as an evolutionary improvement based on experience from H.261, MPEG-1 and MPEG-2 standards.  Its first version was completed in 1995 and two enhanced versions H.263 V2 and H.263 V3 were released.  The coding algorithm of H.263 is similar to H.261.  It has changes to improve performance and error recovery.  The differences between the H.261 and H.263 coding algorithms are: 

* In addition to QCIF and CIF in H.261, H.263 supports three additional formats, SQCIF (128 x 96), 4CIF (704 x 576), and 16CIF (1408 x 1152).   

* Half pixel precision is introduced for motion compensation. (H.261 used full pixel motion compensation with a loop filter).

* H.263 can be configured for a lower data rate or better error recovery.

* 19 new modes of options were added to improve performance throughout revisions.  The optional mode is signaled by external means (e.g., Recommendation H.245).  Some of key options are:

* Unrestricted Motion Vectors mode (UMV-mode),

* Syntax-based arithmetic coding (SAC-mode),

* Advance prediction mode (AP-mode),

* PB-frames mode (PB-mode): forward and backward frame prediction similar to MPEG called P-B frames.

8.1.3. H.264

ITU initiated a new project called H.26L in 1998 and created a first draft in 1999 to provide good video quality at bit rates that are substantially lower than what previous standards such as H.261, H.263, and MPEG-2.   In 2001, MPEG finished developing MPEG-4 Part 2 video coding standard, ITU and MPEG agreed to work together to jointly develop the next generation video coding standard and used H.26L as the starting point.  Unlike the initial goal of H.26L, the joint group developed the standard to be applied to a very wide variety of applications, networks and systems including both low and high bit rates and low and high resolution video for broadcast, DVD storage, RTP/IP packet networks, and multimedia telephony systems.  ITU names the newly developed standard H.264 and MPEG did it MPEG-4 Part 10 Advanced Video Coding (AVC).  Due to the change in the course of developing H.264, various names were created including H.26L, H.264, AVC, MPEG-4 Part 10, MPEG-4 AVC, ISO/IEC 14496-10, etc.  We will use "H.264/AVC" to reference and describe ITU H.264 and MPEG-4 Part 10 together in this section. 

Figure 8.4 illustrates the block diagram of H.264/AVC video encoder.  The underlying approach of H.264/AVC is similar to previous standards such as H.263 and MPEG-2.  It consists of four main stages; 1) block-based coding; 2) remove the spatial redundancies by spatial prediction, transform, quantization and entropy coding; 3) remove the temporal redundancies by motion estimation and compensation; 4) and remove any remaining spatial redundancies entropy coding.  

 

 

H.264/AVC uses a Discrete Cosine Transform (DCT) compression standard, similar to previous standards such as H.261, H.263, MPEG-1 and MPEG-2.  H.264/AVC includes a number of enhancements to previous coding standards. 

* Blocks of different sizes and shapes on the motion estimation and compensation, quarter pixel motion estimation,

* Spatial prediction from the edges of neighboring blocks for "intra" coding,

* Multiple frame selection and bi-directional mode selection,

* Loss-less macro block coding features,  

* New transform design features including: an exact-match integer 4×4 and 8x8 spatial block transforms and adaptive encoder selection between the 4×4 and 8×8 transform block sizes and a secondary Hadamard transform performed on "DC" coefficients,   

* An in-loop deblocking filter to prevent the blocking artifacts in DCT-based image compression techniques,

* An entropy coding design including:

   * Context-adaptive binary arithmetic coding(CABAC),

    * Context-adaptive variable-length coding (CAVLC),

    * A common simple and highly-structured variable length coding (VLC) technique for many of the syntax elements not coded by CABAC or CAVLC, referred to as Exponential-Golomb coding.

When H.264/AVC was completed in May 2003, the standard was focused on entertainment quality video based on 8-bits per sample and 4:2:0 sampling.  To address the needs for high quality video applications, the joint group added new extensions names "Fidelity Range Extensions (FRExt)"to Main profile.  The extensions includes:

   * The High Profile(HP): 8-bit video with 4:2:0 sampling addressing high-end consumer applications.

    *  The High 10 Profile (Hi10P): Up to 10-bit video with 4:2:0 sampling.

   * The High 4:2:2 Profile (H422P): Up to 10-bit video with 4:2:2 sampling.

   * The High 4:4:4 Profile (H444P): Up to 12-bit video with 4:4:4 sampling and additionally supporting loss-less region coding. 

The High 4:4:4 Profile was removed later.

8.2. MPEG

An ISO/IEC joint working group, ISO/IEC JTC1/SC29 WG11, known as Moving Picture Experts Group (MPEG) is in charge of developing MPEG standards for coded representation of digital audio and video.  The group was established as a subcommittee of the International Standards Organization/International Electrotechnical Commission (ISO/IEC) in 1998 and published first version in 1993. MPEG standards specify video and audio compression and decompression methods in MEPG-1, MPEG-2 and MPEG-4.  MPEG-4 uses the most advanced technologies. 

8.2.1. MPEG-1

MPEG-1 is first version of MPEG standards for the compression of moving picture and audio.  It was a popular standard for video on the Internet (.mpg files) and DVD players.  Layer 3 of MPEG-1 audio compression is known as MP-3.  MPEG-1 specification is in five parts:

Part 1 specifies a method of combining one or more data streams from the video and audio parts of the MPEG-1 standard into a single stream with timing information.  Part 2 specifies a video coding method for compressing video sequences in various formats into different bit rates.     MPEG-1 video uses a hybrid coding, a combination of block-based motion-compensated prediction and scalar quantization of the residual by Discrete Cosine Transform (DCT).  A video format with 352 pixels by 240 lines at 1.5 Mbits per second is commonly used in storage media such as DVD.   At present MPEG-1 is the most compatible format in the MPEG family.  Most computers and VCD/DVD players can play MPEG-1 videos. 

The main difference between MPEG-1 and H.261 is inter-frame prediction.  In H.261, there are two types of frames, intra frame and predictive frame.  In MPEG-2, forward and backward frame prediction is specified as shown in Figure 8.5.  I and P frames are identical to those in H.261 coding method.  A B-frame is encoded relative to the past reference frame, the future reference frame, or both frames. The future reference frame is the closest following reference frame (I or P). The encoding for B-frames is similar to P-frames, except that motion vectors may refer to areas in the future reference frames. For macroblocks that use both past and future reference frames, the two 16x16 areas are averaged.

 

 

Part 3 specifies an audio coding method and layer-3 is known as MP-3. Part 4 specifies tests and Part 5 is a technical report on software implementation of the first three parts of the MPEG-1 standard.

8.2.2. MPEG-2

MPEG-2 standard is currently in 9 parts. Part 1 (Systems), Part 2 (Video coding) and Part 3 (Audio coding) have reached international standard status.

Similar to Part 1 of MPEG-1, Part 1 of MPEG-2 addresses the combining of one or more streams of video and audio, as well as, other data into single or multiple streams.  It specifies two types of streams, the Program Stream and the Transport Stream.  The Program Stream is a container format that is designed for the use in the storage devices such as DVD/VCD.  The Transport Stream is designed to carry digital video and audio over a communications media.

Part 2 specifies a video compression method.  It is built on the capabilities of the MPEG-1 standard defines different profiles and levels within each profile, which describe image formats.  Each video-stream carries information indicating which profile capability (e.g. Main Profile@Main Level) should be used in decoding the stream.  Four levels are defined: "Low" "Main""High-1440"and "High"  Table 8.1 shows profiles and levels defined by MPEG-2.

Table 8.1: MPEG-2 video profiles

Main profile is the most widely used MPEG-2 profile, defined for "Main"(Standard Definition, SD) and "High" (High Definition, HD) resolution applications in storage and transmission. MPEG-2 is forward compatible with MPEG-1 meaning that MPEG-2 decoders can decode MPEG-1 streams with typical constraints.

Part-3 specifies a backwards-compatible multichannel extension of the MPEG-1 audio standard and a new coding scheme called "Advanced Audio Coding" (AAC), which is not backwards compatible.

8.2.3. MPEG-4

MPEG-4 Version 1 was approved by MPEG in December 1998 and version 2 was ratified in December 1999.  MPEG-4 is in many "parts" including Part 3 for Audio, Part 2 and 10 for Video, and Part 12, 14, and 15 for File Formats.   MPEG-4 includes many of the features of MPEG-1 and MPEG-2 and added new features from other related standards including VRML support for 3D rendering and object-oriented composite files, externally specified Digital Rights Management (DRM) support and various types of interactivity. MPEG-4 audiovisual scenes are composed of primitive media objects such as still images, video objects, audio objects, text and graphics, etc. Most of the features included in MPEG-4 are left to individual developers to decide whether to implement them.

Figure 8.6 illustrates MPEG-4 system layers.  There are two layers of multiplexers to exploit different QoS available from the network.  The first multiplexing layer is managed according to the Delivery Multimedia Integration Framework (DMIF, Part 6) specification.  This multiplex may be embodied by the MPEG-defined FlexMux tool, which allows grouping of Elementary Streams (ESs) with a low multiplexing overhead. Multiplexing at this layer may be used to group ES with similar QoS requirements so that it can reduce the number of network connections or the end-to-end delay.  Use of the FlexMux is optional and this layer may be bypassed if the underlying TransMux layer provides all the required functionality. The synchronization layer is always present.

 

 

The MPEG-4 Visual standard is specified in Part 2 and Part 10.  Part 2 is a video compression technology developed by MPEG, which is commonly referred to MPEG-4 video.  Part 10 is also called Advanced Video Coding (AVC) and was written by the ITU-T Video Coding Experts Group together with the ISO/IEC Moving Picture Experts Group as the product of a collective partnership effort. The final drafting work on the first version of the standard was completed in May of 2003.  Refer to H.264 section for details about MPEG-4 Part 10 Advanced Video Coding.

8.3. JPEG

In 1986, Joint Photographic Experts Group (JPEG) was formed as a joint ISO/CCITT committee to create a standard method of compression for photographic images. The group issued a standard in 1992, which was approved in 1994 as ISO 10918-1.  JPEG is a lossy compression method for still images unlike H.26x series and MPEG compression methods for moving images.  JPEG also uses a DCT similar to H.26x series and MPEG, quantization, zigzag scanning and entropy coding.  JPEG can achieve 10:1 compression ratio easily without noticeable degradation of image and up to 100:1 compression radio depending on images.  JPEG was widely accepted as the standard compression method for still images in the Internet and digital photographs.  Common file name extension .jpg is used to indicate images coded by JPEG compression method.     

The JPEG committee initiated a wavelet-based image compression method in order to achieve better compression ratio than the original JPEG based on DCT transform.  Part 3 of JPEG2000 defines a file format called MJ2 (or MJP2) for motion sequences of JPEG 2000 images. Motion JPEG uses intra-frame coding only, which is very similar to the I-frame in H.263, H.264, MPEG-1 and MPEG-2.  Since each frame is coded independently, impact of error does not propagate to next frames unlike inter-frame coding.   Support for associated audio is also included.  Expected applications of JPEG2000 include:

   * Storing video clips taken using digital still cameras,

    * High-quality frame-based video recording and editing,

   * Digital cinema,

    * Medical and satellite imagery

Motion JPEG (MJPEG) has lower compression efficiency compared to H.264/AVC.  However, it is error resilient and suitable for sending video over wireless networks since it does not use inter-frame coding. MJPEG results in much bigger file size compared to Inter-frame coding, but it makes the video editing easier because there is no dependency between frames.  JPEG 2000 is not widely accepted like the original JPEG.    

8.4. Video Coding Methods in Internet Video Players

Windows Media Player video codec was originally developed as proprietary codec for low-bit rates video applications. In 2003 Microsoft submitted its video codec to the Society of Motion Picture and Television Engineers (SMPTE) for standardization.  It was approved as a standard in March 2006 as SMPTE 421M (also known as VC-1).   VC-1 is an evolution of the conventional DCT-based video codec. 

Current version 10 of RealVideo is a proprietary video format developed by RealNetworks.   The first version of RealVideo was based on the H.263 codec and H.263 codec was supported until version 8. 

Apple QuickTimeSupport MPEG-4 Part 2 (a.k.a. MPEG-4 simple profile) and MPEG-4 Part 10 (a.k.a. H.264). 

Previous Page     |     Next Page

 1. Introduction  2. 3.5G Cellular Networks    2.1. HSDPA    2.2. EV-DO  3. Apps and Services  4. Technical Issues  5. VoW Devices    5.1. HSDPA Devices    5.2. EV-DO Devices  6. Phone Components
 7. Mobile TV  8. Video Compression    8.1. ITU Standards    8.2. MPEG  9. Standards    9.1. Signaling    9.2. Transport  10. Resources  11. Acronyms                                                                                  Contact

If you enjoy nature photos, you may visit my photo site.
© 2007 TelecomReferences. All rights reserved