Multimedia is defined by the International Standardization Organization (ISO) as the creation, modification and composing a/or representation of products comprising any type of combination of media [1]. 'Medium' is any means through which it is possible to perceive information, to express it, to store it, or to transmit it.

   'Network multimedia' is understood as: :

  • Videoconferences - video phoning, computers with video windows
  • Videostreaming - meaning multimedia on the computer
  • IP phoning, voice, voice mail, facsimile

  • Audio codec standard

        Sound, or frequencies 20-20000 Hz, which we can hear, is too large a range for transmission through low-capacity lines. The human voice uses from this spectrum only a low band in the range 300-4000 Hz. This means that, in the transmission of voice it is calculated mainly with the bandwidth 4 kHz. In 1928, at the Bell labs for the first time they started to code sound into a digital form, while it was found that to get a post-digitalization signal equal to the original, it is necessary to code it with a double frequency range; this means, that if a voice has 4 kHz, then 8 kHz is required. Today, we can on the Internet encounter an audio signal in an AU format that is sampled at 8 kHz. This format is rather obsolete; it does not use any type of the audio compression, so popular in recently. Despite this, there has been up until now only one audio file supported in JAVA applications (the standard of the company SUN). So, at the frequency 20 kHz we use 40 kHz. In the case of a CD, there is used a 44 kHz 16 bit stereo, which results in 44 x 16 bit x 2 = 176 000 bits (175 kHz) per second. This figure is relatively big. One of the possibilities of how to reduce it is compression.


        This is the process of converting a wave time course into discrete values. More information on quantization


        This is necessary due to the high cost of transmission. It exploits some attributes of human voice, for instance pauses between words, long periods of magnitudes, predictable changes in voice amplitudes; allowing high expectations for good compression.

    The Pulse Code Modulation (PCM) is a popular method of coding an analogue signal. It is used in the public telephone network. The PCM samples an analogue signal with the frequency 8000 times per second, using 7-bit or the 8-bit quantization. In this way, we achieve results of 56 kilobits per second (kbps) with 7-bit sampling, and 64 kbps with 8-bit sampling. The PCM format supports only amplitude compression. The ADPCM (Adaptive differential PCM) uses bandwidth compression, which can reduce the bandwidth by half.

    Video codec standard

        When a picture appears in front of the human eye, it can take several milliseconds for us to perceive it. If a sequence of pictures is shown very fast, the human eye cannot see each such picture individually. This is used in every video system for producing the 'impression of moving pictures'. To obtain a smooth sequence, it is necessary to display 24 to 30 pictures per second. In some cases, where the dynamic change between pictures (e.g., a speaking human) is small, a slower frequency is also acceptable, 15 ? 20 pictures per second.

        A digital picture, or image, comprises pixels in a 2D field, where each pixel represents a level of intensity or color. In the case of a black & white image, one bit is enough for each pixel. Gray-scale images (256 shades of gray) require 8 bits per pixel. Truecolor images demand 8 bits for representing every color from the RGB color model, which means 25 bit per pixel. Because digital video contains a huge amount of data, compression is essential from the view of transmission via a network.

        A videoconference of an uncompressed video (30 frames per second) with the resolution 352x288 would require a bandwidth of 73 Mbps (352*288*39*24 bits).


        In 1988 the ISO Moving Picture Experts Group (MPEG) commenced design of the MPEG standard for coding whole-picture video, audio, and synchronization. The high compression ratio is due to exploiting the similarity of neighboring pixels (these are highly co-related) as well as due to the redundancy of two subsequent frames plus the limitations of the human eye in detecting details in moving video-sequences. In its current implementation of MPEG, we distinguish three types of images:

    I - the intra frame containing compressed information of the whole image
    P - the prediction frame containing changes between two images
    B - the image containing changes between the previous and following frame (this is possible,because the MPEG stores groups of images at the same time, and not gradually one image after the other).

        A typical 30-frame MPEG looks as follows:


    with 1/3 of the data (from the overall size) being I images, 1/3 being P, and 1/3 being B images. The B images are calculated from the I images and the P images. This sequence is a series of displayed images; in the file is stored the sequence I, P, B1, B2, P, B3, B4, ?, when the first I image is displayed on the screen, the P image is read into the memory, then, from the I, P, and B1images is displayed the second image, next from the I, P, and B2 images is displayed the third image, and in the end the P image is displayed.

    MPEG 1

        Designed in October 1992, as a standard for storage and retrieval of moving pictures and audio on storage media for coding video and audio signals with a quality comparable to the non-interlaced VHS signal (classical video), maximum capacity up to 1.5 Mbps. Most popular is the MPEG1 with a resolution of 352x240, Pal standard, 25 frames per second with a capacity of 1.2 Mbps and the audio signal coded by MPEG Layer 2 in CD quality 44 kHz 16 bit stereo of the capacity 227 kbps, which is used in VIDEO CD. In this way a compressed 120 minute film can be stored on 2 CD.

    MPEG 2

        Designed in November 1994 as a standard for digital television (HDTV) with 5 audio channels in CD quality with a bandwidth 4 to 40 Mbps, maximally 80 Mbps. It requires a higher hardware compatibility than does the MPEG 1; it can be coded as an interlaced or non-interlaced signal, and is backward compatible to MPEG 1

        V origin? : MPEG-2, a standard for digital television (officially designated as ISO/IEC 13818, in 9 parts).

    MPEG 3

        This format does not exist. Originally, it should have been a successor to the MPEG 2. But because the MPEG 2 fulfilled all quality expectations, this format was abolished

    MPEG 4

        MPEG-4 The MPEG4 is an ambitious project of ISO Moving Picture Experts Group (MPEG), intended as a standard for interactive network multimedia, including audio and video and their synthesis (2D and 3D animation composed from polygons, splines, ?).

        While the MPEG 1 was primarily intended as an audio/video compression technology of VideoCD (films on CD-ROMs), MPEG 2 was intended for digital television and High Definition Television (HDTV). With regard to the compression method used, MPEG 4 appears as a close relative to H.263 and MPEG-1. Documentation indicates that the MPEG-4 should be backward compatible, or even almost compatible with H.263 and MPEG-1.MPEG-4 uses video compression algorithms like hybrid block discrete cosine transformations and the movement compensation coding methods used already in the MPEG-1, MPEG-2, H.261 and H.263

        The MPEG-4 format is already supported also directly in the video player for Windows 98 by the version Media player 6.4. The files have an extension '.asf', in the most common dimensions 320x240 with 15 to 30 frames per second. The size of a 120-minute film in MPEG 4 is from 220MB(for a film with the quality 250Kb), 440MB (for a film with the quality 500 kb), up to 880 MB (for a film with the quality 1Mb). Hardware demands comparable to those of the MPEG-1, where a Pentium 150 was enough for playback, are about double (at least PII 300 or K6-2 300).

        One of the compression programs is the Sonic Foundry Stream Anywhere v1.0 of 27 September 1999, that can compress from the following formats: MPEG 1, AVI, and MOV, into the format ASF (MPEG4) with the bandwidths of 100, 250, 512, 1024, 3072 Mb for video, and 28.8 and 56 kb for videoconferences.

    DivX is MPEG4 Hi-Res Video Codec for coding video in DVD quality, but in a size of about 700 kbps (2 hours' film on one CD).
    You can find more information on the address

    Other types


    Joint Photographic Experts Group (JPEG) The Joint Photographic Experts Group (JPEG) is the first international digital image compression standard for static images (grayscale or color). It is an ISO standard, and since the year 1992 has represented one of the best techniques for high color images. JPEG is a loss format, with a compression rate of 8 : 1 up to 35, where the level of compression can be changed in relation to the quality of the resulting compressed image. One of the use options is the b>Motion JPEG,representing in fact a series of JPEG images. In comparison with MPEG, it has an advantage in the option of editing each image separately

    H.261 (Recommendation F.261)

        Through the adoption of ITU in December 1990, H.261 is the specification (landmark specification) for video coding algorithms enabling coding from diverse types of videoconferences. H.261 enables coding of a TV standard signal compressed and transmitted with a speed from 64 kbps to 2 Mbps. The H.261 is the main video coding algorithm for virtual ITU videoconferences. It provides the general basis for supporting operability between conference systems running through various types of network. Originally, H.261 was intended for video communication through ISDN with a bandwidth of a multiple of 64 kbps (X x 62 (X = 1, 2, 3, ?, 30)). For providing operability between different TV systems and reducing transmission speed, there are supported in the H.261 only the following two resolutions of image:

    1. Common Intermediate Format (CIF)
    2. Quarter CIF (QCIF),

    where only QCIF is compulsory.

        CIF has a resolution 352x288, and the number of frames per second is most frequently 30. It provides support for applications requiring high transmission speeds, as for instance videoconferences where several people are in the viewing field of a camera.

        QCIF The QCIF has a resolution 174x144 (one quarter of the CIF size), and the number of frames per second is typically 7.5 to 15. Compared to CIF, it requires only one quarter of the transmission channel. The transmission band is in the range from 64 to 128.

    H.263 (Recommendation H.263)

        Through its approval in spring 1996, the H.263 was specifically designed for coding video applications with low transmission rates from 15 to 20 kbps. The H.263is intended for use with V.34 modems capable of full duplex connections with the bit rate of 28.8 kbps. Whereas the H.261 supports only two video formats (the CIF and the QCIF), the H.263 supports 5 formats, where besides these it supports also the Sub-QCIF (128x96), the 4CIF (4xCIF), and the 16CIF (16xCIF). Using some MPEG compression techniques, the H.263 creates images close to the quality of H.261 , despite the fact that requires only the half of the bandwidth below 64 kbps. Even for higher transmission speed, the H.263 provides still a better image quality, and perhaps beats the H.261 in view of its widespread acceptance and universality for videoconferences.

    3D sound

        We can compare the sound card with a graphics card. While the graphics card serves for displaying data on the screen, the sound card has for its output a loudspeaker system for sound reproduction. The first 3D sound idea came only after the development of 3D graphic cards. We can take as an example - a classical 3D-action-game where there has arisen the need for better orientation in space. And so, the first 3D sound cards started to be designed.

        The Creative, Inc. Sound blaster marked the start of the development of sound cards, enabling 8-bit mono sound. Up until then, sound in a PC was produced through a 'PC speaker'. Later, 16-bit sound cards for stereo sound entered the scene. And then, 3D sound followed.

        The following standards were designed for 3D sound over the time:

    A3D v 2.0 - ? the standard for 3D sound by Aurel; this simulates sound propagation using a simplified spatial material model of an environment.

    EAX (Environmental Audio eXtension) ? the standard of spatial sound, emphasizing 3D sound through the simulation of sound reflections from the environment. It is easily programmable, and contains pre-calculated values of parameters in relation to the surrounding environment.

    Dolby Digital (AC-3) -a digital system of coding 5 standard channels (frequency: 20 Hz ? 20 kHz), two front and two back loudspeakers plus a central loudspeaker, and one channel for a subwoofer (a loudspeaker for frequencies 20 ? 100 Hz) into one digital data stream. This system is briefly indicated in 5.1.

    HRTF (Head Related Transition Functions) ? the technology of 3D sound simulation using two loudspeakers or headphones.

        Though spatial sound can be made on a stereo loudspeaker system, the resulting 3D sound is not comparable with that as made using 4 loudspeakers.

        In sound diffusion in an environment, the Doppler effect arises; through which the frequency of sound of a approaching object appears higher than in a stable, standing object; and the opposite, if an object is receding, its frequencies seem lower

        Modern sound card chips comprise a processor, however not a classical processor like a Pentium type, but a DSP procesor (Digital Signal Processor) for processing data in real time. The performance of this sort of processor is given in MIPS (million instructions per second). For comparison with a classical processor, the performance of an EMU 10K chip installed in the sound card SB Live! with the declared performance of 1000 MIPS corresponds to the performance of about a 100 MHz Pentium

        At present, besides SB Live!, there are other quality sound cards; for instance: cards installed by Aureal Vortex 2 chip with the power of 800 MIPS. A very good and cheap alternative is the sound card MediaForte Quad X-Treme 256 providing, besides output to 4 loudspeakers, also support of the EAX standard.


    Bandwidth - transmission capacity of a given size; meaning what constant amount of data can be transferred for a time unit; given in bit per second (bps).

    Bit rate - number of bits which can be transmitted through a channel for one second (bps).

    Codec - a hardware or software component establishing the digital coding and decoding of video signal.

    Compression - ? a technique enabling the reduction of a data size required for the reproduction of digital or video signal
    H.261 - Low level video codec supporting the CIF and QCIT. Known as 'Px64'
    H.262 - Low level video codec, similar to MPEG-2
    H.263 - Low level video codec supporting the SQCIF, QCIF, CIF, 4CIF, and 16CIF