Transcoding

3 Min. Read
Dec 8, 2019

Introduction

Transcoding is the direct digital-to-digital conversion of one encoding to another, such as for movie data files, audio files (Eg: MP3, WAV), or character encoding (Eg: UTF-8, ISO/IEC 8859). This is usually done in cases where a target device (or workflow) does not support the format or has limited storage capacity that mandates a reduced file size, or to convert incompatible or obsolete data to a better-supported or modern format.

Transcoding is commonly a lossy process, introducing generation loss; however, transcoding can be lossless if the output is either losslessly compressed or uncompressed. The process of transcoding into a lossy format introduces varying degrees of generation loss, while the transcoding from lossy to lossless or uncompressed is technically a lossless conversion because no information is lost; however, the process is irreversible and is more correctly known as destructive.

Transcoding is a blanket term that involves other processes such as transcoding, transrating and trans-sizing.It is a two-step process in which the original data is decoded to an intermediate uncompressed format (Eg: PCM for audio; YUV for video), which is then encoded into the target format.

  • Transcoding alters the video and audio format into one that is either more popular or can be manipulated easier e.g. mkv to flv.
  • Transrating focuses on reducing the bitrate of video and audio while maintaining the original media format e.g. 4K at 13Mbps to 4K at 6Mbps all in the same format.
  • Trans-sizing involves the reduction in video frame from 4K to 2K or 1080p, 720p, etc.

Usage

Although transcoding can be found in many areas of content adaptation, it is commonly used in the area of mobile phone content adaptation. In this case, transcoding is a must, due to the diversity of mobile devices and their capabilities. This diversity requires an intermediate state of content adaptation in order to make sure that the source content will adequately function on the target device to which it is sent.

Transcoding video from most consumer digital cameras can reduce the file size significantly while keeping the quality about the same. This is possible because most consumer cameras are real-time, power-constrained devices having neither the processing power nor the robust power supplies of desktop CPUs.

One of the most popular technologies in which transcoding is used is the Multimedia Messaging Service (MMS), which is the technology used to send or receive messages with media (image, sound, text and video) between mobile phones. For example, when a camera phone is used to take a digital picture, a high-quality image of usually at least 640x480 pixels is created. When sending the image to another phone, this high resolution image might be transcoded to a lower resolution image with fewer colors in order to better fit the target device’s screen size and color limitations. This size and color reduction improves the user experience on the target device, and is sometimes the only way for content to be sent between different mobile devices.

YouTube makes use of FFmpeg for performing the transcoding. The benefits of transcoding arise when it comes to streaming. For instance if you intend to directly stream a live Fortnite BR game directly to your viewers, the properties of the video and sound that you are recording on may not match that of your intended viewers. Your video recording software may produce 1080p H.264 video that can take a long time be downloaded on users machines (as its not being appropriately transrated and trans-sized). In addition if your audio format is .aac, some users may not have the ability to decode the audio. Streaming sites like Twitch/YouTube act as an intermediary that perform all the transcoding to ensure viewers can retrieve the video in their appropriate format.

YouTube has perfected the art of transcoding large video and audio files in no time, by using infrastructure that can perform the task in a highly parallelized manner. This is achieved by takingthe input video you are uploading, splitting it into several smaller segment, performing transcoding on each small segment, and stitching them back together. They have an array of machines that are optimized for this process, that is why you can watch a SpaceX livestream in 1080p with little to no interruption.

Drawbacks

The key drawback of transcoding in lossy formats is decreased quality. Compression artifacts are cumulative, so transcoding causes a progressive loss of quality with each successive generation, known as digital generation loss. For this reason, transcoding (in lossy formats) is generally discouraged unless unavoidable.

For image editing users are advised to capture or save images in a raw or uncompressed format, and then edit a copy of that master version, only converting to lossy formats if smaller file sized images are needed for final distribution. As with audio, transcoding from lossy format to another format of any type will result in a loss of quality.

For video editing, (for video converting), images are normally compressed directly during the recording process due to the huge file sizes that would be created if they were not, and because the huge storage demands being too cumbersome for the user otherwise. However, the amount of compression used at the recording stage can be highly variable, and is dependent on a number of factors, including the quality of images being recorded (e.g. analog or digital, standard def. or high def., etc.), and type of equipment available to the user, which is often related to budget constraints – as highest quality digital video equipment, and storage space, may be expensive. Effectively this means that any transcoding will involve some cumulative image loss, and hence the most practical solution insofar as minimizing loss of quality is for the original recording to be deemed the master copy, and for desired subsequent transcoded versions, which will often be in a different format and smaller file size, to be transcoded only from that master copy.