Synchronization Basics

By Robert J. Withoff

Copyright © 1999 Digital Audio Labs, Inc.  Used with permission


NOTE: This article originally appeared in programmer documentation for the DIgital Audio Labs V8® series Timing Gear® synchronizer. I have tried to modify it to be a more general tutorial for timecode and synchronization, but as always I might have missed something. Neither myself nor DAL can be held responsible for any problems that arise from the use of the information in this article.

Synchronization

What is synchronization?

[To happen or cause to happen, move or operate in exact time with (something else or each other). Concurrence of events with respect to time.]

Synchronization is the act of causing two or more separate events to happen together, or more precisely, causing two or more periodic events to occur in a specific relation to each other. For example, picture two gears. If the two gears are not meshed together, they are allowed to spin freely independent of each other. If those same two gears are meshed, they are no longer independent, and are synchronized with each other.

If we take a look at synchronization within the context of the audio-for-visual world, we are almost entirely talking about synchronizing audio to visual data, be it videotape, projected film, or streaming multimedia files on the internet. It is probably the single most frustrating issue in post-production, and the most overlooked.

Locking multiple machines together

When we are looking at the world of audio, video, and film production, synchronization between multiple machines becomes the primary barrier to success. Think of a multi-track recording studio that has two 16-track analog reel-to-reel machines. If there were no way to synchronize these machines, it would be almost impossible to use them together in any meaningful way. Or consider something as simple as going to see a movie—if there was no way to lock the sound to the picture, it would be a very different experience indeed.

Master/slave

The most common configuration in a synchronization scenario is a master/slave configuration, where one unit is the master that provides the clock signal that all of the other devices follow. Any other devices are considered to be slaves, and must follow the clock of the master.

A slave device can provide a clock that other devices can follow, as long as the clock is locked to the master input. For example, let’s look at a fairly common higher-end video production studio setup. This particular room has two VTRs, a DA-88, and four nameless digital audio devices that have a word clock input. In addition, the entire facility has a master video sync generator in a separate room, which provides a master video sync source (house sync) to all of the rooms in the facility.

Note that since the audio devices don’t accept the video clock directly, it must go through a video-to-word clock converter, or genlock. This device generates a word clock that is very tightly synchronized with the master video clock.

The second part of the synchronization paradigm is positional information. Even though everything is running at the same speed, we need to make sure that it is in the same place as well. For instance, picture an alarm clock—if the hour, minute, and second hands all operated properly, but if you set your alarm for 7:00 and it kept going off at 3:25, you wouldn’t like it very much. In this case the positional information of the alarm would not be in sync

The positional information in our production example is provided by timecode. Timecode is provided in much the same way as the clock signal; there is one master, and one or more slave devices that follow the master like obedient puppies. The timecode from the master informs all of the slaves where the master is; the slaves in turn look at their own locations, and speed up or slow down (or take some other action) in order to match themselves to the master.

Note that it is not critical that the timecode master and the clock master are the same source, but in order for the operation to be smooth, they should be closely related in some fashion. It is best if they are generated from the same master clock—in this case, VTR 1 will likely be generating its timecode referenced to the house sync source.

Note: there is a second type of topology where two or more devices are connected together in a ring, and each device slaves to its predecessor in the ring, with no particular device being the master. However, this configuration is of limited use in the A/V production industry, and will not be discussed further. Suffice it to say that any normal configuration you will run into will be a master/slave.

Multiple sources

It is certainly possible to run into a situation where there are multiple devices in a system that are all generating timecode independently, and you will need to keep track of some or all of them at the same time. In this case, it is important to remember that there can be only one timecode master. If you have timecode slaves in the system, they can follow only one device. You can certainly keep track of and display other timecodes, but you cannot locate to two different points simultaneously.

It is possible (and quite likely) that the device generating the timecode and the device generating the system clock may be operating independently of each other. When this happens, the timecode and the system clock will drift away from each other, and eventually, there will be a noticeable error. For example, a system with a sample clock running at 44.1 kHz and a timecode framerate of 30 fps will have a very clean ratio of 1470 samples per frame. If the same clock generated them both, there would be a new frame every 1470 samples on the nose.

However, let us assume that the sample clock is generated by a clock with a +10 PPM (parts per million) error with relation to the timecode generator. At this rate, every 147 million samples will have an error of one frame—at 44.1 kHz, that’s more than one frame every hour. That may not sound like much, but over a long time, it adds up.

Timecode

Definition

Background

Back in the early days of the space program, the fine folk at NASA were experimenting with using film and video cameras during their space flights, in order to document some of the events that were happening. One of the engineers came up with the idea of putting a unique label on each frame based on the elapsed time of the particular experiment, so when they were playing back the films and tapes to analyze the mission, they would have a record of the sequence of events.

Eventually, this idea caught on with some motion picture and television producers who wanted to use it for their work, and after several different systems were tried, the organization known as SMPTE standardized it into what is now known as SMPTE timecode.

Unique label

The primary purpose of SMPTE timecode is to attach a unique label to each and every frame of a motion picture film or videotape. This allows an editor to go back and locate a specific frame of the show by locating that specific code.

It is important to recognize that this same feat could be accomplished by simply numbering the frames of a film consecutively, or by providing them with labels like "A, B, C, D…"; however, this can really lead to very large numbers with long projects ("Hey Bill, you wanna find frame number 37,658,922 for me?").

This leads us to the secondary purpose of SMPTE timecode, which is to provide an approximation of the elapsed real time of the project or segment.

SMPTE timecode is defined as a block of information that consists of hours, minutes, seconds, and frames, usually written as HH:MM:SS:FF. Because of the historical roots, the timecode is limited to representing no more than 24 hours, so the ranges go from 00:00:00:00 to 23:59:59:nn, where nn is the format of the timecode, measured in frames per second. For example, most motion picture film is shot at a 24 frames-per second (fps) rate, so in this case, the timecode format would be 24fps. The sequence would then go:

Frame 0 00:00:00:00

Frame 1 00:00:00:01

Frame 2 00:00:00:02

Frame 3 00:00:00:03

Frame 4 00:00:00:04

. .

. .

. .

Frame 22 00:00:00:22

Frame 23 00:00:00:23

Frame 24 00:00:01:00

Frame 25 00:00:01:01

Frame 26 00:00:01:02

That does not mean that timecode represents time of day. Rather, timecode represents elapsed time, kind of like a stopwatch. It can certainly be forced to represent time of day, and in certain circumstances that is even desirable (like when you are shooting multiple cameras at an historic event and want them all to match). In most instances, however, the timecode will only represent elapsed time based on the start of the particular reel.

To make things even more confusing, many shooters will use the hours field as a reel number; thus timecodes starting at 01:00:00:00 would be on reel 1, timecodes starting at 02:00:00:00 would be on reel 2 and so on. So it really doesn’t matter what the initial start time is, what matters is that each frame has a unique number, and that the approximate elapsed time can be obtained by the relative difference between any timecode values.

SMPTE time vs. real time

The second biggest mistake that people make when using timecode is thinking that the time in the timecode is the same thing as real time. It’s not. Boy, is it not. It’s easy to make the mistake, because under normal operation, a timecode display showing the SMPTE timecode will appear to be advancing one second for every second of real time. Don’t be fooled! The appearance that this is happening is just coincidence. Let me explain why:

Imagine if you will a film projector running a piece of film. That film has timecode burned on it in a 24fps format, so each frame has a unique code. The normal speed of the projector is 24 fps, so at normal speed, there is a match for time. But let’s assume that the projector is being used for a special effect, and it’s being run at half speed, or 12 fps. The timecode that is burned into the film is still in a 24-fps format, but since the film is running through the projector at 12 fps, it takes two seconds for each 24 frames to pass. So it takes two seconds of real time for the projector to send one second of SMPTE time.

Confusing? You bet. So try thinking of the timecode as a label, not as a time. So now with this example, it takes two seconds for the projector to send 24 unique location labels, because the projector is only sending 12 labels per second. Now let’s say that 24 of those unique location labels have some meaning—with the film, it makes some sense that those 24 labels would take up some certain length of film, so just for the sake of argument, let’s say it represents one foot. So with the projector running at 12 frames/sec, it takes two seconds for one foot of film to go by.

Now if you pick up that foot of film, you will see 24 individual image frames. Let’s imagine further that each one of those frames has a timecode label visibly stamped into each frame, so you can read the timecode just by looking at the frame. In that one foot of film, you can see that the timecode on one end is one second further advanced than the other end, even when the projector isn’t running and you’re just looking at the film by hand.

So as it turns out, the only case where one "second" of SMPTE time equals one second of real time is when the device that’s generating the SMPTE timecode is running at exactly normal speed. This is an important distinction that will come into play very soon.

Consider the dead horse beat.

 

Components

Every kind of timecode in use today has three basic components to it. They are the clock, the data and the frame sync. This is true of even more esoteric forms of timecode used in GPS systems, and can be broadened into areas dealing with digital data transmission in general, but for the purposes of this article, we’re limiting the focus to SMPTE timecode.

In mathematical terms, timecode is considered a vector, which is basically a measurement that has more than one element to it. For instance, if you are driving your car at 60 MPH, you have your speed; if you are going north, you have your direction. If you’re trying to figure out how long it will take you to get somewhere, you need to know both your speed and direction; that combined information would be considered a vector (students of physics will recognize that one).

Clock

The clock in a timecode signal is formally defined as the data clock; it is what defines the time when the timecode data is valid. It can be implicit, as in the case of VITC, where the data clock must be derived from the other sync signals present in the video transmission, or explicit as in the case of LTC.

The data clock should have some correlation in timing to the frame sync, and there must always be at least one data clock for every frame sync. In the case of LTC, the data is transmitted as a stream of bits during the entire duration of the frame, so there is a very tight correlation between the data clock and the frame sync (there are 160 data clocks per frame in LTC). In the case of MTC, the data is sent at quarter-frame intervals, but the timing correlation is a bit looser.

In the case of an as-yet-undefined parallel mechanism where the timecode would be parallel-loaded each frame, the data clock and the frame sync would have a direct 1:1 correlation.

In cases of tight correlation between the data clock and the frame sync (as in the case of LTC), it is possible to extract the data clock to provide speed information.

Data

The timecode data is a pretty critical part of the timecode. While a frame sync and data clock signal alone would be enough to provide speed information, the timecode data provides the positional information. This is basically the positional label information from the above section.

Frame sync

The frame sync element is the heartbeat of the timecode vector. It essentially marks the sample interval of the timecode, and marks the boundary for which that timecode data is valid. It is accepted convention that the timecode data that immediately follows a frame sync signal is the valid data for that frame.

Each frame that the timecode data represents should have an exact correlation to the physical frame that it represents; for example, the timecode associated with a video signal should have its frame sync signal match exactly with the video vertical sync.

Data format

The SMPTE timecode specification delineates a very specific data format for timecode. This is good in that it has been well known and well defined for centuries (okay, not centuries, but at least decades), and there are many different devices from many different manufacturers that know and understand how to work with SMPTE timecode. There are also a few out there that didn’t get it quite right.

SMPTE Block

The basic SMPTE data block consists of 64 bits of information, commonly divided into 8 bytes, or 16 nybbles:

*In VITC, this is the FIELD MARK flag.

Timecode bits

The actual timecode data is carried in the timecode bits (low nybble of each SMPTE block byte) in BCD format. The frame units are transmitted first, as this data is the most dynamic, and usually the most critical.

User Data

The user data is carried in the binary group section of the SMPTE block. User data is available to be written by the generator of the timecode, and does not have a specific rigid definition. However, some conventions have been established for writing data to the binary groups:

If the user wants to write byte-wide data, each byte is split into pairs as [8,7] [6,5] [4,3] [2,1] with the [8,7] pair being the MSB. If the user data gets displayed in a timecode window, this is the order in which it is displayed.

The user data bytes can be interpreted as ASCII data; [BG Flag 55 = 1] and [BG Flag 75 = 0] usually indicate this.

There are other conventions as well, but they are outside the scope of this article. Most often, the user data can be ignored during run-time, as the data rarely changes. It is nice to be able to reference it within an application or timecode reading device, as it can contain useful information such as the date of production, film reel identification codes, or the names of ‘droids from a popular science fiction movie (C3P0, R2D2). (Why, I knew him when he was only C2P9…)

Flags

The flag bits in the timecode block require a little bit of explanation:

Color frame: This has to do with the way the timecode is synchronized with an NTSC video signal. If an even timecode address identifies an ‘A’ frame and an odd timecode address indicates a ’B’ frame, this bit should be set to 1. If this makes no sense to you, don’t worry about it. Most often this bit is set to 0.

Drop frame: This bit is set to 1 when the timecode is a drop-frame format.

Phase corr: This is the biphase mark parity correction bit, and is designed to correct for phase problems when two LTC timecode words are joined together in a tape splice.

Field mark: With VITC timecode, this flag is intended to indicate whether the data is being sent in the odd (0) or even (1) field of the frame.

BG Flag 55,

BG Flag 75: These two mark the binary group flags. Their use can identify the way in which the user data should be interpreted:

Unassigned: This flag is unassigned at the time of this writing, and should always be ‘0’. Should you or any member of your team decide to use this bit, the Secretary will disavow any knowledge of your existence.

Status

Okay, this is not strictly a part of the SMPTE timecode specification, but it deserves discussion.

Every timecode frame that comes in to a device has a status associated with it. That status isn’t transmitted as a part of the timecode data, but is generated by the timecode reader. The end user may never be aware of the status directly, but it’s there to act as a guide for the interpretation of the timecode data. Here are some examples:

These are not set in stone. Every timecode reader will have its own set of states that it keeps track of.

Run/Idle/Stop

Q: When timecode is used with a device that can be started and stopped (like a VTR), what happens to the timecode when the device is stopped?

A: It depends on the device.

It’s pretty clear that when the tape is playing in the VTR, that the timecode output is incrementing normally. However, when the tape is paused or put into still-frame mode, the timecode output may do one of two things: either it will stop completely, or it will continue putting out a timecode signal with the positional data staying the same (not incrementing). This second behavior is called idle timecode.

In general, it is preferable for a device to send idle timecode when it is paused or stopped, because the timecode reader can then see that there is a device connected and active, and can determine that the signal can send timecode coherently. If the device stops sending timecode completely, it could indicate that the device has become disconnected, or powered off, or was crushed by an elephant.

Use

So how is timecode used in the everyday world? Wow, you don’t ask simple questions, do you? Yikes. Well, here are some real-world examples that might provide some enlightenment.

Film sync sound recording

In shooting a motion picture, it is almost always the case that the audio for the shot is recorded separately from the actual film. This is primarily because there are very few motion-picture cameras that have the capability of recording sound, particularly among the more expensive cameras.

Most professional film shoots will use either a Nagra (brand name of analog reel recorder) with timecode capability, or a timecode DAT recorder (such as the HHb PortaDAT). Very often, this timecode is displayed on a device called a timecode slate, which you’ve probably seen in a lot of behind-the-scenes TV specials on the making of movies. It’s usually a regular film clapper slate with an LED timecode display on it.

The timecode display on the slate is synchronized with the master timecode generator, which is usually on the audio recorder. The idea is that the timecode getting recorded on the audio recorder is visible on the slate, so when it'’ shot with the film or video camera, there is a visual reference point that can be used to match the audio back to the picture in editing. The editor looks for the frame where the little clapper comes together and notes the timecode on the display, then he can locate the audio to that timecode and lo and behold, there’s the sound of the little clapper coming together. This works. I’ve done it. It’s cool.

There are a couple of newer motion picture cameras (notably Aaton) that have a timecode output that can be used to feed an audio recorder that is equipped with the interface. With the Aaton, the timecode is actually ‘burned’ onto the film in the camera, and some editing equipment can read the timecode. It’s sweet.

Frankly, you don’t need timecode to shoot a film; a whole lot of lower budget films are done all the time without timecode. However, it takes a lot longer to put together when you have to locate each take by hand. About three times as long.

Video editing

In most ways, video editing is a lot faster and easier than film editing. For one, modern video editing is done electronically; for another, almost every video camera records sound in sync with picture, so there is no need to synchronize externally. So, synchronization issues aside, why does videotape editing benefit from timecode?

In a word, precision. Remember that each frame of video has its own unique timecode address—this makes it possible for editing systems to exactly locate down to the single frame any edit points, and it makes those edit points reproducible again and again, even on completely different editing systems.

However, something to bear in mind is the advent of DVD and digital television. With both of these, the standard for sound is six discrete channels, but no video deck currently available has the capability of more than four. That means that somewhere along the way, there will have to be a synchronizing link between the video deck and an audio workstation, and that synchronizing link will more than likely be SMPTE timecode.

Multitrack recording w/ MIDI

Although modern synthesizer and sampler technology has made it less necessary, as short as a couple of years ago, it was fairly common to see a MIDI studio with a multitrack recorder. One track of the multitrack would get striped with timecode for at least the length of the song, then that track would be fed into a SMPTE input of a MIDI sequencer (perhaps the most common would be a PC with an MQX-32 card). This would allow the composer to set up his synth to several different instruments and record each to a different track on the multitrack.

The net effect would be that of having a whole bunch of synths playing simultaneously on the final multitrack.

Audio post-production

This section is probably more closely related to film and video editing, but I think it stands on its own as a production category.

When a film or TV show is being produced, the first step is usually to get the visual rough edit done, then start doing sound. What this usually means is that a videotape of the rough edit is handed to the sound editor, who takes it into his studio and puts it in his machine. This videotape has the timecode of the finished show burned into it, and the timecode output of the VTR feeds the sound editor’s workstation. The workstation then ‘chases’ the VTR, and the sound editor adds sound elements to the show locked to the video.

Quite often, the various elements such as dialogue replacement, sound effects, Foley, and music are added separately, sometimes by different individuals or departments. On a large budget feature, the usual procedure is that each department brigs a mix stem back to the main mixdown. The mix stem usually consists of the source material needed, plus automation information. Each stem is loaded into the main mixer, which is locked to the final visual edit via timecode, and final tweaks are done to bring the final sound edit in line with the picture. Sometimes there are several hundred tracks going simultaneously.

It's a real hoot to watch a fully automated board with 256 faders work over the length of a feature film. It brings tears to my eyes.

Scoring

Scoring has a lot in common with audio post, but is completely about music. It also works a bit differently.

The final edit with sound effects is sent to the composer with timecode burned into the videotape. Historically, this has been on 3/4" videotape, although more recently other formats have become more popular.

The composer/music supervisor uses the timecode to spot (locate) music cues, which is a very subjective exercise (try it sometime; play a movie with the sound off and see where you would place music cues).

Once the cues are spotted, the composer composes.

The music is then matched back to the video in the spotted locations.

Keeping track of project duration

This one’s pretty simple. Subtract the project end timecode from the beginning timecode, and you have the Total Running Time (TRT) of the project. This time is measured from the first frame of the program to the last frame, but does not count header information (like countdowns and slates).

TRT is almost always stated in actual running time. This presents a problem when using 29.97 non-drop frame timecode, because the timecode will be off by 108 frames every hour—that’s 25.2 seconds in dog years. The lesson is that your final project has a framerate of 29.97, you should always use drop-frame timecode to mark it.

General considerations

Most video and film productions have a countdown leader associated with them, and video productions for broadcast and duplication will often have a color-bar and slate leader. The color bars are used to align the video signal for proper duplication, and the slate provides production information. However, when timecode is attached to these productions, the "zero point" of the timecode is almost always associated with the first frame of the show(called program start), so the bars, slate, and countdown are all before the "zero point" frame.

Because the behavior of timecode wrapping at the 24:00:00:00 point has not always been handled well, most facilities use 01:00:00:00 as the "zero point". This practice is almost entirely universal in the professional video marketplace (though there are some variations in the start point; 10:00:00:00 and 02:00:00:00 are common), and with the advent of electronic editing, is rapidly becoming the de facto standard in the film industry.

 Common timecode problems

Dropout

Dropout is defined as a loss of recognizable timecode data. This could mean anything from a loss of signal (hey! Somebody tripped over that cable!) to noise or distortion overwhelming the data. In any case, the single characteristic important to dropout is that once you had the data, and now you don’t have the data anymore.

Dropout is a fact of life. Timecode is very often recorded on tape, and it’s real easy for tapes to deteriorate with age or physical damage.

Sometimes, dropout is intentional. For instance, it is quite normal for a video production or post-production house to create a VHS dub of a project, but VHS has no dedicated timecode track. While you could stripe timecode to one of the audio tracks, this can get ugly if the person watching the tape forgets to connect the audio track to a timecode reader.

What could be done is to run several seconds of timecode on the tape, and drop it out a couple seconds before the show starts. This will allow the timecode reader to jamsync with that early timecode on the tape, and continue generating timecode based on the video signal.

Discontinuities

A discontinuity in timecode is when there is a "jump" of more than one frame in a timecode sequence. A normal timecode sequence increments only one frame for each new frame of data (with the exception of drop-frame, which has well-known jumps).

You can think of it as a break in the expected sequence of numbers. For example, take the series [1, 2, 3, 4, 5, 27, 28, 29, 30]. The obvious break in sequence between 5 and 27 would be a form of discontinuity in the sequence.

There are two types of discontinuities: bound and unbound.

A bound discontinuity is characterized by a fairly short disturbance in a normal sequence, where the sequence other than the discontinuous area remains undisturbed. For example:

00:01:12:00

00:01:12:01

00:01:12:02

00:01:12:03

11:01:12:04 ß discontinuity

11:01:12:05

11:01:12:06 ß end of discontinuity

00:01:12:07

00:01:12:08

00:01:12:09

In this sequence, if it weren’t for the discontinuities, the sequence would be normal. This type of discontinuity is fairly easy to recover from, as it is usually a matter of correcting a small number of frames worth of data. It usually indicates an error in the timecode rather than a break in the source material.

An unbound discontinuity is marked by a single boundary:

00:01:12:00

00:01:12:01

00:01:12:02

00:01:12:03

11:00:22:09 ß discontinuity

11:00:22:10

11:00:22:11

11:00:22:12

11:00:22:13

11:00:22:14

This type of discontinuity usually marks a break in the source material, and may indicate an unrecoverable error. If the break is small (like one or two frames), it may be possible to slide the audio back to match the timecode; if the break is large, all bets are off.

Any unbound discontinuity makes things difficult when you are trying to synchronize a slave device to timecode. However, unbound discontinuities are sometimes used to advantage to mark scene or reel changes. For instance, a standard motion-picture film reel holds approximately ten minutes worth of film; when that film is transferred to videotape, you can fit five reels easily on a one-hour tape. If each reel is assigned a new starting timecode, then you will be assured that the start of each reel will be marked by a discontinuity.

Some video camera/recorder units have small discontinuities whenever the tape is stopped between shots. This can be used to effectively mark shot locations on the tape.

As long as the discontinuity is not within a segment that is trying to be synchronized, it should be fine.

Bad data

Bad data can find its way into any timecode tape. Usually, it comes in masquerading as a good frame, but it might travel beyond the boundaries, like trying to tell you that it’s located at 36:00:00:00, when we know that any valid timecode must be less than 24 hours.

Luckily, these bad guys rarely travel in packs, they usually come alone, and if detected they can be easily corrected or replaced.

If there is a section of timecode on a tape that is consistently bad:

30:01:12:00

30:01:12:01

30:01:12:02

30:01:12:03

30:01:12:04

30:01:12:05

30:01:12:06

30:01:12:07

30:01:12:08

30:01:12:09

then that section of timecode will likely be unusable, and may need to be re-striped. More importantly, find out what caused the bad timecode to begin with; some older timecode equipment does not properly wrap at 24:00:00:00, and may need to be reset.

Stability

Stability of timecode takes two different forms. First, there is the stability of the timecode signal—how much and how quickly the signal changes in speed and volume.

In general, any system that deals with timecode should be able to track relatively slow changes in speed. However, if the speed changes are faster than the timecode reader can keep up with, you will get timecode errors.

The second type of stability is a bit more specialized, and has to do with changes between idle and running timecode.

When a mechanical device such as a VTR or projector is generating the timecode, there is a certain amount of inertia that needs to be overcome when starting and stopping the machine. This translates to a certain amount of time between the initial start and the time when the device is running at its stable speed.

The timecode during this unstable region is usually quite unpredictable, and depends upon the implementation of the timecode writer. It may drop out completely, it may rapidly oscillate back and forth between idle and running timecode, it may produce erroneous data, or it may change the framerate of the timecode. At any rate, it is likely to be unusable for synchronization during the unstable range.

Framerate vs. Format

The distinction between framerate and format can get confusing, because they are both measured in frames per second. The difference is that the framerate measures the actual speed of the timecode in real time, while the format indicates how many frames make up one timecode second.

As an example, let’s say that timecode is recorded onto a tape with a format of 30FPS and a framerate of 30fps. If we take this tape and play it back at half speed, the format will still be 30FPS, but the framerate is now 15fps.

The format is always specified as an integer.

To add to the confusion, most of the time, the format and framerate are lumped together: for instance, 29.97 drop-frame timecode has a format of 30FPS drop-frame, and a framerate of 29.97fps.

Drop frame

Welcome to the Dark Side.

A long time ago, in a galaxy far, far away, television was invented. Originally, television was a mechanical system, with rotating wheels that actually produced an image on a screen. In order to get these wheels to run at the proper speed, they were synchronized to the line frequency of the electric current coming into the house (60 cycles in the US, 50 cycles in Europe). As television advanced into picture tubes, those numbers stayed in place, with the U.S. displaying 60 fields per second, and Europe displaying 50 (two fields make up one frame, so we are talking about 30 frames/sec and 25 frames/sec respectively).

Then came color.

The Europeans figured out how to send color information with their system without changing the way it worked, so they were able to stay with a framerate of 25fps. But, the NTSC (National Television Standards Committee) had to come up with a way of squeezing the extra color information into an already full space, and still keep it compatible with the existing black & white televisions in place.

So they compromised. In a feat of engineering that was really quite amazing, they figured out a way to fit half again as much information into the broadcast signal and keep it compatible with black & white. However, in order to do this, they had to slow the TV signal down slightly—by one-tenth of one percent (to 29.97 frames per second). Everyone was happy.

Until someone recognized that with the NTSC system, the timecode was off-kilter slightly—108 frames per hour to be exact.

So there was much gnashing of teeth and baying of hounds, and darkness fell across the land, until one day someone came up with the idea of periodically skipping a couple of frames in the frame count.

Here’s how it works: for every minute that doesn’t end in a zero, when the seconds and frames counters both hit 00, we skip ahead two frames. But if the minutes are evenly divisible by 10, then we don’t skip. For example:

01:08:59:26

01:08:59:27

01:08:59:28

01:08:59:29

01:09:00:02 ß Look! Two frames dropped!

01:09:00:03

01:09:00:04

01:09:00:05

But…

01:09:59:26

01:09:59:27

01:09:59:28

01:09:59:29

01:10:00:00 ß Look! Two frames not dropped!

01:10:00:01

01:10:00:02

01:10:00:03

So, for every hour, the timecode skips (120 –12) or 108 frames.

Keep in mind that the actual pictures in the video transmission aren’t dropped, just the timecode numbers. And yes, this provides a built in discontinuity almost every minute. But hey, at least the timecode numbers can be used to get the exact duration of a program, which made the networks happy. For some reason, they don’t like it when you pick them up a couple of seconds late. They seem to think it costs them money.

Timecode types

LTC

LTC stands for Longitudinal Time Code. Some folks call it Linear time code. You can laugh at them. You know the truth.

LTC was originally designed to fit on an audio-bandwidth cue channel of an old Ampex 2-inch quadruplex VTR. The bandwidth of the cue channel allowed for the recording of a 2400-bit/second digital signal, and at 30 frames per second, that comes out to 80 bits per frame.

Of the 80 bits, the first 64 are actually used to store data, while the remaining 16 are used for a unique synchronization word that identifies the end of the LTC data frame.

LTC uses a modulation technique known as bi-phase mark code, which encodes data onto a clock signal. A digital "one" is marked by a transition within the clock pulse, a "zero" by no transition.

The last 16 bits in the LTC frame are a bit sequence known as the synchronizing word. This serves a dual purpose, marking the end of the frame and providing a direction indication (LTC can be read both backwards and forwards).

VITC

VITC is shorthand for Vertical Interval Time Code. It is written on video lines above the active picture area of the video signal.

A video image is written with 525 lines (625 in PAL land), divided up into two fields. The two fields are interlaced, with the odd lines being written in the first field, and the even lines being written in the second field. The two fields together make up one frame. Each field consists of 262.5 lines (312.5 PAL). The half-line comes in during the interlace scheme.

 

 

With NTSC video, there are 29.97 frames per second (59.94 fields per second). All the odd lines are drawn during the first field, and the even lines are drawn in between the odd lines in the second field. All the lines together make up the frame.

The VITC data is written on the video signal in some of the upper lines, with a ‘1’ being represented by a higher voltage (brighter spot), and a ‘0’ by a lower voltage (black spot). This area is generally not visible on a video monitor, as it is usually just off the top of the picture tube. Those first 20 lines are sometimes called the vertical interval (though that’s not technically accurate), thus vertical interval timecode.

VITC is almost always written in line pairs, with one line separating the two. For instance most Panasonic VTRs default to lines 16 & 18; Sony machines default to 12 & 14. The same data is written on both lines for the sake of redundancy, so that error correction can take place when one of the lines gets a little munched by a hungry VCR.

The manner in which the VITC data is written on the video signal gives 90 bits across one horizontal line. Since only 64 bits are used for actual data, the remaining bits are used for internal sync bits and a CRC check. If each line is written with the same data, the first line can be read in and verified, and if the CRC fails, the second line is available for backup.

Using both

VITC and LTC each have their advantages and disadvantages. VITC requires a stable video signal which may only be available if the VTR is operating at 1x play or pause speeds, while LTC can be read over a wide range of higher and lower speeds, but cannot be read at still-frame.

Some devices allow for automatic switching between modes; for instance, most professional VTRs will automatically switch internally between LTC and VITC depending on which has the most stable timecode available.

There is a caveat in dealing with LTC and VITC from the same source simultaneously, and that has to do with the way they are written.

When VITC is written on the video signal, the data is available almost immediately at the start of the picture. The VITC data that is written as a part of the particular frame represents that frame, much as a timecode burned onto a film frame would represent that frame.

If it’s done properly, the LTC signal is recorded entirely within the timing constraints of one frame of the video signal and the LTC data represents the frame that it is matched with. In other words, if you were to look at one frame of video data, the VITC for that frame would match the LTC for that frame.

BUT the LTC data takes the entire length of the frame to be read in. So where the VITC data is completely available at the beginning of the frame, the LTC data isn’t available until the end of the frame, and usually doesn’t get updated until the next frame starts. That means that there is almost always a one-frame delay between the two.

Higher-end timecode readers usually have some way of adjusting the LTC by anticipating the frame coming in. This does tend to be difficult when the timecode transitions between running and idle.

MTC/MMC/MIDI clock

MIDI timecode is a codified way of sending timecode positional data over a MIDI port. It has two "flavors" depending on whether the project is running or stopped: the full-frame message is sent when the project is stopped but the location has changed (when the project cursor is dropped on a new location), and the quarter-frame messages are sent while the project is playing.

Quarter-frame messages (QFM) are sent four times per frame on average. Each QFM contains one numeral of the timecode, and since there are eight numerals in a timecode, it takes two complete frames to send the full data for one frame. That means that the timecode data is only updated fully every other frame.

For example:

Note that the data is written "backwards", frames units first. This ensures that the MTC data lines up with a particular frame edge.

Reading MTC properly is the responsibility of the timecode reader, and takes some intelligence to handle properly. This usually means that the address is checked every time it is fully decoded, and keeping an internal track of whether the timecode is advancing or not. For more details on MTC, please refer to The Complete MIDI 1.0 Detailed Specification, published by the MIDI Manufacturers Association.

A MIDI Clock is different, in that it is a one-byte message sent by the MIDI controller at a rate of 24 per quarter note. The big difference is that MIDI clock is tied to the tempo of the song, where MIDI timecode is a way of sending positional information about a project.

MMC refers to MIDI Machine Control, and is a protocol for controlling non-musical devices via MIDI. MMC is used extensively for devices such as transport controls for tape machines, and some automation systems. A close relative, MIDI Show Control (MSC) is designed for controlling live theatrical performance automation such as lighting controllers and special effects. See the MIDI specification for more details.

Timecode math

Basic functions

Basic timecode mathematical functions fall into two categories: functions on one variable and functions on two variables. At this point, let’s define a variable to be a timecode in HH:MM:SS:FF format—multiplying a timecode times a scalar (like ‘2’) would be considered a single-variable function.

The single-variable functions would be primarily conversions: timecode to frames or timecode to samples would be two examples you are likely to run into. As a matter of fact, the timecode ß à frames conversion is so critical that it should be considered a primitive. Multiplication of a single timecode by a scalar can be useful if you need to know the duration of multiple copies of a project, or how long the project would be if it was half as long as it is now.

The two-variable kind would be functions like addition and subtraction, where there are two separate timecodes that provide a third result. In this case it is critical that both timecode values have the same format, lest you get results that have no meaning. It would be like adding oranges and nectarines—tastes great, but less filling.

Advanced functions

You might be wondering if there is a reason that you might want to multiply or divide two timecode values. I have pondered this myself in the wee hours of the morning, somewhere in that delta-wave not-quite Dream State where you can walk through walls and fly and stuff. And after careful consideration, I have come to the conclusion that there are no advanced mathematical functions for timecode. If you happen to come across some unique meaning in taking the hyperbolic arctangent of a timecode, please let me know, otherwise don’t lose any sleep over it.

Addition & subtraction

Timecode addition and subtraction are relatively easy: if I tell you to add 01:00:00:00 to 02:00:00:00, either you would come up with 03:00:00:00, or you’d get your knuckles rapped. But there are some caveats:

    1. The formats of both of the timecode variables must be the same.
    2. The format of the timecode can affect the result (try adding 01:59:59:10 and 00:00:00:19 in 24 and 30FPS).
    3. If you are using drop-frame, you must always correct for it.
    4. There is no such thing as a negative timecode. If your timecode comes out negative, add 24 hours.

The easiest way to add two timecodes together is to start with frames and work towards hours. Add the frames together—if the result is greater than or equal to the format, roll those over into the seconds, and so on down the line. If your total hours exceed 24, subtract 24 from the hours.

Examples:

Timecode ß à frames conversion

The timecodeß à frames conversion is an important one to understand, as it is the basis of most timecode conversions. Basically, the timecode à frames conversion looks like this:

Frames = ((((HH * 60 * 60) + (MM * 60) + SS) * format) + FF);

If (drop-frame == TRUE)

{ // subtract drop-frame correction term

Frames -= 2*((HH*60 + MM) – ((HH*60 + MM)/10)));

}

And it’s inverse:

Total_frames = frames;

If (drop-frame == TRUE)

{ // add drop-frame correction term

frames += 2*((frames /(60*format))

– (frames /(600*format)));

}

FF = (Total_frames % format);

Total_frames -= FF;

SS = ((Total_frames / format) % 60);

Total_frames -= (SS * format);

MM = ((Total_frames / (format * 60)) % 60);

Total_frames -= (MM * format * 60);

HH = ((Total_frames / (format * 60 * 60)) % 24);

If (drop-frame == TRUE)

{ // make sure that the timecode does not violate DF format

If (((FF == 0)||(FF == 1)) && (SS == 0) && ((MM % 10)!= 0)

{

FF += 2;

}

}

The above examples are not optimized for speed, but are intended to illustrate a procedure. It is highly recommended that you do some timecode conversions by hand using these algorithms in order to understand them better. It really works!

Drop frame

The drop frame corrections included in the above section work based on the idea that two frames are skipped every minute except minutes that are evenly divisible by 10. Another way to think of it is that you skip over two frames every minute and "unskip" two frames every ten minutes.

The last part deals with correcting the output format. It is possible with these calculations to come up with a result that violates the format. When the seconds are 00 and the minutes are 00, 10, 20, 30, 40 or 50, then and only then can the frames be 00 or 01. If the minutes have any non-tens value, then the frames skip over the 0 and 1 and proceed directly to 2. You can’t have a drop-frame timecode of 00:01:00:00— the sequence goes from 00:00:59:29 directly to 00:01:00:02.

Samples ß à SMPTE conversion

The samplesß à SMPTE conversion is broken down into two parts: first, the SMPTEß à frames conversion (which you already know how to do!), and a framesß à samples conversion.

Oh, isn’t this nice. An easy, linear conversion.

Samples / frame = samples per second / frames per second.

Frames / sample = frames per second / samples per second.

So:

Samples = frames * samplerate / framerate

Frames = samples * framerate / samplerate

Caveat: When doing the conversion from samples à frames you will very often be left with a remainder. For example, at 44.1 kHz and 30 frames/sec, 25897456 samples = (25897456 * 30 / 44100) 17617.317 frames. That .317 frames (466 samples) can be a problem.

Since you can’t convert a partial frame to timecode, what happens to those 466 samples?

Depending on the purpose of the conversion, there are a number of options. If you’re displaying timecode, you could show subframes. If you are attempting to locate or send timecode, then subframes are not an option. You can force the sample selection to the nearest SMPTE equivalent (in this case, 25897456 – 466, or 258969990 samples); you could delay the start of timecode by 466 samples; or you could live with the error.

Make sure that you use the actual framerate of the timecode, not just the format, or your calculations will be off-kilter. Also use the actual samplerate for the same reason.

Division in the above segments is truncated, not rounded.

Video

Most systems that use timecode are designed to lock sound to a picture, and quite often that picture appears on a video monitor. Whether the image originated on film or videotape, the editing and post production sessions are almost always done on video, and the process of locking the audio to the video becomes very important.

Current video systems in the world today are almost entirely analog. That is changing as you read this, but for the next several years, the information contained here will be quite useful.

Video signal

The video signals from around the world have certain components in common, though their transmission systems differ. First, all the pictures that you see on the screen are drawn by a small fast-moving dot. The dot moves very rapidly across the screen from left to right creating a series of lines, and a number of those lines stack on top of each other from top to bottom to make a rectangle. That rectangle is what you see as the picture.

Well, okay, that’s not entirely true. That rectangle makes up the whole frame of information, but what you actually see as the picture is a smaller area inside of that frame known as the active picture area. Think of it like a painting: the active picture area would be the actual painting, while the entire rectangle would be both the painting and the picture frame around the painting. The frame is larger than the painting.

The video signal itself needs to contain information for the video monitor to tell it where the dot is supposed to be. It does this with horizontal and vertical sync pulses.

A horizontal sync pulse lets the video monitor know that it needs to bring the dot back to the left side of the screen to start a new line. A vertical sync pulse tells the monitor that the dot should move to the top of the screen to start a new field.

Oops, I said field, and not frame.

Frames vs. fields

If the video screen was actually completely redrawn only 30 times per second, the screen would have a nasty flicker to it. Trust me, it would be ugly. In order to keep the flicker to a minimum, broadcast video uses a trick called interlacing, which basically means that the picture is drawn in two halves, with all the odd lines being drawn first, and all the even lines being drawn in the second half. That way, the screen is redrawn twice as fast, mostly eliminating the flicker.

Because of this, the two fields are called the odd field and the even field respectively. A new frame starts with an odd field.

Just remember that two fields make up one frame.

NTSC vs. PAL

Originally, television was in black and white. Some of you may have even seen a black-and-white TV; they display a picture, but there’s no color. It’s very strange if you’re not used to it. The display rates were based on the frequency of the AC power available, so in the US they used 60 cycles (pictures) per second, while in Europe they used 50 cycles. Each cycle provided an interlock for one field, so there were 30 and 25 frames per second respectively.

Within about 20 years came the advent of color television. By this time, it was apparent that the AC line frequencies in various parts of the country weren’t quite the same, but because there were a lot of TV sets already out there, they had to keep close to the existing standards

The National Television Systems Committee was the body that came up with the color television system adopted by the U.S.—they made a slight change from 30 to 29.97 frames per second just to squeeze in enough information. The Europeans kept the 25 frames per second framerate, but added a color scheme where the color reference information alternates on each video line. PAL stands for Phase Alternating Line.

In general, NTSC means video at a 29.97 framerate, and PAL indicates 25 fps.

Telecine and 3:2 pulldown

Transferring film images to videotape has become quite common in the past couple years. The ease and convenience of electronic editing has caught the attention of filmmakers, not to mention the video movie rental market.

A telecine is the machine that does the actual transfer of the film images to videotape (or computer data). It is essentially a high-speed film scanner.

Since most film is shot at 24 fps and NTSC video is displayed at 29.97 fps, if you just try to videotape a movie, the results can be pretty bad. What is needed is a way of transferring the film to videotape so that each film frame gets a full image on the video monitor with no flicker and roll bars.

This is done with a method called 3:2 pulldown. With this method, four film frames are spread across five video frames (ten video fields); this process gets repeated six times each second, so we come out just fine (6*4 = 24, 6*5 = 30). The projector is slowed down slightly to compensate for the 29.97 video rate.

The details of the 3:2 pulldown are as follows:

The four film frames are labeled A, B, C, and D.

The A frame is transferred to the first two fields in the sequence, B to the next three, C to the next two, and D to the next three. Hey, I think it should be called 2:3 pulldown, but nobody asked me. Notice that frame 3 and frame 4 in the sequence are made up of composites of two film frames. These are called composite frames, and they can be difficult to work with because they don’t represent any one film frame.

Figure a: example of a composite frame. Note the last numbers in the KeyKode® in the lower right corner.

The fun part is that this process is reversible. If you have a film that has been transferred to videotape and you need to get back to the original film frame, you can simply combine the correct fields from the videotape sequence.

The part that isn’t so fun is trying to match back timecode on the videotape with the proper film locations. Film rarely has SMPTE timecode on it (although it is getting more common), but it quite often has a KeyKode® number. This is a unique number which identifies every frame (hey, doesn’t that sound familiar), and as part of the information, identified footage and frame numbers. It is usually in a format with two characters, followed by six numbers, a space, four numbers, a plus sign and two numbers. For instance: KU926782 nnnn+ff (where nnnn is the footage and ff is the frame offset). These numbers are useful to the negative cutter who actually performs the final edit on the film by physically cutting the negative (that’s why he’s called the negative cutter). He can look at these numbers on the negative and find the exact frame to cut.

 

Figure b: examples of timecode and KeyKode®. Note the distinct lack of correlation.

Images courtesy of YND Productions.

However, the video editor for the film is working with the video image, and is working with timecode based on the video framerate. There is an entire software industry based on this matchback solution, and the Flex file format is devoted to correlating timecode to film location code numbers. The usual process is for a window dub to be made at a lower resolution with the keycode information burned in. That way, the final video edit can be given to the negative cutter for a visual reference as well as the cut list.

Time base corrector (TBC)

In order for the video signal to be displayed properly on a monitor, the timing of the various lines has to be fairly accurate. Otherwise the image looks jittery, with vertical edges that don’t stay lined up.

This can be seen quite readily with most consumer VCRs—if you make a tape-to-tape copy, it usually looks pretty bad, and by the time you get a couple of generations down, it’s almost unwatchable. A timebase corrector (TBC) works as a sort of digital information store: it stores the information and then feeds it all out a bit later with precise timing.

A TBC is also very good when you are relying on a video sync signal from a VCR for your system clock. A VCR or VTR without a TBC can provide a clock that may be jittery; adding a TBC can make it rock-solid.

Most professional video decks have a TBC built in. All of the newer DV decks perform this function automatically.

The future

The future of television is here, and it looks great. It’s digital television, or DTV.

DTV promises to be the vision of the future. Nice clean signals, crisp colors, amazing sound; these are but a few of the promises made by DTV.

Of course, there are some problems. First of all, there are several formats available in DTV, everything from a 640 x 480 interlaced image (exactly like what you see now, only digitally transmitted) to a 1920 x 1080 progressive scan (the highest of high-definition TV), and framerates from 24 to 60 frames/sec. And not only is it likely that different TV stations in your area will carry different formats, it is likely that a single TV station will change formats for different programs.

Consider one popular format: 1280 x 720 progressive scan 60 fps (shorthand is 720p). One big problem that I see with this is that the SMPTE timecode specification doesn’t currently have a 60FPS entry, so timecode equipment for this format will be working poorly if at all.

Then consider that with DTV, there is no "video" signal to use as a clock reference, there is just a data transmission stream. Different types of data compression make that data stream unreliable to use as any kind of clock source, so how do you create a stable reference that is locked to the picture?

The jury is still out on this one.

Clock Generation

The generation of a stable system clock from an external reference is one of the great mysteries of all time. There is even historical evidence to suggest that the ancient Egyptians used a word clock reference to help synchronize their flying saucers when they were out building the pyramids. Unfortunately, the knowledge of these ancient kings has been lost to us through the ravages of time, and we, as mere men, must rely on our own knowledge and experience in putting these systems together.

By the way, I was just kidding about the Egyptians.

Genlock

Genlock is a shortening of the term generator lock. Historically, it has been used in the video industry to describe a device that locks to a video signal and generates a clock based on that signal. Nowadays, it is used more generically to describe a device that takes any external signal and generates a clock based on that reference.

The mechanism for doing this is sometimes thought of as alien technology or magic. In reality, it is based around a device known as a phase-locked-loop (PLL). A PLL is a device that basically takes a reference input and creates a clock output at some multiple of that input frequency. The actual details of how this works are not something I want to go into here—let’s just say that a good PLL design doesn’t just happen.

Video sync

Feed a video signal in, get a sample clock out. A genlock can be used to create the sample clock based on the video sync reference. When this is done properly, you can be assured that the number of samples per frame stays the same no matter how long your program is.

The caveat is that you need to do the initial recording of the audio locked to the video signal as well. Otherwise, the number of samples per frame will be different and your audio will drift in relation to your video.

Word clock

Some systems want word clock instead of video sync coming in. Word clock is simply a TTL level (0-5volt) signal that is a square wave at the samplerate; for instance, a word clock at 44.1 kHz would be a 44.1 kHz square wave.

Generally, if you are feeding word clock to an audio device, you are responsible to make sure that the word clock is clean and free of jitter. That’s not as easy as it sounds.