How Content Recognition Software Works

You're in line for a movie when a great song plays over the sound system. Although you like the song, you have no idea what it's called or who sings it. You pull out your cell phone, dial a number and hold the phone toward the speakers. In a few seconds, you receive a text message with the name of the song, the artist's name and even a link you can follow to buy a copy.

The service you called uses content-recognition software to identify the song. These programs are helpful if you want to learn about a song playing nearby. They can also help to curb copyright infringement, which is a huge issue for independent artists and corporations alike.

Peer-to-peer networks, file-sharing services and heavy hitters like YouTube provide people with lots of opportunity to access content without paying for it. Until recently, companies had to rely on a human being to detect copyright violations and then take action. Sites like YouTube normally count on users to report inappropriate material, but some don't consider clips that violate copyright law to be inappropriate. At the moment, most companies have to rely on employees to uncover proprietary video footage and log a report. It's a tedious, inefficient process that may soon become unnecessary thanks to content-recognition software. In this article, we'll explore exactly how the process works and how this software can help both people and businesses.

Developing the Software

Several software companies plan to offer programs that can analyze audio and video clips, compare them to a database of content and determine whether they are from sources that are protected by copyright. Such software provides an efficient and relatively inexpensive alternative to combing through the vast amount of content on the Internet. It's also more reliable than asking your friend if he knows what song is on the radio.

You might think creating a program that recognizes video or audio content shouldn't be complicated, but it's proving to be a real challenge. For one thing, there are dozens of ways to encode a sound or video file, so creating a program that looks for matching code isn't very useful. After all, a WAV file and an MP3 file of the same song won't look the same from a programming-language perspective. In addition, songs and videos can be recorded at different bit rates, which means that two MP3 files of the same song may not match. Software that identifies songs via cell phone has to be able to identify the track despite the quality of the recording or the interfering background noise.

There are other challenges as well. Some video pirates bring recording devices into films and capture movies on their own cameras. Some projectionists have been known to set up a digital video camera in the projection room, recording a first-run movie on its premiere night. Other people who bypass legal distribution might crop a video or otherwise alter it. Any program designed to find recordings like these can't rely only on programming language or identical files.

In the next section, we'll look at the process for identifying audio files and how it compensates for these challenges.

Content-recognition Software - Audio

The first step in identifying content is assembling a database of material that other files can be compared against. For a record company, this would include the company's entire music catalog. The content-recognition software analyzes each song and creates a digital tag identifying that song. Tags are called fingerprints or signatures.

The software analyzes the actual sound of the song rather than its encoding language. Some programs analyze the tempo and beat of a song. Others measure the song's amplitude and frequency. Fingerprinting software usually takes several samples that last just a few seconds each from a single recording. A few companies offer software that analyzes entire audio clips in order to get as complete a fingerprint as possible. At least one current product analyzes a song for landmarks -- distinctive acoustic moments in the clip -- then analyzes the sound around the landmarks. Ideally, the landmarks will be readily identifiable when scanning other music.

The programs use algorithms to analyze sound. Most are a type of Fast Fourier Transform (FFT) algorithm. This mathematical technique can take a complex series of signals and track any changes within it. These changes -- whether they're tempo changes, beats per minute or the amplitude and frequency of the sound in the clip -- are mapped out and mathematically converted into a digital fingerprint. Fingerprints are usually in numeric form.

Once a record company establishes its database, it's ready to help identify songs to potential customers or to track down cases of copyright infringement. In either case, the software analyzes the unknown audio clips the same way it did for the songs in the company's catalog. It creates a hash, or short code, that's dependent upon the content of the audio file. The software assigns digital fingerprints to the clips, which it then compares to the fingerprints in the database. Next, we'll take a look at exactly how it determines whether the songs are the same.

Did You Hear That? Did You Hear That?

In order to ensure that content-recognition software identifies songs no matter what format they're in, programmers concentrate on only analyzing sounds that are within the human range of hearing, just like MP3 files. One of the reasons MP3 files are relatively small is that only the sounds within human hearing are encoded -- everything else is ignored. Content-recognition software doesn't rely on the full range of sounds that might be present in the original recording because it might then overlook MP3 versions of the audio track.

Identifying the Sound

Often, sound clips being analyzed are not clean copies of a song. The song could be truncated, or it might be similar to a different song. This is where algorithms come in handy. The algorithm's job is to compare the fingerprints and determine if the incoming sound clip matches a song (or portion of a song) in the database within a certain range of probability.

The identification process is similar to the way forensics experts once matched a suspect's fingerprints to those found at a crime scene. Before sophisticated computer software and advanced methods for examining fingerprints became available, experts would look for points of similarity between different fingerprints. In most cases, the specialist would need to demonstrate at least 16 points of similarity for a print to be considered a match.

There is no standard probability range for content-recognition software. Most programs allow customers to adjust the level of similarity required to declare a match. For example, you could adjust the program so that it only brings back match results if the algorithm determines that there is a 95 percent or better chance it's a match. If the incoming clip doesn't fall in that range, it sends an error message to the user.

When the program determines a match, a partnered application can take over. The application might send information to someone who wants to know the title of a song, or it might flag a song on a Web site and e-mail the corresponding record company's legal department. Some record companies have used such software to scan file-sharing sites or to track content on Web sites that stream audio. The entire process of analysis and matching takes only a few seconds.

In the next section, we'll look at how video content presents different challenges than audio files.

Content-recognition Software - Video

Recently, Time Warner and Disney partnered with YouTube to test video content-recognition software developed by Google. The software is similar to existing audio content-recognition programs in that it analyzes content to create a fingerprint. Then it compares that information to fingerprints in a database to determine if there is a match. However, video presents unique challenges that are not easily overcome.

For example, most videos on YouTube are limited to 10 minutes or 100 megabytes. Since a clip could include any 10-minute segment from a film or television show under copyright, the content-recognition software must analyze the entire original work in such a way that it can make meaningful matches from a relatively small sample clip. Google isn't saying much about how the software manages this, but it's likely that the program analyzes overlapping chunks of the original content to create multiple fingerprints.

Video content-recognition software must be able to identify footage even if the person who uploaded the content edited it first. For example, people can fool software that matches color resolution by tweaking the color saturation in a video. Cropping a video or uploading footage of a film captured on a video camera can also fool recognition software. Some pirated films are captured on cameras set up at an angle to the screen, further complicating the identification process.

One approach developers are trying is to use programs to base fingerprints off an analysis of the changes in motion characteristics in a video. Even this could prove ineffective if someone uploads a pirated video captured on a hand-held camera. In some cases, the probability range for matches may need to be fairly wide to flag all possible cases of piracy. Film studios may discover that they will still need a real person to review video clips to confirm a case of infringement. Still, the initial identification of potential video piracy will be much more efficient.

Video-identification software is still in the testing stage, though some companies are already holding effective demos of their programs. Challenges in identification won't end once the software is perfected, though. The sheer volume of video content presents a big problem. Movie and television studios will need to constantly update their databases with fingerprints for all the new content that comes out every day. While the process for uncovering piracy may become more efficient, it will still require constant upkeep and maintenance.

To learn more about content-recognition software, check out the links on the next page.

How Content-recognition Software Works

Developing the Software

Content-recognition Software - Audio

Identifying the Sound

Content-recognition Software - Video

Frequently Answered Questions

What is audio content recognition?

Lots More Information

Related HowStuffWorks Articles

More Great Links