The first step in identifying content is assembling a database of material that other files can be compared against. For a record company, this would include the company's entire music catalog. The content-recognition software analyzes each song and creates a digital tag identifying that song. Tags are called fingerprints or signatures.
The software analyzes the actual sound of the song rather than its encoding language. Some programs analyze the tempo and beat of a song. Others measure the song's amplitude and frequency. Fingerprinting software usually takes several samples that last just a few seconds each from a single recording. A few companies offer software that analyzes entire audio clips in order to get as complete a fingerprint as possible. At least one current product analyzes a song for landmarks -- distinctive acoustic moments in the clip -- then analyzes the sound around the landmarks. Ideally, the landmarks will be readily identifiable when scanning other music.
The programs use algorithms to analyze sound. Most are a type of Fast Fourier Transform (FFT) algorithm. This mathematical technique can take a complex series of signals and track any changes within it. These changes -- whether they're tempo changes, beats per minute or the amplitude and frequency of the sound in the clip -- are mapped out and mathematically converted into a digital fingerprint. Fingerprints are usually in numeric form.
Once a record company establishes its database, it's ready to help identify songs to potential customers or to track down cases of copyright infringement. In either case, the software analyzes the unknown audio clips the same way it did for the songs in the company's catalog. It creates a hash, or short code, that's dependent upon the content of the audio file. The software assigns digital fingerprints to the clips, which it then compares to the fingerprints in the database. Next, we'll take a look at exactly how it determines whether the songs are the same.