The first step to creating a CAPTCHA is to look at the different ways humans and machines process information. Machines follow sets of instructions. If something falls outside the realm of those instructions, the machine isn't able to compensate. A CAPTCHA designer has to take this into account when creating a test. For example, it's easy to build a program that looks at metadata -- the information on the Web that's invisible to humans but machines can read. If you create a visual CAPTCHA and the image's metadata includes the solution, your CAPTCHA will be broken in no time.
Similarly, it's unwise to build a CAPTCHA that doesn't distort letters and numbers in some way. An undistorted series of characters isn't very secure. Many computer programs can scan an image and recognize simple shapes like letters and numbers.
One way to create a CAPTCHA is to pre-determine the images and solutions it will use. This approach requires a database that includes all the CAPTCHA solutions, which can compromise the reliability of the test. According to Microsoft Research experts Kumar Chellapilla and Patrice Simard, humans should have an 80 percent success rate at solving any particular CAPTCHA, but machines should only have a 0.01 success rate [source: Chellapilla and Simard]. If a spammer managed to find a list of all CAPTCHA solutions, he or she could create an application that bombards the CAPTCHA with every possible answer in a brute force attack. The database would need more than 10,000 possible CAPTCHAs to meet the qualifications of a good CAPTCHA.
Other CAPTCHA applications create random strings of letters and numbers. You aren't likely to ever get the same series twice. Using randomization eliminates the possibility of a brute force attack -- the odds of a bot entering the correct series of random letters are very low. The longer the string of characters, the less likely a bot will get lucky.
CAPTCHAs take different approaches to distorting words. Some stretch and bend letters in weird ways, as if you're looking at the word through melted glass. Others put the word behind a crosshatched pattern of bars to break up the shape of the letters. A few use different colors or a field of dots to achieve the same effect. In the end, the goal is the same: to make it really hard for a computer to figure out what's in the CAPTCHA.
Designers can also create puzzles or problems that are easy for humans to solve. Some CAPTCHAs rely on pattern recognition and extrapolation. For example, a CAPTCHA might include a series of shapes and ask the user which shape among several choices would logically come next. The problem with this approach is that not all humans are good with these kinds of problems and the success rate for a human user can drop below 80 percent.
Next, we'll take a look at how computers can break CAPTCHAs.