You can see some of the most common extensions on these URLs:
- http://computer.howstuffworks.com/web-page.htm - This article page (and nearly all other pages) at HowStuffWorks ends in htm.
- http://www.adobe.com/products/acrobat/readermain.html - The home page for Adobe Reader ends in html.
- http://www.cbsnews.com/sections/home/main100.shtml - The home page for the CBS news in shtml.
- http://www.microsoft.com/catalog/default.asp - Many pages on Microsoft's site end in asp.
- http://www.altavista.pl/ - The home page for the AltaVista search engine ends in pl.
- http://www.howstuffworks.com/cgi-bin/suggest.cgi - The code that processes suggestions at HowStuffWorks ends in cgi.
- http://www.howstuffworks.com/search.php - The HowStuffWorks search results page ends in php.
When the Web started, it ran almost exclusively on UNIX machines and all pages were static. Html was the standard file extension. When people started using PCs running DOS or Windows as Web servers, however, the four letters in "html" were problematic. PCs followed an 8.3 naming convention that allowed only three letters in the extension. So the world made room for two standard extensions: html and htm. It used to be that you could tell whether a Web site was running on UNIX or Windows by looking at the file extension, but now there is no distinction. HowStuffWorks runs off a UNIX server but uses "htm" as its extension -- it's the webmaster's choice.
Pages tagged with shtml reveal that "Server Side Includes" are being used on the server. Htm and Html pages are static. The file is lifted off the server's disk and sent verbatim to the client. With SSI, a page can contain tags indicating that another file should be inserted in place of the tag in the existing page. So a page is lifted off the server's disk and the server makes all the substitutions indicated. Then it sends the final page to the client. This approach makes it very easy to change things like headers and footers on pages across an entire site.
Active Server Pages (asp) is a Microsoft technology that allows even more flexibility. A Web page can contain Visual Basic code that the server executes when it lifts a page off the disk. This code can do just about anything -- read databases, run other programs, custom format pages based on the user's ID, etc. You have a great deal of flexibility. On the other hand, your Web pages now contain code that may have bugs in it, so it is possible for a page to "crash." With freedom comes responsibility...
(Note that is now becoming common to see jsp and php extensions as well. Jsp is one of the latest additions to the Java Enterprise suite of APIs. "Jsp" stands for "Java Server Pages" and is effectively Java's response to ASP. The code embedded in a page is Java rather than Visual Basic. "Php" used to stand for "Personal Home Page," but now it's really just "PHP," which is a scripting language that's mostly used with Linux.)
The pl extension stands for PERL, a scripting language. The page contains nothing but PERL script, and the script builds the page on the fly. The script can also do anything as in asp pages.
The cgi extension also means that a page contains code executed by the server, but the type of code can be just about anything. On HowStuffWorks, C++ code is compiled to create "cgi" files (See How CGI Scripting Works.)