How Semantic Web Works


The World Wide Web is an interesting paradox -- it's made with computers but for people. The sites you visit every day use natural language, images and page layout to present information in a way that's easy for you to understand. Even though they are central to creating and maintaining the Web, the computers themselves really can't make sense of all this information. They can't read, see relationships or make decisions like you can.

The Semantic Web proposes to help computers "read" and use the Web. The big idea is pretty simple -- metadata added to Web pages can make the existing World Wide Web machine readable. This won't bestow artificial intelligence or make computers self-aware, but it will give machines tools to find, exchange and, to a limited extent, interpret information. It's an extension of, not a replacement for, the World Wide Web.

That probably sounds a little abstract, and it is. While some sites are already using Semantic Web concepts, a lot of the necessary tools are still in development. In this article, we'll bring the concepts and tools behind the Semantic Web down to earth by applying them to a galaxy far, far away.

Why Semantic Web?

Suppose you want to buy a "Star Wars Trilogy" boxed set online, and you have some basic criteria for your purchase. First, you want widescreen, not full-screen, DVDs, and you want the set that has the extra disc of bonus materials. Second, you want the lowest available price, but you'd prefer to buy a new set, not a used one. Finally, you don't want to pay too much for shipping and handling, but you also don't want to wait too long for delivery.

At this point in the evolution of the Web, your best bet would be to look at different retailers' web pages, comparing prices and shipping times and rates. You could also look for a site that will compare prices and shipping options from several retailers all at once. Either way, you have to do most of the virtual legwork, then make your buying decision and place your order yourself.

With the Semantic Web, you'd have another option. You could enter your preferences into a computerized agent, which would search the Web, find the best option for you, and place your order. The agent could then open personal finance software­ on your computer and record the amount you spent, and it could mark the date your DVDs should arrive on your calendar. Your agent would also learn your habits and preferences, so if you had a bad experience buying from one particular site it would know not to use that site again.

The agent would do this not by looking at pictures and reading descriptions like a person does, but by searching through metadata that clearly identify and define what the agent needs to know. Metadata are simply machine-readable data that describe other data. In the Semantic Web, metadata are invisible as people read the page, but they're clearly visible to computers. Metadata can also allow more complex, focused Web searches with more accurate results. To paraphrase Tim Berners-Lee, inventor of the World Wide Web, these tools will let the Web -- currently similar to a giant book -- become a giant database.

We'll look at the tools that can make documents machine readable next.

Marking Up: XML and RDF

An RDF triple has a subject (Anakin Skywalker), an object (Luke Skywalker) and a property that unites the two.
An RDF triple has a subject (Anakin Skywalker), an object (Luke Skywalker) and a property that unites the two.

Let's say you want to make this sentence readable to a computer:

Anakin Skywalker is Luke Skywalker's father.

It's easy for you to figure out what this sentence means -- Anakin and Luke Skywalker are both people, and there is a relationship between them. You know that a father is a type of parent, and that the sentence also means that Luke is Anakin's son. But a computer can't figure any of that out without help. To allow a computer to understand what this sentence means, you'd need to add machine-readable information that describes who Anakin and Luke are and what their relationship is. This starts with two tools -- eXtensible Markup Language (XML) and Resource Description Framework (RDF).

XML is a markup language like hypertext markup language (HTML), which you're probably somewhat familiar with from surfing the Web. HTML governs the appearance of the information you look at on the Web. XML complements (but does not replace) HTML by adding tags that describe data. These tags are invisible to the people who read the document but visible to computers. Tags are already in use on the Web, and existing bots, like the bots that collect data for search engines, can read them.

RDF does exactly what its name indicates -- using XML tags, it provides a framework to describe resources. In RDF terms, pretty much everything in the world is a resource. This framework pairs the resource (any noun, like Anakin Skywalker or the "Star Wars" trilogy) with a specific item or location on the Web so the computer knows exactly what the resource is. Clearly identifying resources keeps the computer from doing things like confusing Anakin Skywalker with Sebastian Shaw or Hayden Christiansen, or the original trilogy with the One-Man "Star Wars" Trilogy.

To do this, RDF uses triples written as XML tags to express this information as a graph. These triples consist of a subject, property and object, which are like the subject, verb and direct object of a sentence. (Some sources call these the subject, predicate and object.) RDF already exists on the Web -- for example, it's part of RSS feed creation.

So far in this example, the computer knows that there are two objects in this sentence and that there is a relationship between them. But it doesn't know what the objects are or how they relate to one another. We'll look at the tool for adding this layer of meaning next.

Knowing What's What: URIs

A URI gives a computer a specific point of reference for each item in the triple -- there's no need for interpretation or potential for misunderstanding.
A URI gives a computer a specific point of reference for each item in the triple -- there's no need for interpretation or potential for misunderstanding.

Even with the framework that XML and RDF provide, a computer still needs a very direct, specific way of understanding who or what these resources are. To do this, RDF uses uniform resource identifiers (URIs) to direct the computer to a document or object that represents the resource. You're already familiar with the most common form of URI -- the uniform resource locator (URL), which begins with http://. A URI can point to anything on the Web and may also point to objects that are not part of the web, like appliances in computerized homes. Mailto, ftp and telnet addresses are some other examples of URIs.

For our example, we'll use the characters' pages at the official Star Wars site as their URIs.

Now the computer knows what the subject and object are -- Anakin Skywalker is the entity represented by the first URI, and Luke Skywalker is the entity represented by the second. But you'll notice that the middle URI in our triple -- the one for the property -- doesn't point to the Star Wars site. Instead, it points to a make-believe document on the HowStuffWorks server. If that page really existed, it would be our XML namespace.

Unlike HTML, which uses standard tags like <b> for bold and <u> for underline, XML doesn't have standard tags. This is useful -- it lets developers create unique tags for specific purposes. But it means that a browser doesn't automatically know what the tags mean. An XML namespace is basically a document that tells applications the meaning of all the tags in another document. The creator of an XML document declares the namespace at the beginning of the document with a line of code. In our example, our namespace declaration would look like this:

<rdf:RDF xmlns:hsw=https://www.howstuffworks.com/example/RDF/relationship;

That line of code says to the computer, "Any tags you see that begin with 'hsw' use the vocabulary found in this document. You can look up any tag beginning with 'hsw' here." That way, people can create the XML tags they need for a document without conflicting with other XML documents on the Web.

XML and RDF are the "official language" of the Semantic Web, but by themselves they're not enough to make the entire Web accessible to a computer. We'll look at some of the other layers next.

Languages and Vocabularies:RDFS, OWL and SKOS

An example of a very small number of the resources and connections that might be found in a Star Wars ontology. You can figure these out on your own from watching the movies and surfing the Web, but a computer must have a clear outline to make sense of it.
An example of a very small number of the resources and connections that might be found in a Star Wars ontology. You can figure these out on your own from watching the movies and surfing the Web, but a computer must have a clear outline to make sense of it.

Another obstacle for the Semantic Web is that computers don't have the kind of vocabulary that people do. You've used language your whole life, so it's probably easy for you to see connections between different words and concepts and to infer meanings based on contexts. Unfortunately, someone can't just give a computer a dictionary, an almanac and a set of encyclopedias and let the computer learn all this on its own. In order to understand what words mean and what the relationships between words are, the computer has to have documents that describe all the words and logic to make the necessary connections.

In the Semantic Web, this comes from schemata and ontologies. These are two related tools for helping a computer understand human vocabulary. An ontology is simply a vocabulary that describes objects and how they relate to one another. A schema is a method for organizing information. As with RDF tags, access to schemata and ontologies are included in documents as metadata, and a document's creator must declare which ontologies are referenced at the beginning of the document.

Schema and ontology tools used on the Semantic Web include:

  • RDF Vocabulary Description Language schema (RDFS) - RDFS adds classes, subclasses and properties to resources, creating a basic language framework. For example, the resource Dagobah is a subclass of the class planet. A property of Dagobah could be swampy.
  • Simple Knowledge Organization System (SKOS) - SKOS classifies resources in terms of broader or narrower, allows designation of preferred and alternate labels and can let people quickly port thesauri and glossaries to the Web. For example, in a Star Wars glossary, a narrower term for Sith Lord could be Darth Sidious and a broader term could be villain. Similarly, alternate labels for Han Solo might be nerf herder and laser brain.
  • Web Ontology Language (OWL) - OWL, the most complex layer, formalizes ontologies, describes relationships between classes and uses logic to make deductions. It can also construct new classes based on existing information. OWL is available in three levels of complexity -- Lite, Description Language (DL) and Full.

The trouble with ontologies is that they are very difficult to create, implement and maintain. Depending on their scope, they can be enormous, defining a wide range of concepts and relationships. Some developers prefer to focus more on logic and rules than on ontologies because of these difficulties. Disagreements regarding the roles these rules should play may be one potential pitfall for the Semantic Web.

Next, we'll tie it all together by looking at our original example -- those "Star Wars Trilogy" DVDs.

Tying it All Together

In our original example, we talked about buying "Star Wars" DVDs online. Here's how the Semantic Web could make the whole process easier:

  • Each site would have text and pictures (for people to read) and metadata (for computers to read) describing the DVDs available for purchase on their site.
  • The metadata, using RDF triples and XML tags, would make all the attributes of the DVDs (like condition and price) machine-readable.
  • When necessary, businesses would use ontologies to give the computer the vocabulary needed to describe all of these objects and their attributes. The shopping sites could all use the same ontologies, so all of the metadata would be in a common language.
  • Each site selling the DVDs would also use appropriate security and encryption measures to protect customers' information.
  • Computerized applications or agents would read all the metadata found at different sites. The applications could also compare information, verifying that the sources were accurate and trustworthy.

Of course, the Web is enormous, and adding all this metadata to existing pages is a huge undertaking. We'll look at this and some of the other potential hurdles for the Semantic Web next.

W3C and the Future of the Semantic Web

Like the World Wide Web, the Semantic Web is decentralized -- no one organization or agency has control over all of its rules and content. However, some people and organizations have taken leadership roles in the development of Semantic Web guidelines and protocols. These include the World Wide Web Consortium (W3C), its director Tim Berners-Lee and its member organizations. The W3C is not a research organization, so universities, other organizations and the public also play an active role in Semantic Web development.

Some areas of the World Wide Web have already incorporated Semantic Web components. These include RSS feeds, which use RDF, and the Friend-of-a-Friend (FOAF) project, which proposes to create machine-readable personal web pages.

But much of the Semantic Web's function and practicality are still in development, and there are some pretty big obstacles to overcome. Decentralization gives developers the freedom to create precisely the tags and ontologies that they need. But, it also means that different developers might use different tags to describe the same thing, which could make machine comparisons difficult. Critics also question the "identity problem" -- does a URI represent a Web page, or does it represent the concept or object the page describes. For example, is "http://www.starwars.com" meant to represent the "Star Wars" films, or just the Web page?

Some developers disagree on whether the Semantic Web should rely more heavily on rules or on ontologies. Critics also say that the project is enormously impractical. First, people don't actually think in terms of the graphs that RDF uses. Second, it seems unlikely that businesses and existing sites will actually devote the time and resources it would take to add all the necessary metadata. In the future, off-the-shelf software might include options for adding metadata when creating new documents, but that tool still might not make the project feasible on a larger scale.

For lots more information on the World Wide Web and the Semantic Web, check out the links on the next page.

Related Articles

More Great Links

Sources

  • Adams, Katherine. "The semantic Web: Differentiating between taxonomies and ontologies" Online; Jul/Aug 2002;
  • Beckett, Dave. "Dave Beckett's Resource Description Framework (RDF) Resource Guide." http://planetrdf.com/guide/
  • Clark, Kendall. "SPARQL: Web 2.0 Meets the Semantic Web" O'Reilly. http://www.oreillynet.com/pub/wlg/7823
  • Greenberg, Jane et al. "Metadata: A Fundamental Component of the Semantic Web." Bulletin of the American Society for Information Science and Technology, Apr/May 2003.
  • Greenberg, Jane. "Metadata Generation: Processes, People and Tools." Bulletin of the American Society for Information Science and Technology, Dec 2002/Jan 2003.
  • Greenberg, Jane. "The Semantic Web: More than a Vision." Bulletin of the American Society for Information Science and Technology, Apr/May 2003.
  • Gruber, Tom. "What Is an Ontology?" http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
  • Hardin, Steve. "Tim Berners-Lee: The Semantic Web - Web of Machine-Processable Data." Bulletin of the American Society for Information Science and Technology, Feb/Mar 2005
  • Hawke, Sandro. "How the Semantic Web Works." http://www.w3.org/2002/03/semweb/
  • Hendler, James. "Science and the Semantic Web.' Science. Jan. 24, 2003.
  • Horrocks, Ian et. Al. " Semantic Web Architecture: Stack or Two Towers?" http://www.cs.man.ac.uk/~horrocks/Publications/download/2005/HPPH05.pdf
  • Jacob, Elin K. "Ontologies and the Semantic Web." Bulletin of the American Society for Information Science and Technology; Apr/May 2003.
  • Miller, Eric and Ralph Swick. An overview of W3C Semantic Web activity." Bulletin of the American Society for Information Science and Technology, Apr/May 2003.
  • Parsia, Bijan. "Semantic Web Services." Bulletin of the American Society for Information Science and Technology, Apr/May 2003.
  • Shirky, Clay. "The Semantic Web, Syllogism and Worldview." http://www.shirky.com/writings/semantic_syllogism.html
  • Swartz, Aaron. "The Semantic Web in Breadth." http://logicerror.com/semanticWeb-long
  • Van Eman, Jay. "OWL Exports from a Full Thesaurus." Bulletin of the American Society for Information Science and Technology, Oct/Nov 2005.
  • W3C: How We Identify Things on the Semantic Web. http://www.w3.org/2001/03/identification-problem/
  • W3C: OWL Web Ontology Language Guide http://www.w3.org/TR/owl-guide/
  • W3C: RDF Primer http://www.w3.org/TR/rdf-primer/
  • W3C: Semantic Web Activity Statement. http://www.w3.org/2001/sw/Activity
  • W3C: SKOS Core Guide http://www.w3.org/TR/swbp-skos-core-guide/
  • W3C: Tutorial on Semantic Web Technologies http://www.w3.org/Consortium/Offices/Presentations/RDFTutorial/