Number 211 - December 2000
Self-conscious Content
by Glenn Fleishman, from WEB WATCHER Column, Adobe Magazine September - October 2000
    Form and content, we know, are two separate concepts, but that's not necessarily true when you're dealing with the Web. Form, of course, defines appearance and structure; content is the actual meaning inside that structure. Form can help make content more understandable by providing consistency, but information without differentiation can be a rather meaningless laundry list.

    HTML, the language of Web pages, has a limitation dating back to its introduction: it compresses form and content into a single unit. The tags that you use to define the appearance of a Web page identify the content to some extent, but only loosely. A heading 1 (< H1>) tag doesn't convey much meaning about the text other than "this item is important and it's a headline, so make it big."

    But a Web page may contain categories of information with distinct meanings. On a product page, you might have a stock number, details about availability, price, and an item description. In a news article, you could have an abstract, a dateline, and keywords, as well as the article itself. But the browser, reading the HTML, doesn't know any of that--it simply knows that it must format one section like this, and another section like that.

    Enter XML: Extensible Markup Language. XML is a generic way to define a markup language, such as HTML, so that it can describe the content. (HTML, in fact, has been rewritten to be xML-compatible in a new specification called XHTML.) The ability to define content is really useful in the exchange of information between businesses. For instance, car dealers and automobile manufacturers could band together and write an XML specification (called a DTD, or Document Type Definition) that covered the details of cars: manufacturers, models, features, gas mileage, and so on. When dealers wanted to order cars, they would know that all of the items in their databases corresponded exactly to those in the manufacturers' databases. Previous efforts to create this kind of standardization have succeeded in part, but
XML has pretty widespread support because it's pure structure--there's no legacy attached to it, and no specific purpose for which it's intended.

    But you're saying, "Ho-hum. XML makes databases easier to manage and streamlines industry. Thrilling." That does sound pretty dull. The exciting part is that once you have XML in place as a standard for identifying content, suddenly all programs--browsers, for example, or layout programs, or database applications--have the potential to become smart about the content they're handling.

    What do I mean by smart? XML-enabled applications could act on specific information embedded in XML format on a Web page. Let's say you're interested in bidding on a Tickle Me Elmo doll in any online auction that happens to have one. You could tell your XML-enabled auction-monitor software to watch for the dolls at five specific auction sites, and it could continually search, retrieve results, and give you a stock-ticker-like display of products and availability.

    Or consider your favorite money-management program. If it were xML-enabled, and if you had accounts at different banks that offered xML-enabled account viewing, the program could retrieve data from all of them via the Web in a seamless fashion. None of this "export data in format X, import into the program, answer these questions, delete the stuff you don't need." Instead, the program could merge the data from its own local accounts and transactions with the remote data and display a merged set.

    XML is starting to appear practically everywhere in the electronic world, making it easier for Web users to work in many different programs, exchange data, and get things done more efficiently. It's really the ultimate case of form following function: content becomes paramount--which is what it's all about in the end.

    Glenn Fleishman is coauthor with Jeff Carlson, of Real World Adobe GoLive 5 (Adobe Press). You can reach him at glenn@glennf.com.
  Number 211 - December 2000