Semantic Rubbish

Article 12 of 26
M-iD, January 2005
View a PDF of the original article ~ 132K

The so-called 'semantic web' is a flawed initiative that will never catch on, despite certain techies' enthusiasm, believes Rob Buckley

Page 1 | Page 2 | All 2 Pages

A celebrity endorsement can often make a bad product sell well. Sir Tim Berners-Lee, 'inventor of the web', may be an unlikely choice to sell anything, but both he and others seem convinced that his promotion of a standard will somehow make it worth adopting. Yet, the 'semantic web' has all the hallmarks of the Sinclair C5 and spray-on hair.

The semantic web is a way for web authors to describe a page's content so that search engines know how to classify it and how it relates to other pages. It sounds simple enough, but there are a number of problems.

First, it uses a new mark-up language and classification system made up of the resource description framework (RDF) and the web ontology language (OWL). RDF and OWL are languages only a machine could love - they can be immensely complex to generate correctly and for humans to interpret. So anyone wanting to generate RDF and OWL mark-up is almost certainly going to have to have a tool to generate that code and plot the interrelationships between documents.

Simple tools
Much of the web is created by people using simple design tools that have no site management capabilities. Even if the tools had the necessary classification capabilities, few of these authors would have the skills or the time to successfully classify a document within an ontology. So immediately a big portion of the web is almost guaranteed never to join the semantic web.

The rest of the web,however, is produced by companies capable of putting together complicated sites. Many of them have access to tools capable of automatically generating 'meta tags', which represent a 'semantic web lite'. Meta tags occur at the beginning of HTML files and contain information such as the author, keywords, the page's relationships with other pages and a human-readable description of the content. Yet few companies go to the extent of full meta tag optimisation because they know it does not help them get a high search ranking.

The biggest search engines, including Yahoo! and Google, ignore meta tags in favour of data that they believe classify pages better, such as content, the sites that reference them, the text used to link to them and so on. Only the less sophisticated search engines use meta tags as a basis for understanding a page's content. As a result, their listings are frequently abused by companies that insert false descriptions into meta tags in order to gain higher rankings and, therefore, more traffic.

There is no reason to believe that if the world switched from using meta tags to OWL/RDF tags that search engines would be able to use them more reliably or that unscrupulous companies would not abuse them.

So if the web at large is not going to become semantic, will intranets make the switch? Not likely. Here the problem is that the semantic web is too late in arriving for most organisations. At its heart, the semantic web is a metadata and ontology standard and any organisation that has so much data that they need a metadata standard and a classification system will almost certainly have picked them already.

Enterprise content management (ECM) systems all now come with metadata systems of their own that their search engines can hook into and which do not require the data to be embedded in the document. They have their own classification systems that organisations can then customise.

Most ECM systems can also read other vendors' metadata repositories, so there is little need for a neutral standard for integration. If an external metadata standard is needed, it will probably be mandated by legislation, such as the e-GMS, Dublin Corp or similar standards.

Page 1 | Page 2 | All 2 Pages

Rob Buckley – Freelance Journalist and Editor

Hard of Hearing

Taxing taxonomies

Semantic Rubbish