Taxing taxonomies

Article 13 of 26
M-iD, February 2005
View a PDF of the original article ~ 1.1MB

A growing number of information managers are implementing taxonomies in a bid to improve customer retention and employee efficiency.

Page 1 | Page 2 | Page 3 | All 3 Pages

“Some companies are put off by any kind of manual work to add bits to taxonomies. But I do think that once a taxonomy is there, it can be updated by various content experts around an organisation. It doesn't have to be centralised, so it's not really such a huge job,” says Ahearn.

In particular, employees outside the IT department need to be involved in constructing taxonomies if they are to reflect the way the company actually does business, he adds.

The downside to this manual approach to construction, however, is that an organisation may need more than one taxonomy. A PC maker, for instance, would need one taxonomy to filter results for its call centres, another for its hardware repairs and yet another for sales department.

Buying in a larger base taxonomy can, therefore, often prove more cost-effective, since it can be broken down into several potential classification systems. But as the number of different classes of worker needing to access corporate data proliferate, so the number of taxonomies could increase.

Personal agents
This is where personal agent software from companies such as Autonomy comes in. This can generate personalised taxonomies by learning from the kinds of search each user performs: for technical details on products it will create a taxonomy suitable for hardware repairs, while searches for product features will create a more sales-friendly profile.

Torstein Thorsen, vice president of technical sales at search engine company Fast Search & Transfer, says that many companies prefer this computer-generated approach to taxonomies. “People are moving away from huge, manually created, structured taxonomies towards computer-created, flatter taxonomies: they provide more ease-of-use for normal information users.”

However, Ahearn argues that personalisation of taxonomies creates a problem when guiding others to the same documents: a user can no longer be sure a document will be in the same place in another user's search, or even if he or she tries the same search on a different machine. So automated taxonomy generation almost always needs to be allied with manual taxonomies, typically by providing higher-level categories for the automated systems to generate sub-categories, where possible.

The other main aspect of implementing a taxonomy is document categorisation. Each document needs to be classified as belonging to particular categories in any given taxonomy. With an ECM system, it can be quite easy to ensure that all new documents are automatically categorised, since the system can enforce document metadata tagging by staff.

But this does not help classify existing documents. While manual tagging of existing documents is a possible solution, it is usually highly impractical for organisations of any size and age, particularly those with a high information investment. Forrester Research analyst Laura Ramos says that consequently, organisations that adopt a manual tagging approach rarely attempt to deal with legacy content.

Automated tagging, with some degree of human oversight, is therefore the most effective way, she says. Like Google and other modern search engines, automated tagging systems attempt to 'understand' the content of documents to see how they fit into the taxonomy. 'Bayesian' techniques from Autonomy and others, suggest that the most infrequent words in a document give the best indication of its meaning and use those to categorise the document. Others, such as Fast, use linguistic analysis to try to get the documents' meaning.

Page 1 | Page 2 | Page 3 | All 3 Pages

Rob Buckley – Freelance Journalist and Editor

Semantic Rubbish

Watching brief

Taxing taxonomies