Logo Rob Buckley – Freelance Journalist and Editor

New seekers

New seekers

A new generation of search tools aims to help corporate users find valuable content and information buried deep on their hard drives.

Page 1 | Page 2 | Page 3 | All 3 Pages

A magnifying glass, a text box and a talkative dog: the Windows file search function is one of the most recognised, necessary and loathed parts of the operating system. It is slow, unreliable and extremely limited, a fact that becomes glaringly obvious when it is compared against today's Internet searches, capable of providing excellent results almost instantaneously.

If some enterprising company had decided to sell a superior file search system to frustrated Windows users, no one would have been surprised. For a whole flurry of companies ranging from Google and Yahoo to Microsoft's own MSN subsidiary to do it and all for free: that's a surprise.

The first desktop search tools of note came from AltaVista and Verity in 1997, but were soon abandoned. AltaVista launched an improved version of Desktop Search in 2002. This provided a simple way to search both the Web and the PC at the same time, but again was more or less ignored. It was the launch of Google Desktop Search in October 2004 that really changed the market, thanks to Google's global strength. By December, MSN, Ask Jeeves and Yahoo had all announced plans to release similar technology.

According to Angela Ashenden, an Ovum analyst, the motivation for this new focus on the desktop by search engine companies is reasonable clear. “It's a continuing battle for loyal customers, really. If you've got someone searching from the desktop rather than going to a web site, then they're going to be that much more loyal. Where your revenue is from advertising, getting loyal customers is key,” she says.

For MSN, there is an additional motivation however. “Microsoft feels increasingly threatened by Google and what Google might do. The only real challenge to Microsoft comes from Google: it never knows what Google's going to do next.”

The current desktop search systems all essentially work the same way. First, they search for files whose format they understand. They then index that file, just as a database system would, 'summarising' it and storing the result on the local hard drive. Then, when the user performs a search, the system compares the search with that database and that enables it to return relevant results.

With few exceptions, desktop search tools don't stray far from simple keyword searches. Both Autonomy's IDOL Enterprise Desktop Search and FAST's Personal Search Platform use a subset of their developers' server indexing toolkit to permit more complicated searches.

The database doesn't just contain information about the file's content. It also includes the file's metadata such as 'from' and 'to' fields of emails; duration of sound files; authors, subjects and titles of Word files; and so on. That means that more complicated searches, such as “all sound files less than five minutes in duration”, are possible with the majority of desktop search tools.

However, the database only contains a snapshot of the hard drive at the time of indexing: any changes made to files after that point will not be reflected in the index, nor will any new files. So desktop search systems also include a tool that takes in any changes to the local file system and updates the database of indexes with the new information.

As well as desktop search, most tools also offer web search. Users can choose to pass their searches, where possible, to a web search engine. The tool can then aggregate the file system search results with the online search results and present them as either a unified collection or discrete sets.

How each search system achieves these things is where the differences emerge. For web search, the Google Desktop Search can pass its search to Google's own search engine, with Ask Jeeves, MSN and Yahoo doing similarly. Others, such as Copernic and Blinkx, have no such recourse and must use other companies' search engines.

Index updating is a greater differentiator. Yahoo's desktop search requires a manual or scheduled re-index, so its database can become quickly outdated in a busy environment. Other systems, such as Google's and MSN's, run a monitor process that use the Windows file system APIs [application programming interface] to receive almost instant notification of changes. Any amendments to files understood by the search tool and it adds the files to a queue. Then, at the next 'convenient' moment, the tool will amend its database with indexes of the updated or new files.

Depending on the tool, 'convenient' can mean different things: it certainly needs to be convenient to the computer, which will wait until other file system operations complete before starting the indexing; but tools such as Google's and MSN's will wait until user activity on the PC drops to a pre-defined level, to avoid slowing down the computer during everyday use - a trade-off between true database currency and usability. With a typical user however, updating the database will be a real-time activity.

Aside from interfaces, the other main differences arise from the file formats understood by each tool, with each vendor claiming hundreds of different formats to its credit. Surprisingly though, PDF search isn't common, with MSN requiring a separately downloaded plug-in, and support for Lotus Notes and Outlook variable: some support searches of Outlook emails only, while others support searches of tasks and events. Unsurprisingly, MSN's search tool provides the best Outlook search capabilities.

All of these facilities make desktop search tools useful for enterprises but not must-haves. Indeed, while Verity acquired a new desktop search tool from 80-20 Software in August, it has done little to sell this as anything more than a useful add-on for its existing clients, rather than marketing it as a product for drawing in new customers.

One of the biggest problems facing all the enterprise desktop search vendors is the slow march of corporations back towards centralised IT infrastructures, and, in particular, ECM systems. With all documents on a server, there's no need for a desktop search product except for laptops and other devices that may leave the network.

It's a situation faced by Edward Cowell, technical director of search engine marketing consultancy Neutralize, who has tested for his company virtually all the desktop search tools currently available. “If I like it, I keep it. Otherwise, I deactivate it.” He now has only the Google Desktop Search installed and uses that mainly for web searches. “We're a network-based company. We don't store a lot of stuff on our computers. Desktop search products are kind of redundant for us.”

Combining intranet search with desktop and web search would make the desktop search tools far more attractive to Cowell and others. It's something that Autonomy and FAST have already made possible in their own tools. However, despite the ability of Windows' existing search facility to tie into Windows Index Server (albeit only through undocumented commands), MSN's own search tool does not.

Google's efforts at penetrating the enterprise are similarly lacklustre, but are slowly improving. The initial version of Google Desktop Search stored its index unencrypted and indexed every document on the PC, no matter who owned it. It also indexed cached files from secure web sites. It posed such a corporate security risk that Gartner analyst Whit Andrews advised businesses to discourage the use of the tool.

Google fixed many of these problems in the latest version of the tool, splitting the indexes for different users, encrypting them and switching off indexing of certain secure file types. It has also created an enterprise version of the software, Google Desktop for Enterprise. Nikhil Bhata, product manager for Google Desktop, says that most of the features in the enterprise version are designed to “help IT administrators deploy the software easily and lock down certain functions”. More notably, Google signed a deal in October with IBM to allow the software access to IBM's WebSphere Information Integrator OmniFind Edition, making simultaneous desktop, enterprise and web search possible.

In the short term, this makes Google's software more attractive to the enterprise than Microsoft's. Indeed, the consumer, MSN branding on the Windows Desktop Search tool actually makes it less attractive to many IT managers. Andrews says that Google is now likely to develop considerable “mindshare” with its products and take the lead in the market.

In the long term, though, while centralisation of documents may be the biggest threat to desktop search tools' relevance in the enterprise, a far bigger threat is emerging from Microsoft itself: a better Windows desktop search function. WinFS, short for Windows File System, has been a long-touted holy grail of Microsoft's. Originally planned for Windows 95, the idea behind it is simple: replace the standard Windows file systems - FAT, FAT32 and NTFS - with a cut-down version of the SQL Server database. The virtues of this are the same virtues demonstrated by desktop search tools - a separate, indexed database of file system content and metadata - but, in this case, tied in at a much deeper level. More so, it will provide a standard API for developers to tie desktop search and metadata into their products.

Technical obstacles have caused Microsoft to scale back its ambitions over the years, but a beta version of WinFS that sits on top of NTFS (rather than replacing it) is now available. The eventual aim is for WinFS to be available for Windows XP, and installed by default in its replacement, Windows Vista.

Corporate upgrade cycles mean that it will be some years after Vista's current release date of 2007 before WinFS becomes the de facto search technology for most desktops. Microsoft also has few ties into existing server indexing technology planned, meaning integrated desktop, enterprise and web search tools will still have a place on many corporate desktops, even if pure desktop and web search tools don't.

Says Lee Phillips, FAST's director of intelligence solutions, “We've tied our personal search platform very closely into the Windows file system APIs. We think, whatever changes Microsoft makes, by combining enterprise search with desktop search, we're going to be in a very good position.”

The current war over desktop search is just the beginning of a much greater war over integrated enterprise search. Even if Google wins this battle, it won't have won that war. Despite many smaller companies like FAST already having realised where the real battles are, it's likely that dominance of the market will belong to either Microsoft or Google. Microsoft has the advantage of WinFS. But with integrated search already starting to become part of its long-term strategy and with a good portion of the mindshare, Google could take the ultimate prize from Microsoft yet.

Page 1 | Page 2 | Page 3 | All 3 Pages

Interested in commissioning a similar article? Please contact me to discuss details. Alternatively, return to the main gallery or search for another article: