Tech trends: metadata
- Article 2 of 3
- GIS Europe, December 1998
Files. If your hard drive is anything like mine, it's full of them. At last count, I had nearly 11,000. Do I know what they all do? No. If I needed to find a particular one in a hurry, would I be able to? No. Is there a risk of my using the wrong file or duplicating my efforts? Yes.
Page 1 | Page 2 | All 2 Pages
Computer users all over the world all share this problem, that of finding and identifying the files they need for their tasks, whether it be on their own computer or a network. Software vendors have started waking up to this problem and have devised various solutions. But what do they have in mind and how will it affect you?
At a fundamental level, what's needed is more information about files than just a name, and a way to locate files without knowing their locations on a system. “Metadata” or “data about data” is an issue that the gis industry has been working on for longer than most of the mainstream it companies, mainly because of the complexity of the data involved. Indeed, metadata is one of the standards being worked on by the Opengis Consortium, a group of companies and organisations from all over the world that is working on making gis systems interoperable with each other, as are the UK's National Geospatial Data Framework, the Netherlands ravi and the EU's Mègrin among others.
At the moment, most organisations' metadata usually consists of paper records (or “data dictionaries”) of where files are stored on a network, who created them, who last modified them, which departments own them, what they record and so on. This has to be updated manually which can lead to difficulties. As Craig McCorriston, a statistician at West Lothian Council in Scotland whose gis unit uses data dictionaries, points out, “if you don't systematically update a dictionary, it becomes useless and more significantly, may mislead users into using inappropriate data”.
A better system is to have your computer either automatically create metadata or force you to enter your own metadata when you save a file. You could store this information in a central location such as a network server, that everyone in your organisation can access from his or her own machine. This can increase data-sharing between different departments because the biggest obstacle to sharing data is knowledge of its existence. Many organisations have datasets captured or bought by one department that others don't know about so don't use. With central “metadatabases”, everyone knows what everyone else has and can use it to the fullest.
Companies such as Microsoft, Novell, Banyan, Netscape, and ibm have all created “directory services” for their operating systems or collaboration products. Directory services are means to locate resources on a network, including printers, servers, users and documents. Given a name for the resource, you can almost instantly locate it on the network through a special client or a Web browser. Of course, you still have to know the name of the resource and unless you've used a descriptive filename for your document, you'll probably have to open up twelve different documents all called “arfx2.dxf” or similar to discover if any of them is the right file.
The problem is greater for gis files because there's no support for them built into these directory service systems. While most can summarise or search the contents of word-processing documents, none can take a gis file and work out what georeferencing it uses or what area it represents for example.
One of the few gis companies working on the problem of metadata for standalone gis files is esri. It has a two-pronged approach: sde for Coverages and arc/info 8.0. sde for Coverages is a way for you to access files or even parts of files distributed on a file server without having to mount the server on your desktop via unix's nfs or Windows nt's smb network file systems.
arc/info 8.0 will include ArcCatalog, a new application for Windows nt only. It will be almost mandatory for any file you create or modify in arc/info 8.0 to have extensive metadata stored with it that other users will be able to access through an arc/info server and ArcCatalog.
Another problem for gis data is the dual nature of the traditional gis storage system. While relational databases, the mainstay of text storage in mainstream it, are very good at storing gis attribute information, they're very bad at storing geometry and images. This meant that gis data would be separated into attribute data and geometry data and stored separately. This placed the onus on the user to know if and how the two were related. It also meant two files and two sets of metadata.
This has now changed with the advent of “object-oriented” relational databases. These enable you to store the geometry information with the attribute information in the same database, giving you access to the full capabilities of a standard relational database system and a central access point for all your files. Rather than having to store all your data in separate files on different file servers, hard drives and so on, you can store all your data in one database system. This has the added advantage that when you're performing queries and selections on data, you don't have to download a whole file off a server, just the features in which you're interested. You can also store metadata in the same database and link the attribute, geometry and metadata tables together so that everyone has access to all this information through one source. Some spatial enablers also automatically create tables that list all the geometries stored in the database.
Page 1 | Page 2 | All 2 Pages
