8 February 2016
What needs to happen before spatial data can be upstanding web citizens? Which agreements will have to be made to fully develop the potential of spatial data on the web? This third blog in a series about spatial data on the web is about the necessities identified by the Spatial Data on the Web Working Group (SDWWG), a joint initiative of the OGC and the W3C.
In the first blog I wrote about the benefits of sharing spatial data on the web, and the need for a synthesis of domain standards, which is a task of the SDWWG. The second part of the series is about what the SDWWG wants to achieve. This third part is about the first product of the Working Group, an overview of use cases and requirements.
It is a tried and tested method for finding out how to build something: first you collect stories of how people would like things to work, or how things presently do not work well. From those stories system requirements can be derived – confirmable and testable properties of the system that is to be built. This method was used in the SDWWG too. One the Working Group’s deliverables (described in the previous blog) is an overview of use cases and requirements derived from them. The second public working draft has just been published. It contains the foundations for the other group deliverables:
The document contains about fifty different use cases and about sixty requirements that follow from the use cases. Requirements can be related to Working Group deliverables. The most general of those deliverables will be the Best Practices. They should give an answer to the core question: How can spatial data on the web best be published and consumed? So I want to direct your attention to a few interesting requirements for the best practices: discoverability, linkability, uncertainty, spatial relationships, metadata and coordinate systems.
What use are data that are somewhere on the web but can’t be found? Very little. Discoverability is an obvious requirement for data on the web, and for spatial data too.
A traditional way of announcing the existence of spatial data is publishing metadata in a data catalog, for instance using the CSW specification. But that does not lead to optimal discoverability. The way most people search for anything on the web is by using a search engine like Google. That is why a large part of making data discoverable is to allow the data to be indexed by a search engine. Search engines typically make use of web crawlers, automated processes that recursively follow hyperlinks and gather data along the way. So to achieve good discoverability it is not only necessary to publish metadata that do a good job of describing the data, it is also necessary to link data using hyperlinks. Furthermore, it helps when data can be requested in a format that is understandable for web crawlers, such as HTML.
Linkability has much in common with discoverability. It is about possibilities for making connections between elements in a dataset, between elements in different data sets, and between data sets. It is links that are the connecting threads of the web. Without them there would not be a web, but a collection of data silos. The more linked something on the web is, the greater its value. Web links can be established using HTTP(S) URIs (an example of a URI is https://en.wikipedia.org/wiki/Uniform_Resource_Identifier). When something has a URI, something else can link to that URI. At the same time, the URI can be used to get to the data that the URI identifies, using HTTP, the basic web API.
It is likely that the Best Practices will recommend assigning HTTP(S) URIs to spatial data, because it will give the power to link data directly and explicitly. But implicit links can also be valuable. By making use of common data definitions on the web, semantic links can be made. For example, when two separate datasets make use of a common definition of administrative area (like https://schema.org/AdministrativeArea) a connection between the two datasets is established.
Both spatial and temporal aspects of data can involve uncertainty. It can occur in different ways. In vector geometry there can be uncertainty in the numbers that form the coordinates. Textual data can be vague or ambiguous. Take the toponym ‘Kingstown’. It can be used to indicate many different places in the world. Additionally, the spatial characteristics of places can change in time, so without temporal data names of locations can not be clearly lead to spatial data anyway. What’s more, some locations, like the Sahara, have no clear spatial demarcation. And what to think about the certainty of spatial relationships like ‘close to’ or ‘south of’?
Uncertainty is a fact of life and it exists in spatial data too. What we don’t want to happen is that spatial data will not be shared because of uncertainty, or because it is not clear how to express uncertainty or how to deal with it. Data with uncertainty by no means are worthless. So it is important to find a common understanding of how uncertainty in spatiotemporal data on the web can best be expressed and processed.
The value of data on the web depends on the possibilities for linking. Links make data easier to find and easy to combine. Spatial data offer a unique type of linkage: spatial relations. They can be topological relations (A borders on B, A crosses B, A contains B, etc), or mereological relations (A is part of B), or distance relations (A is 10 km from B), or they can be directional relations (A is below B, A is north of B, A is on the starboard side of B, etc).
Spatial relations can give an extra dimension of connectivity between data. That can be done explicitly, by recording the relations in the data in a standard way. But spatial relations can also be used in operations or filters on data, in which case they are established by software procedures that use the spatial properties of data. A simple example of the latter: requesting data of things that are located within a certain radius from a point.
Spatial relations can be important in analysis of large data volumes (Big Data analytics). Discovery of spatial patterns in data can lead to valuable insights or to improved decision making.
There are plenty of reasons for trying to agree on good standards for this aspect of spatial data too.
Information about coordinate systems is needed for all spatial data that make use of coordinates. Without knowledge about their reference system, coordinates are rather meaningless. But coordinate systems can cause headaches, especially for those that work with geographical data. For some it is a nuisance to be avoided, and in simple applications data on coordinate systems sometimes are absent, often with a silent assumption that the coordinates should be interpreted as longitude and latitude in WGS84. WGS84 is a popular system, it is used in the GPS (the satellite location system of the USA). But unthoughtful use of the system has its risks. Firstly, in geographic information systems (GIS) usually EPSG codes are used to identify coordinate systems. But the EPSG definition specifies an order of latitude – longitude, a rule that is often not respected. Secondly, we have to realize that the very Earth on which we want to locate things is constantly changing. Plate tectonics cause parts of the crust to move with respect to each other at speeds of several centimeters per year. With WGS84 being tied to the North-American tectonic plate, locations in Europe experience a shift of several centimeters per year. For serious and durable data that is an unwanted effect, especially with technological advancements making measurements of location with ever higher accuracy possible. In Europe spatial data experts have therefore agreed to use another system: ETRS89. Many other coordinate systems for geography are in use, each with their own benefits and limitations.
For the single data cloud that is the world wide web, having many different coordinate systems for different places on earth is inconvenient. It limits possibilities for exchange and combination of data from different sources. Our dynamic planet will make it hard for the Working Group to come up with a recommendation for coordinate systems that is both simple and universally applicable.
The importance of metadata, data about data, should not be underestimated. Publishing metadata next to actual data has several benefits. Because metadata give a description of a dataset, potential users can determine whether data are suitable for their intended use. Metadata can also give insight into matters like quality, currency and usage restrictions. Lastly, metadata are very suitable to index and/or to record in data catalogs, making datasets easier to find.
Fortunately, much has already been developed for metadata on the web. The importance of sharing metadata is not unique to spatial data. All types of data do well with linked metadata. However, until recently the web domain and the geography domain have developed different metadata standards. Naturally the geography domain has occupied itself with developing standards for geographical metadata, like data about the spatial extent of a dataset, the coordinate systems that are used, or the spatial resolution of the data.
A promising way of uniting the two separate developments of standards for metadata is currently being developed as a European undertaking: GeoDCAT-AP links the metadata specifications from INSPIRE (which are based on OGC specifications) to the Data Catalog Vocabulary (DCAT) from the W3C. This makes the two types of metadata interoperable, and makes it easier to share geographic metadata on the web.
The topics described above are only a few that need clear guidance to realize efficiently shared spatial data on the web. That means the SDWWG has a lot of work to do.
Should the rest of the world wait patiently until the Working Group has finished its jobs? Of course not! Although the situation is not ideal at the moment, it is certainly possible to share spatial data on the web right now, and to reap the unique benefits of doing that right now. Also it is possible to adjust the way spatial data are published or consumed when new insights come to light. Just make sure URIs, once assigned, keep working.
For progress in the SDWWG is important to have a continuing practice of how the world is using spatial data on the web. Observing that practice, and working with it, is a vital source of inspiration. So I encourage everyone to enrich the web with spatial data, and at the same time enrich spatial data with the opportunities of the world wide web.