High tech network data security center background.jpg
Blog: Spatial data on the web..?

Frans Knibbe’s blog Spatial data on the web; how should it work?

What needs to happen for spatial data to become exemplary, useful residents of the web? Which agreements do we need to make to gain access to the great potential of spatial data on the web? Geodan Research’s Frans Knibbe discusses the requirements drawn up by the Spatial Data on the Web Working Group (SDWWG), a joint initiative of the OGC and the W3C.

This is the third blog in a series on spatial data on the web. In part 1, I described the value of sharing spatial data on the web and the need to synthesize domain standards for optimal sharing. It’s up to the SDWWG to make sure that that happens. In part 2, I described what the SDWWG wants to achieve. In this article I will discuss the first result of this working group, the overview of usage scenarios and the requirements.

Use cases and requirements

It is a proven method to determine how a system should be set up: you collect stories from users that describe how something might work well, or what isn’t working right now. System requirements can be derived from these stories; checkable and testable properties of what needs to be made. This method was also applied in the SDWWG. One of their deliverables, described in the previous blog, is an overview of usage scenarios and requirements. The second public concept version has recently been released and contains the basis for the other four results:

  • Best Practices
  • Time Ontology
  • Semantic Sensor Network Vocabulary
  • Coverage in Linked Data

The document contains some fifty different user scenarios (use cases), and almost sixty requirements that follow from them. The requirements for the Best Practices are particularly interesting, because those recommendations should answer the general question: how can we best publish and use spatial data on the web? A number of components of the requirements stood out:

• Findability• Linkability of data• Uncertainty• Spatial relations• Metadata• Coordinate constellations

Findability

What good is data that, though it’s on the web, just can’t be found. Not much. Findability is a logical requirement for spatial data on the web. A traditional way of disclosing spatial data on the web is to publish metadata in a data catalog, for example via the CSW specification. This, however, is sub-optimal. If we’re searching for something on the web, we usually use a search engine, such as Google. This means that publishing spatial data in such a way that it can be indexed by search engines is crucial. A search engine uses a so-called crawler: an algorithm that recursively follows hyperlinks. Everything that the crawler finds can be indexed and is therefore findable. Findable data requires the publication of metadata that describes the data well (preferably in a general web way), but the data must also be linked to each other so that a crawler can find as much as possible. It also helps when data can be retrieved in a format that web crawlers understand, such as HTML.

Linkability

This ties in directly with the findability of data: the possibilities of linking data and to make connections between elements within single or several different datasets. These links (or hyperlinks, in this case) form the threads of the web: they are absolutely essential. The better something is or can be linked, the more useful it is. Links on the web can be made via HTTP(S) URIs. If something has a URI, something else can link to it. At the same time, there’s a way to access the data behind it. Via HTTP. It’s likely that the Best Practices will recommend allocating HTTP(S) URIs to spatial data, which lets you link data directly and explicitly. However, implicit links can be valuable as well. Semantic links can be made by making use of common data definitions on the web. When, for example, two separate datasets use a general definition of an administrative area (such as https://schema.org/AdministrativeArea), these datasets are linked.

iStock_75892533_LARGE.jpg

Uncertainty

Uncertainty occurs in both spatial and temporal aspects of data. It can take different forms. In case of vector geometry, for example, the accuracy of the numbers that form those coordinates is a source of uncertainty. Textual indications can also be vague or ambiguous. Take the place name “Bergen”, for example, which refers to several different places in the Netherlands alone. Besides, the spatial properties of such a place also change over time, so without time stamping, names of locations can not always be unambiguously converted into spatial characteristics. In addition, some places, such as the Sahara, have no clear spatial delineation at all. And what about indications of spatial relations such as ‘in the vicinity of’ or ‘under the great rivers’? What we definitely don’t want is that data isn’t shared because of uncertainty, or that information about uncertainty isn’t shared, simply because it isn’t clear how uncertainty can be indicated. Data will always have uncertainties, but that does not mean that it can’t be valuable. It is therefore important to be able to express spatial and temporal uncertainty well when data is published on the web.

Types of relations

Links make data easier to find and easier to combine, and spatial data offers a unique form of linkage: spatial relations. These can be topological relations (A borders B, A crosses B, A overlaps with B, etc.), mereological relations (municipality A is part of province B), distance relations (A is within 10km of B) or directional relations (A is left of B, A is north of B, A is to port of B, etc.). Spatial relations are therefore another way to link data. This can be done explicitly, by recording relationships in a standard way within a dataset, but spatial relations can also be used in operations or filters, where they are determined by software that analyzes the spatial properties of data. This happens, for example, when you request data within a certain radius of a given point.

Spatial relations can also be important in analyses of large datasets (Big Data analytics). The discovery of spatial patterns in datasets can lead to valuable insights and better decision making, so there are plenty of reasons to ensure that this aspect of spatial data is also well-standardized.

Coordinate systems

Coordinate systems are needed for all spatial data that uses coordinates. Without knowledge of the reference system, coordinates are fairly meaningless numbers. For people who work with a special kind of spatial data - geographical data - coordinate systems can often be a major headache. In fact, they’re something that most people would rather not have to deal with. For simple applications, data about the system is often ignored, often with the tacit assumption that the coordinates must be interpreted as pairs of longitude and latitude in WGS84. WGS84 is widely used by GPS (the USA’s satellite positioning system) and that’s one of the reasons that it’s so popular. It does, however, have its issues. First of all, GIS data often uses an EPSG code to refer to this coordinate system, but that definition states that you should use the order latitude - longitude, which frequently doesn’t happen.

Secondly, and more importantly, geographic coordinate systems are subject to plate tectonics. The earth's crust consists of plates that move with respect to each other at speeds of several centimeters per year. WGS84 uses the North American plate as a reference point, which means that the coordinates of locations outside that plate are subject to change. WGS84 coordinates of a location in Europe shift a few centimeters per year compared to the actual location. For serious applications and sustainable data, this is an undesirable effect, especially considering the increasing accuracy of location determination due to technological progress. That’s why Europe has agreed to use a different system: ETRS89. On a worldwide web, using different coordinate systems per continent is rather inconvenient, as it makes it more difficult to seamlessly combine data from different sources. Our earth’s dynamic crust will make it even more difficult for the working group to come up with general and easily applicable recommendations for interpreting the coordinate system of geographical geometry.

"I wholeheartedly recommend that everyone enrich the web with spatial data as much as possible"

Frans Knibbe

Metadata

We should not underestimate the importance of metadata - the data that describes the data itself. Publishing metadata alongside other data has multiple positive effects. For instance, the description can help a potential user determine immediately whether the data will be suitable for the intended use. Metadata can also provide clarity on matters such as topicality, quality, and conditions of use. Finally, metadata is very suitable for indexing and inclusion in data catalogs, making datasets easier to find. It’s nice to see that so many beautiful things are being developed in the field of metadata on the web. The need for sharing metadata therefore does not just exist for spatial data; all types of data do well with linked metadata. It is true, however, that until recently the web domain and the geography domain developed different standards for the provision of metadata. It goes without saying that the geography domain has mainly focused on geographic metadata. Which geographical area does the data relate to? Which coordinate systems are used? What is the spatial resolution of the data?

A promising way to unite these two types of metadata standards is now being developed in a European context: GeoDCAT-AP links the metadata specifications of INSPIRE (based on the specifications of the OGC) to the Data Catalog Vocabulary (DCAT) of the W3C. This makes the two types of metadata interoperable, and makes it easier to share geographic metadata on the web than before.

Keep sharing spatial data on the web

If you want to share spatial data on the web effectively, the topics mentioned above are only some of the many possible improvements. So, the SDWWG has plenty of work left to do. Should the rest of the world wait patiently until they’re done? Of course not! It’s already possible to share spatial data on the web well. The methods used to do so can be changed at any time, according to new insights, but make sure that the URIs always work well! At the moment, we’re already enjoying many advantages over more old-fashioned ways of data sharing. It’s also important for the continuity of the work done by the SDWWG that there is a continuing practice of how the world works with spatial data. Observing and cooperating with this practice is an important source of inspiration for the working group. I wholeheartedly recommend that everyone enrich the web with spatial data as much as possible, and also enhance spatial data with the possibilities presented by the web.

LOCATIONS

Geodan has two branches in the Netherlands

One location in Amsterdam and one in 's-Hertogenbosch. The general mailing and visiting address is President Kennedylaan 1, 1079 MB Amsterdam.

Directions

GPS Lat/Lon:Geodan ‘s-Hertogenbosch
51º
51.69174 5.299683
GPS Lat/Lon:Geodan Amsterdam
52º
52.342346 4.91305

Geodan ‘s-Hertogenbosch
Buitenhaven 27-A
5211 TP ‘s-Hertogenbosch
 +31 (0)73 – 6925 151

DIRECTIONS

Geodan Amsterdam
President Kennedylaan 1
1079 MB Amsterdam
 +31 (0)20 – 5711 311

DIRECTIONS