News Explore the world of Geo-information

Frans Knibbe

Don’t succumb to the dark side, bring your data to the surface

30 January 2017

Spatial data describing the location and shape of things are highly valuable. They are put to meaningful use everywhere in society. Innovation brings growing opportunities to share those data in a global data network, the world wide web. It is a place where spatial data can thrive, because they can be readily combined with other data, both spatial and non-spatial.

But the web has different grades of depth and illumination. Beyond its well-lit surface where things are easily found and accessed there are more obscure troughs where data can reside. Those data run the risk of withering, unless they are turned towards the light. The path towards the light can be taken step by step. This blog lists a few of those steps, after looking at the role of established practices in disclosing spatial data on the web.

The deep and dark web

The terms ‘deep web’ and ‘dark web’ are often conflated, but they are both used to refer to those parts of the web that can not easily be interacted with. Things that are hard to find and hard to use. In some cases, information is hidden away on purpose. That happens when it’s about illegal things, or about things that governments rather not have out in the open. More often, the difficulty in finding and using data is unintentional. That is unfortunate, especially when it happens to spatial data, because spatial data can do a wealth of good if properly made available on the web.

darthvader

The path to the dark side of the web may beckon as easy passable, but the difficult path towards the light will likely be more rewarding in the end.

Having access to spatial data of high quality is crucial, be it in rapid response to disasters or incidents, in monitoring assets and infrastructure, in tracking things, in monitoring and predicting weather, climate and pollution, in locating business opportunities or in city planning. More and more, those essential spatial data are available in our single global data space, the world wide web, whether freely accessible or selectively available to authorized parties only. Although the web is a highly distributed system, it forms a single logical medium because it is based on common open standards. Knowing how the standards work is knowing how to find and use data on the web. Good standards for sharing data are unmindful of the type of data they are used for. Using general standards allows for interacting with data of any type (spatial, medical, financial, statistical, …). And it allows for combining them at will. Unfortunately, like in other domains, supply of spatial data is often still based on domain standards, or lacks in web-friendliness. Two symptomatic concepts are the SDI and the data portal.

SDI

In the context of spatial data, people talking about an SDI are not referring to the ‘Star Wars’ defence shield, but to Spatial Data Infrastructures. The concept of an SDI was a logical step after the era of solitary Geographic Information Systems (GIS). It allowed sharing of geographical data between different organisations, between different nations even. SDIs are typically based on standards like WMS, CSW, WFS and GML. When those standards were developed, they were based on popular general standards like XML and UML and had the web as a means of interacting with data firmly in mind: all the different services making up an SDI can be accessed by HTTP(S) queries.

 

800px-space_laser_satellite_defense_system_concept

An artist’s concept of a Space Laser Satellite Defense System, part of the Strategic Defensive Initiative (SDI), also known as ‘Star Wars’

But it is becoming clear that the successful concept of an SDI is showing its age. When it comes to the ability to crawl the data in order to index them and make them discoverable, SDIs are lacking. Also, the knowledge required to interact with an SDI is a substantial barrier for the web developer that is not familiar with the family of geospatial standards that underpin an SDI. Getting to know and understand those standards takes serious effort. It is an effort that web developers that are used to simple REST APIs are ever more reluctant to take. Lastly, SDIs do continue to act as data silos. While data exchange within an SDI might be arranged well enough, data exchange between SDIs is not easily feasible because of the tendency for SDIs to use proprietary information models.

Portals and wormholes

An idea that frequently goes hand in hand with the concept of an SDI is that of a data portal, a web page or website dedicated to giving access and interacting with data. In case of a data portal coupled with an SDI, the portal typically features a map area and allows browsing the available datasets, which can be displayed on the map as map layers.
Like the SDI concept, data portals were a great step forward from having to install and learn to use a GIS on a local machine. And like the SDI concept, data portals can gradually be regarded as outdated. Data portals give access to a predefined collection of datasets. In practice, demand for data often will encompass data outside a predefined collection. And needs for interaction with the data will often involve other functionality next to that offered by the portal. With techniques being available for accessing the whole dataweb at once, the need for data portals is replaced with the need for low-threshold access to all data everywhere, including spatial data.
The rather elaborate way of getting to the required data by means of a portal could be replaced by more direct access. If all datasets and all data contained therein are directly accessible by means of HTTP(S) URIs which allow direct linkage between everything, navigation on the data web can move from using portals to using wormholes.

spaceship-traveling-through-wormhole

In science fiction, wormholes can be used to instantly travel between far away places anywhere in the universe

New demand calls for new supply

Techniques for the exchange of spatial data like SDIs and portals have been developed a while ago. In the meantime, the web has continued developing. And so expectancy of what should be available has changed. Describing data on very different themes in general terms, using common web vocabularies, can go a long way in meeting that expectancy. It gives increased opportunities to find different datasets somewhere and combine them, or mash them up. Going into the intricate details and background of the data is still possible, but it is no longer a necessity for every kind of usage.
Next to that, increasingly people are relying on search engines to find what they are looking for on the ever expanding web. It is hard for specialized lists, catalogues and indexes to keep up with the continuously updated smartness of a search engine like Google. It has not come as far (yet) that the existence of something that can’t readily be found by Google can be denied. But it does mean that there is an increasing sharpness in the division between the murky deep web and the brightly visible things on the surface. And so a pressing question needs to be answered: on which side of the divide do you want your data to be?

Start over? Or continue developing?

So perhaps the ageing concepts of Spatial Data Infrastructures and data portals could do with an overhaul. Does that mean existing SDIs should be scrapped and replaced? Certainly not. That would be a very wasteful thing to do because SDIs have many lasting qualities.

For one thing, SDIs can offer a lot of much needed semantic interoperability. In the process of setting up SDIs, participating organizations have gone through the process of aligning their different views to come up with shared information models. In INSPIRE, a major European SDI, a lot of effort has gone into developing shared ways of describing many different topics, the INSPIRE data themes. This kind of shared semantics is very valuable and can be a stepping stone towards global semantic interoperability.
In response to further development of common practice and techniques on the web, INSPIRE is in the process of adapting, by adding functionality to its existing capabilities. For example, the INSPIRE registry has been made available using the Resource Description Framework (RDF), and the specification for INSPIRE metadata has been mapped to the general DCAT standard. These are examples of how an existing SDI can continue to develop towards something that is better integrated with the web in general.

Furthermore, to make the distributed web services that form an SDI possible it is likely that harmonization of different ways to store and retrieve data has already taken place. New ways of exposing data on the web can be fitted on those existing solutions. Providing access to data on the web does not have to be a choice between technologies, it can be offering multiple avenues of approach.

Whatever methods you or your organization are using to publish or collaborate with spatial data on the web, here are some ways in which data can be turned towards the light:

  • Make sure all data are assigned HTTP(S) URIs.
  • Make sure URIs are persistent. Ideally they should function forever.
  • Be hospitable to search engine crawlers:
    • Describe a dataset with standardized metadata.
    • Structure datasets in such a way that allows recursive fetching of data, for example by partitioning the data in a hierarchy of subsets.
    • Advertise the dataset by URI on web pages or in data registries.
  • Using simple REST APIs next to more elaborate querying mechanisms.
  • Support web friendly data formats such as JSON.
  • Use common data types.
  • Use general semantics (next to domain specific semantics, if so required).
  • Expose code lists and classifications on the web.

Some of these things may be easier to do than others, but nothing requires having a high midi-chlorian count. And help is available. The web is full of advice on how to go about making data more usable. And I know of a friendly spatial data company that is willing to help too 🙂