What linked data and the semantic web mean for search

September 27, 2014 Opinion piece

Share this article

Until now most information on the web consists of documents intended to be read by people rather than machines.

Linked data refers to publishing data on the web so that it can be read by computers, linked to other data and have other data link to it. The data is expressed in a way that enables its meaning to be understood by the machines reading it.

Better searching then becomes possible. By understanding meaning (ie semantics) rather than just matching keywords that might lead to entirely spurious documents of no relevance, and by linking data from different sites and drawing inferences, searches can return more relevant and powerful information presented in a more useful way. They could even trigger automatic actions such as placing an order and authorising payment.

But while formats and techniques such as RDFa and JSON-LD, which provides ways to mark up data, and SPARQL, which provides the language for querying RDF data, have been developed and standardised to create the foundation of the semantic web, actually establishing the meaning of the terms that will be used to assert facts is rather difficult and requires domain expertise. This specific area forms the current focus of GS1’s work in line with this W3C initiative.

The meaning of human terms is revealed to machines through use of “ontologies”. Ontologies are dictionaries with annotations about how the things defined relate to each other. They give logical annotations (like simple models), not just definitions. For example ontologies might specify “companies have addresses”, “people have names”. Unfortunately, as can easily be imagined, creating and maintaining an ontology is not trivial.

For those of you unfamiliar with what GS1 actually does, we are a not-for-profit standards development organisation. Put simply, our role is to define data structures and how these are used to identify things, a role we have been performing since the 1970s. We provide a series of ‘keys’ for industry which identify various types of entity (products, locations, assets and so on) and which have highly developed allocation rules. We have also defined product attributes for bar coding (the application identifier standards), have over 1,000 product attributes defined for synchronisation in the Global Data Synchronisation Network and an extensive Global Product Classification that is used to categorise products. For visibility systems we have a standard “Core Business Vocabulary” defining values that describe the context (where, what, when, why) of “event data” that can be exchanged via the open standard known as Electronic Product Code Information Services (EPCIS).

The definitions have been developed through a collaborative process, the Global Standards Management Process or GSMP, involving literally thousands of companies from around the world that use the GS1 standards. This established data dictionary is already in operation across multiple industries, suggesting that the GS1 ontology is very well suited to describing the products and locations identified by GS1 keys for the purposes of the semantic web.

So what does this all actually mean to the end customer? Providing the means for computers to connect a vast range of datasets from any number of sources will greatly enhance general search results, by ensuring the linking of disparate and in-depth product information as well as additional content such as peer reviews and where the product can be sourced (ie is in stock) locally to that user’s location. Potentially transactions could be automated by linking to the user’s preferences and financial data to place the order and activate payment.

To help illustrate what this actually looks like in practice, consider this example: a user types ‘Barbie’ into a search engine. Currently, this functions as a keyword that may return paid-for results, as well as sites where the SEO work undertaken for that keyword has successfully pushed them high up the rankings.

With the Semantic Web, the term ‘Barbie’ could be identified as a toy, with the ontologies telling the search agent that toys have specific attributes, such as dimensions, age range and country of manufacture with this data then linked in from other sources. There are also other associations it would make throughlinked data, such as appropriate retail stores relevant to the user’s location where the product is currently available at a particular price.

GS1 also has another potential role in the semantic web. On the web (of documents or of data) anyone can say anything they like about anything. How can anyone know who to trust? GS1 may look to operate a lightweight lookup service that provides the definitive correspondence between a GS1 standard product identifier or location identifier and the registered domain name of the brand owner or organisation. This would provide a degree of trust about whether the organisation providing the data is the one responsible for the product or location to which the data relates.

Tim Berners-Lee said the semantic web will allow “searching of the Web as though it were one giant database, rather than one giant book.” The emerging GS1 ontology is intended to support the evolution of the web to the “giant database” it will surely become.

Andrew Osborne, CTO of GS1 UK

This article first appeared on Essential Retail on 28th July 2014.

What linked data and the semantic web mean for search