Linked Data holds the promise to derive additional value from existing data throughout different sectors, but practitioners currently lack a straightforward methodology and the tools to experiment with Linked Data. This article gives a pragmatic overview of how general purpose Interactive Data Transformation tools (IDTs) can be used to perform the two essential steps to bring data into the Linked Data cloud: data cleaning and reconciliation. These steps are explained with the help of freely available data (Cooper-Hewitt National Design Museum, New York) and tools (Google Refine), making the process repeatable and understandable for practitioners.
Keywords: Linked Open Data (LOD), Data cleaning, Atomization, Clustering, Data reconciliation
Author: Van de Walle, Rik
Publisher: ISQ (Information Standards Quarterly)
Date created: 2012-05-01 04:00:00.000
Time required: P10M
Educational use: instruction
Educational audience: professional
Interactivity type: expositive
- Cleans a dataset by finding and correcting errors, removing duplicates and unwanted data.
- Knows the "five stars" of Open Data: put data on the Web, preferably in a structured and preferably non-proprietary format, using URIs to name things, and link to other data.
- Uses available resources for named entity recognition, extraction, and reconciliation.