A brief tutorial explaining how to enrich a dataset even when many fields (notoriously description) contain unstructured text. To capture this potentially interesting information in machine-processable format, named entity recognition can be used via an extension to Open Refine (formerly Google Refine). This tutorial builds on two previous ones which explain how to clean and reconcile
an example dataset (from the Powerhouse Museum) to a specific controlled vocabulary (in the example, the Library of Congress Subject Headings). The textual walk-through demonstrates how to install the extension, open a new project, and perform the extraction of named entities.
Keywords: Open Refine, Google Refine, Named-entity extraction
Author: Verborgh, Ruben
Date created: 2016-01-01 05:00:00.000
Time required: P20M
Interactivity type: active