A brief tutorial containing both a screencast and text instructions for cleaning an example dataset (from the Powerhouse Museum) using Open Refine (formerly Google Refine). The walk-through includes the following steps: 1) Loading the data; 2) Inspecting the data; 3) Removing blank rows; 4) Removing duplicate rows; 5) Splitting cells with multiple values; 6) Removing blanks cells; 7) Clustering values; 8) Removing double category values. Links to the sample data files and the tool itself are provided.
Keywords: Open Refine, Google Refine, General Refine Expression Language (GREL), Data cleaning
Author: Verborgh, Ruben
Date created: 2016-01-01 05:00:00.000
Time required: P1H
Interactivity type: active