Linked Data design principles are increasingly employed to publish and consume high-fidelity, heterogeneous statistical datasets in a distributed fashion. While vast amounts of linked statistics are available, access and reuse of the data is subject to expertise in corresponding technologies. There exists no user-centered interfaces for researchers, journalists and interested people to compare statistical data retrieved from different sources on the Web. Given that the RDF Data Cube vocabulary is used to describe statistical data, its use makes it possible to discover and identify statistical data artifacts in a uniform way. In this article, the design and implementation of a user-centric application and service is presented. Behind the scene, the platform utilizes federated SPARQL queries to gather statistical data from distributed data stores. The R language for statistical computing is employed to perform statistical analyses and visualizations. The Shiny application and server bridges the front-end Web user interface with R on the server-side in order to compare statistical macrodata, and stores analyses results in RDF for future research. As a result, distributed linked statistics with accompanying provenance data can be more easily explored and analysed by interested parties.

URL: http://csarven.ca/linked-statistical-data-analysis
Keywords: Data analysis, R (programming language), Shiny server, Apache Jena, Federated queries
Author: Riedl, Reinhard
Publisher: CEUR (Central Europe Workshop Proceedings)
Date created: 2013-07-07 04:00:00.000
Language: http://id.loc.gov/vocabulary/iso639-2/eng
Time required: P1H
Educational use: professionalDevelopment
Educational audience: professional
Interactivity type: expositive

  • Competencies