Introduction

This tutorial was created both to highlight the potential uses of the WorldCat dataset and to address specific portions of the LD4PE Competency Index.  Early sections address topics related to “Creating RDF data” and “Storing RDF Data”.  Walkthroughs of simple SPARQL queries introduce the broad topic of “Querying RDF Data” by teaching the basic SPARQL syntax and how to interpret result sets.  Finally, a series of exercises prompt the user to write task-specific queries, with answers provided. These exercises feature the more advanced uses of SPARQL functions and operators that make up the competencies and benchmarks under “Querying RDF Data”.

There are a great number of SPARQL tutorials on the Web, but the majority of them make at least one of the following two assumptions- which do not always hold true in real-life cases:

  1. That the dataset the user wants to query is relatively small (e.g. “toy” examples)
  2. That if the user is querying a massive database (e.g., DBpedia), a SPARQL endpoint will be provided.

What does the user do when he or she discovers that their dataset is available in an RDF format, but as a data dump – and that it contains over twenty million triples? The WorldCat Dataset is just such a case.

There are many different tools available for storing and querying RDF data, and the right one for the job depends on how the data will ultimately be used.  This tutorial represents only one possible solution; it’s primary intention is to allow the user to retrieve the WorldCat Dataset and start exploring it as quickly as possible and, hopefully, in a way that is simple enough for novice users.

Accessing the Dataset

Let’s say that a colleague has given you a link to a dataset: http://purl.org/dataset/WorldCat/LibraryScienceSubset.

Download detailed introductory information (PDF, 274KB)

Storing the Data

Before you can start querying the data, we need to load it into a triple store for persistent storage.  The tool chosen used in this tutorial is Apache Jena’s TDB.

Download instructions for storing the dataset (PDF, 115KB)

Querying the Data

When faced with a new and unfamiliar dataset, it is helpful to start by getting a sense of the classes and properties being used to describe the data.  Without this knowledge, writing queries is virtually impossible.  The following Exploratory Queries are simple, reusable, and can quickly give you an idea what a dataset is all about.

Download exercises for exploratory queries (PDF, 174KB)

Simple Queries

The following sections contain walkthroughs of some simple queries, with both the logic and syntax of each broken down – a starting point for users new to the SPARQL query language.

 Simple Query 1: Union and Shared Subjects

Start with this query: What languages are represented in this dataset?

To write this query, you need to determine one vital piece of information – what property is used when describing a resource’s language?  Fortunately, you already know all the classes and properties used in the dataset, thanks to the Exploratory Queries we just performed. If you skim through the result set you saved, you see that there are two likely candidates: “http://purl.org/dc/terms/language” and “http://schema.org/inLanguage”. 

To determine which property you should use in future queries to get the best results, you can first write a fairly simple query which will give you an idea how the dataset’s creators used these properties.

Download exercises for Simple Query 1 (PDF, 149KB)

Simple Query 2: Optional and Turning an Object into a Subject

Now that we know which properties are used to describe languages and all the possible language codes, let’s write a more specific query.  Let’s limit the type of Creative Works we are looking for to Books, and the language they are written in to French. All we need to do is string together a few triple statements.

Download exercises for Simple Query 2 (PDF, 203KB)

Simple Query 3: Negation Using Not Exists and Minus

What if, on the hand, we had wanted to write a query specifically to get the names of French books which were not translations of works in other languages (i.e., works originally written in French)? There are actually two ways to do this, both of which fall under the broader topic of NEGATION.

Download exercises for Simple Query 3 (PDF, 118KB)

Additional SPARQL Exercises

You are now ready to try writing some queries on your own.  The following section contains prompts (questions) and example queries which accomplish each task (answers).

Download SPARQL exercises (PDF, 128KB)

Download exercise answer walkthrough (PDF, 169KB)