This slide presentation covers the whole spectrum of Linked Data production and exposure. It begins with a grounding in Linked Data principles and best practices, with special emphasis on the VoID vocabulary. It then covers R2RML (for operating on relational databases), Open Refine (for operating on spreadsheets), and GATECloud (for operating on natural language). Finally, the presentation describes means to increase inter-linkage between datasets, focusing on tools like Silk. NOTE: These slides represent material from several lessons that comprised a larger "module" of the EUCLID Project. As such, they cover a wider range of topics than most resources.
Keywords: Linked Open Vocabularies (LOV), Vocabulary of Interlinked Datasets (VOID), Simple Knowledge Organization System (SKOS), Validation, Link Discovery, Data extraction
Author: Norton, Barry
Publisher: EUCLID Project
Date created: 2013-04-27 07:00:00.000
Time required: P2H
- [MOVE] Knows portals and registries for finding RDF-based vocabularies.
- Cleans a dataset by finding and correcting errors, removing duplicates and unwanted data.
- Knows methods for generating RDF data from tabular data in formats such as Comma-Separated Values (CSV).
- Knows methods such as Direct Mapping of Relational Data to RDF (2012) for transforming data from the relational model (keys, values, rows, columns, tables) into RDF graphs.
- Knows Simple Knowledge Organization System, or SKOS (2009), an RDF vocabulary for expressing concepts that are labeled in natural languages, organized into informal hierarchies, and aggregated into co
- Knows the "five stars" of Open Data: put data on the Web, preferably in a structured and preferably non-proprietary format, using URIs to name things, and link to other data.
- Recognizes that owl:sameAs, while popular as a mapping property, has strong formal semantics that can entail unintended inferences.
- Understands that the properties of hierarchical subsumption within an RDF vocabulary — rdfs:subPropertyOf and rdfs:subClassOf — can also be used to express mappings between vocabularies.
- Registers datasets with relevant services for discovery.
- Uses available vocabularies for dataset description to support their discovery.
- Understands that to be "dereferencable", a URI should be usable to retrieve a representation of the resource it identifies.
- Understands that to be "persistent", a URI must have a stable, well-documented meaning and be plausibly intended to identify a given resource in perpetuity.
- Understands trade-offs between "opaque" URIs and URIs using version numbers, server names, dates, application-specific file extensions, query strings or other obsoletable context.
- Uses available resources for named entity recognition, extraction, and reconciliation.