Ieben Smessaert, Arthur Vercruysse, Julián Rojas Meléndez, Pieter Colpaert,
SEMANTiCS 2025, September 3-5, 2025
Ghent University – imec – IDLab, Belgium
SEMANTiCS 2025, September 3, 2025
You will learn the motivation behind RDF-Connect, its conceptual model and architecture, by following a running example of a knowledge graph lifecycle pipeline.
You will implement a ML-based processor and integrate it in the knowledge graph lifecycle pipeline.
A pipeline of a knowledge graph lifecycle process, where weather data (from an Austrian API service) will be collected, transformed into RDF, validated against a SHACL shape, enriched and published on a RDF graph store.
We want to fetch the GeoSphere weather data API,
and log its contents to the console.
Install the orchestrator, runner, and processors:
npm install @rdfc/orchestrator-js
npm install @rdfc/js-runner
npm install @rdfc/http-utils-processor-ts
npm install @rdfc/log-processor-ts
Add the prefixes rdfc, owl, ex
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix ex: <http://example.org/>.
Declare the RDF-Connect pipeline
<> a rdfc:Pipeline.
Import definition via owl:imports
<> owl:imports <./node_modules/@rdfc/js-runner/index.ttl>.
Attach it to the pipeline declaration
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:NodeRunner;
].
Import definition via owl:imports
<> owl:imports <./node_modules/@rdfc/http-utils-processor-ts/processors.ttl>.
Define the channel
<json> a rdfc:Reader, rdfc:Writer.
Define the processor instantiation
<fetcher> a rdfc:HttpFetch;
rdfc:url "https://dataset.api.hub.geosphere.at/v1/station/current/tawes-v1-10min?parameters=TL,RR&station_ids=11035";
rdfc:writer <json>.
Attach the processor to the runner
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:NodeRunner;
rdfc:consistsOf <fetcher> ].
Import definition via owl:imports
<> owl:imports <./node_modules/@rdfc/log-processor-ts/processor.ttl>.
Define the processor instantiation
<logger> a rdfc:LogProcessorJs;
rdfc:reader <json>;
rdfc:level "info";
rdfc:label "output".
Attach the processor to the runner
[ rdfc:instantiates rdfc:NodeRunner;
rdfc:consistsOf <fetcher>, <logger> ].
npx rdfc pipeline.ttl
# or with debug logging:
LOG_LEVEL=debug npx rdfc pipeline.ttl
✅ Solution available in task-1 branch.
They enable the transformation, integration, and analysis of data from and to various sources and targets.
However, building, managing and reusing these pipelines can be complex and challenging.
Traditional batch processing systems suffer from latency problems due to the need to collect input data into batches before it can be processed. — Isah, H., et al., A Survey of Distributed Data Stream Processing Frameworks, IEEE Access, 2019
Current real-world data systems often require real-time or near-real-time processing of dynamic data. Stream processing allows for the continuous ingestion and processing of data as it arrives, enabling timely insights and actions.
The ability to execute applications written in different programming languages in an integrated manner offers several advantages:
Scientists want to use provenance data to answer questions such as: Which data items were involved in the generation of a given partial result? or Did this actor employ outputs from one of these two other actors? — Cuevas-Vicentin, V., et al., Scientific Workflows and Provenance: Introduction and Research Opportunities, Datenbank Spektrum, 2012
Provenance is instrumental to activities such as traceability, reproducibility, accountability, and quality assessment. — Herschel, M., et al., A Survey on Provenance: What for? What form? What from?, VLDB, 2017
Prospective provenance—the execution plan—is essentially the workflow itself: it includes a machine-readable specification with the processing steps to be performed and the data and software dependencies to carry out each computation. — Simone, L., et al., Recording provenance of workflow runs with RO-Crate, PLoS ONE, 2024
Common Workflow Language (CWL) is an open standard for describing how to run command line tools and connect them to create workflows.
Feature | RDF-Connect | CWL |
---|---|---|
Streaming | Supports both batch and streaming via gRPC streams | Primarily batch-oriented, although implementation-dependent streaming can be supported (e.g,. using named pipes) |
Polyglot | Supports any language through an add-in libraries approach | Can accomodate polylingual workflows via POSIX CLI interfaces |
Provenance | Built-in prospective and retrospective provenance tracking based on PROV-O | Provenance extension available (CWLProv) based on PROV-O |
Schema expressivity | Full SHACL-based expressivity | Set of defined types and limited constraint definitions |
TODO: Diagram with generic architecture overview. I imagine a simple layered architecture having:
TODO: Diagram with RDFC ontology showing main concepts and relations (similar to the one in the ISWC paper)
A pipeline is described in RDF configuration files:
Each runner and processor comes with a SHACL shape.
These shapes serve as the glue of RDF-Connect:
Currently, runners exist for JavaScript, JVM, and Python.
Example of a runner configuration in RDF (Turtle):
DEBUG=:fetcher npx rdfc pipeline.ttl
npm install
build.gradle
uv add
(or pip
install
)
Follow allong in the GitHub repository.
All tasks are in the README. Each branch is a solution to a task!
open.gent/r/semantics-repo
Follow along on branch task-1, or jump to the slides for a recap.
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix ex: <http://example.org/>.
Start the orchestrator with the configuration file:
npx rdfc pipeline.ttl
Follow allong in the GitHub repository.
All tasks are in the README. Each branch is a solution to a task!
open.gent/r/semantics-repo
Install the additionally required processors:
npm install @rdfc/file-utils-processors-ts
npm install @rdfc/shacl-processor-ts
npm install @rdfc/sparql-ingest-processor-ts
Add the required dependency to your Gradle build file:
plugins { id 'java' }
repositories {
mavenCentral()
maven { url = uri("https://jitpack.io") }
}
dependencies {
implementation("com.github.rdf-connect:rml-processor-jvm:master-SNAPSHOT:all")
}
tasks.register('copyPlugins', Copy) {
from configurations.runtimeClasspath
into "$buildDir/plugins"
}
Install jars with
gradle copyPlugins.
The jvm-runner downloads the jvm-runner itselve, no installing required.
If you do not want to use Gradle, you can also download the jars manually and put them in the build/plugins/ folder.
wget 'jitpack.io/com/github/rdf-connect/rml-processor-jvm/master-SNAPSHOT/rml-processor-jvm-master-SNAPSHOT-all.jar'
Start the orchestrator with the configuration file:
npx rdfc pipeline.ttl
🛠️ Follow Part 1 in the repo (up to Task 4)
⏰ We continue with the presentation at 13:45
🙋 Ask questions!
Follow allong in the GitHub repository.
All tasks are in the README. Each branch is a solution to a task!
open.gent/r/semantics-repo
init
, transform
, and produce
methods
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix sh: <http://www.w3.org/ns/shacl#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
sh:targetClass links back to the IRI used on previous slide
Custom class for readers and writers: rdfc:Reader & rdfc:Writer
sh:name links to variable name in code
sh:path links to property in pipeline.ttl
Optional and multiple arguments with sh:minCount and sh:maxCount
(sh:maxCount != 1 results in a list of arguments)
TypeScript
https://github.com/rdf-connect/template-processor-ts
transform
method: consume the reader channelSet up the pyproject.toml for your pipeline
Configure specific Python version to have a deterministic path to the dependencies.
Add the rdfc-runner as a dependency.
Install the runner:
uv add rdfc_runner
Import definition via owl:imports
<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_runner/index.ttl>.
Attach it to the pipeline declaration
<> a rdfc:Pipeline;
rdfc:consistsOf [...], [
rdfc:instantiates rdfc:PyRunner;
rdfc:processor <translator>
].
Install your local processor after hatch build
uv add ../processor/dist/rdfc_translation_processor.tar.gz
Import definition via owl:imports
<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_translation_processor/processor.ttl>.
Define the channel
<translated> a rdfc:Reader, rdfc:Writer.
Define the processor instantiation
<translator> a rdfc:TranslationProcessor;
rdfc:reader <rdf>;
rdfc:writer <translated>;
... .
Follow allong in the GitHub repository.
All tasks are in the README. Each branch is a solution to a task!
open.gent/r/semantics-repo
🛠️ Follow Part 2 in the repo (Task 5 - 7)
🙋 Ask questions!
Thank you!
We sincerely hope you enjoyed this tutorial
and found it valuable.