Configuration
WARNING
This page is still under construction and will be updated soon.
A pipeline typically consists of a Turtle file, although any common RDF serialization is supported, which defines the pipeline configuration. A pipeline is composed of multiple components, including runners, processors, readers, and writers. To be able to use these components, they need to be imported into the pipeline configuration using owl:imports
statements. The object of the owl:imports
statement is a URL that points to the Turtle file containing the semantic component definitions in your locally installed package using your package manager (e.g., in your node_modules
folder for Node.js).
Configuring a Pipeline
A pipeline is usually configured in a Turtle file called pipeline.ttl
or similar. The pipeline is defined as an instance of the rdfc:Pipeline
class. However, when a pipeline instance is defined, it won't do much yet. To make the pipeline functional, it needs to be configured with runners and processors.
The most simple pipeline (which does nothing) looks like this:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
### Define the channels
### Define the pipeline
<> a rdfc:Pipeline.
### Define the processors
A pipeline configuration consists of one or more runner-processors pairs, each including a reference to a runner and to one or more processor instantiations. Such a pair is typically contained within a blank node, which is then linked to the rdfc:Pipeline
instance using the rdfc:consistsOf
property. The runner is referenced in the blank node using the rdfc:instantiates
property, which points to the runner's class IRI. The processors are referenced in the blank node using the rdfc:processor
property, which points to the processor instantiation's IRI and are typically defined in the same Turtle file. One triple with the rdfc:processor
property is needed for each processor you want to attach to the runner.
Below, an explanation is given of how to configure runners and processors in a pipeline configuration.
Configuring a Runner
A runner is attached to a pipeline using a blank node, which is linked to the rdfc:Pipeline
instance using the rdfc:consistsOf
property. The blank node represents a runner-processors pair and contains the rdfc:instantiates
property, which points to the runner's class IRI. Only one rdfc:consistsOf
triple is needed for each runner you want to attach to the pipeline, and each processor you want to attach to the runner needs its own rdfc:processor
triple in the same blank node.
Before the runner can be used, we need to make sure the orchestrator knows about it. Therefore, the runner needs to be installed, and its semantic definition needs to be imported into the pipeline configuration using an owl:imports
statement.
How you do this depends on the execution environment you are using.
Example: Configuring a rdfc:NodeRunner
for JavaScript/TypeScript Processors
The RDF-Connect ecosystem already offers a js-runner package published on npm, which provides a rdfc:NodeRunner
class for executing JavaScript/TypeScript processors using Node.js, as well as a rdfc:BunRunner
class for executing JavaScript/TypeScript processors using Bun. To use the rdfc:NodeRunner
in your pipeline configuration, you first need to install the @rdfc/js-runner
package using npm:
npm install @rdfc/js-runner
Next, you need to import the runner's semantic definition into your pipeline configuration using an owl:imports
statement:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <./node_modules/@rdfc/js-runner/index.ttl>.
Now you can attach the rdfc:NodeRunner
to your pipeline configuration using a blank node:
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:NodeRunner;
].
Next, you can add one or more processors to the runner-processors pair using the rdfc:processor
property. You can directly jump to the next section to learn how to configure a processor, or you can hang on to learn more about configuring a runner in different environments.
Example: Configuring a rdfc:PyRunner
for Python Processors
The RDF-Connect ecosystem already offers a py-runner package published on PyPI, which provides a rdfc:PyRunner
class for executing Python processors using Python.
To manage your project environment in the case of Python, we highly recommend using a pyproject.toml
file and a virtual environment. This ensures that you can pin the Python version and the package versions you are using in your project. Especially a specific Python version is important here, as the path to the installed package contains the Python version and a deterministic path is thus needed in the owl:imports
statement.
You can start from the following pyproject.toml
file:
[project]
name = "your-rdf-connect-pipeline"
version = "0.0.1"
description = "RDF-Connect pipeline for your use case."
requires-python = "==3.13.*"
dependencies = [
"rdfc-runner>=0.0.3",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.build.targets.wheel]
packages = ["resources"]
[tool.hatch.envs.default]
type = "virtual"
path = ".venv"
system-packages = false
installer = "uv"
env-vars = { PYTHONPATH = "src" }
To use the rdfc:PyRunner
in your pipeline configuration, you first need to install the rdfc_runner
package using pip or uv:
uv add rdfc_runner
Next, you need to import the runner's semantic definition into your pipeline configuration using an owl:imports
statement:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_runner/index.ttl>.
Now you can attach the rdfc:PyRunner
to your pipeline configuration using a blank node:
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:PyRunner;
].
Next, you can add one or more processors to the runner-processors pair using the rdfc:processor
property. You can directly jump to the next section to learn how to configure a processor, or you can hang on to learn more about configuring a runner in different environments.
Example: Configuring a rdfc:JvmRunner
for Java/Kotlin Processors
The RDF-Connect ecosystem already offers a jvm-runner package published through JitPack, which provides a rdfc:JvmRunner
class for executing Java/Kotlin processors using the Java Virtual Machine (JVM). To use the rdfc:JvmRunner
in your pipeline configuration, you need to import the runner's semantic definition from JitPack into your pipeline configuration using an owl:imports
statement:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <https://javadoc.jitpack.io/com/github/rdf-connect/jvm-runner/runner/master-SNAPSHOT/runner-master-SNAPSHOT-index.jar>.
Now you can attach the rdfc:JvmRunner
to your pipeline configuration using a blank node:
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:JvmRunner;
].
Next, you can add one or more processors to the runner-processors pair using the rdfc:processor
property. You can directly jump to the next section to learn how to configure a processor, or you can hang on to learn more about configuring multiple runners in the same pipeline configuration.
Example: Configuring Multiple Runners in the Same Pipeline Configuration
You can configure multiple runner-processors pairs in the same pipeline configuration by adding multiple blank nodes to the rdfc:consistsOf
property of the rdfc:Pipeline
instance. This allows you to create cross-environment pipelines that can execute processors written in different programming languages.
An example of a pipeline configuration with both a rdfc:NodeRunner
, a rdfc:PyRunner
, and a rdfc:JvmRunner
is shown below:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <./node_modules/@rdfc/js-runner/index.ttl>.
<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_runner/index.ttl>.
<> owl:imports <https://javadoc.jitpack.io/com/github/rdf-connect/jvm-runner/runner/master-SNAPSHOT/runner-master-SNAPSHOT-index.jar>.
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:NodeRunner;
], [
rdfc:instantiates rdfc:PyRunner;
], [
rdfc:instantiates rdfc:JvmRunner;
].
Next, you can add one or more processors to each runner-processors pair using the rdfc:processor
property. You will learn how to do this in the next section.
Configuring a Processor
A processor will perform a specific data processing task in the pipeline and is thus a crucial component. However, the processor needs to be executed. This is the responsibility of the runner, which is why a processor always needs to be attached to a specific runner in a pipeline configuration. To attach a processor to a runner, you need to add an rdfc:processor
triple to the blank node that represents the runner-processors pair. The object of the rdfc:processor
triple is the IRI of the processor instantiation, which is typically defined in the same Turtle file.
Before the processor can be used, we need to make sure the orchestrator knows about it. Therefore, the processor needs to be installed, and its semantic definition needs to be imported into the pipeline configuration using an owl:imports
statement.
How you do this, again, depends on the programming language the processor is implemented in. We will show you how to configure a processor in JavaScript/TypeScript, Python, and Java/Kotlin.
Example: Configuring a JavaScript/TypeScript Processor
The RDF-Connect ecosystem already offers many JavaScript/TypeScript processors. We provide a handy way to discover existing processors on the Discover Existing Work page.
Imagine you want to use the rdfc:LogProcessorJs published on npm in your pipeline configuration. To use the rdfc:LogProcessorJs
in your RDF-Connect pipeline, you first need to install the @rdfc/log-processor-ts
package using npm:
npm install @rdfc/log-processor-ts
Next, you need to import the processor's semantic definition into your pipeline configuration using an owl:imports
statement:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <./node_modules/@rdfc/log-processor-ts/processor.ttl>.
Now you can attach the rdfc:LogProcessorJs
to the rdfc:NodeRunner
in your pipeline configuration using the rdfc:processor
property in the blank node that represents the runner-processors pair:
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:NodeRunner;
rdfc:processor <logger>;
].
Next, you need to define the channels the processor will use to read and write data. As the log processor will read data from one channel, log it to the RDF-Connect logging system, and write the data to another channel, you need to define two channels in your pipeline configuration:
### Define the channels
<channel1> a rdfc:Writer, rdfc:Reader.
<channel2> a rdfc:Writer, rdfc:Reader.
Finally, you need to define and configure the processor instantiation itself.
### Define the processors
# Processor to log the output
<logger> a rdfc:LogProcessorJs;
rdfc:reader <channel1>;
rdfc:writer <channel2>;
rdfc:level "info";
rdfc:label "output".
The complete pipeline configuration looks like this:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <./node_modules/@rdfc/js-runner/index.ttl>.
<> owl:imports <./node_modules/@rdfc/log-processor-ts/processor.ttl>.
### Define the channels
<channel1> a rdfc:Writer, rdfc:Reader.
<channel2> a rdfc:Writer, rdfc:Reader.
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:NodeRunner;
rdfc:processor <logger>;
].
### Define the processors
# Processor to log the output
<logger> a rdfc:LogProcessorJs;
rdfc:reader <channel1>;
rdfc:writer <channel2>;
rdfc:level "info";
rdfc:label "output".
Example: Configuring a Python Processor
Imagine you want to use the rdfc:LogProcessorPy published on PyPI in your pipeline configuration. To use the rdfc:LogProcessorPy
in your RDF-Connect pipeline, you first need to install the rdfc_log_processor
package using pip or uv:
uv add rdfc_log-processor
Next, you need to import the processor's semantic definition into your pipeline configuration using an owl:imports
statement. We again assume you have a virtual environment set up in the .venv
folder and that you are using Python 3.13.
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_log_processor/processor.ttl>.
Now you can attach the rdfc:LogProcessorPy
to the rdfc:PyRunner
in your pipeline configuration using the rdfc:processor
property in the blank node that represents the runner-processors pair:
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:PyRunner;
rdfc:processor <logger>;
].
Next, you need to define the channels the processor will use to read and write data. As the log processor will read data from one channel, log it to the RDF-Connect logging system, and write the data to another channel, you need to define two channels in your pipeline configuration:
### Define the channels
<channel1> a rdfc:Writer, rdfc:Reader.
<channel2> a rdfc:Writer, rdfc:Reader.
Finally, you need to define and configure the processor instantiation itself.
### Define the processors
# Processor to log the output
<logger> a rdfc:LogProcessorPy;
rdfc:reader <channel1>;
rdfc:writer <channel2>;
rdfc:level "info";
rdfc:label "output".
The complete pipeline configuration looks like this:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_runner/index.ttl>.
<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_log_processor/processor.ttl>.
### Define the channels
<channel1> a rdfc:Writer, rdfc:Reader.
<channel2> a rdfc:Writer, rdfc:Reader.
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:PyRunner;
rdfc:processor <logger>;
].
### Define the processors
# Processor to log the output
<logger> a rdfc:LogProcessorPy;
rdfc:reader <channel1>;
rdfc:writer <channel2>;
rdfc:level "info";
rdfc:label "output".
Example: Configuring a Java/Kotlin Processor
Imagine you want to use the rdfc:TestProcessor implemented in the template-processor-jvm repository in your pipeline configuration, which logs the data from a channel and writes it to another channel. To use the rdfc:TestProcessor
in your RDF-Connect pipeline, you first need to install the rdfc:TestProcessor
package using Maven or Gradle.
Create a build.gradle
file with the following content:
plugins {
id 'java'
}
repositories {
mavenCentral()
maven { url = uri("https://jitpack.io") } // if your processors are on GitHub
}
dependencies {
implementation("com.github.rdf-connect:template-processor-jvm:master-SNAPSHOT:all")
}
tasks.register('copyPlugins', Copy) {
from configurations.runtimeClasspath
into "$buildDir/plugins"
}
configurations.all {
resolutionStrategy.cacheChangingModulesFor 0, 'seconds'
}
Then run the following command to download the dependencies:
gradle copyPlugins
If you do not want to use Gradle, you can also download the rdfc:TestProcessor
JAR file directly from JitPack and place it in the build/plugins
folder in your project directory.
Next, you need to import the processor's semantic definition into your pipeline configuration using an owl:imports
statement:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <./build/plugins/template-processor-jvm-master-SNAPSHOT-all.jar>.
Now you can attach the rdfc:TestProcessor
to the rdfc:JvmRunner
in your pipeline configuration using the rdfc:processor
property in the blank node that represents the runner-processors pair:
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:JvmRunner;
rdfc:processor <logger>;
].
Next, you need to define the channels the processor will use to read and write data. As the log processor will read data from one channel, log it to the RDF-Connect logging system, and write the data to another channel, you need to define two channels in your pipeline configuration:
### Define the channels
<channel1> a rdfc:Writer, rdfc:Reader.
<channel2> a rdfc:Writer, rdfc:Reader.
Finally, you need to define and configure the processor instantiation itself.
### Define the processors
# Processor to log the output
<logger> a rdfc:TestProcessor;
rdfc:reader <channel1>;
rdfc:writer <channel2>;
rdfc:name "World".
The complete pipeline configuration looks like this:
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
### Import runners and processors
<> owl:imports <https://javadoc.jitpack.io/com/github/rdf-connect/jvm-runner/runner/master-SNAPSHOT/runner-master-SNAPSHOT-index.jar>.
<> owl:imports <./build/plugins/template-processor-jvm-master-SNAPSHOT-all.jar>.
### Define the channels
<channel1> a rdfc:Writer, rdfc:Reader.
<channel2> a rdfc:Writer, rdfc:Reader.
### Define the pipeline
<> a rdfc:Pipeline;
rdfc:consistsOf [
rdfc:instantiates rdfc:JvmRunner;
rdfc:processor <logger>;
].
### Define the processors
# Processor to log the output
<logger> a rdfc:TestProcessor;
rdfc:reader <channel1>;
rdfc:writer <channel2>;
rdfc:name "World".