Building Streaming and Cross-Environment Data Processing Pipelines with RDF-Connect

Ghent University – imec – IDLab, Belgium

Tutorial website: open.gent/r/iswc-rdfc

ISWC 2025, November 2, 2025

Download dependencies while we're waiting


# Clone the tutorial repository
git clone git@github.com:rdf-connect/nara-weather-forecast-kg-pipeline.git
cd nara-weather-forecast-kg-pipeline/pipeline/resources

# Start the development container
docker compose up -d

# Open a shell inside the container
docker compose exec devbox bash

# Prepare the Python environment for the processor
cd processor/
hatch env create
hatch shell

⏳ These commands preload dependencies and images in the background so everything is ready when we start coding.
✅ You’ll end up inside a fully configured devbox environment.

QR code for tutorial guideline repository

open.gent/r/iswc-rdfc-repo

Agenda

Morning Session 1: Introduction
Coffee break ☕
Morning Session 2: Architecture
- RDF-Connect Architecture & Components ⚙️ (11:00h)
- Assembling a pipeline 🔗 (11:30h)
Lunch break 🍣
Afternoon Session 1: Roadmap
- Implementing a custom processor 🏗️ (13:30)
- What is next for RDF-Connect? 🛫 (14:30h)
Coffee break ☕
Afternoon Session 2: Hackathon 🧑‍💻

Morning Session 1: Introduction

Tutorial overview

Theory and practice intertwined

You will learn the motivation behind RDF-Connect, its conceptual model, architecture and roadmap. All while setting up, extending and running an example RDF-Connect pipeline.

Example pipeline: An RDF KG lifecycle

A pipeline of a knowledge graph lifecycle process, where weather data (from the Japanese meteorological API service) will be collected, transformed into RDF, enriched, validated against a SHACL shape, and published on a RDF graph store.

Tutorial resources

Tutorial Website

The tutorial website has a complete description and motivation for the tutorial. There you may also find all the resources you need to follow along, including these slides.

Developer resources

We have prepared a GitHub repository containing a step-by-step guide (split over dedicated branches) that will allow you to start and check the result of any task of the tutorial at any time.

Morning Session 1: Introduction

Assembling our very first pipeline

We want to [fetch data from the JMA meteorological forecast API] and [log its contents] to the console.

Follow on our step-by-step
tutorial code repository

Boilerplate template to get started
Detailed instructions for each task
Incremental solutions
Links to further resources

open.gent/r/iswc-rdfc-repo

Step 0: Choose your environment

Run locally or in a containerized environment:


# Build and run the Docker image
cd pipeline/resources
docker compose up -d

# Access the devbox container
docker compose exec devbox bash
cd pipeline/

# You can now run commands like `npm install` or `npx rdfc pipeline.ttl`
# inside the container

This way you avoid having to install running environments for Python, Node.js, Java, etc.

Step 1: Setup

Install the orchestrator, runner, and processors:


npm install @rdfc/orchestrator-js
npm install @rdfc/js-runner
npm install @rdfc/http-utils-processor-ts
npm install @rdfc/log-processor-ts

🛠️ The log processor is also available in Python.
You can swap it to see cross-language interoperability in action!

Step 2: Initialize pipeline.ttl

Add the prefixes rdfc, owl, ex


@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix ex: <http://example.org/>.

Declare the RDF-Connect pipeline


<> a rdfc:Pipeline.

Step 3: Add the rdfc:NodeRunner

Import definition via owl:imports


<> owl:imports <./node_modules/@rdfc/js-runner/index.ttl>.

Attach it to the pipeline declaration


<> a rdfc:Pipeline;
   rdfc:consistsOf [
       rdfc:instantiates rdfc:NodeRunner;
   ].

Step 4: Add the rdfc:HttpFetch processor

Import definition via owl:imports


<> owl:imports <./node_modules/@rdfc/http-utils-processor-ts/processors.ttl>.

Define the channel


<json> a rdfc:Reader, rdfc:Writer.

Define the processor instantiation


<fetcher> a rdfc:HttpFetch;
    rdfc:url "https://www.jma.go.jp/bosai/forecast/data/overview_forecast/290000.json";
    rdfc:writer <json>.

Attach the processor to the runner


<> a rdfc:Pipeline;
   rdfc:consistsOf [
       rdfc:instantiates rdfc:NodeRunner;
       rdfc:processor <fetcher> ].

Step 5: Add the rdfc:LogProcessorJs

Import definition via owl:imports


<> owl:imports <./node_modules/@rdfc/log-processor-ts/processor.ttl>.

Define the processor instantiation


<logger> a rdfc:LogProcessorJs;
    rdfc:reader <json>;
    rdfc:level "info";
    rdfc:label "output".

Attach the processor to the runner


[ rdfc:instantiates rdfc:NodeRunner;
  rdfc:processor <fetcher>, <logger> ].

Step 6: Run the pipeline


npx rdfc pipeline.ttl
# or with debug logging:
LOG_LEVEL=debug npx rdfc pipeline.ttl

✅ Solution available in task-1 branch.

Now try task 0 & 1 yourself!

Setup
Initialize pipeline.ttl
Add the rdfc:NodeRunner
Add the rdfc:HttpFetch processor
Add the rdfc:LogProcessorJs
Run the pipeline

open.gent/r/iswc-rdfc-repo

Morning Session 1: Introduction

Data processing pipelines are crucial in modern data-centric systems

They enable the transformation, integration, and analysis of data from and to various sources and targets.

Pipelines are usually composed of multiple complex tasks

However, building, managing and reusing these pipelines can be complex and challenging.

Pipelines are ubiqutous in real world systems

🏭 ETL pipelines for data warehousing
🔗 Data integration pipelines for combining data from multiple sources
🤖 Machine learning pipelines for training and deploying models
☄️ Real-time data processing pipelines for streaming data

Our main motivation use case

We faced again and again the challenges of handling the lifecycle of Knowledge Graphs in multiple domains.

Stream processing computational paradigm for continuous and dynamic data systems

Traditional batch processing systems suffer from latency problems due to the need to collect input data into batches before it can be processed. — Isah, H., et al., A Survey of Distributed Data Stream Processing Frameworks, IEEE Access, 2019

Current real-world data systems often require real-time or near-real-time processing of dynamic data. Stream processing allows for the continuous ingestion and processing of data as it arrives, enabling timely insights and actions.

Cross-environment execution: choosing the best of all worlds

The ability to execute applications written in different programming languages in an integrated manner offers several advantages:

- Flexibility: Developers can choose the most suitable language for each component of the application based on its strengths and capabilities.
- Code reuse: Existing libraries and frameworks can be leveraged, reducing development time and effort.
- Scalability: Components can be scaled independently based on their specific requirements, allowing for efficient resource utilization.
- Efficiency: Not all parts of an application require the same level of optimization.

Declarative, reusable and provenance-aware data processing

Scientists want to use provenance data to answer questions such as: Which data items were involved in the generation of a given partial result? or Did this actor employ outputs from one of these two other actors? — Cuevas-Vicentin, V., et al., Scientific Workflows and Provenance: Introduction and Research Opportunities, Datenbank Spektrum, 2012

Provenance is instrumental to activities such as traceability, reproducibility, accountability, and quality assessment. — Herschel, M., et al., A Survey on Provenance: What for? What form? What from?, VLDB, 2017

Prospective provenance—the execution plan—is essentially the workflow itself: it includes a machine-readable specification with the processing steps to be performed and the data and software dependencies to carry out each computation. — Simone, L., et al., Recording provenance of workflow runs with RO-Crate, PLoS ONE, 2024

RDF-Connect requirements overview

Aren't there like a million pipeline frameworks already?

Aren't there like a million pipeline frameworks already?

https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems https://github.com/pditommaso/awesome-pipeline

Enter RDF-Connect

RDF-Connect — Key features

⚡️ Streaming-first: event-based design supporting both batch and streaming workloads.
🌐 Polyglot execution: run workflows transparently across JavaScript, Python, JVM and more.
📜 Declarative pipelines: pipelines are described in RDF based on a lightweight vocabulary.
✨ SHACL-driven contracts: each runner/processor ships a SHACL shape for constraint validation.
🧭 Provenance-aware: prospective and retrospective (PROV-O) provenance are first-class.
♻️ Reusable processors: native add-in libraries for easy extension and cross-language reuse.
📡 Uniform communication: protocol-based communication for interoperable data flows.

A worthy mention:
Common Workflow Language

Common Workflow Language (CWL) is an open standard for describing how to run command line tools and connect them to create workflows.

How is RDF-Connect different from CWL?

Feature	RDF-Connect	CWL
Streaming support	Event-based design that supports both batch and streaming paradigms	Primarily batch-oriented, although implementation-dependent streaming can be supported (e.g,. using named pipes)
Polyglot	Supports any language through an add-in libraries approach	Can accomodate polylingual workflows via POSIX CLI interfaces
Provenance	Built-in semantic prospective and retrospective provenance tracking based on PROV-O	Retrospective provenance extension available (CWLProv) based on PROV-O
Schema expressivity	Full SHACL-based expressivity	Set of defined types and limited constraint definitions

Another worthy mention:
Workflow Run RO-Crate profiles

Workflow Run RO-Crate profiles provide a semantic way to describe workflows including:

- Process Run Crate: to describe the execution of one or more tools that contribute to a computation;
- Workflow Run Crate: to describe a computation orchestrated by a predefined workflow;
- Provenance Run Crate: to describe a workflow computation including the internal details of individual step executions

Coffee break! ☕

Agenda

Morning Session 1: Introduction
Coffee break ☕
Morning Session 2: Architecture
- RDF-Connect Architecture & Components ⚙️ (11:00h)
- Assembling a pipeline 🔗 (11:30h)
Lunch break 🍣
Afternoon Session 1: Roadmap
- Implementing a custom processor 🏗️ (13:30)
- What is next for RDF-Connect? 🛫 (14:30h)
Coffee break ☕
Afternoon Session 2: Hackathon 🧑‍💻

Morning Session 2: Architecture
- RDF-Connect Architecture & Components ⚙️ (11:00h)
- Assembling a pipeline 🔗 (11:30h)

Running example: the goal

Retrieve JSON data from JMA weather forecast API
Data is transformed into RDF using RML
Translate language-typed literals to another language
RDF data is validated against a shape
RDF data is published through a triple store

High-level architecture overview

Pipeline File Structure

A pipeline is described in RDF configuration files:

🔗 Channels
Define how data flows between processors.
📦 Runners
Specify which runtime environments are needed.
⚙️ Processors
Tasks that run inside a runner.

RDF-Connect data model overview

RDF-Connect logical inference over a processor

General definitions in the RDFC ontology:


# Processor class definition
rdfc:Processor a rdfs:Class;
    rdfs:subClassOf prov:Activity.

rdfc:implementationOf a rdf:Property;
    rdfs:subPropertyOf rdfs:subClassOf.

# Property for JavaScript processors
rdfc:jsImplementationOf a rdf:Property;
    rdfs:subPropertyOf rdfc:implementationOf.


# JavaScript Runner definition
<myJSRunner> a rdfc:Runner;
    rdfc:handlesSubjectsOf rdfc:jsImplementationOf;
    rdfc:command "npx js-runner".

RDF-Connect logical inference over a processor

Concrete processor definition:


# Language-specific processor definition
ex:LogProcessorJS rdfc:jsImplementationOf rdfc:Processor.
  rdfs:label "Simple Log Processor for JavaScript";
  rdfs:comment "Logs incoming messages";
  rdfc:entrypoint <./>;
  rdfc:file <./lib/util_processors.js>;
  rdfc:class "LogProcessor".


# Processor instantiation in pipeline
_:p1 a ex:LogProcessorJS;
  ...

Following the simple entailment relations, we obtain that:

check it online


rdfc:jsImplementationOf rdfs:subPropertyOf rdfs:subClassOf.

ex:LogProcessorJS rdfc:implementationOf rdfc:Processor;
  rdfs:subClassOf rdfc:Processor;
  rdfs:subClassOf prov:Activity.

_:p1 a rdfc:Processor, prov:Activity.

Pipeline design of running example

Deep Dive: SHACL Shapes

Each runner and processor comes with a SHACL shape.
These shapes serve as the glue of RDF-Connect:

✅ Validation of correctness
Ensure pipeline definitions are consistent before execution.
🔄 Mapping RDF → JSON(-LD)
Define programming interface, a SHACL shape ensures the expected incoming JSON arguments.
📖 Documentation
SHACL shapes double as a human-readable and computer-readable contract for processor usage.

Deep Dive: SHACL Shapes Example

Deep Dive Orchestrator: Overview

📂 Understand pipeline configuration
Resolve the full pipeline, including any imported modules.
✅ Validate pipeline
Use SHACL shapes to check that the pipeline definition is correct.
🔌 Communicate with runners
Interact with runners via gRPC to control processors and exchange messages.
📊 Centralized logging
Collect logs from all runners and processors for monitoring and debugging.

Deep Dive Orchestrator: Responsibilities

📜 Initialize Pipeline
1. ▶️ Start runners
  Launch each runner using its configured command.
2. ⚙️ Initialize processors
  Instruct runners to start the processors they manage.
📡 Route messages
Deliver incoming messages to the correct runner / processor.

Deep Dive Orchestrator: Message Types

✉️ Single messages
Small payloads that fit in a single frame.
🌊 Streaming messages
Continuous streams for large data that can’t fit in one frame.

Design decision: channels connect processors 1 to 1.

Message Types: Single Message

Send the message
Process the message
Acknowledge message processed

Message Types: Streaming Message

Deep Dive Runner: Overview

🤝 Bridge between processors and orchestrator
Make it possible to combine processors written in different programming languages.
🌱 Lower the barrier for new processors
Runners are designed so the community can easily add their own processors. Providing idiomatic interfaces for sending and receiving messages.

Currently, runners exist for JavaScript, JVM, and Python.

Deep Dive Runner: Responsibilities

▶️ Start from command line
Runners can be launched as standalone processes.
🔌 Connect with orchestrator via gRPC
Handle control messages and data exchange.
⚙️ Manage processors
Start, stop, and monitor the processors they host and forward log messages.

Deep Dive Runner: Example

Example of a runner configuration in RDF (Turtle):

Deep Dive Processor: Overview

⚡ Unit of computation inside a pipeline
📥 Receives arguments mapped from RDF via SHACL
📤 Produces outputs (RDF, JSON, etc.)
🛠 Can do anything from calling APIs to generic tasks like HTTP POST
🏗 The runner defines the implementation contract including abstract classes, interfaces, etc.

Deep Dive: Centralized Logging

📊 One place for all logs
The orchestrator aggregates logs from every runner and processor.
🧩 Idiomatic in every language
Keep using your ecosystem’s standard logger; runners forward those logs to the orchestrator.
🎯 Targeted DEBUG
Show debug output only for a specific component by using its name from the pipeline descriptor.
Example (bash): DEBUG=:fetcher npx rdfc pipeline.ttl
🌐 Easy to extend
Updating the orchestrator to forward messages to hubs like Elastic Stack (ELK), Grafana Loki, or a hosted service like Datadog is trivial.

JavaScript Ecosystems

⚡ Processor extends an abstract class
⚙️ Runner provides Readers & Writers to handle messages idiomatically
📦 Processors can be published to npm with their config
📥 Pipelines install processors via npm install

Java Ecosystems

⚡ Processor extends an abstract class
⚙️ Runner provides Readers & Writers to handle messages idiomatically
📦 Processors published via GitHub, included in fat jar with config
📥 Pipelines include Jitpack link in build.gradle

Python ecosystem

⚡ Processor extends an abstract class
⚙️ Runner provides Readers & Writers to handle messages idiomatically
📦 Processors can be published to PyPI with their config
📥 Pipelines include processors with uv add (or pip install)

Morning Session 2: Architecture
- RDF-Connect Architecture & Components ⚙️ (11:00h)
- Assembling a pipeline 🔗 (11:30h)

Hands-On

🌦 HTTP Fetch → Log contents
🔄️ Weather API → RDF → Log
🧩️ Weather API → RDF → Validation → Log
🚀️ Weather API → RDF → Validation → Publish → Log
🤖 Implement your own ML processor in Python
✅ Weather API → RDF → Translation → Validation → Publish → Log

Follow allong in the GitHub repository.
All tasks are in the README. Each branch is a solution to a task!
open.gent/r/iswc-rdfc-repo

What we already did this morning

Follow along on branch task-1, or jump to the slides for a recap.

What we already did this morning


@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix ex: <http://example.org/>.

Recap: Running the pipeline

Start the orchestrator with the configuration file:


npx rdfc pipeline.ttl

✅ You should see the HTTP contents being logged.

Hands-On: Pipeline

🌦 HTTP Fetch → Log contents
🔄️ Weather API → RDF → Log
🧩️ Weather API → RDF → Validation → Log
🚀️ Weather API → RDF → Validation → Publish → Log

Follow allong in the GitHub repository.
All tasks are in the README. Each branch is a solution to a task!
open.gent/r/iswc-rdfc-repo

Pipeline design: Weather KG

Weather KG Pipeline: JavaScript Setup

Install the additionally required processors:


npm install @rdfc/file-utils-processors-ts
npm install @rdfc/shacl-processor-ts
npm install @rdfc/sparql-ingest-processor-ts

Weather KG Pipeline: Java Setup

Add the required dependency to your Gradle build file:


plugins { id 'java' }
repositories {
    mavenCentral()
    maven { url = uri("https://jitpack.io") }
}
dependencies {
    implementation("com.github.rdf-connect:rml-processor-jvm:master-SNAPSHOT:all")
}
tasks.register('copyPlugins', Copy) {
    from configurations.runtimeClasspath
    into "$buildDir/plugins"
}

Install jars with
gradle copyPlugins.

The jvm-runner downloads the jvm-runner itselve, no installing required.

If you do not want to use Gradle, you can also download the jars manually and put them in the build/plugins/ folder.


       wget 'jitpack.io/com/github/rdf-connect/rml-processor-jvm/master-SNAPSHOT/rml-processor-jvm-master-SNAPSHOT-all.jar'

Run the Weather KG Pipeline

Start the orchestrator with the configuration file:


npx rdfc pipeline.ttl

Don't forget to start a SPARQL endpoint!
→ We provide a docker-compose.yml with a Virtuoso instance configured.

Now try it yourself!

🛠️ Follow Part 1 in the repo (up to Task 4)

⏰ You have time till lunch

🙋 Ask questions!

open.gent/r/iswc-rdfc-repo

Lunch break! 🍣

Agenda

Morning Session 1: Introduction
Coffee break ☕
Morning Session 2: Architecture
- RDF-Connect Architecture & Components ⚙️ (11:00h)
- Assembling a pipeline 🔗 (11:30h)
Lunch break 🍣
Afternoon Session 1: Roadmap
- Implementing a custom processor 🏗️ (13:30)
- What is next for RDF-Connect? 🛫 (14:30h)
Coffee break ☕
Afternoon Session 2: Hackathon 🧑‍💻

Afternoon Session 1: Roadmap
- Implementing a custom processor 🏗️ (13:30)
- What is next for RDF-Connect? 🛫 (14:30h)

Hands-On

🌦 HTTP Fetch → Log contents
🔄️ Weather API → RDF → Log
🧩️ Weather API → RDF → Validation → Log
🚀️ Weather API → RDF → Validation → Publish → Log
🤖 Implement your own ML processor in Python
✅ Weather API → RDF → Translation → Validation → Publish → Log

Follow allong in the GitHub repository.
All tasks are in the README. Each branch is a solution to a task!
open.gent/r/iswc-rdfc-repo

Implement a processor

Define the processor and its SHACL shape in processor.ttl
Implement the processor class
- Extend the abstract class
- Implement the init, transform, and produce methods
- Use Readers & Writers to handle messages
Publish the processor
Use the processor in a pipeline

Define the processor in processor.ttl


@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.

Define the processor in processor.ttl

                
@prefix rdfc: <https://w3id.org/rdf-connect#>.
@prefix sh: <http://www.w3.org/ns/shacl#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.

sh:targetClass links back to the IRI used on previous slide

Custom class for readers and writers: rdfc:Reader & rdfc:Writer

sh:name links to variable name in code

sh:path links to property in pipeline.ttl

Optional and multiple arguments with sh:minCount and sh:maxCount
(sh:maxCount != 1 results in a list of arguments)

Implement the processor class

Start from a processor template repository

TypeScript
https://github.com/rdf-connect/template-processor-ts

Python
https://github.com/rdf-connect/template-processor-py

Java
https://github.com/rdf-connect/template-processor-jvm

Implement a Python processor

In the transform method: consume the reader channel
Parse the input using rdflib
Identify language-tagged literals with @ja
Translate them to English using a ML model
Emit both original and translated triples to the writer channel
Define the processor in processor.ttl

Add the Python runner

Set up the pyproject.toml for your pipeline

Configure specific Python version to have a deterministic path to the dependencies.

Add the rdfc-runner as a dependency.

Add the Python runner

Install the runner:


uv add rdfc_runner

Import definition via owl:imports


<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_runner/index.ttl>.

Attach it to the pipeline declaration


<> a rdfc:Pipeline;
   rdfc:consistsOf [...], [
       rdfc:instantiates rdfc:PyRunner;
       rdfc:processor <translator>
   ].

Add your translation processor

Install your local processor after hatch build


uv add ../processor/dist/rdfc_translation_processor.tar.gz

Import definition via owl:imports


<> owl:imports <./.venv/lib/python3.13/site-packages/rdfc_translation_processor/processor.ttl>.

Define the channel


<translated> a rdfc:Reader, rdfc:Writer.

Define the processor instantiation


<translator> a rdfc:TranslationProcessor;
    rdfc:reader <rdf>;
    rdfc:writer <translated>;
    ... .

Hands-On: Processor

🤖 Implement your own ML processor in Python
✅ Weather API → RDF → Translation → Validation → Publish → Log

Follow allong in the GitHub repository.
All tasks are in the README. Each branch is a solution to a task!
open.gent/r/iswc-rdfc-repo

Pipeline design: Weather KG With ML

Now try it yourself!

🛠️ Follow Part 2 in the repo (Task 5 - 7)

🙋 Ask questions!

open.gent/r/iswc-rdfc-repo

Afternoon Session 1: Roadmap
- Implementing a custom processor 🏗️ (13:30)
- What is next for RDF-Connect? 🛫 (14:30h)

What is next for RDF-Connect?

We envision the following development and research roads:

📜 Alignment with existing initiatives for metadata management
🗠 Implementation of execution monitoring
🧑‍💻 Add support for other programming languages
☁️ Support for federated and cloud-native execution
🤖 Automated Processor and Pipeline generation
🚀 Zero-copy approaches for data exchange, e.g., Apache Arrow

Community-driven Metadata innitiatives

Several innitiatives exist for the standardization of workflow metadata

Workflows Community Initiative (WCI)

The WCI aims to foster collaboration and standardization in the field of scientific workflow management. It provides a common framework for describing workflows, their components, and execution metadata. By aligning RDF-Connect with WCI standards, we can enhance interoperability and facilitate the sharing of workflow metadata across different platforms and tools.

WorkflowHub

WorkflowHub is a platform for sharing and discovering scientific workflows. It provides a repository for workflow definitions, metadata, and execution records.

Dockstore

Dockstore is a platform for sharing reusable and scalable analytical tools and workflows. It supports a variety of workflow languages and provides features for versioning, collaboration, and execution tracking.

Workflow Run RO-Crates Profile

The Workflow Run RO-Crates Profile extends the RO-Crate specification to better support the description of workflow executions and their associated metadata. A semantic alignment between RDF-Connect and the Workflow Run RO-Crates Profile will enable seamless integration of workflow execution metadata into RO-Crates, facilitating better reproducibility and sharing of scientific workflows.

OpenMetadata platform

OpenMetadata is an open-source metadata management platform that provides a unified view of data assets across an organization. It offers features for data discovery, lineage tracking, and governance.

OpenLineage framework

OpenLineage is an open standard for metadata and lineage collection designed to instrument data pipelines and applications.

Live monitoring capabilites

Integration of RDF-Connect with systems such as Prometheus. This will allow real-time tracking of pipeline execution, resource utilization, and performance metrics, enabling users to monitor and optimize their workflows effectively.

Support for other programming languages

RDF-Connect extension to support other languages such as:

Support for federated and cloud-native execution

Remote execution of RDF-Connect Runners beyond CLI. For instance within EOSC (European Open Science Cloud) nodes.

Automation of workflow development and management

Leverage generative AI capabilities to automate Processor and Pipeline development. Also, provideUI-based pipeline management.

Optimization of data flows

Zero-copy data movement: Integration with Apache Arrow (where possible) to optimize data flow performance and efficiency.

Coffee break! ☕

Afternoon Session 2: Hackathon 🧑‍💻

Let's Code Something Together

Goal: build a pipeline using software that’s going to be introduced later during ISWC.
Each of you can contribute by wrapping new or existing software as a processor compatible with RDF-Connect.

Once we have a few, we’ll connect them into a shared pipeline and see what we can create together!

Do you know of any software we could try? We’ve spotted some ideas already: Jelly, RDFMutate, pycottas, rdf2vecgpu, or we could even generate data cubes.

What Could Our Pipeline Do?

We already have plenty of ideas for individual processors — but what about the bigger picture? What could a complete pipeline actually achieve?

Let’s brainstorm: combine analysis, transformation, or visualization — something fun and meaningful that shows what RDF-Connect can do when we link our work together.

Hackathon Flow 💡

Pick a tool or idea — something new, weird, or cool you want to plug in.
Wrap it as a processor — make it talk RDF-Connect style.
Test it! — see if it runs standalone, maybe share your results.
Connect it — link your processor into our shared pipeline.
Celebrate 🎉 — watch the data flow and see what we built together!

👉 Don’t worry about perfection — the goal is to explore, experiment, and have fun connecting ideas.

Please create a GitHub repository for your processor and let us link them together in a pipeline.