1. Introduction
RDF-Connect is a modular framework for building and executing multilingual data processing pipelines using RDF as the configuration and orchestration layer.
It enables fine-grained, reusable processor components that exchange streaming data, allowing workflows to be described declaratively across programming languages and environments. RDF-Connect is especially well suited for data transformation, integration, and linked data publication.
2. Usage Paths
Depending on your use case, you may only need a subset of this specification:
-
Pipeline Authors: Read the site or § 4.1 Pipeline for a more in depth explanation.
-
Processor Developers: Read § 4.2 Processor and § 4.3 Runner.
-
Platform Maintainers: Read all sections, including implementation notes.
3. Design Goals
RDF-Connect is designed to be language-agnostic, enabling seamless integration of processor components written in diverse programming languages such as JavaScript, Python, and shell scripts. This flexibility allows developers to leverage existing tools and libraries within their language of choice, avoiding the constraints of monolithic, single-language frameworks.
A key architectural principle of RDF-Connect is that it operates in a streaming-by-default mode. Data flows between processors via a central orchestrator, supporting real-time and large-scale data processing scenarios. This streaming model ensures efficient memory usage and enables continuous transformation and publication of data as it is ingested.
The configuration of pipelines, processors, inputs, and outputs is expressed semantically using RDF. This use of semantic configuration promotes clarity, extensibility, and interoperability by describing the structure and behavior of the system in a machine-readable, standards-based format.
Processor components in RDF-Connect are designed for reusability. A processor defined once can be reused across multiple pipelines and in varying contexts, reducing duplication and encouraging modular design practices. This modularity fosters rapid prototyping and easier maintenance of complex workflows.
Transparency is a fundamental goal of the framework. By leveraging RDF vocabularies to describe pipeline components and their interactions, RDF-Connect makes it easy to inspect, document, and reason about the behavior and structure of a pipeline, both during development and after deployment.
Finally, RDF-Connect provides native support for provenance tracking using the PROV-O ontology. Each step in the pipeline can be traced to its source, enabling detailed lineage tracking. This capability is especially critical in applications involving data quality, reproducibility, and compliance.
4. Concepts
This section introduces the core concepts underpinning the RDF-Connect framework. These concepts collectively define the modular, streaming, and language-agnostic architecture that enables RDF-Connect to support sophisticated, provenance-aware data pipelines.
4.1. Pipeline
At the heart of RDF-Connect is the notion of a pipeline, which defines a structured sequence of processing steps. Each step corresponds to a processor, configured with parameters and paired with an appropriate runner that governs its execution. Pipelines specify the flow of data between processors using readers and writers, forming a streaming architecture in which data is continuously passed along and transformed. The pipeline configuration itself is expressed in RDF, making it semantically explicit and machine-interpretable.
4.2. Processor
A processor is a modular, reusable software component that performs a discrete data processing task. In a typical scenario, a processor receives input data through a reader, transforms it according to its logic, and emits the result via a writer. However, processors are also flexible enough to support single-directional tasks, such as those that only produce or only consume data. Crucially, processors are implementation-agnostic — they can be written in any programming language and integrated into pipelines via language-specific runners. This makes processors the building blocks of RDF-Connect’s cross-language interoperability.
4.3. Runner
A runner is responsible for executing a processor on behalf of the orchestrator.
Each runner targets a specific language or execution environment, enabling processors written in different languages to participate seamlessly in a pipeline.
For example, a JavaScript processor would be executed using a NodeRunner
, which knows how to initialize and manage JavaScript-based components.
In this sense, a runner serves as an execution strategy, abstracting away the platform-specific details of launching and interacting with a processor.
4.4. Orchestrator
The orchestrator is the central component that interprets and runs RDF-Connect pipelines. It parses the RDF-based configuration, initializes and assigns runners, instantiates processors, and manages the data flow between components. Acting as the conductor of the system, the orchestrator ensures that processors execute in the correct order and that data is passed efficiently and correctly through the pipeline. Its role is fundamental to realizing the streaming and semantic integration goals of RDF-Connect.
4.5. Reader / Writer
Readers and writers provide the streaming interfaces that connect processors within a pipeline. A writer streams data out from a processor, while a reader receives data into a processor. Together, they define how data flows between pipeline steps in an idiomatic and composable way. This separation of concerns allows for flexible data routing and makes it easy to compose and recombine processors across different pipeline configurations.
5. SHACL as Configuration Schema
RDF-Connect uses SHACL [shacl] not only as a data validation mechanism but also as a schema language for defining the configuration interface of components such as processors and runners. These SHACL shapes enable:
-
Static validation of component descriptions.
-
Programmatic extraction of configuration contracts.
-
Type-safe interpretation in environments like JavaScript/TypeScript.
Shapes define required and optional configuration properties, which are transformed into JSON objects at startup, according to the pipeline.
SHACL shape defining some required configuration for a processor
[] a sh : NodeShape ; sh : targetClass <FooBar> ; sh : property [ sh : name "repeat" ; # JSON field name sh : datatype xsd : integer ; # specify datatype sh : maxCount 1 ; sh : path : repeat ; ], [ sh : name "messages" ; sh : datatype xsd : string ; sh : path : msg ; ].
Processor configuration
<MyProcessor> a <FooBar> ; : repeat 10 ; : msg "Hello" , "World" .
Results in the following JSON structure.
{ "repeat" : 10 , "messages" : [ "Hello" , "World" ] }
5.1. Mapping SHACL to Configuration Structures
Each sh:property
statement in a SHACL shape directly maps to a field in a configuration object, such as a JSON structure.
This enables semantically rich, machine-validated configurations for processors and pipelines within RDF-Connect.
5.1.1. Required Fields
sh:path
This property indicates the RDF predicate that must be present on the target resource.
In the context of configuration mapping, sh:path
is used to extract the corresponding value from the data graph.
It defines the link between the RDF representation and the configuration data being generated or interpreted.
sh:name
The sh:name
property provides the external field name used in the resulting configuration object.
This allows decoupling the internal RDF predicate (as defined in sh:path) from how the value appears in the configuration file.
sh:datatype or sh:class
These properties define the expected type of the configuration field.
Use sh:datatype
for primitive types such as xsd:string
, xsd:boolean
, or xsd:anyURI
.
When the expected value is a nested resource, i.e., an object with its own structured fields, use sh:class
instead.
This signals that the value is an embedded configuration object to be interpreted according to its own SHACL shape, enabling deeply nested configuration structures.
5.1.2. Optional Fields
sh:minCount
This property defines the minimum number of values that must be provided.
If the actual number of values is below this threshold, a validation error will be raised.
A sh:minCount
of 1 or higher indicates that the field is required in the configuration.
sh:maxCount
This property sets the maximum number of values allowed for a field.
It also determines how the field should be interpreted structurally.
If sh:maxCount
is set to 1, the corresponding field is treated as a single value (T
).
If it is greater than 1 or left unspecified, the field is interpreted as a list or array of values (T[]
).
This behavior ensures that the structure of the configuration aligns with user expectations and downstream processing logic.
5.2. Nested Shapes and Component Types
Configuration fields can reference structured values instead of primitive literals.
This is supported via sh:class
, which indicates that the value must conform to another shape associated with a given RDF class.
For example:
sh : property [ sh : path rdfc : input ; sh : name "input" ; sh : class rdfc : Reader ; ]
This defines a configuration field named input
whose value is expected to be an instance of the class rdfc:Reader
.
5.2.1. Use of sh:class
and sh:targetClass
The sh:class
predicate is used within a property constraint to indicate that the value of a field must be an RDF resource belonging to a specific class.
This class must, in turn, have a shape associated with it that defines its expected structure.
To make this connection, sh:targetClass
is used on a sh:NodeShape
.
This associates the shape with a class so that tools can look up the shape definition when they encounter an RDF resource of that class.
In RDF-Connect, sh:targetClass
allows reusable schemas to be defined for configuration components.
These can then be referenced in other shapes through sh:class
, enabling configuration structures that are both modular and type-safe.
5.2.2. Special Component Types: rdfc:Reader
and rdfc:Writer
rdfc:Reader
and rdfc:Writer
are two special classes to represent input and output endpoints in RDF-Connect pipelines.
Fields that are declared with sh:class rdfc:Reader
or sh:class rdfc:Writer
are not simply nested configuration objects.
They serve as runtime injection points for streaming data.
At execution time, the runner is responsible for resolving these references into actual language-native abstractions. Depending on the programming environment, these may be JavaScript streams, async iterators, or callback-based interfaces.
For example:
sh : property [ sh : path rdfc : input ; sh : name "input" ; sh : class rdfc : Reader ; sh : minCount 1 ; ]
This declares an input
field that must be provided and will be automatically connected to an input stream by the runner.
This design decouples the declarative pipeline configuration from the underlying data transport logic, allowing developers to focus on processor behavior rather than infrastructure.
5.3. Example: Putting it all together
The following SHACL definitions and RDF instance demonstrate how a FooBar processor might be configured to append text to each incoming message and send the result to an output channel.
[ ] a sh : NodeShape ; sh : targetClass : Channels ; sh : property [ sh : path rdfc : input ; sh : name "input" ; sh : class rdfc : Reader ; sh : minCount 1 ; sh : maxCount 1 ; ], [ sh : path rdfc : output ; sh : name "output" ; sh : class rdfc : Writer ; sh : minCount 1 ; sh : maxCount 1 ; ]. [ ] a sh : NodeShape ; sh : targetClass <FooBar2> ; sh : property [ sh : path : channel ; sh : name "channels" ; sh : minCount 1 ; sh : class : Channels ; ], [ sh : path : append ; sh : name "append" ; sh : minCount 1 ; sh : maxCount 1 ; sh : datatype xsd : string ; ].
<foobar> a <FooBar2> ; : channel [ #`a :Channels` is not required, this is implicit from the definition rdfc : input <channel1> ; rdfc : output <channel2> ; ], [ rdfc : input <channel2> ; rdfc : output <channel1> ; ]; : append " World!" .
This results in the following JSON object:
{ channels: [ { "input" : { /* idiomatic input stream for channel <channel1> */ } , "output" : { /* idiomatic output stream for channel <channel2> */ } , }, { "input" : { /* idiomatic input stream for channel <channel2> */ } , "output" : { /* idiomatic output stream for channel <channel1> */ } , } ], append: " World!" }
6. RDF-Connect by Layer
Communication between the orchestrator and the runners happens using a strongly typed protocol defined in Protocol Buffers (protobuf). This enables language-independent and efficient communication across processes and machines.
The protobuf server is the orchestrator, which is the central point. The orchestrator starts all runners and notifies the runners of the different processors they should start. The orchestrator is also the message post office, allowing messages to be sent to the correct runner which will relay those messages to the correct processor.
6.1. Communication Protocol: Orchestrator ↔ Runner
RDF-Connect uses a bidirectional communication protocol based on Protocol Buffers (protobuf) for interaction between the Orchestrator and Runners. The orchestrator manages execution, while runners host and execute individual processors.
6.1.1. Messages Sent to Runners
The orchestrator can send the following messages to the runner:
-
RPC.pipeline
: Contains the complete pipeline configuration (in Turtle syntax). -
RPC.proc
: Instructs the runner to register and setup a new processor. -
RPC.start
: Signals that all added processors should begin execution. -
RPC.close
: Instructs the runner to gracefully shut down and terminate all processors. -
RPC.msg
: Delivers a message to a specific processor. Used for normal data transfer. -
RPC.streamMsg
: Begins a streaming message transmission to a processor, typically for large or chunked payloads.
6.1.2. Messages Sent from Runners
The runner can send the following messages to the orchestrator:
-
RPC.identify
: Indicates that the runner is ready and provides their unique identifier (IRI). -
RPC.init
: Confirms that a previously registered processor has successfully started. -
RPC.close
: Notifies that the runner is shutting down because all processors have stopped. -
RPC.msg
: Sends a message from a processor to another processor via the orchestrator. -
RPC.streamMsg
: Sends a streaming message from a processor. Used when the payload is too large for a single message.
6.2. Orchestrator
The orchestrator is the central runtime entity in RDF-Connect. It reads the pipeline configuration, sets up the runners, initiates processors, and routes messages between them. It ensures the dataflow graph described by the pipeline is brought to life across isolated runtimes. The orchestrator acts as a coordinator, not an executor. Each runner is responsible for running the actual processor code, but the orchestrator ensures the pipeline as a whole behaves as intended.
Responsibilities:
-
Parse the pipeline RDF.
-
Load SHACL shapes for processors and runners.
-
Validate and coerce configuration to structured JSON.
-
Instantiate runners.
-
Start processors.
-
Mediate messages (data and control).
-
Handle retries, streaming, and backpressure.
6.2.1. Startup Flow
6.2.1.1. Understanding the pipeline file
The orchestrator begins execution from a single pipeline RDF file. This file MUST first be expanded by resolving all owl:imports
statements recursively.
Once the full RDF graph is assembled, the orchestrator extracts the pipeline to execute by locating a rdfc:Pipeline
instance whose subject is the pipeline file itself.
The pipeline is composed of one or more runner–processor pairs, defined via the rdfc:consistsOf
property.
Each pair includes:
-
A reference to a runner, using the
rdfc:instantiates
property. -
One or more processors, referenced with the
rdfc:processor
property.
This example pipeline contains three processors divided over two runners. The orchestrator starts this pipeline with two runners and provides them with the correct processor configurations.
@prefix rdfc: <https://w3id.org/rdf-connect/ontology#> . <> a rdfc : Pipeline ; rdfc : consistsOf [ rdfc : instantiates rdfc : NodeRunner ; rdfc : processor <sender> , <echo> ; ], [ rdfc : instantiates rdfc : RustRunner ; rdfc : processor <log> ; ].
6.2.1.2. Starting the runners
Each simple runner should specify two things:
-
the language it supports, linked with
rdfc:handlesSubjectsOf
-
how the runner should be started, linked with
rdfc:command
For each runner, the orchestrator appends two arguments to the configured command: the URL of the orchestrator’s running Protobuf server and the IRI identifying the runner instance. It then executes the resulting command to spawn the runner process.
Each runner is expected to connect back with the orchestrator with the connect
method, setting up a bidirectional stream of messages, dubbed normal stream
.
When a message is sent, without identifying how, the message is sent using this stream.
The orchestrator MUST track all active runners, specifically recording which ones have sent an RPC.identify
message after startup.
This message confirms that the runner is ready to accept processor assignments.
The orchestrator responds to the runner with a RPC.pipeline
message, containing the full expanded pipeline as Turtle.
This pipeline configuration is useful when a runner wants to extract processor arguments themselves, instead of using the provided JSON-LD.
Once all expected runners have successfully identified themselves, the orchestrator proceeds to the next step in the pipeline initialization sequence.
6.2.1.3. Starting processors
Once all runners have been successfully identified via RPC.identify
, the orchestrator proceeds to initialize the processors defined in the pipeline.
This involves the extraction and transformation of processor configuration data into a format suitable for consumption by the associated runner.
Processor Arguments
Processor arguments are encoded as JSON-LD objects, providing a structured representation of RDF configuration data. JSON-LD fits the requirements as it is selected for the following reasons, and it allows encoding of typed literals and nested structures, in alignment with SHACL definitions. This while still enabling extensibility, supporting advanced use cases such as capturing full SHACL paths or preserving provenance metadata.
Support for JSON-LD is optional for runners. Runners MAY choose to treat the JSON-LD as plain JSON if they do not require the semantic context or graph-aware features. However, all runners MUST accept the structure produced by the orchestrator.
Processor arguments come from the SHACL shape defined for the processor type.
Each field is mapped following section Mapping SHACL to Configuration Structures.
A JSON-LD @context
is generated mapping all sh:name
values to the corresponding IRIs from sh:path
.
If the processor instance has a known RDF identifier or rdf:type
, these are added to the JSON-LD using @id
and @type
.
rdfc:Reader
.
@prefix : <http://example.org/> . [] a sh : NodeShape ; sh : targetClass <FooBar> ; sh : property [ sh : name "reader" ; sh : property : reader ; sh : class rdfc : Reader ; sh : maxCount 1 ; ], [ sh : name "count" ; sh : property : count ; sh : datatype xsd : number ; sh : maxCount 1 ; ]. <SomeId> a <FooBar> ; : reader <ReaderId> ; : count 42 .
The following JSON-LD structure is built. Which aligns with section Mapping SHACL to Configuration Structures.
{ "@context" : { "reader" : "http://example.org/reader" , "count" : "http://example.org/count" } "@id" : "SomeId" , "@type" : "FooBar" , "reader" : { "@type" : "https://w3id.org/rdf-connect/ontology#Reader" , "@id" : "ReaderId" }, "count" : 42 }
Processor Definition Extraction
In addition to extracting processor instance arguments, the orchestrator MUST also extract the processor definition configuration. This definition provides implementation-specific parameters, typically required to launch the processor in a specific runtime (e.g., JavaScript entrypoints, file paths, class names, etc.).
Processor definitions are extracted using the same SHACL-based mechanism described previously. The shape used for this extraction is associated with the programming language or runtime type and MUST be processed in the same way to produce a structured JSON-LD object.
RPC message
Once both the arguments and definition have been extracted for a processor instance, the orchestrator sends an RPC.proc
message to the appropriate runner, initiating the processor launch process.
The orchestrator MUST keep an internal record of all processor instance that have been dispatched to a runner, and the runner’s acknowledgment that a processor was successfully launched, as indicated by an incoming Rpc.init
message.
No processor may be assumed to be operational until its runner has responded with RPC.init
.
When all processors are successfully initialized, the orchestrator can start the pipeline.
6.2.1.4. Starting the pipeline
The orchestrator can start the pipeline by sending a RPC.start
message to each runner.
The full startup flow is shown in this diagram.
6.2.2. Handling messages
The orchestrator acts as a message broker between processors. It is responsible for receiving messages from runners and forwarding them to the appropriate destination runners based on channel identifiers defined in the pipeline. Importantly, channels support many-to-many communication: multiple processors may emit to or receive from the same channel.
6.2.2.1. Normal messages
When a runner sends a RPC.msg
message to the orchestrator, the message includes a channel IRI indicating its logical destination.
The orchestrator MUST:
-
Resolve the set of processors that are declared to consume this channel.
-
Determine which runner is responsible for each of those processors.
-
Forward the message to each relevant runner using
RPC.msg
.
These messages are sent as discrete units and fit within the allowed message size.
6.2.2.2. Streaming messages
When the payload of a message is large, the streaming message protocol SHOULD be used. This protocol enables large messages to be sent incrementally over a separate gRPC stream while maintaining channel-based routing.
The process is as follows:
-
Sender (runner) initiates a
sendStreamMessage
gRPC stream to the orchestrator. -
The orchestrator generates a unique stream identifier and sends it back on this stream.
-
The sender then sends a
RPC.streamMsg
over the normal bidirectional RPC stream, including the stream identifier and the channel IRI. -
The orchestrator resolves which processors receive messages on the given channel and forwards the stream identifier to their corresponding runners with
RPC.streamMsg
. -
Receiving runners connect to the orchestrator using
receiveStreamMessage
, passing the received stream identifier. -
Once all participants are connected, the orchestrator acts as a relay: all incoming chunks from the sending stream are forwarded to each connected receiving stream.
The orchestrator MUST close all associated receiving streams once the sending stream completes.
This mechanism ensures that high-volume or large data payloads can be distributed across the pipeline efficiently and reliably.
6.3. Runner
A runner in RDF-Connect is responsible for managing and executing processors within a specific execution context—typically a programming language runtime. Each runner MUST connect with the orchestrator’s Protobuf server and follow the RDF-Connect protocol.
While minimal runners can be implemented with little overhead, they often become enriched with quality-of-life features to better support developers and processors operating in that language.
These quality-of-life features include:
-
Wrapping readers and writers in idiomatic objects
-
Runners SHOULD also make it possible to let processors start up before acknowledging to the orchestrator that the processor is initialized.
This is useful for processor to execute longer running operations, like reading a file or consulting an external API.
Runners MAY also coalesce or transform message types to simplify processor implementation. A streaming message may be aggregated into a single message if the underlying platform supports arbitrarily large strings or buffers, and a single message may be exposed to the processor as a streaming interface, emitting a single chunk. This flexibility allows generic processors to be implemented more easily without needing to distinguish between streaming and single-message protocols.
Instead, the runner can convert incoming messages to a streaming form with one chunk, or aggregate streaming chunks into a single message.
It is advised that those processors, before forwarding the message, look at the length of each message before determining whether or not this message should be a single message or a streaming message.
The following sections detail the runner startup flow and describe the expected interactions between runners and the orchestrator during initialization and execution.
6.3.1. Pipeline Configuration
A runner is defined with an instance of rdfc:Runner
, which currently is instantiated by a command.
It requires the command to actually start the runner with rdfc:command
, and a link to the programming context (rdfc:handlesSubjectsOf
).
The object of rdfc:handlesSubjectsOf
links runners and processors to a context term. This often refers to the programming language.
Each context term is related to a SHACL shape, this specifies the incoming data that the runner can use to start the processors.
rdfc : NodeRunner a rdfc : Runner ; rdfc : handlesSubjectsOf rdfc : jsImplementationOf ; rdfc : command "npx js-runner" . # Note that rdfc:jsImplementationOf is already defined by RDF-Connect as follows sds : implementationOf rdfs : subPropertyOf rdfs : subClassOf . rdfc : jsImplementationOf rdfs : subPropertyOf sds : implementationOf . # Shape that a Js Processor should fulfil; [ ] a sh : NodeShape ; # We target it with jsImplementationOf sh : targetSubjectsOf rdfc : jsImplementationOf ; sh : property [ sh : path rdfc : file ; sh : name "file" ; sh : minCount 1 ; sh : maxCount 1 ; sh : datatype xsd : string ; ], [ sh : path rdfc : class ; sh : name "clazz" ; sh : maxCount 1 ; sh : datatype xsd : string ; ].
This way, rdfc:jsImplementationOf
is a predicate declared only for JavaScript processors.
And a shape is linked with that predicate, runners can expect a file location and a class name to start the JavaScript processors.
6.3.2. Connecting Flow
Each runner is started by the orchestrator using a command defined in the pipeline via rdfc:command
.
The orchestrator appends two arguments to this command: the URL of the orchestrator’s Protobuf server and the IRI that uniquely identifies the runner.
Upon startup, the runner MUST connect to the orchestrator using the provided URL via the RPC.connect
method.
This establishes a bidirectional message stream referred to as the normal stream.
Once connected, the runner MUST send an RPC.identify
message, identifying itself with the provided IRI.
The orchestrator then sends an RPC.pipeline
message containing the complete, expanded pipeline definition.
The runner MAY ignore this message.
Runners MAY initiate a separate log stream using the RPC.logStream
method to stream log messages back to the orchestrator.
6.3.3. Starting Processors
After initialization, the orchestrator may send multiple RPC.proc
messages to instruct the runner to start specific processors.
Each message includes a processor IRI, a configuration object and an argument object.
Both the configuration and arguments are provided as JSON-LD strings.
The configuration object contains the arguments as defined by the context term following the section Mapping SHACL to Configuration Structures.
The arguments are constructed based on a SHACL shape defined for the processor type.
Runners MAY parse these JSON-LD payloads and transform known constructs into idiomatic equivalents.
For example, reader and writer objects are represented as JSON-LD values with: an @id
field (containing the channel IRI), and an @type
field indicating either https://w3id.org/rdf-connect/ontology#Reader
or https://w3id.org/rdf-connect/ontology#Writer
.
Runners are RECOMMENDED to replace these values with appropriate typed objects in the target environment.
When a processor has been fully initialized, the runner MUST send an RPC.init
message, indicating success or failure.
If any runner signals an error during initialization, the orchestrator MUST abort the pipeline execution.
Once all processors are successfully initialized, the orchestrator sends an RPC.start
message, instructing the runner to start the processors.
After all processors complete their execution, the runner SHOULD gracefully close the normal stream to signal completion.
6.3.4. Handling messages
Apart from starting processors, the runner also acts as a mediator that makes sure the correct messages are sent to the correct processors.
The orchestrator MAY send any number of RPC.msg
or RPC.streamMsg
messages.
6.3.4.1. Receiving normal messages
When an RPC.msg
is received, the runner MUST deliver the message to the appropriate processor, using the channel IRI to determine the correct target.
Message routing can follow either a push or pull model depending on the language environment.
Runners MAY coerce or transform messages into a different representation, including converting a normal message into a streaming message with a single chunk or converting a short streaming message into a normal message. These transformations SHOULD respect the preferences of the processor and the runner’s internal design constraints.
6.3.4.2. Receiving streaming messages
When an RPC.streamMsg
is received, the runner MUST establish a streaming channel by invoking the RPC.receiveStreamMessage
method with the provided stream ID.
This initiates a stream of chunks from the orchestrator.
The runner MUST forward these chunks to the appropriate processor, using any internal buffering or transformation logic as required.
If the runner determines that the message size is small enough, it MAY convert a streaming message into a single message object, provided that this does not violate processor expectations or exceed system limits.
6.3.4.3. Sending messages
Runners MUST also support outbound communication from processors. While runners MAY omit support for certain advanced features (such as streaming output), a full implementation is strongly encouraged.
To send a normal message, the runner uses the RPC.msg
method on the normal stream.
To send a streaming message, the runner first initiates the RPC.sendStreamMessage
method, which returns a new stream.
The orchestrator responds with a single message, the stream ID.
The runner then sends an RPC.streamMsg
on the normal stream, including the channel IRI and stream ID.
Finally, the runner streams the message content using the created stream.
The message is considered complete when the runner closes the stream.
6.3.4.4. Channel Closure
Processors MAY indicate that a given channel is closed (i.e., no further messages will be sent).
The runner MUST propagate this information to the orchestrator via an RPC.close
message.
Similarly, when the orchestrator sends an RPC.close
message for a channel, the runner MAY respond by closing or invalidating the corresponding data stream in the processor.
6.4. Processor
6.5. Pipeline
7. Ontology Reference
The RDF-Connect ontology provides the terms used in RDF pipeline definitions. See the full RDF-Connect Ontology for details.