Runners¶
No, not Usain Bolt. These are the kind of runners that run other things.
In fact, these run the Tube.
To run tubes, a runner requires a plan of how it’s going to run the tube.
It also has to awaken the tube and take inventory of what’s inside the tube.
Then, while the tube is running, it needs to monitor how it’s doing.
Finally, after the run is over, it needs to clean everything up.
You can grab your favorite runners from the top level of noob, like the following:
from noob import Tube, SynchronousRunner
tube = Tube.from_specification("...")
runner = SynchronousRunner(tube=tube)
and start running your tube with it n times…
results = runner.run(n=100) # run tube 100 times
or use it like an iterator of your tube:
gen = runner.iter()
for result in gen():
... # do things with result
or, use it like a function and call it a single time:
runner.init()
result = runner.process()
runner.deinit()
States¶
A track and field runner can be in a few different states, ranging from shooting up steroid pre-run, to clearing the bandage after the race.
Our runner, on the other hand, can also be in a few different states,
ranging from shooting up a Tube,
to clearing returns and garbage after the run.
Let’s take a look at some of the more notable states that you will find a TubeRunner in.
Pre-init
The runner exists. It has accepted a
Tube
Inited
The runner has run an
initmethod on all nodes and assets, bringing them into a state that can be called if given inputs.
Running
The runner is picking ready nodes off the scheduler, gathering inputs for them from its EventStore, executing them, and storing their outputs in its EventStore for the next ready node.
Deinited
There may be no more ready nodes that runner can execute, or the user stopped the runner. Either way, all nodes and assets within the Tube has undergone a teardown process, reverting the runner state to the pre-init stage.
Synchronous Runner¶
Unfortunately, there actually isn’t much happening “in sync” in SynchronousRunner.
In contrast, SynchronousRunner actually does only ever does one thing at a time.
This is called a single-threaded operation.
The word synchronous here would rather mean that every line of code “exists in the same time frame.”
Here, the order of operation is clearer. Let’s take a look at a few examples:
flowchart LR
A --> B
A --> C
B --> D
C --> D
When a SynchronousRunner first encounters a Tube like the above,
the first thing it does is performing a topological sort, using a Scheduler.
sequenceDiagram
participant runner
box Nodes
participant A@{ "type" : "entity" }
participant B@{ "type" : "entity" }
participant C@{ "type" : "entity" }
participant D@{ "type" : "entity" }
end
runner ->> A: execute
Activate A
A --> runner: signal
Deactivate A
runner ->> B: execute
Activate B
B --> runner: signal
Deactivate B
runner ->> C: execute
Activate C
C --> runner: signal
Deactivate C
runner ->> D: execute
Activate D
D --> runner: signal
Deactivate D
Based on this graph, all runners will start by executing node A.
As you can see, nodes B and C do not depend on each other.
SynchronousRunner will choose at random which one of the two will precede the other.
While one is processing, the other is simply idling.
If you’d like to multitask and process both at the same time, take a look at
Asynchronous Runner.
Once both B and C are fully processed, it will move onto node D.
After node D is finished, since the graph is complete,
we move onto the next epoch, generate another one of the graph, and repeat the process.
A strength of SynchronousRunner is a simpler architecture and thus having a more predictable control of the nodes.
It also is the only runner that currently supports dynamically enabling / disabling nodes:
runner.enable_node(node_id="a")
runner.disable_node(node_id="a")
For more details on enabled/ disabled nodes, refer to Tube.
Additionally, we strictly deal with only one epoch at a time.
Python debuggers will probably have an easier time debugging things in this runner,
so if asynchronous operation isn’t part of the core logic of your pipeline,
it could prove helpful to try running it in a SynchronousRunner first.
Scheduler¶
When a runner accepts a Tube, and enters the running stage,
it needs to know which node to execute next, since some nodes depend on others.
Scheduler determines this order of execution.
Currently, the Scheduler class is a wrapper around TopologicalSorter.
For each epoch, the Scheduler generate a new TopologicalSorter instance, and as runner executes the nodes,
the Scheduler return the next set of ready nodes to the runner. Once the ready nodes have been depleted,
the Scheduler marks the epoch complete.
Epochs¶
Todo
Document epochs