# Related Projects ## Streaming processors - [streamz](https://github.com/python-streamz/streamz) - Nice examples of what pipeline systems need, even if it's not exactly how we'd do them. - See, e.g. ["zip" vs "combine_latest"](https://streamz.readthedocs.io/en/latest/core.html#branching-and-joining) - [RxPY](https://github.com/ReactiveX/RxPY) - [Apache Kafka](https://kafka.apache.org/) - good discussion of [design](https://kafka.apache.org/documentation/#theproducer), even if a very different application - massive scale, simple one-step processing replicated over many machines. - [Apache Storm](https://storm.apache.org) - [streamparse](https://streamparse.readthedocs.io) - [Toplogy DSL](https://streamparse.readthedocs.io/en/latest/topologies.html#topology-dsl) ## Batch-based - [Dagster](https://docs.dagster.io/) - [code locations](https://docs.dagster.io/deployment/code-locations/) interesting handling of multiple conflicting dep trees/environments in same pipeline: "just don't specify the dependencies" (except in dagster+! where the cloud is the dependencies!) and specify the venv instead. each venv is independent and integrated over RPC - some analogies in division of labor: op -> node, graph -> tube, job -> tube + runner (not sure), i/o manager -> store... - [luigi](https://github.com/spotify/luigi) - [dask.distributed](https://distributed.dask.org/en/stable/)