If I Were to Create a New Stream Processing Framework Today...

Mar 11, 2024

It would probably be implemented in Rust 🙂 And it would probably leverage the Apache Arrow DataFusion ecosystem (but it looks like the streaming support is not very mature).
It would be very resource-efficient and operate well in tiny workloads.
It would support programmatic APIs in many languages (not just Java and Python; think JavaScript, PHP, Ruby, Golang, etc.), as well as SQL.
It would have a disaggregated state and leverage object stores like S3.
It would make it easy to query and introspect its state. It would be possible to easily create new versions of the state.
It would support batch queries to a certain degree (doesn’t need to be as good as Spark; I’d be fine with 75% of that), and it would use them to support backfilling and reprocessing natively.
It would have leaderless architecture and be available as a single binary.
It would be extremely extendable. It would make it very easy to implement new connectors.
It would support internal consistency.
It would have built-in tracing and amazing observability capabilities.
It would natively support blue/green deployments and make it possible to deploy new versions without any downtime.
It would support autoscaling and automatic hot key redistribution.
It would have great development experience.

A man can dream, right?

Data Streaming Journey