It would probably be implemented in Rust 🙂 And it would probably leverage the Apache Arrow DataFusion ecosystem (but it looks like the streaming support is not very mature).
It would be very resource-efficient and operate well in tiny workloads.
It would support programmatic APIs in many languages (not just Java and Python; think JavaScript, PHP, Ruby, Golang, etc.), as well as SQL.
It would have a disaggregated state and leverage object stores like S3.
It would make it easy to query and introspect its state. It would be possible to easily create new versions of the state.
It would support batch queries to a certain degree (doesn’t need to be as good as Spark; I’d be fine with 75% of that), and it would use them to support backfilling and reprocessing natively.
It would have leaderless architecture and be available as a single binary.
It would be extremely extendable. It would make it very easy to implement new connectors.
It would support internal consistency.
It would have built-in tracing and amazing observability capabilities.
It would natively support blue/green deployments and make it possible to deploy new versions without any downtime.
It would support autoscaling and automatic hot key redistribution.
It would have great development experience.
A man can dream, right?
Timeplus fulfills most of those except written in C++. The open source proton doesn’t have the clustering part but the enterprise one does. Single binary, jdbc/odbc+other APIs available, can do batch and streaming, has mutable streams, Kafka/clickhouse native connectivity etc. Only need sql. Would love if you tried it. https://www.timeplus.com/install
Sounds like you are talking about Fluvio - https://github.com/infinyon/fluvio.
We've been at it for the past 5 years.
Hit's 90% of those bullets.
10% coming soon.