2 Comments
Jun 20Liked by Yaroslav Tkachenko

Spark’s State reader is also open sourced - you can give a try with Spark 4.0 beta. Databricks provides the functionality slightly earlier than the Apache Spark, not only to have advantages but also to cope with different release cadence.

Expand full comment

Recommended to have a look at Hazelcast Stream Processing. It is an IMDG (state-in-memory) with added stream processing capabilities. Super fast stream joins. Solves the problem in one solution. Can keep all your enriching data in-memory if you can afford. Or alternatively keep the hot data in cache and overflow on to a more persistent data store.

Flink is stream processing with state storage externalized.

Expand full comment