Current 2023

Conference highlights.

Oct 06, 2023

Current is a flagship event in the data streaming space. Current 2023 happened last week in San Jose, California.

Recordings are already available here (requires a registration), but I believe that’s temporary, Confluent typically moves them to a permanent location on the website, so stay tuned. And, honestly, I haven’t attended that many this time 🙂 (hallway track FTW).

Confluent Disappoints

The keynotes from Confluent had no exciting announcements for me. Everyone was expecting to hear news about the managed Flink offering. However, it’s still in beta (“Preview”) without any public SLA. I’m also quite skeptical about any managed Flink SQL offering until compiled plans are mature enough.

However, it’s clear that Confluent is very bullish on Flink and Flink SQL. I was also pleasantly surprised to realize that the new SQL workspace is backed by Flink SQL.

As I predicted, there are no indicators that Confluent will allow any Flink connectors aside from the Kafka one. The Flink <> Kafka integration does look pretty good right now though - all topics are automatically available as tables in the Flink SQL catalog.

The new Data Portal looked nice, but it was kinda obvious: it’s built on top of many existing primitives. It also doesn’t look innovative in any way, I’ve seen way more advanced capabilities 4-5 years ago from several data governance vendors.

The OpenAI UDF may look impressive to some, but, in the end, it’s just an enrichment via an API call… How well will it scale?

The rest of the keynotes either contained Flink explainers or repeated the content from 2022.

Streaming Databases Are Here to Stay

There were several vendors offering streaming databases: Materialize, Rising Wave, Timeplus, DeltaStream. I’m still not completely sold on the idea of streaming databases being the answer to most of the data streaming challenges. They can cover 50% to 80% of use cases, but they’ll never arrive close to 100%. But they can be surprisingly sticky!

I approached some of the vendors and asked if they planned to introduce programmatic APIs down the road. Everyone said no - SQL should give them enough market share. It makes perfect sense, streaming databases try to lower the entry barrier so anyone with a bit of SQL knowledge can start authoring streaming data pipelines. They don’t have a goal to support every use case.

In the end, if you need state and timers you have Apache Flink.

Conduktor

Conduktor was the discovery of the conference for me. I didn’t know much about the company, but some of the advertised product features deeply resonate with me. E.g. Optimize offers Virtual clusters, Unlimited partitions and Topics Fusion!

Restate

The best talk of the conference for me (and the best demo in a while!). Restate is one of a very few projects I’m really excited about and I highly recommend checking out their blog (and the demo).

Stephan Ewen (co-creator of Apache Flink), Restate’s founder, ended up implementing a stream-processing system on top of Restate! Guess how many lines of code it took?

Other Talks

3 Flink Mistakes We Made So You Won't Have To: great talk from Decodable with practical recommendations on exactly-once delivery, memory and checkpoint tuning.
Evolution of Streaming Pipeline at Lyft: primarily focused on feature engineering. Interesting example of a YAML-based pipeline definition (actually quite similar to what we have in Goldsky!).
Deeply Declarative Data Pipelines: this also deeply resonates with how we think about data pipelines at Goldsky. Data pipeline as a YAML file seems to be a pretty common idea nowadays.
Query Your Streaming Data on Kafka using SQL: Why, How, and What: nice overview from Timeplus showcasing not just Timeplus, but a bunch of other data engines. Extra kudos for coffee comparisons!
Flink SQL: The Challenges to Build a Streaming SQL Engine: amazing intro into how Flink SQL works. I can confirm that a lot of the challenges mentioned in the talk are very relevant.
Operationalizing Pluralsight's Data with Materialize: quite sales-y and short talk, but, in my opinion, a very important one. Materialize is the pioneer of the streaming database space, and so far, they’ve been mostly focused on building. Showcasing a real customer helps to validate the whole market segment, not just Materialize. I’d be very curious to see what Pluralsight thinks after operating the system for a few months.
Finally, my colleague Xiao Meng and I gave a talk called Dynamic Change Data Capture with Flink CDC and Consistent Hashing. I hope you can learn a thing or two about Postgres CDC and how you can build even the most demanding system with just a few building blocks provided by Flink.

There are many more great presentations, but I just haven’t had a chance to watch them yet 🙂. Most of the recordings look really good, so I highly recommend checking them out!

Update: corrected the statement about the on-call/SLA provided by the Confluent managed Flink offering.

Eugene Koblov

Oct 7, 2023

Well, what can I say. Yaroslav is not objective. First of all, Confluent launched Flink SQL to beta (Open Preview), and not to alpha. Second - it is just not true that no one is oncall. We do carry the pager already. We do not yet offer SLA, but we do have internal SLO, as we are preparing for GA and making sure that we are 100% ready to offer 99.99% SLA.

Expand full comment

1 reply by Yaroslav Tkachenko

1 more comment...

Data Streaming Journey

Discussion about this post