Interview with Nikhil Benesch (Materialize)

Discussing many aspects of Materialize including consistency and correctness.

Dec 12, 2023

Recently, I’ve had a chat with Nikhil Benesch, CTO of Materialize. We talked about operational data warehouse positioning, consistency and correctness, the evolution of Materialize, and even BYOC.

Yaroslav Tkachenko

Hi Nikhil, could you introduce yourself and Materialize?

Nikhil Benesch

Hi, I'm Nikhil Benesch. I'm the CTO at Materialize. I've been with the company since the very early days when it was me and the two co-founders, Arjun and Frank. I came over from Cockroach Labs, where Arjun and I were both engineers. Arjun had gotten me really fired up about the prospect of building a database that did most of the work when a write arrived rather than doing most of the work when a read arrived. Because when you look at most real-world workloads, they tend to read the same data over and over. And it's really silly that most databases compute the same answer over and over.

Yaroslav Tkachenko

Makes sense! There are a lot of similar companies calling themselves streaming databases recently. But also, I noticed your big rebranding around the operational data warehouse. And there are also, of course, classic stream processing frameworks like Flink and Spark. So maybe you can explain a bit about positioning and how you think about Materialize. Why is it different from Flink or some other tools out there?

Nikhil Benesch

We're really excited about this new operational data warehouse positioning because it lets us talk about the problem rather than the solution. A streaming database is more of a solution. Streaming is an implementation detail. But the problem that we're actually solving is that if you try to use traditional data warehouses, Snowflake, BigQuery, etc. today, in an operational context, something that powers the day-to-day operations of your business, you run into problems. Either you can't get the data to be fresh enough, or you rack up a massive bill because you are spending lots of compute constantly recomputing the same answer. So, an operational data warehouse lets us focus on the problem that we're solving, which is operationalizing traditional data warehouses. And it lets us come at that problem from the technology that we look more similar to.

From a user experience perspective, we are an entirely SQL-based system. You can only write SQL against Materialize. You can't write custom Java code that processes things event by event. That's not what Materialize offers in terms of user interface. So, if you come at it from a data warehouse perspective, where you care about making your data warehouse fresher, Materialize works as you would expect. Whereas if you come at Materialize from the stream processor perspective, where you're used to writing Kafka Streams code in Java, getting to really fiddle with the low-level details, you might find yourself frustrated with Materialize as it exists today because it doesn't have the same level of expressivity that stream processors do, even though at its core Materialize has a stream processor as its computation engine.

Yaroslav Tkachenko

I want to follow up a bit on the “operational” as a keyword, because a lot of Big Tech companies use something like Flink for analytical workloads. E.g. for things like window aggregations, doing feature engineering. So, does your positioning mean that you want to specifically target the operational side and maybe not as much the analytical one?

Nikhil Benesch

It's not a perfect split, this framing of operational vs. analytical. And there are some analytical workloads that Materialize is pretty good at today. So it's more a broad strokes characterization of the kinds of workloads that Materialize tends to be good at. But what we actually see with a lot of folks is they come to Materialize with their hot data for their operational workloads, and then there's a little bit of additional analysis they want to do with a larger set of data. And because they've already brought half of their data, it's often convenient to be able to run some analytical workloads inside of Materialize, even though it's not the ideal tool for that today. So we've been looking increasingly into additional features inside of Materialize that make those hybrid use cases more viable.

But to a first approximation, the operational data warehouse captures what Materialize is best suited for, but you can stretch beyond that on a use case by use case basis.

Yaroslav Tkachenko

As far as I know, Materialize is Postgres compatible, and at least the core of the system is implemented in Rust. Could you elaborate on these choices?

Nikhil Benesch

We chose Postgres because it is an open-source, highly mature SQL implementation, and there are no IP issues with copying it. Between Postgres and MySQL, the two biggest open-source SQL implementations, we prefer the Postgres dialect. We find that the choices that the Postgres implementers have made result in a more consistent SQL interface. It's also what CockroachDB uses, and I think we'll talk a little bit more about this later. But CockroachDB made the same choice for similar reasons. So we followed in their footsteps because we liked the way that played out at CockroachDB.

As far as Rust, that decision predates Materialize. Frank McSherry, one of the co-founders of the company, wrote the core computation engine of Materialize called Timely and Differential Dataflow in Rust. And I don't know the exact details of why Frank picked Rust, but the combination of a strong, powerful type system and the systems language performance you get out of Rust make it a pretty killer combination for a system like Materialize. The core computation engine is highly mathematical, and a lot of the formal concepts in that computation engine are expressed really nicely as traits in Rust. And there isn't another language, as far as I'm aware, that has as expressive a type system as Rust that still gives you C-like performance. The Haskells and the OCamls of the world would not let you get anywhere close to the kind of performance that Materialize needs.

Yaroslav Tkachenko

Makes sense. For me personally, Materialize was the first company to build what you call a streaming database. Is that actually correct? Was there anyone else before? Has anyone inspired you?

Nikhil Benesch

I'm not aware of another company that was a streaming database before us. I was just checking the history, and KSQL was started in 2017, and they rebranded to ksqlDB in 2019 after Materialize was established. I like to think we had a hand in them wanting to become more of a database, but I believe we were the first to really go after the term streaming database. As far as inspiration, a number of databases and layers on top of databases have offered live subscriptions over the years. Technologies like Firebase or Supabase have support for subscriptions, but they tend to be very limited. You can only subscribe to base tables, for example, or you have to wire up the change notification yourself.

So, wanting to remove the limitations of subscriptions in existing data products was definitely a big part of the inspiration for Materialize.

Yaroslav Tkachenko

I clearly remember when Materialize was just a single binary, and now it's a big distributed cloud-native system. So, what was the biggest architectural change going from that single binary to a cloud-native system? And what was the most challenging part of that transition?

Nikhil Benesch

The biggest architectural change was the introduction of clusters. In the single binary world, you had a materialized process that did everything. If you were running it on a small machine, you could do small computations. And if you were running it on a large machine, you could do large computations, but you could only fundamentally scale that computation to the number of cores on that one computer. And there was no possibility of sharding your computation across multiple computers. Nor was there any ability to tolerate failures. If your computer went down, there was a network fault, or there was a memory corruption, etc., your computation was lost, and you had to restart Materialize on that computer in order to get it back.

We fixed all this in the cloud-native version of Materialize with the introduction of clusters. A cluster is like a virtual warehouse in Snowflake. It is an isolated pool of compute resources. And what's very cool about clusters is they are both horizontally scalable and fault tolerant. So you can run a cluster across multiple EC2 instances. They have t-shirt sizes like warehouses in Snowflake. You can have extra small clusters that are just a fraction of one EC2 instance. Or you can have a 3XL cluster, which is like four EC2 instances. And Materialize will automatically shard your computations across all of those EC2 instances. That solved one problem.

The other problem that we solved with clusters was fault tolerance. Materialize learned to have multiple replicas of a given cluster so you can specify your replication factor. Doesn't need to be one, it can be two or three. By default, it's one. We only have one copy of your computation running. But if you say a replication factor of two, we will turn on two identical pools of compute resources, each performing identical computations. And then, if you lose a machine in either pool, the other pool is still available, still has that computation running, and you won't notice an interruption in your queries. So that was the biggest architectural shift, and we had to really hammer on the control plane of Materialize in order to orchestrate these clusters, in order to handle getting multiple responses from the multiple replicas of the clusters, choosing whichever one came back first, restarting that cluster when it failed.

The horizontal scalability, funnily enough, wasn't too large of an architectural change because Timely and Differential Dataflow already scaled across multiple cores. And it turns out that if you can scale across multiple cores, scaling across multiple machines is not that much harder. The way the stream processor is implemented under the hood, it thinks of cores and machines as relatively similar. The unit of sharding is a core, not a machine. So, the only difference is the introduction of a network connection between workers, rather than everything being able to pass messages directly to one machine. But the way the underlying architecture is designed, nothing is really shared across threads. So, there was already a very firm boundary as workers communicated from thread to thread.

The interface there is pretty agnostic to whether you're doing thread to thread communication on the same machine, or process to process communication on different machines. The big challenge, though, was less technical and more philosophical, more oriented around the people. We had to get the entire team to stop thinking about Materialize as a single binary that you could run on your machine, and as a complex distributed system that was orchestrated across a fleet of machines. A lot of folks had to skill up on Kubernetes. We had to rewrite a lot of tooling to become Kubernetes aware, a lot of test frameworks to become Kubernetes aware. And we still have a lot of vestiges of the original system.

There are a lot of test suites where if we were to have implemented Materialize as a cloud-native system from scratch, we would have made them all Kubernetes-based. But we have three different test suites for Materialize. One that still thinks of Materialize as a single binary system, one that is somewhat cloud-aware, and one that is 100% cloud-aware and runs against an actual AWS account. But we never would have had three totally distinct test suites if we'd started with a cloud-native system from the beginning.

Yaroslav Tkachenko

You and Arjun worked on CockroachDB before. How did that influence your work at Materialize?

Nikhil Benesch

I think it's really hard to say the depths to which were influenced by CockroachDB. So much of the engineering culture and ethos at Materialize comes from CockroachDB, I think in ways that Arjun and I don't even understand. The big things that jump out to me are a focus on code quality and correctness. CockroachDB had a very strong culture of code reviews, writing good commit messages, writing design documents for large features. And we imported a lot of those processes into Materialize. I think if you trace the lineage back, they go all the way to Google. The CockroachDB co-founders all came from Google, all worked on some of the largest infrastructure projects that Google had. They brought those to CockroachDB, and then we brought those processes to Materialize.

The other thing that was part of the CockroachDB philosophy was correctness. We care a lot about correctness at Materialize, even though Frank doesn't come from CockroachDB. Timely and Differential take correctness as an axiom, so that permeates throughout the culture. We don't like to take shortcuts that would impact correctness. We would never put performance ahead of correctness. It's always correctness first, performance second, and I think we owe a lot of that culture to CockroachDB. One of the defining features of CockroachDB relative to other technologies is that you get serializability for sure and something close to strict serializability. The exact isolation level is hard to define, but it's much stronger than what you get with other distributed SQL databases like PlanetScale. And I think that really matters for folks who want to build applications that work correctly.

The other thing that I mentioned before is that CockroachDB showed how well basing a SQL dialect off of Postgres worked, and we wholesale imported that decision into Materialize as well.

Yaroslav Tkachenko

Using the Postgres dialect seems to be a no-brainer nowadays. A lot of new databases start with that by default.

Nikhil Benesch

It really is. It's fun to be part of such a large cohort that's building either on top of Postgres directly or mirroring Postgres's syntax and semantics. One of the concrete things we learned from CockroachDB is that it's better not to deviate from Postgres. CockroachDB had a couple of places where they tried to improve upon Postgres semantics and then later realized that there is some other Postgres feature that relied on the original feature being implemented in this particular way. At Materialize, we almost never deviate from Postgres semantics. And where we do, we need to have a very good reason for doing so and really think through the long-term implications of what that deviation might cause down the road.

Yaroslav Tkachenko

So, let's talk more about correctness. When I first learned about Materialize and compared it with traditional data processing frameworks like Spark and Flink, strong consistency and data correctness guarantees immediately stood out. And now, after operating for a while and onboarding some customers, how important do you think those guarantees are? Is it something that customers generally demand? In the end, a lot of Big Tech companies and a lot of companies in general successfully use Spark and Flink.

Nikhil Benesch

I am ever more convinced that everyone wants or needs correctness. And if you don't, it's largely that you haven't been looking hard enough. Every now and then, we'll have a customer who will say: “I can live with a little bit of incorrectness”, but it's not that they're happy about it. It's that they have resigned themselves to the fate of inconsistency. They can tolerate a little bit of deviation from the correct answer, but they often want to be 97%, 98% correct. And, of course, what that means depends on the application. But if they have some sort of reconciliation process where their streaming pipeline is compared to their batch pipeline, they want to see that 97%, 98% of the rows match the batch pipeline.

And then for a lot of the other customers we have, that's absolutely unacceptable. They need it to be a 100% match. And the only difference between customers is how closely they're looking. And what we find is that a lot of people give up on correctness because they have to, not because they want to. That said, it's a really hard problem. If you hold Spark and Flink just right, you can coax correct answers out of them, especially if you can tolerate eventual consistency. If there's some way you can pause the upstream inputs, let the stream processor converge on the right answer, and look at that, then you can get correct results out of these stream processors. And I think a fair number of companies, for their correctness-sensitive pipelines, do something like this where they pay very close attention to their watermarking and checkpointing strategy to make sure that their data is in alignment.

And then a lot of folks who are a little bit less correctness-sensitive just don't look and just hope that it works out. And there's something to that. But because we are in the business of providing data infrastructure, we want to take the time to really get it right. We think it is our responsibility to build a system that can be used correctly without too much work. And the easier we make it to get correct answers out of Materialize, the more people stand up and say: ”great, this is the system that I want because I don't have to work that much harder to get correct answers out of it”.

I do want to dig in a little bit into the end-to-end consistency too, because while this is not something that you can get out of Materialize today, we have a line of sight to providing end-to-end consistency when using Materialize with particular upstream sources. And it's the easiest to see with Postgres. Today, if you create a Postgres source in Materialize, we ingest asynchronously so you can write to Postgres and then go read a view in Materialize that is doing some computation on top of your Postgres source, but there's no guarantee that you will see that write that you know was committed to Postgres. It will show up in Materialize at some point in the future.

Now, in Materialize, that's often just a few seconds. We aim for low single-digit seconds of lag, end to end, but it's still not a guarantee that you will see it immediately after you know that write was committed. But we have an open work item on something called real-time recency, which is our word for giving users end-to-end consistency. When it’s enabled, every read you do against Materialize reflects all of the data, at least all of the data that was present in Postgres at some point during that operation. So, it gives you read-after-write semantics across Postgres and Materialize. You do a write to Postgres, you see it commit. Then, if you go to Materialize, you know that every single query you do against Materialize will reflect that value.

This is possible with Postgres because Postgres takes consistency very seriously, and we can plumb that consistency guarantee through to Materialize, which also takes consistency very seriously. It's less clear how to make this work with Kafka. Kafka is a lower-level tool. And if you have multiple Kafka topics, there's no guarantee of consistency across those Kafka topics unless you have your own bespoke consistency protocol where maybe you're writing transaction information to a third Kafka topic, and Materialize has no inherent knowledge of how you have stitched all of that together. So, with Kafka, it's less clear how we would be able to offer end-to-end consistency. We can do it on a single topic basis, but applications usually consist of multiple topics.

Yaroslav Tkachenko

That’s fascinating. Also, I remember seeing two-phase commit support in Kafka coming sometime soon. So that might be something to look at.

Nikhil Benesch

A two-phase commit usually works like this: the upstream system reaches out to the downstream system and records that a transaction is about to commit before it actually commits that transaction. And this is a little scary because you have to trust that the downstream system is online. Because if it goes down now, the upstream system can't make progress. And every two-phase commit protocol across multiple systems has a defect where you can end up in this stuck state where one system fails in an unexpected way that requires manual action to go unstick.

We're planning to do real-time recency in Materialize differently: instead of a two-phase commit on the upstream system, Materialize is responsible for reaching out to the upstream system. So, the upstream system is not dependent on Materialize making progress. Postgres can commit all day. And every time Materialize responds to a query, it goes and asks Postgres (or Kafka): what's the latest LSN (or offset)? When it gets a response, it waits for the results to reach a certain offset. This can still be pretty efficient because you don't need to run this query for every single query against Materialize. You don't always need to be checking in with the upstream system.

You just need to make sure that before you respond to any query in Materialize, you have checked while that query is outstanding at least once with the upstream system. So even if you're doing thousands of queries against Materialize a second, we batch those queries together, check the LSN for the batch in Postgres and then release that batch. As you can see, it can still be very efficient, and it does not have any liveness implications on the upstream systems. We're pretty excited about the way we're planning to build this. Kyle Kingsbury, who runs the Jepsen project, came in and did some consulting for us about a year ago and helped us think through this. And he was the one who invented the term real-time recency.

We think it has a lot of properties to like over a two-phase commit. It doesn't require the upstream system to build anything and doesn't have these liveness faults. So we think that's the way forward.

Yaroslav Tkachenko

Okay, very exciting! Now, let's switch gears. You briefly mentioned at the beginning that Materialize is, first of all, a database. SQL is a language you use, and obviously, you can do a lot with it. I remember seeing a talk from Seth Wiesman explaining how you can build complete application features just with SQL. But it's not always enough. So, do you allow any way to extend Materialize, for example, with User-Defined Functions (UDFs)? If not, any plans for that?

Nikhil Benesch

No UDF support today. And the reason for that has largely been that when folks show up with a good use case for a SQL function that doesn't exist, we can just add it to Materialize. So, as long as the request is something that's fairly general purpose, we can just add that SQL function to Materialize without having a full-blown UDF implementation. That said, support for UDFs is on the roadmap. It's something we know that we will absolutely do eventually. The plan is to have it use WASM. So, if you can compile your UDF to WASM following the interface that we need, you can drop that module into Materialize and run it as part of your data flows.

The big challenge is determinism. You, as the creator of a UDF, need to make sure that it is a pure function of its inputs. And there are a lot of common SQL functions that are not deterministic, like generating UUIDs, for example. Materialize’s computation engine relies on computations being entirely deterministic. The correctness guarantees go out the window in the face of non-determinism. So that would be the one big caveat of UDFs in Materialize. But beyond that, there's no reason that we couldn't allow users to bring their own functions and run them in Materialize1.

Yaroslav Tkachenko

Let's say I’m a customer, I use Materialize, I'm quite happy, but I'm only able to implement 80% of my cases because I need access to some low-level primitives like state and timers. So, for the remaining 20%, I still need to fall back to something like Flink. Also, both Flink and Spark keep investing in the SQL ecosystem. Flink SQL is becoming more powerful and efficient. It might be hard for someone to justify using Flink SQL and Materialize together. Do you think that's inevitable and we'll just stuck with running multiple technologies, or is there a future where we have only one?

Nikhil Benesch

I would love to see a future where there's just one. I think getting to 100% of organizations run 100% of their operational workloads on Materialize is implausible, though I'd be happy if we could get that number to 80%. So for 80% of organizations, they can run 100% of their operational workloads on Materialize, which I feel like is a much better future to drive towards than 100% of organizations run Materialize and Flink and they put 80% of their workloads on Materialize and 20% on Flink. And Materialize has some plans to gain additional temporal capabilities like ASOF joins. That's another big reason that people need to reach for Spark or Flink today.

I think we will see the set of use cases that Materialize can handle increase substantially over time, and many organizations that do not have extremely esoteric processing needs will be able to run all of their operational workloads on Materialize. But we're definitely keeping an eye on Flink SQL and Spark SQL as well. They're impressive technologies, and they're certainly building out incremental view maintenance capabilities inside of those engines as well.

Yaroslav Tkachenko

Great answer. Final question: any comments on the BYOC (Bring Your Own Cloud) offering? It has been a pretty hot topic recently. Will we see Materialize as BYOC?

Nikhil Benesch

I have a bit of a spicy take. I think BYOC is a bit of a fad that is not the long-term answer to cloud services. I think BYOC is most valuable for relatively small companies that need to bootstrap trust in their technology and can't sell enterprises and their security reviews on a relatively small, untrusted third-party cloud vendor. And BYOC lets them go through a much more limited security review process. But I think there's a lot of downsides. The customer has to take on a large part of the burden of managing that infrastructure, and the vendor doesn't have the ability to debug problems with that infrastructure.

And on top of it, the security posture is not that much better. The customer usually deploys a highly trusted agent that has elevated permissions into their VPC, that receives commands from the vendor and deploys them in the customer’s account. A sufficiently motivated attacker can still exfiltrate all of the data in that customer's VPC. Granted, it's harder. The attack is not as easy to pull off in the BYOC model, but I don't think it fundamentally improves the security posture that much, and it makes both the customer's and vendor’s lives harder.

So, I suspect that BYOC is not the long-term form of most cloud-native products. I think the future looks a lot more like Snowflake and Materialize: you build trust with customers, you prove to them that you can actually take better care of their data than they can themselves. And if you can build that trust, then the customers would rather not deal with the infrastructure. If you can prove that you're a reliable and secure handler of their data and infrastructure. And for the vendor, it's so much easier to diagnose problems with the infrastructure when it is all under your control. It's not off the table for Materialize forever. But right now, we've been able to move very quickly because we are not supporting a BYOC option where we are debugging remotely. Everything is under our control, and we can diagnose performance problems using a much wider array of tools than we would be able to otherwise.

Of course, we've had to put in a lot of work to build trust with our customers, a lot of privacy controls, a lot of security controls. I can understand why smaller companies wouldn't want to go through that to get something out the door, but I don't think BYOC is where the cloud space is headed long term.

Yaroslav Tkachenko

Thank you for your time!

In fact, the work on WASM-based UDFs has started.

Data Streaming Journey

Interview with Nikhil Benesch (Materialize)

Discussing many aspects of Materialize including consistency and correctness.