Demystifying Apache Pulsar Deployment Complexity

The Pulsar deployment has many moving parts, making it harder to manage than Kafka.

May 03, 2024

The Pulsar deployment has many moving parts, making it harder to manage than Kafka.

Every Kafka user knows it, but it’s a common misconception that I’ll try to refute in this article.

Running Pulsar for Development

To start developing with Pulsar, you need to download the Pulsar distribution, unpack it, and run the pulsar standalone command. It starts all the required components as a single process.

If you don’t have Java installed in your system, you can run it as a Docker container:

docker run -it \
    -p 6650:6650 \
    -p 8080:8080 \
    apachepulsar/pulsar:latest \
    bin/pulsar standalone

Sometimes it’s useful to have a multi-cluster multi-broker Pulsar instance on a developer machine. In a few evenings, I wrote a simple tool that makes it possible in two commands:

puls create --num-clusters 2 --num-bookies 2 --num-brokers 2 multi-cluster
puls start multi-cluster

Production Deployment

I won’t claim that running Pulsar is the simplest thing, but in production, the deployment complexity is very similar to Kafka’s. Pulsar’s operations are usually simpler than Kafka’s, especially when it comes to scaling.

Let's take a closer look at what moving pieces we need to deploy Kafka to production.

Consensus and Metadata

Historically both Kafka and Pulsar relied on Apache Zookeeper for consensus. Kafka broker recently implemented the KRaft consensus protocol that eliminates dependency on Zookeeper. Sounds good, right? No need to run and manage a separate Zookeeper cluster.

In reality, KRaft has significant flaws according to documentation:

Combined mode, where a Kafka node acts as a broker and also a KRaft controller, is not currently supported for production workloads.
There is currently no support for quorum reconfiguration, meaning you cannot add more KRaft controllers, or remove existing ones.

That means you still need to manage a dedicated Kafka broker pool where each node has the controller role and internally runs the KRaft server.

Until there has been some fundamental breakthrough in the distributed systems space, all the operational problems remain about the same as with the Zookeeper. Impression of KRaft's simplicity comes from tight adjustment to the specific system (Kafka) and good documentation and training materials quality.

StreamNative recently released the Oxia metadata store under the Apache 2.0 license, which is aimed to replace Zookeeper for Pulsar, simplify operations, and make a step toward supporting more than 100 million topics in Pulsar.

Schema Registry

A schema registry is a component that enforces data type safety and schema compatibility guarantees, which makes it a necessary component for any production system. Pulsar has a built-in schema registry. With Kafka, you must run and manage it as a separate component.

Integration with External Systems

Kafka Connect is a widely used feature for streaming data between Kafka and other systems like databases and other messaging systems. If you want to use Kafka Connect, you must deploy and manage a separate cluster. Pulsar has a similar feature called Pulsar IO, where there is no need to deploy and manage an extra cluster for that.

Geo-Replication 🌎 🌍 🌏

You may want to run multiple connected Kafka or Pulsar to achieve the following goals:

Fault tolerance. If one of the clusters becomes unavailable, your system should be able to continue to operate.
Bring data closer to consumers, which is useful for latency-sensitive applications.

In the case of using Kafka, you’ll most likely use MirrorMaker2, which typically runs as a dedicated Kafka cluster with the Kafka Connect component. Yes, you got it right. If you have 2 Kafka clusters and want to set up geo-replication, now you have 3 Kafka clusters to manage.

Geo-replication is built-in in Pulsar. Despite the widespread opinion, you don’t need a per-cluster metadata store plus a global metadata store to run a multi-cluster Pulsar instance. Having a single global metadata store is enough, and you already have it if you have up and running Pulsar. No need to change anything! Typically, setting up geo-replication in Pulsar requires running a few admin CLI commands for each Pulsar cluster.

Multi-Tenancy

You may wonder why multi-tenancy is related to the deployment complexity. I’ll simply quote the Confluent’s documentation:

Your business may choose to support multi-tenancy on a single Dedicated Cluster for the following reasons:
Lower cost: Most use cases don’t need the full capacity of a Kafka cluster. You can minimize fixed costs by sharing a single cluster across workloads, and spread the costs across teams, even when those workloads are kept separate.
Simpler operations: Using one cluster instead of several means means a narrower scope of access controls and credentials to manage. Separate clusters may have different networking configurations, API keys, schema registries, roles, and ACLs.
Greater reuse: Teams can most easily reuse existing data when they’re already using the cluster - it’s a simple access control change to grant them access. Reusing topics and events created by other teams lets teams deliver value more quickly.

It is so well written that I haven’t much to add here 🙂

Kafka open-source doesn’t have built-in multi-tenancy. You can either come to terms with the Confluent vendor lock-in or continue desperately managing a separate Kafka cluster for each tenant. Or try Pulsar, which has a built-in multi-tenancy feature in its open-source distribution.

Summary

Let’s summarize the list of components required to run each system in production.

Kafka

Stateful brokers cluster responsible for KRaft - 3 nodes
Stateful brokers cluster - 3 nodes
Schema Registry node - 3 nodes
Kafka Connect worker nodes - 3 nodes

Pulsar

Stateful metadata store cluster i.e. Zookeeper or Oxia - 3 nodes
Stateful storage nodes cluster - 3 nodes
Broker nodes - 3 nodes

Interesting, isn’t it?

Pulsar brokers are stateless and don’t add any operational complexity in practice. You can seamlessly scale brokers up and down 100 times a day, which is impossible with a Kafka cluster with any reasonable amount of data.

By the way, would you want to have a service that provides cheap Pulsar instances for development and testing? Like for $10-15 per month you could get a single-cluster instance with 3 brokers, 3 bookies, or two-custer instance with 2 brokers, 2 bookies for the same price.
With 99.00% SLA, data loss guarantee 😄, isn’t suitable for performance testing. Just a dumb cheap thing that can save your local machine resources and money.
If you need it during working hours only, the cost can be even lower - about $5 per month. I consider making such a service if the demand is clear.
Loading...

Bonus

Pulsar broker has the enableRunBookieTogether configuration option that allows running the storage nodes as part of the broker process. It simplifies deployment for applications that don’t benefit from Pulsar’s compute and storage separation.

Pulsar

Stateful metadata store cluster i.e. Zookeeper or Oxia - 3 nodes
Stateful broker nodes cluster (similar to Kafka) - 3 node

👋 Join the Apache Pulsar community on Slack or GitHub Discussions

Kiryl’s Substack

Discussion about this post