1. Website activity tracking

According to the creators of Apache Kafka, the original use case for Kafka was to track website activity — including page views, searches, uploads or other actions users may take. This kind of activity tracking often requires a very high volume of throughput because messages are generated for each user action.

This article follows a scenario with a simple website. Users can click around, sign in, write blog articles, upload images to articles and publish those articles. When an event happens in the blog (e.g when someone logs in, when someone presses a button or when someone uploads an image to the article) a tracking event and information about the event will be placed into a record, and the record will be placed on a specified Kafka topic. One topic is named “click” and one is named “upload”.

Partitioning setup is based on the user’s id. A user with id 0, will map to partition 0, and the user with id 1 to partition 1, etc. The “click” topic will be split up into three partitions (three users) on two different machines.

  1. A user with user-id 0 clicks on a button on the website.

2. Web Shop

Think of a webshop with a ‘similar products’ feature on the site. To make this work, each action performed by a consumer is recorded and sent to Kafka. A separate application comes along and consumes these messages, filtering out the products the consumer has shown an interest in and gathering information on similar products. This ‘similar product’ information is then sent back to the webshop for it to display to the consumer in real-time.

Alternatively, since all data is persistent in Kafka, a batch job can run overnight on the ‘similar product’ information gathered by the system, generating an email for the customer with suggestions of products.

3. Application health monitoring

Servers can be monitored and set to trigger alarms in case of rapid changes in usage or system faults. Information from server agents can be combined with the server syslog and sent to a Kafka cluster. Through Kafka Streams, these topics can be joined and set to trigger alarms based on usage thresholds, containing full information for easier troubleshooting of system problems before they become catastrophic.

4. Kafka as a Database

Apache Kafka has another interesting feature not found in RabbitMQ — log compaction. Log compaction ensures that Kafka always retains the last known value for each record key. Kafka simply keeps the latest version of a record and deletes the older versions with the same key.

An example of log compaction use is when displaying the latest status of a cluster among thousands of clusters running. The current status of the cluster is written into Kafka and the topic is configured to compact the records. When this topic is consumed, it displays the latest status first and then a continuous stream of new statuses.

5. Message queue

Kafka works well as a replacement for more traditional message brokers, like RabbitMQ. Messaging decouples processes and creates a highly scalable system.

Instead of building one large application, decoupling involves taking different parts of an application and only communicating between them asynchronously with messages. That way different parts of the application can evolve independently, be written in different languages and/or maintained by separated developer teams. In comparison to many messaging systems, Kafka has better throughput. It has built-in partitioning, replication, and fault-tolerance that makes it a good solution for large-scale message processing applications.

A lot of interesting use cases and information can be found in the documentation for Apache Kafka.

6. Publish and subscribe messages

To be able to communicate with Apache Kafka you need a library that understands Apache Kafka. You need to download the client-library for the programming language that you intend to use for your applications. A client-library is an applications programming interface (API) for use in writing client applications. A client library has several methods that can be used, in this case, to communicate with Apache Kafka. The methods should be used when you, for example, connect to the Kafka broker (using the given parameters, host name for example) or when you publish a message to a topic. Both consumers and producers can be written in any language that has a Kafka client written for it.

Programming isn’t about what you know; it’s about what you can figure out.