Building Data Pipelines With Kafka

In this case we assign a set of consumers to a consumer group id, Kafka will make sure that a message will be sent to only one of the consumers.Let’s (Actually) Get Started:Run Zookeeper and Kafka ServerProducer: Let’s Write messages to Kafka Server on to the topic voice.Create an instance of KafkaProducer which is thread-safe..The instance can be shared between multiple threads without any additional overhead.Why KafkaProducer with key value pair?Key could be any datatype(for ex: Integer or String for most of the cases) which could be used to designate a partition in the topic..Value could be any datatype(Avro or Json etc.), which ultimately will encoded to byte array by Kafka before sending to broker by producer and also decoded to original dataype by Kafka before sending it to consumer.Simulate text conversation by using a file which contain movie id and movie summary as tab delimited linesFor each line in the above file plot_summaries.txt below action is performed, which also involves sending data to brokerThe operation .send does not invoke a network call immediately..This is balanced by batch.size and buffer.memory configurations..Complete code for the producer can be found at ListenProducerConsumer: Lets read data from the topic voice.KafkaConsumer is not thread-safe..So, a singile instance of KafkaConsumer cannot be shared across multiple threads..Spinning up of new consumer should happen after child threads are created..Complete code for the producer is at ListenConsumer.There are two ways to determine the number of threads. 1..No..of Threads = No..of Partitions: In this case, we spin up a consumer thread for each partition in a topic..It could ensure ordering guaranty within each partition, however, there won’t be any control on the no.. More details

Leave a Reply