'Extract particular data from Kafka topic

I'm doing real time streaming on Twitter and wonder is there a way to extract only messages and certain values from Kafka topic?



Solution 1:[1]

You can use ksqlDB to do this. For example:

ksql> CREATE STREAM TWEETS WITH (KAFKA_TOPIC='twitter_01', VALUE_FORMAT='Avro');

ksql> SELECT USER->SCREENNAME, TEXT FROM TWEETS WHERE TEXT LIKE '%cool%' EMIT CHANGES;

+-------------------+------------------------------------------------------------------------------------------+
|USER__SCREENNAME   |TEXT                                                                                      |
+-------------------+------------------------------------------------------------------------------------------+
|MobileGist         |This is super cool!! Great work @houchens_kim!                                            |

You can also build a new topic with the results of this if you want

ksql> CREATE STREAM COOL_TWEETS AS SELECT USER->SCREENNAME, TEXT FROM TWEETS WHERE TEXT LIKE '%cool%' EMIT CHANGES;

Since you tagged Python it's worth pointing out that you can call ksqlDB using its REST API from Python. Here's an example.

Ref: Exploring ksqlDB with Twitter Data

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Robin Moffatt