'Extract particular data from Kafka topic
I'm doing real time streaming on Twitter and wonder is there a way to extract only messages and certain values from Kafka topic?
Solution 1:[1]
You can use ksqlDB to do this. For example:
ksql> CREATE STREAM TWEETS WITH (KAFKA_TOPIC='twitter_01', VALUE_FORMAT='Avro');
ksql> SELECT USER->SCREENNAME, TEXT FROM TWEETS WHERE TEXT LIKE '%cool%' EMIT CHANGES;
+-------------------+------------------------------------------------------------------------------------------+
|USER__SCREENNAME |TEXT |
+-------------------+------------------------------------------------------------------------------------------+
|MobileGist |This is super cool!! Great work @houchens_kim! |
You can also build a new topic with the results of this if you want
ksql> CREATE STREAM COOL_TWEETS AS SELECT USER->SCREENNAME, TEXT FROM TWEETS WHERE TEXT LIKE '%cool%' EMIT CHANGES;
Since you tagged Python it's worth pointing out that you can call ksqlDB using its REST API from Python. Here's an example.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Robin Moffatt |