'Table-Table Join duplicate entries
we are using kafka in production and I try to push the adoption and usage of KSQL in the same direction. But I already failed with one simple table-table join. I’ve tried with our production data first and ran in an issue. So I thought I missed something and moved back to the example from the confluent docs and ran in the same problem. I will explain my issue with the example data https://docs.confluent.io/current/ksql/docs/tutorials/basics-docker.html#table-table-join When I have created both tables and try to join the data it works, but as soon as I try to alter or add something I get new entries in my table. From every example I found at confluent or even at the youtube videos this is not suppose to happen.
Creating records
docker run --interactive --rm --network tutorials_default \
confluentinc/cp-kafkacat \
kafkacat -b kafka:39092 \
-t warehouse_location \
-K: \
-P <<EOF
1:{"warehouse_id":1,"city":"Leeds","country":"UK"}
2:{"warehouse_id":2,"city":"Sheffield","country":"UK"}
3:{"warehouse_id":3,"city":"Berlin","country":"Germany"}
EOF
docker run --interactive --rm --network tutorials_default \
confluentinc/cp-kafkacat \
kafkacat -b kafka:39092 \
-t warehouse_size \
-K: \
-P <<EOF
1:{"warehouse_id":1,"square_footage":16000}
2:{"warehouse_id":2,"square_footage":42000}
3:{"warehouse_id":3,"square_footage":94000}
EOF
Creating tables
CREATE TABLE WAREHOUSE_LOCATION (WAREHOUSE_ID INT, CITY VARCHAR, COUNTRY VARCHAR)
WITH (KAFKA_TOPIC='warehouse_location',
VALUE_FORMAT='JSON',
KEY='WAREHOUSE_ID');
CREATE TABLE WAREHOUSE_SIZE (WAREHOUSE_ID INT, SQUARE_FOOTAGE DOUBLE)
WITH (KAFKA_TOPIC='warehouse_size',
VALUE_FORMAT='JSON',
KEY='WAREHOUSE_ID');
Creating a joined table:
CREATE TABLE WH_U AS SELECT WL.WAREHOUSE_ID, WL.CITY, WL.COUNTRY, WS.SQUARE_FOOTAGE
FROM WAREHOUSE_LOCATION WL
LEFT JOIN WAREHOUSE_SIZE WS
ON WL.WAREHOUSE_ID=WS.WAREHOUSE_ID;
With this I get the expected results:
1 | Leeds | UK | 16000.0
2 | Sheffield | UK | 42000.0
3 | Berlin | Germany | 94000.0
But when I add or chnage records, this happens:
1566375174496 | 1 | 1 | Leeds | UK | 16000.0
1566375174496 | 2 | 2 | Sheffield | UK | 42000.0
1566375174496 | 3 | 3 | Berlin | Germany | 94000.0
1566375595372 | 4 | 4 | London | UK | null
1566375641291 | 4 | 4 | London | UK | 94000.0
1566375641291 | 1 | 1 | Leeds | UK | 1.0
I expected:
1566375174496 | 1 | 1 | Leeds | UK | 1.0
1566375174496 | 2 | 2 | Sheffield | UK | 42000.0
1566375174496 | 3 | 3 | Berlin | Germany | 94000.0
1566375641291 | 4 | 4 | London | UK | 94000.0
What am I missing?
SOLVED
The reason for this behaviour was a simple env in ksql server. KSQL_CACHE_MAX_BYTES_BUFFERING was set to 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|