'Parquet Binary storing value in encoded format
I'm creating a parquet file from Java using org.apache.parquet.*. No issues with other data types, but when a binary value is written and I cat the parquet file using parquet-tools, it is showing the value in encoded format. Because of that, the parquet is not processed in my system further.
Code block:
case BINARY:
recordConsumer.addBinary(stringToBinary(val));
break;
AND
private Binary stringToBinary(Object value) {
return Binary.fromString(value.toString());
}
Schema used is:
message m {
required INT64 id;
required binary username;
required boolean active;
}
When I cat:
parquet-tools cat <parquetFileName>
I see something like this:
id = 1
username = TmFtZTE=
active = true
id = 2
username = TmFtZTI=
active = false
I want to see the actual Username passed and not the encoded strings.
Solution 1:[1]
Try this in your schema required binary username (UTF8)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Saniya Arab |