'how to read json.snappy file from athena
I have input file in s3 bucket with .json.snappy compression and I am trying to read through athena table. I tried using different serde 'org.apache.hive.hcatalog.data.JsonSerDe' & 'org.openx.data.jsonserde.JsonSerDe' but it didn't work, Athena table showing zero records when queried.
Also i tried to see sample data from s3 select but it didn't work. Is there a way to read this data.
Solution 1:[1]
After debugging for few hours I found input file was compressed with raw snappy format and then encoded with base64. Hence, Athena was not able to read this file. So as a workaround solution, I decoded and uncompressed file using python-snappy library and uploaded to s3 bucket.
import snappy
import base64
import os
snappy_file = 'input.json.snappy'
with open (snappy_file, "r") as input_file:
data = input_file.read()
# decoding base64 data
data = base64.b64decode(data)
# uncompress raw-snappy
uncompressed = snappy.uncompress(data)
print(uncompressed.decode('utf-8'))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | PB22 |