'how to read json.snappy file from athena

I have input file in s3 bucket with .json.snappy compression and I am trying to read through athena table. I tried using different serde 'org.apache.hive.hcatalog.data.JsonSerDe' & 'org.openx.data.jsonserde.JsonSerDe' but it didn't work, Athena table showing zero records when queried.

Also i tried to see sample data from s3 select but it didn't work. Is there a way to read this data.



Solution 1:[1]

After debugging for few hours I found input file was compressed with raw snappy format and then encoded with base64. Hence, Athena was not able to read this file. So as a workaround solution, I decoded and uncompressed file using python-snappy library and uploaded to s3 bucket.

import snappy
import base64
import os
snappy_file = 'input.json.snappy'
with open (snappy_file, "r") as input_file:
    data = input_file.read()
    # decoding base64 data
    data = base64.b64decode(data)
    # uncompress raw-snappy 
    uncompressed = snappy.uncompress(data)
    print(uncompressed.decode('utf-8'))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 PB22