'Presto fails to import PARQUET files from S3

I have a presto table that imports PARQUET files based on partitions from s3 as follows:

create table hive.data.datadump
( 
    tUnixEpoch varchar,
    tDateTime varchar,
    temperature varchar,
    series varchar,
    sno varchar,
    date date
    )
 WITH (
 format = 'PARQUET',
 partitioned_by = ARRAY['series','sno','date'], 
 external_location = 's3a://dev/files');

The S3 folder structure where the parquet files are stored looks like:

s3a://dev/files/series=S5/sno=242=/date=2020-1-23

and the partition starts from series.

The original code in pyspark that produces the parquet files has all the schema as String type and I am trying to import that as a string but when I run my create script in Presto, it successfully created the table but fails to import the data.

On Running,

select * from hive.data.datadump;

I get the following error:

[Code: 16777224, SQL State: ]  Query failed (#20200123_191741_00077_tpmd5): The column tunixepoch is declared as type string, but the Parquet file declares the column as type DOUBLE[Code: 16777224, SQL State: ]  Query failed (#20200123_191741_00077_tpmd5): The column tunixepoch is declared as type string, but the Parquet file declares the column as type DOUBLE

Can you guys help to resolve this issue? Thank You in advance!



Solution 1:[1]

I ran into same issues and I found out that this was caused by one of the records in my source doesnt have a matching datatype for the column it was complaining about. I am sure this is just data. You need to trap the exact record which doesnt have the right type.

Solution 2:[2]

This might have been solved, just for info, this could be due to column declaration mismatch between hive and parquet file. To use the column names instead of the order, use the property -

hive.parquet.use-column-names=true

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kishore
Solution 2 DharmanBot