'Amazon Athena partition with colon(:) is not working

When creating partition in Athena, I tried to use the date in the format (yyyy-MM-ddTHH:mm:ssZ) then I am not able to query the data

Step 1: Create table

CREATE EXTERNAL TABLE my_info (
         id STRING,
         name STRING
) PARTITIONED BY (
        part string
) STORED AS ORC LOCATION 's3://bucket1/data' tblproperties ("orc.compress"="SNAPPY"); 

Step 2: Create folder like below and added the files.

S3://bucket1/data/part=2019-11-12T14:15:16Z

Step 3: Refresh partition MSCK REPAIR TABLE my_info

Step 4: Query the data SELECT * FROM my_info

With this I am not able to query any data

If I change the folder to format (yyyy-MM-ddTHH)

without ’:’ in Step 2

s3://bucket1/data/part=2019-11-12T14

Then I am able to get the results.

Any idea about why this is not working.



Solution 1:[1]

This is because when you create the partitioned table the partitioning is implemented as part of the S3 path e.g. for s3://bucket1/data/part=2019-11-12T14:15:16Z the part=2019-11-12T14:15:16Z section is an S3 path that Athena interprets as a partition when querying the data.

S3 path names have some restrictions on the characters that can be used:

The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX. Some of these are non-printable characters and your browser might not handle them, which also requires special handling:

Ampersand ("&")  
Dollar ("$")  
ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)  
'At' symbol ("@")  
Equals ("=")  
Semicolon (";")  
Colon (":")  
Plus ("+")  
Space – Significant sequences of spaces may be lost in some uses (especially multiple spaces)  
Comma (",")  
Question mark ("?")  

In this case it's probably the colons in the path that are not being interpreted by Presto/Athena. To work around this you can use an alternative dividing character in the timestamp e.g. part=2019-11-12--14-15-16 or omit it altogether.

Solution 2:[2]

It seems you can use an URL encoded colon (%3A).

Further, if you which to use timestamp as the partition type instead of string, make sure to use a "java.sql.Timestamp compatible format" as documented for the CREATE TABLE statement.

So the final url would be s3://bucket1/data/part=2019-11-12 14%3A15%3A16/.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nathan Griffiths
Solution 2 Christoph Hösler