'Amazon Athena partition with colon(:) is not working
When creating partition in Athena, I tried to use the date in the format (yyyy-MM-ddTHH:mm:ssZ) then I am not able to query the data
Step 1: Create table
CREATE EXTERNAL TABLE my_info (
id STRING,
name STRING
) PARTITIONED BY (
part string
) STORED AS ORC LOCATION 's3://bucket1/data' tblproperties ("orc.compress"="SNAPPY");
Step 2: Create folder like below and added the files.
S3://bucket1/data/part=2019-11-12T14:15:16Z
Step 3: Refresh partition MSCK REPAIR TABLE my_info
Step 4: Query the data SELECT * FROM my_info
With this I am not able to query any data
If I change the folder to format (yyyy-MM-ddTHH)
without ’:’ in Step 2
s3://bucket1/data/part=2019-11-12T14
Then I am able to get the results.
Any idea about why this is not working.
Solution 1:[1]
This is because when you create the partitioned table the partitioning is implemented as part of the S3 path e.g. for s3://bucket1/data/part=2019-11-12T14:15:16Z
the part=2019-11-12T14:15:16Z
section is an S3 path that Athena interprets as a partition when querying the data.
S3 path names have some restrictions on the characters that can be used:
The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX. Some of these are non-printable characters and your browser might not handle them, which also requires special handling:
Ampersand ("&")
Dollar ("$")
ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)
'At' symbol ("@")
Equals ("=")
Semicolon (";")
Colon (":")
Plus ("+")
Space – Significant sequences of spaces may be lost in some uses (especially multiple spaces)
Comma (",")
Question mark ("?")
In this case it's probably the colons in the path that are not being interpreted by Presto/Athena. To work around this you can use an alternative dividing character in the timestamp e.g. part=2019-11-12--14-15-16
or omit it altogether.
Solution 2:[2]
It seems you can use an URL encoded colon (%3A).
Further, if you which to use timestamp
as the partition type instead of string, make sure to use a "java.sql.Timestamp compatible format" as documented for the CREATE TABLE statement.
So the final url would be s3://bucket1/data/part=2019-11-12 14%3A15%3A16/
.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Nathan Griffiths |
Solution 2 | Christoph Hösler |