'What's the data format of Athena's .csv.metadata files?
What's the data format of the .csv.metadata
files written by Amazon Athena?
Alongside the output file of every query there is a metadata file. It looks like it describes the schema of the result. I assume this is what Athena uses to create the ResultSet.ResultSetMetadata
part of the response of GetQueryResults
requests, and that it is somehow created by Hive or Presto.
2019-04-23 14:51:29 27 e7629796-9b91-476a-bfb7-2fe6c9595bce.csv
2019-04-23 14:51:29 56 e7629796-9b91-476a-bfb7-2fe6c9595bce.csv.metadata
2019-04-27 14:23:53 1591958 ebe432ac-db7b-4ea1-b5de-529350d1a02a.csv
2019-04-27 14:23:53 712 ebe432ac-db7b-4ea1-b5de-529350d1a02a.csv.metadata
2019-04-25 16:31:23 10152 eeb6f4ab-9ac3-4a7e-81c4-0cc155187acb.csv
2019-04-25 16:31:23 494 eeb6f4ab-9ac3-4a7e-81c4-0cc155187acb.csv.metadata
2019-04-25 22:30:56 22384376 f0160ff7-e5b3-466d-926a-a660a5208c5f.csv
2019-04-25 22:30:56 494 f0160ff7-e5b3-466d-926a-a660a5208c5f.csv.metadata
Here's a hexdump of e7629796-9b91-476a-bfb7-2fe6c9595bce.csv.metadata
from the listing above:
00000000 0a 1b 32 30 31 39 30 34 32 33 5f 31 32 35 31 32 |..20190423_12512|
00000010 38 5f 30 30 30 30 31 5f 65 68 74 75 72 22 19 0a |8_00001_ehtur"..|
00000020 04 68 69 76 65 22 03 61 72 79 2a 03 61 72 79 32 |.hive".ary*.ary2|
00000030 05 61 72 72 61 79 48 03 |.arrayH.|
It's ResultSet.ResultSetMetadata
looks like this:
"ResultSetMetadata": {
"ColumnInfo": [
{
"CatalogName": "hive",
"SchemaName": "",
"TableName": "",
"Name": "ary",
"Label": "ary",
"Type": "array",
"Precision": 0,
"Scale": 0,
"Nullable": "UNKNOWN",
"CaseSensitive": false
}
]
}
I realise that these are internal file to Athena, but I'm curious.
Solution 1:[1]
Metadata files are not human readable (binary format) and are meant for Athena.
From AWS documentation:
DML and DDL query metadata files are saved in binary format and are not human readable. The file extension corresponds to the related query results file. Athena uses the metadata when reading query results using the GetQueryResults action. Although these files can be deleted, we do not recommend it because important information about the query is lost.
For more details look into "Identifying query output files" section in : https://docs.aws.amazon.com/athena/latest/ug/querying.htmlIdentifying
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Ash |