'What's the data format of Athena's .csv.metadata files?

What's the data format of the .csv.metadata files written by Amazon Athena?

Alongside the output file of every query there is a metadata file. It looks like it describes the schema of the result. I assume this is what Athena uses to create the ResultSet.ResultSetMetadata part of the response of GetQueryResults requests, and that it is somehow created by Hive or Presto.

2019-04-23 14:51:29         27 e7629796-9b91-476a-bfb7-2fe6c9595bce.csv
2019-04-23 14:51:29         56 e7629796-9b91-476a-bfb7-2fe6c9595bce.csv.metadata
2019-04-27 14:23:53    1591958 ebe432ac-db7b-4ea1-b5de-529350d1a02a.csv
2019-04-27 14:23:53        712 ebe432ac-db7b-4ea1-b5de-529350d1a02a.csv.metadata
2019-04-25 16:31:23      10152 eeb6f4ab-9ac3-4a7e-81c4-0cc155187acb.csv
2019-04-25 16:31:23        494 eeb6f4ab-9ac3-4a7e-81c4-0cc155187acb.csv.metadata
2019-04-25 22:30:56   22384376 f0160ff7-e5b3-466d-926a-a660a5208c5f.csv
2019-04-25 22:30:56        494 f0160ff7-e5b3-466d-926a-a660a5208c5f.csv.metadata

Here's a hexdump of e7629796-9b91-476a-bfb7-2fe6c9595bce.csv.metadata from the listing above:

00000000  0a 1b 32 30 31 39 30 34  32 33 5f 31 32 35 31 32  |..20190423_12512|
00000010  38 5f 30 30 30 30 31 5f  65 68 74 75 72 22 19 0a  |8_00001_ehtur"..|
00000020  04 68 69 76 65 22 03 61  72 79 2a 03 61 72 79 32  |.hive".ary*.ary2|
00000030  05 61 72 72 61 79 48 03                           |.arrayH.|

It's ResultSet.ResultSetMetadata looks like this:

"ResultSetMetadata": {
  "ColumnInfo": [
    {
      "CatalogName": "hive",
      "SchemaName": "",
      "TableName": "",
      "Name": "ary",
      "Label": "ary",
      "Type": "array",
      "Precision": 0,
      "Scale": 0,
      "Nullable": "UNKNOWN",
      "CaseSensitive": false
    }
  ]
}

I realise that these are internal file to Athena, but I'm curious.



Solution 1:[1]

Metadata files are not human readable (binary format) and are meant for Athena.

From AWS documentation:

DML and DDL query metadata files are saved in binary format and are not human readable. The file extension corresponds to the related query results file. Athena uses the metadata when reading query results using the GetQueryResults action. Although these files can be deleted, we do not recommend it because important information about the query is lost.

For more details look into "Identifying query output files" section in : https://docs.aws.amazon.com/athena/latest/ug/querying.htmlIdentifying

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Ash