'Reading schema & metadata from a parquet file
I am reading a third-party parquet file using parquetjs-lite
const parquet = require("parquetjs-lite");
:
reader = await parquet.ParquetReader.openFile(fileName);
cursor = reader.getCursor()
:
I am able to read the records (and rowCount) but how can I get the Schema and metadata info. Something like this from the doc (if I had created the Schema):
var schema = new parquet.ParquetSchema({
name: { type: 'UTF8' },
quantity: { type: 'INT64' },
price: { type: 'DOUBLE' },
date: { type: 'TIMESTAMP_MILLIS' },
in_stock: { type: 'BOOLEAN' }
});
From the field-names of the third-party parquet.
Thanks
Solution 1:[1]
If you console log your cursor, you can get this type of info.
console.log(cursor.schema)
will give you the types of each column in the parquet file.
You can use this to grab whatever info you would want i.e.
let exampleType = cursor.schema.schema.[COLUMN NAME].type
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Matthew Pawlak |