Category "parquet"

How do I write this Python code to use 2+ fewer nested if statements?

I have the following code which I use to loop through row groups in a parquet metadata file to find the maximum values for columns i,j,k across the whole file.

How to read partitioned parquet files from S3 using pyarrow in python

I looking for ways to read data from multiple partitioned directories from s3 using python. data_folder/serial_number=1/cur_date=20-12-2012/abcdsd0324324.snapp

Python: Obtain number of rows for ParquetDataset?

How do I obtain the number of rows of a ParquetDataset that is structured in the form of a folder containing multiple parquet files. I tried from pyarrow.parq