'Delta Table / Athena And Spark
I have my delta table, which can be read from Athena.
When I try to get the data through a query from spark I get the following error:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 80.0 failed 4 times, most recent failure: Lost task 0.3 in stage 80.0 (TID 449, ip-172-31-22-178.ec2.internal, executor 2): java.lang.RuntimeException: s3://<path>/BDA/DELTA/CLIENTE/_symlink_format_manifest/PERIODO=202001/manifest is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]
if I do that same query in athena, there are no problems
Solution 1:[1]
This happens because your delta file was already created with a manifest to be read in athena now if you want to read it with spark, it has to be this way
%sql
select * from delta.s3://path/tabla/
limit
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Catherine Solano |