'How do I Insert Overwrite with parquet format?
I am have two parquet file in azure data lake gen2 I want to Insert Overwrite onw with other. I was trying the same in azure data bricks by doing below.
Reading from data lake to azure.
val cs_mbr_prov_1 = spark.read.format("parquet")
.option("header","true")
.load(s"${SourcePath}/cs_mbr_prov")
cs_mbr_prov_1.createOrReplaceTempView("cs_mbr_prov")
val cs_mbr_prov_stg = spark.read.format("parquet")
.option("header","true")
.load(s"${SourcePath}/cs_mbr_prov_stg_1")
cs_mbr_prov_stg.createOrReplaceTempView("cs_mbr_prov_stg")
var res =spark.sql(
s"INSERT OVERWRITE TABLE cs_br_prov " +
s"SELECT NAMED_STRUCT('IND_ID',stg.IND_ID,'CUST_NBR',stg.CUST_NBR,'SRC_ID',stg.SRC_ID, "+
s"'SRC_SYS_CD',stg.SRC_SYS_CD,'OUTBOUND_ID',stg.OUTBOUND_ID,'OPP_ID',stg.OPP_ID, " +
s"'CAMPAIGN_CD',stg.CAMPAIGN_CD,'TREAT_KEY',stg.TREAT_KEY,'PROV_KEY',stg.PROV_KEY, " +
s"'INSERTDATE',stg.INSERTDATE,'UPDATEDATE',stg.UPDATEDATE,'CONTACT_KEY',stg.CONTACT_KEY
) AS key, " +
s"stg.MEM_KEY,stg.INDV_ID,stg.MBR_ID,stg.OPP_DT,stg.SEG_ID,stg.MODA,stg.E_KEY, " +
s"stg.TREAT_RUNDATETIME from cs_br_prov_stg stg")
Error I am getting:
AnalysisException: unknown requires that the data to be inserted have the same number of columns as the target table: target table has 20 column(s) but the inserted data has 9 column(s), including 0 partition column(s) having constant value(s).
But having same no.of colums in both.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|