I need to extract objects from an array, where there's more than one object in that array I need to repeat for every id and if the field is null then I want to
HIVE has a metastore and HIVESERVER2 listens for SQL requests; with the help of metastore, the query is executed and the result is passed back. The Thrift frame
I am from Linkedin, we are having compatibility issue with spark-cdm-connector, to give a little context I have a cdm data in ADLS which I’m trying to rea
I am attempting to use a PySpark kernel from inside an EMR Notebook that is hosted on an AWS managed service (EMR) and I am unable to access Artifactory to inst
I am trying to write data on an S3 bucket from my local computer: spark = SparkSession.builder \ .appName('application') \ .config("spark.hadoop.fs.s3a.
I have a dataframe having a column of type MapType<StringType, StringType>. |-- identity: map (nullable = true) | |-- key: string | |-- value: st
I have a spark standalone configured with 3 nodes. I want to read csv data stored in s3-compatible storage (dell ecs) in this pySpark. Here's the method and con
I am using spark version 3.1.2, and I need to load data from a csv with encoding utf-16le. df = spark.read.format("csv") .option("delimiter", ",") .opti
Having a date, I create a column with ISO 8601 week date format: from pyspark.sql import functions as F df = spark.createDataFrame([('2019-03-
I'm not experienced in Java or Hadoop ecosystem. I configured my Spark cluster to connect to Amazon Keyspaces by using spark-cassandra-connector from Datastax.
I am working on a Pyspark using the flatMap function and I am using the split within the function. But I am getting an error which says: AttributeError: 'NoneTy
I was able to create docker based bitnami stand alone spark instance and run spark jobs on it. However I'm not able not able to write data to snowflake from the
I am querying a Hudi table using Hive which is running on Spark engine in EMR cluster 6.3.1 Hudi version is 0.7 I have inserted a few records and then updated t
I have a DataFrame that consists of Column that is ArrayType, and the array may have a different length in each row of the data. I have provide some example cod
I have a Spark scala DataFrame with two columns, text and subtext, where subtext is guaranteed to occur somewhere within text. How would I calculate the positio
I am running a Synapse Notebook in a For Each activity in a Synapse Pipeline. The notebook loads some data from the datalake into the database and some custom
I'm looking for an inexpensive way to distinguish duplicates and/or uniquely identify rows. I've been looking at the Spark built-ins monotonically_increasing_id
I'm getting the following error when I attempt to write to my data lake with Delta on Databricks fulldf = spark.read.format("csv").option("header", True).option
Below is the sample code snippet that is used for data fetch from HBase. This worked fine with Spark 3.1.2. However after upgrading to Spark 3.2.1, it is not wo
Postgres has a time data type. I am trying to insert rows into postgres from a glue job. Given the code: applymapping1 = ApplyMapping.apply(frame = SelectFromCo