I have a problem regarding merging csv files using pysparkSQL with delta table. I managed to create upsert function that update if matched and insert if not mat
I have a PySpark DataFrame, df, with some columns as shown below. The hour column is in UTC time and I want to create a new column that has the local time based
I have a requirement where i am reading data from a CSV file and writing data to a Delta table over scala on window OS. My scala code is given below:- import co
I am trying to do link between kafka and spark by reading data from one topic and tryy to print the content of this topic into a DataFrame, but by doing connect
I have been studying for the above exam using Databricks' learning platform, but I have not found any external resources such as study guides or practice exams
df1=df.withColumn('etl_load_dt_part_new', concat_ws("-",year(df.ETL_LOAD_DT_PART),lit('12'),lit('31')).cast('date') ) i am trying to add new column named as e
This is my dataset: from pyspark.sql import SparkSession, functions as F spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame([('2021-02-07',)
I have some data like this: ID Value1 Value2 Value40 101 3 520 2001 102 29 530 2020 I want to take this data and convert in to a KV style pair instead ID Val
I have this statement in sql Case WHEN AAAA is not null then AAAA Else RTRIM(LEFT(BBBB, PATINDEX('%[0-9]%', BBBB) - 1)) END as NAME. I need to co
I am trying to create table in spark sql by providing the schema and giving the location. However when i run select on the table, i see only half the columns. (
Following example from Azure team is using Apache Spark connector for SQL Server to write data to a table. Question: How can we execute a Stored Procedure in an
Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the sec
I have a case where I may have null values in the column that needs to be summed up in a group. If I encounter a null in a group, I want the sum of that group t
Hi I try to run spark on my local laptop. I created a mvn project in intelijidea and in my main class I have one line like bellow and when I try to run a projec
I am trying to start with Spark. I have Hadoop (3.3.1) and Spark (3.2.2) in my library. I have set the SPARK_HOME, PATH, HADOOP_HOME and LD_LIBRARY_PATH to thei
I have a large dataset like so: | SEQ_ID|RESULT| +-------+------+ |3462099|239.52| |3462099|239.66| |3462099|239.63| |3462099|239.64| |3462099|239.57| |3462099|
I am trying to start up Spark on my machine. But when I try to launch using spark-shell I get an error that there is an illegal character in the path. Caused by
My Structured Spark Streaming program is to read JSON data from Kafka and write to HDFS in JSON format. I am able to save JSON to HDFS but it saves the JSON st
I have a process using the following select statement in sql server SELECT HASHBYTES('SHA1', CAST('4100119300' AS NVARCHAR(100))) AS StringConverted This give
I get the below error while reading data from delta lake. The detailed log on azure shows its failing to read .tmp file from the _delta_log folder. I have tried