I am trying to read a stream from kafka using pyspark. I am using spark version 3.0.0-preview2 and spark-streaming-kafka-0-10_2.12 Before this I just stat zoo
I'm testing some pyspark code in an EMR notebook before I deploy it and keep running into this strange error with Spark SQL. I have all my tables and metadata i
I would like to run spatial queries on large data sets; e.g. geopandas would be too slow. Inspiration I found here: https://anant-sharma.medium.com/apache-sedon
dbutils.fs.mount( source = f"wasbs://{blob.storage_account_container}@{blob.storage_account_name}.blob.core.windows.net/", mount_point = "/mnt/MLRExtract/"
Error : AnalysisException: Recursive view management_db.v_extract detected (cycle: management_db.v_extract -> management_db.v_extract) Query outisde of the v
Having dates in one column, how to create a column containing ISO week date? ISO week date is composed of year, week number and weekday. year is not the same as
How do I avoid initializing a class within a pyspark user-defined function? Here is an example. Creating a spark session and DataFrame representing four latitu
I have a problem regarding merging csv files using pysparkSQL with delta table. I managed to create upsert function that update if matched and insert if not mat
I have a PySpark DataFrame, df, with some columns as shown below. The hour column is in UTC time and I want to create a new column that has the local time based
df1=df.withColumn('etl_load_dt_part_new', concat_ws("-",year(df.ETL_LOAD_DT_PART),lit('12'),lit('31')).cast('date') ) i am trying to add new column named as e
This is my dataset: from pyspark.sql import SparkSession, functions as F spark = SparkSession.builder.getOrCreate() df = spark.createDataFrame([('2021-02-07',)
New to Spark and Synapse....Need to do some transformation including adding a columns, changing datatypes, etc. I am reading a csv into a dataframe. I'd like t
I am trying to create table in spark sql by providing the schema and giving the location. However when i run select on the table, i see only half the columns. (
tree = dtModel.stages[-1] print(tree) #visualize the decision tree model AttributeError Traceback (most recent call last) Attribute
I have a dataframe with a map column. I want to collect the not null keys into a new column:
Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the sec
I have a case where I may have null values in the column that needs to be summed up in a group. If I encounter a null in a group, I want the sum of that group t
I have a large dataset like so: | SEQ_ID|RESULT| +-------+------+ |3462099|239.52| |3462099|239.66| |3462099|239.63| |3462099|239.64| |3462099|239.57| |3462099|
Is there a way i pyspark to recover for an even number the two values of a median ? For exemple: I have this dataframe df1 = spark.createDataFrame
I am trying to debug my spark UI, and in the SQL tab of spark UI getting this red mark on filter description, trying to figure out what does it mean. Spark UI s