I have a log file in csv which has a column contains a list of filepaths separated by comma. I want to split those filepaths into new rows using pyspark(or exce
I am trying to extract a value from an array in SparkSQL, but getting the error below: Example column customer_details {"original_customer_id":"ch_382820","fi
I have two Dataframes facts: columns: data, start_date and end_date holidays: column: holiday_date What I want is a way to produce another Dataframe that has
I have seen methods for inserting into Hive table, such as insertInto(table_name, overwrite =True, but I couldn't work out how to handle the scenario below. For
I found similar question link , but no answer provided how to fix the issue. I want to make a UDF, that would extract for me words from column. So, I want to cr
I have Glue DBs(db1 and db2) and tables(tbl1 and tbl2) available in different AWS regions(eu-west-1 and us-east-1) respectively. My glue job in eu-west-1, needs
This is my piece of code . There is a good lot of business logic happening here. I have tried to explain it in understandable manner as much as possible. I have
I have this Existing table tb1 in my database Now new data comes and new data is stored in another table tb2 Earlier Account_Number 9988 was Level 2, But now
Trying to process JSON data in a column on Databricks. Below is the sample data from a table (its a weather device records info) JSON_Info {"sampleData":"dataD
Is there an elegant, easy and fast way to move data out of HBase into MongoDB? I want to migrate HBase to mongoDB. I am new to mongoDB. Could someone please hel
I have a table that looks like this common_id table1_address table2_address table3_address table4_address 123 null null stack building12 null 157 123road stree
I'm trying to create a spark dataframe from a dictionary which has data in the format {'33_45677': 0, '45_3233': 25, '56_4599': 43524} .. etc. dict_pairs={'33
I am trying to turn a rdd into a dataframe. The operation seems to be successful but when I try to count the number of elements in the dataframe I get an error.
Issue: I'm trying to write to parquet file using spark.sql, however I encounter issues when having unions or subqueries. I know there's some syntax I can't seem
I have to first partition by a "customer group" but I also want to make sure that I have a single csv file per "customer_group" . This is because it is timeseri
I have a dataset as below col1 extension_col1 2345 2246 2246 2134 2134 2091 2091 Null 1234 1111 1111 Null I need to find the number of extensions available fo
Can someone help me with the below. I have an input dataframe. ID process_type STP_stagewise 1 loan_creation Manual 1 loan creation NSTP 1 reimbursement STP 2
It is suggested that you can 'generate unique increasing numeric values' by select row_number() over (order by monotonically_increasing_id()) from /* ... */ Bu
|-- x: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- y: long (nullable = true) | | |-- z: array (nullable = tru
|-- x: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- y: struct (nullable = true) | | |-- z: struct (nullable =