Imagine I have the following list of dicts in python: list = [dict1, dict2, dict3] I want to parse these dicts and transform them to a dataframe and save it as
I am using spark 3.0.2 with java 8 version. I am trying to write data on s3 path using spark job. I am getting below exception, not able to know what caused thi
I am new to Spark and BigData component - HBase, I am trying to write Python code in Pyspark and connect to HBase to read data from HBase. I'm using the followi
In python or R, there are ways to slice DataFrame using index. For example, in pandas: df.iloc[5:10,:] Is there a similar way in pyspark to slice data bas
I need to find a way to delete multiple rows from a delta table/pyspark data frame given a list of ID's to identify the rows. As far as I can tell there isn't a
I am running an EMR instance. It was working fine but suddenly it started giving below error when I am trying to access S3 files from a Python Spark script: py4
I am running an EMR instance. It was working fine but suddenly it started giving below error when I am trying to access S3 files from a Python Spark script: py4
I have a pyspark dataframe with two columns: [Row(zip_code='58542', dma='MIN'), Row(zip_code='58701', dma='MIN'), Row(zip_code='57632', dma='MIN'), Row(zip_
I have s3 or azure blob directory structure like the following parent_dir child_dir1 avro_1 avro_2 ... child_dir2 ... There
I am trying to unittest a function that writes data to S3 and then reads the same data from the same S3 location. I am trying to use a moto and boto (2.x) to ac
I'm having issues with using pyspark dataframes. I have a column called eventkey which is a concatenation of the following elements: account_type, counter_type
I am running a huge text file using PyCharm and PySpark. This is what I am trying to do: spark_home = os.environ.get('SPARK_HOME', None) os.environ["SPARK_HOM
I am new to Data Scraping. I am reading the data from a file having JSON objects as one row {"name": "Soul Sweet \u2018Taters (Step-by-Step!)", "ingredients":
I am using Spark ML library for classification problem using a logistic regression. I have vectorized input features and created training dataset and test datas
I have a presto table that imports PARQUET files based on partitions from s3 as follows: create table hive.data.datadump ( tUnixEpoch varchar, tDateTi
i have code Python for take all values contains NCO-ETD in Type with groupby ID and Date cond = 'NCO - ETD' df_ = (data.where(data.assign(new = data['Type'].str
i need some help please i have this dataframe with an even number of values for the column 'b' df1 = spark.createDataFrame([ ('c',1), ('c',2), ('c',
I have a PySpark dataframe like this: +------+------+ | A| B| +------+------+ | 1| 2| | 1| 3| | 2| 3| | 2| 5| +------+--
I am trying to connect to Snowflake with Pyspark on my local machine. My code is as follows: from pyspark.sql.types import * from pyspark.sql import SparkSessio
If I have a PySpark DataFrame with two columns, text and subtext, where subtext is guaranteed to occur somewhere within text. How would I calculate the positio