import org.apache.spark.sql.SparkSession object RDDBroadcast extends App { val spark = SparkSession.builder() .appName("SparkByExamples.com") .maste
I am using Spark 3.1.2 with hadoop 3.2.0 to run Spark Structured Streaming (SSS) aggregation job, running on Spark K8S. Theses job are reading files from S3 usi
I'm trying to install Ambari Server 1.7 on Oracle Linux 6 machine, but it turned out that it's not open source anymore. The public repository can't be accessed.
I am running a python project through DAG in airflow, and I encounter the following exception when the dag runs this line from the project - df = spark.sql(quer
I am new to Hadoop Ecosystem. I have been trying to put a csv file into HDFS inside a directory that I could create. But when I do that I get an error : put: C
Hi I'm using eclipse to export the jar file of map reduce program.When i try to run the jar file using the command hadoop jar WordCountdemo.jar /Demo2/WordCount
I configured HBase today and I configured it correctly at first. However, when I ran HBase use the code 'start-all.sh' again, I could not see 'Hmaster' anywhere
I have an HDFS running with multiple datanodes on cloudera. Sometimes we get an error message in the overview which is: There are 1 missing blocks. The followi
I am trying to write data on an S3 bucket from my local computer: spark = SparkSession.builder \ .appName('application') \ .config("spark.hadoop.fs.s3a.
In Scala, I am trying to count the files from an Hdfs directory. I tryed to get a list of the files with val files = fs.listFiles(path, false) and make a count
Apache Ambari moved into the Attic in January 2022. So Apache Ambari has retired, and the only reliable alternative that I know is Cloudera Manager, but Clouder
I installed Kerberos on a ec2 server and on a second ec2 server I installed Apache Ranger (with Kerberos auth added in core-site file,hadoop.security.authentica
I'm running into an issue using distcp to copy files - every copy fails with an IO Exception (Checksum mismatch), even if performing a simple copy within the cl
I have a HDFS Directory as below. /user/staging/app_name/2022_05_06 Under such a directory I have around 1000 part files. I want to loop each of the part file
My data is stored in s3 (parquet format) under different paths and I'm using spark.read.parquet(pathes:_*) in order to read all the paths into one dataframe. Un
When I try to write the dataframe to s3 as parquet, I always get an error like below. In the s3 bucket, an empty folder is generated automatically every time, b
I have a unit test to databricks code, and I want to run it locally on windows. Unluckily when I run pytest with PyCharm, it throws the following exception: Exc
I've got a Dataproc cluster going on configured this way: { "worker_config": { "num_instances": 20 }, "secondary_worker_config": { "
I have installed Hadoop in my Macbook M1 2020 with MacOS Monterey 12.3.1. I am able to successfully use hadoop and hdfs commands in my Laptop. I started using h
I am trying to set up distributed HBase on 3 nodes. I have already set up hadoop, YARN ZooKeeper and now HBase but when I launch hbase shell and run the simples