Category "hadoop"

When writing parquet files to s3 NoSuchMethodError :void org.apache.hadoop.util.SemaphoredDelegatingExecutor

When I try to write the dataframe to s3 as parquet, I always get an error like below. In the s3 bucket, an empty folder is generated automatically every time, b

Databricks local test fail with java.lang.NoSuchMethodError: org.apache.hadoop.security.HadoopKerberosName.setRuleMechanism

I have a unit test to databricks code, and I want to run it locally on windows. Unluckily when I run pytest with PyCharm, it throws the following exception: Exc

Dataproc secondary workers not used

I've got a Dataproc cluster going on configured this way: { "worker_config": { "num_instances": 20 }, "secondary_worker_config": { "

Hadoop Streaming Job showing error /bin/java : No such file or directory

I have installed Hadoop in my Macbook M1 2020 with MacOS Monterey 12.3.1. I am able to successfully use hadoop and hdfs commands in my Laptop. I started using h

HBase Shell - org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet

I am trying to set up distributed HBase on 3 nodes. I have already set up hadoop, YARN ZooKeeper and now HBase but when I launch hbase shell and run the simples

Spark: unable to load native-hadoop library for platform

I am trying to start with Spark. I have Hadoop (3.3.1) and Spark (3.2.2) in my library. I have set the SPARK_HOME, PATH, HADOOP_HOME and LD_LIBRARY_PATH to thei

CredentialProviderFactory - conf.getPassword from jceks file breaks if the password contains a $

I am trying to extract the password from jceks file in hdfs. import org.apache.hadoop.security.alias.CredentialProviderFactory val conf = new org.apache.hadoo

Import MongoDB data into Hive Error: Splitter implementation is incompatible

I'm trying to import mongodb data into hive. The jar versions that i have used are ADD JAR /root/HDL/mongo-java-driver-3.4.2.jar; ADD JAR /root/HDL/mongo-hado

Alias yarn to yarnpkg to avoid conflict with Hadoop Yarn

I have Yarn (package manager) already installed on my machine, but I now have to install Apache Hadoop. When I tried doing that with brew install hadoop, I got

HBase data export to S3

I am trying to export HBase table(size-23TB) data to S3. So while using HBase export and passing S3 credentials via jceks path Command : hbase org.apache.hadoop

HIVE CBO. Wrong results with Hive SQL query with MULTIPLE IN conditions in where clause

I am running one SQL query in Hive and it gives different results with CBO enabled and disabled. The results are wrong when CBO is enabled (set hive.cbo.enable=

Why spark is 100 times faster than Hadoop Map Reduce

Why spark is faster than Hadoop MapReduce?. As per my understanding if spark is faster due to in-memory processing then Hadoop is also load data into RAM then i

Error while trying to create external table in hive

I am trying to create an external table using hive with hadoop but somehow it failed. These are the error I get when I try to run my queries. 02:23:29.516 [Hive

MapReduce Job Failed on MultiNode

I'm new to Hadoop. I have to use 'MapReduce' with WordCount. I am getting some errors. I am running a 50Gb 'MapReduce' job on a single server (8GB, 8 core). It

Error while installing Spark on Google Colab

I am getting error while installing spark on Google Colab. It says tar: spark-2.2.1-bin-hadoop2.7.tgz: Cannot open: No such file or directory tar: Error

Hive - double precision

I have been working on hive and found something peculiar. Basically, while using double as a datatype for your column we need not have any precision specified (

How/Where can I write time series data? As Parquet format to Hadoop, or HBase, Cassandra?

I have real-time time series sensor data. My primary goal is to keep the raw data. I should do this so that the cost of storage is minimal. My scenario like th

hdfs: command not found

I am using Centos7 and Hadoop 3.2.1. I have created a new user in Linux. I copied the .bash_profile file from the master user to my new user. But when I try run

sqoop merge-key creating multiple part files instead of one which doesn't serve the purpose of using merge-key

Ideally, when we run incremental without merge-key it will create new file with the appended data set but if we use merge-key then it will create new whole data

ERROR in datanode execution while running Hadoop first time in Windows 10

I am trying to run Hadoop 3.1.1 in my Windows 10 machine. I modified all the files: hdfs-site.xml mapred-site.xml core-site.xml yarn-site.xml Then, I executed