I am trying to start with Spark. I have Hadoop (3.3.1) and Spark (3.2.2) in my library. I have set the SPARK_HOME, PATH, HADOOP_HOME and LD_LIBRARY_PATH to thei
I am trying to extract the password from jceks file in hdfs. import org.apache.hadoop.security.alias.CredentialProviderFactory val conf = new org.apache.hadoo
I'm trying to import mongodb data into hive. The jar versions that i have used are ADD JAR /root/HDL/mongo-java-driver-3.4.2.jar; ADD JAR /root/HDL/mongo-hado
I have Yarn (package manager) already installed on my machine, but I now have to install Apache Hadoop. When I tried doing that with brew install hadoop, I got
I am trying to export HBase table(size-23TB) data to S3. So while using HBase export and passing S3 credentials via jceks path Command : hbase org.apache.hadoop
I am running one SQL query in Hive and it gives different results with CBO enabled and disabled. The results are wrong when CBO is enabled (set hive.cbo.enable=
Why spark is faster than Hadoop MapReduce?. As per my understanding if spark is faster due to in-memory processing then Hadoop is also load data into RAM then i
I am trying to create an external table using hive with hadoop but somehow it failed. These are the error I get when I try to run my queries. 02:23:29.516 [Hive
I'm new to Hadoop. I have to use 'MapReduce' with WordCount. I am getting some errors. I am running a 50Gb 'MapReduce' job on a single server (8GB, 8 core). It
I am getting error while installing spark on Google Colab. It says tar: spark-2.2.1-bin-hadoop2.7.tgz: Cannot open: No such file or directory tar: Error
I have been working on hive and found something peculiar. Basically, while using double as a datatype for your column we need not have any precision specified (
I have real-time time series sensor data. My primary goal is to keep the raw data. I should do this so that the cost of storage is minimal. My scenario like th
I am using Centos7 and Hadoop 3.2.1. I have created a new user in Linux. I copied the .bash_profile file from the master user to my new user. But when I try run
Ideally, when we run incremental without merge-key it will create new file with the appended data set but if we use merge-key then it will create new whole data
I am trying to run Hadoop 3.1.1 in my Windows 10 machine. I modified all the files: hdfs-site.xml mapred-site.xml core-site.xml yarn-site.xml Then, I executed
I set up and configured a multi-node Hadoop .Will appear when I start My Ubuntu is 16.04 and Hadoop is 3.0.2 Starting namenodes on [master] Starting datanodes
From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive
I am integrating hadoop2.5.0 for running mapreduce job and spring-boot-1.2.7 release and getting error while including this 1) archive contains more than 65535
I have written a Reducer job in which my key and value is composite . I have a requirement of iterating twice through the values and hence trying to cache the v
We tried to test the following example code for accessing HBase tables (Spark-1.3.1, HBase-1.1.1, Hadoop-2.7.0): import sys from pyspark import SparkContext