Category "hadoop"

What runs first: the partitioner or the combiner?

I was wondering between partitioner and combiner, which runs first? I was of the opinion it is the partitiner first and then combiner and then the keys are red

Why does Hadoop choose MapReduce as its computing engine?

I know MapReduce(MR) is one of the three core frameworks of Hadoop and I am familiar with its mapper-shuffle-reducer progress. My question can be separated int

Access Hive Data from Java

I need to acces the data in Hive, from Java.According to the documentation for Hive JDBC Driver,the current JDBC driver can only be used to query data from def

Unable to resolve Grub Rescue issue on Ubuntu despite a lot of research [closed]

I have Windows 7 home premium 64 bit installed on my Dell laptop. Recently I have installed Ubuntu 16.04.3 LTS on a VMware instance to learn H

avro gradle plugin sample usage

I am trying to use the avro-gradle-plugin on github, but have not gotten any luck getting it to work. Does anyone have any sample code on how they get it to wo

Iterate twice through values in Reducer Hadoop

I read in couple of places that the only way to iterate twice through values in a Reducer is to cache that values. But also, there is a limitation that all the

Spark-submit not working when application jar is in hdfs

I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied m

Spark read from S3 working, but I am unable to write using the same session [duplicate]

I am using a pyspark test script to read and write files to S3. Here is how I initialize the spark-session: import findspark from pyspark.sql

what is difference between partition and replica of a topic in kafka cluster

What is difference between partition and replica of a topic in kafka cluster. I mean both store the copies of messages in a topic. Then what is the real diffre

Hive queries fail when the hive.execution.engine is set to MR, they work when set to Tez?

I am using HDP 2.1 sandbox for my work. The version of hive as listed by the jar file is: hive-exec-0.13.0.2.1.1.0-385.jar. I have created a directory in HDFS

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

trying to run MR program version(2.7) in windows 7 64 bit in eclipse while running the above exception occurring . I verified that using 64 bit 1.8 java versi

uploading files to hadoop hdfs?

Hello everyone i m new in using hadoop it is my college work so i am doing some research i have installed hadoop-2.7.3 and i m unable to find tha path where sho

Need to load data from Hadoop to Druid after applying transformations. If I use Spark, can we load data from Spark RDD or dataframe to Druid directly?

I have data present in hive tables. I want to apply bunch of transformations before loading that data into druid. So there are ways but I'm not sure about those

Spark Streaming "ERROR JobScheduler: error in job generator"

I build a spark Streaming application to keep receiving messages from Kafka and then write them into a table HBase. This app runs pretty good for first 25 mins

java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

I install Hadoop-0.20.2 in windows using cygwin. If i run $ bin/hadoop version Hadoop 0.20.2 Subversion https://svn.apache.org/repos/asf/hadoop/common/branch

Use named_struct function in Hive with all the columns of a table

In Hive, you can use a function named_struct in order to create a list of key value pairs; the keys are usually the column names and the values are the values i

why password less ssh not working?

I connected 3 data nodes(in all these data nodes pass-wordless is working fine) in my cluster which are working fine but when i try to connect another data node