I was wondering between partitioner and combiner, which runs first? I was of the opinion it is the partitiner first and then combiner and then the keys are red
I know MapReduce(MR) is one of the three core frameworks of Hadoop and I am familiar with its mapper-shuffle-reducer progress. My question can be separated int
I need to acces the data in Hive, from Java.According to the documentation for Hive JDBC Driver,the current JDBC driver can only be used to query data from def
I have Windows 7 home premium 64 bit installed on my Dell laptop. Recently I have installed Ubuntu 16.04.3 LTS on a VMware instance to learn H
I am trying to use the avro-gradle-plugin on github, but have not gotten any luck getting it to work. Does anyone have any sample code on how they get it to wo
I read in couple of places that the only way to iterate twice through values in a Reducer is to cache that values. But also, there is a limitation that all the
I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied m
I am using a pyspark test script to read and write files to S3. Here is how I initialize the spark-session: import findspark from pyspark.sql
What is difference between partition and replica of a topic in kafka cluster. I mean both store the copies of messages in a topic. Then what is the real diffre
I am using HDP 2.1 sandbox for my work. The version of hive as listed by the jar file is: hive-exec-0.13.0.2.1.1.0-385.jar. I have created a directory in HDFS
trying to run MR program version(2.7) in windows 7 64 bit in eclipse while running the above exception occurring . I verified that using 64 bit 1.8 java versi
Hello everyone i m new in using hadoop it is my college work so i am doing some research i have installed hadoop-2.7.3 and i m unable to find tha path where sho
I have data present in hive tables. I want to apply bunch of transformations before loading that data into druid. So there are ways but I'm not sure about those
I build a spark Streaming application to keep receiving messages from Kafka and then write them into a table HBase. This app runs pretty good for first 25 mins
I install Hadoop-0.20.2 in windows using cygwin. If i run $ bin/hadoop version Hadoop 0.20.2 Subversion https://svn.apache.org/repos/asf/hadoop/common/branch
In Hive, you can use a function named_struct in order to create a list of key value pairs; the keys are usually the column names and the values are the values i
I connected 3 data nodes(in all these data nodes pass-wordless is working fine) in my cluster which are working fine but when i try to connect another data node