Category "hadoop"

How to tail yarn logs?

I am submitting a Spark Job using below command. I want to tail the yarn log using application Id similar to tail command operation in Linux box. export SPARK

How to do CopyMerge in Hadoop 3.0?

I know hadoop version 2.7's FileUtil has the copyMerge function that merges multiple files into a new one. But the copyMerge function is no longer supported pe

Hortonworks HA Namenodes gives an error "Operation category READ is not supported in state standby"

My hadoop cluster HA active namenode (host1) suddenly switch to standby namenode(host2). I could not found any error in hadoop logs (in any server) to identify

Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS?

Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS? If I use Spark standalone cluster manager and have my data distributed in HDFS c

spark elasticsearch: Multiple ES-Hadoop versions detected in the classpath

I'm new to spark. I'm trying to run a spark job that loads data to elasticsearch. I've built a fat jar from my code and used it during spark-submit. spark-subm

What runs first: the partitioner or the combiner?

I was wondering between partitioner and combiner, which runs first? I was of the opinion it is the partitiner first and then combiner and then the keys are red

Why does Hadoop choose MapReduce as its computing engine?

I know MapReduce(MR) is one of the three core frameworks of Hadoop and I am familiar with its mapper-shuffle-reducer progress. My question can be separated int

Access Hive Data from Java

I need to acces the data in Hive, from Java.According to the documentation for Hive JDBC Driver,the current JDBC driver can only be used to query data from def

Unable to resolve Grub Rescue issue on Ubuntu despite a lot of research [closed]

I have Windows 7 home premium 64 bit installed on my Dell laptop. Recently I have installed Ubuntu 16.04.3 LTS on a VMware instance to learn H

avro gradle plugin sample usage

I am trying to use the avro-gradle-plugin on github, but have not gotten any luck getting it to work. Does anyone have any sample code on how they get it to wo

Iterate twice through values in Reducer Hadoop

I read in couple of places that the only way to iterate twice through values in a Reducer is to cache that values. But also, there is a limitation that all the

Spark-submit not working when application jar is in hdfs

I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local filesystem, it works. However, when I copied m

Spark read from S3 working, but I am unable to write using the same session [duplicate]

I am using a pyspark test script to read and write files to S3. Here is how I initialize the spark-session: import findspark from pyspark.sql

what is difference between partition and replica of a topic in kafka cluster

What is difference between partition and replica of a topic in kafka cluster. I mean both store the copies of messages in a topic. Then what is the real diffre

Hive queries fail when the hive.execution.engine is set to MR, they work when set to Tez?

I am using HDP 2.1 sandbox for my work. The version of hive as listed by the jar file is: hive-exec-0.13.0.2.1.1.0-385.jar. I have created a directory in HDFS

Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z

trying to run MR program version(2.7) in windows 7 64 bit in eclipse while running the above exception occurring . I verified that using 64 bit 1.8 java versi

uploading files to hadoop hdfs?

Hello everyone i m new in using hadoop it is my college work so i am doing some research i have installed hadoop-2.7.3 and i m unable to find tha path where sho

Need to load data from Hadoop to Druid after applying transformations. If I use Spark, can we load data from Spark RDD or dataframe to Druid directly?

I have data present in hive tables. I want to apply bunch of transformations before loading that data into druid. So there are ways but I'm not sure about those

Spark Streaming "ERROR JobScheduler: error in job generator"

I build a spark Streaming application to keep receiving messages from Kafka and then write them into a table HBase. This app runs pretty good for first 25 mins

java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

I install Hadoop-0.20.2 in windows using cygwin. If i run $ bin/hadoop version Hadoop 0.20.2 Subversion https://svn.apache.org/repos/asf/hadoop/common/branch

Category "hadoop"

Other Categories