Category "hadoop"

localhost: ERROR: Cannot set priority of datanode process 2984

I set up and configured a multi-node Hadoop .Will appear when I start My Ubuntu is 16.04 and Hadoop is 3.0.2 Starting namenodes on [master] Starting datanodes

What is the difference between -hivevar and -hiveconf?

From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive

Gradle archive contains more than 65535 entries

I am integrating hadoop2.5.0 for running mapreduce job and spring-boot-1.2.7 release and getting error while including this 1) archive contains more than 65535

Iterate Twice in Map reduce

I have written a Reducer job in which my key and value is composite . I have a requirement of iterating twice through the values and hence trying to cache the v

org.apache.hadoop.hbase.io.ImmutableBytesWritable exception in HBase

We tried to test the following example code for accessing HBase tables (Spark-1.3.1, HBase-1.1.1, Hadoop-2.7.0): import sys from pyspark import SparkContext

Hive Error : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

I have got twitter data using flume on HDFS. Have 3 node cluster and MySQL Metastore for hive. When i execute below query select user_name.screen_name, user_n

Save Spark dataframe as dynamic partitioned table in Hive

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to a Hive table in parquet format using the method df.

pickle.PicklingError: args[0] from __newobj__ args has the wrong class with hadoop python

I am trying to I am tring to delete stop words via spark,the code is as follow from nltk.corpus import stopwords from pyspark.context import SparkContext from

Apache Kafka cannot start multiple instances on same local machine

I am trying to set up Apache Kafka on my local machine to try it out following this official guide: https://kafka.apache.org/quickstart. However, when I tried

output/echo a meesage in hql/ hive query language

I need to create a hive.hql as follows. HIVE.hql: select * from tabel1; select * from table2; My question is: can i echo any message to my console like " re

reading and writing from hive tables with spark after aggregation

We have a hive warehouse, and wanted to use spark for various tasks (mainly classification). At times write the results back as a hive table. For example, we wr

Hadoop configuration object not pointing to hdfs file system

I am trying to create small Spark program in Java. I am creating a Hadoop configuration object as show below: Configuration conf = new Configuration(false); con

Cannot connect to hive using beeline, user root cannot impersonate anonymous

I'm trying to connect to hive using beeline !connect jdbc:hive2://localhost:10000 and I'm being asked for a username and password Connecting to jdbc:hive2://l

Mysql : How to run heavy analytical query at real time

I am running a crm application which uses mysql database. My application generating lots of data in mysql. Now i want to give my customer a reporting section wh

How to tail yarn logs?

I am submitting a Spark Job using below command. I want to tail the yarn log using application Id similar to tail command operation in Linux box. export SPARK

How to do CopyMerge in Hadoop 3.0?

I know hadoop version 2.7's FileUtil has the copyMerge function that merges multiple files into a new one. But the copyMerge function is no longer supported pe

Hortonworks HA Namenodes gives an error "Operation category READ is not supported in state standby"

My hadoop cluster HA active namenode (host1) suddenly switch to standby namenode(host2). I could not found any error in hadoop logs (in any server) to identify

Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS?

Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS? If I use Spark standalone cluster manager and have my data distributed in HDFS c

spark elasticsearch: Multiple ES-Hadoop versions detected in the classpath

I'm new to spark. I'm trying to run a spark job that loads data to elasticsearch. I've built a fat jar from my code and used it during spark-submit. spark-subm

What runs first: the partitioner or the combiner?

I was wondering between partitioner and combiner, which runs first? I was of the opinion it is the partitiner first and then combiner and then the keys are red