Category "hadoop"

Why spark is 100 times faster than Hadoop Map Reduce

Why spark is faster than Hadoop MapReduce?. As per my understanding if spark is faster due to in-memory processing then Hadoop is also load data into RAM then i

Error while trying to create external table in hive

I am trying to create an external table using hive with hadoop but somehow it failed. These are the error I get when I try to run my queries. 02:23:29.516 [Hive

MapReduce Job Failed on MultiNode

I'm new to Hadoop. I have to use 'MapReduce' with WordCount. I am getting some errors. I am running a 50Gb 'MapReduce' job on a single server (8GB, 8 core). It

Error while installing Spark on Google Colab

I am getting error while installing spark on Google Colab. It says tar: spark-2.2.1-bin-hadoop2.7.tgz: Cannot open: No such file or directory tar: Error

Hive - double precision

I have been working on hive and found something peculiar. Basically, while using double as a datatype for your column we need not have any precision specified (

How/Where can I write time series data? As Parquet format to Hadoop, or HBase, Cassandra?

I have real-time time series sensor data. My primary goal is to keep the raw data. I should do this so that the cost of storage is minimal. My scenario like th

hdfs: command not found

I am using Centos7 and Hadoop 3.2.1. I have created a new user in Linux. I copied the .bash_profile file from the master user to my new user. But when I try run

sqoop merge-key creating multiple part files instead of one which doesn't serve the purpose of using merge-key

Ideally, when we run incremental without merge-key it will create new file with the appended data set but if we use merge-key then it will create new whole data

ERROR in datanode execution while running Hadoop first time in Windows 10

I am trying to run Hadoop 3.1.1 in my Windows 10 machine. I modified all the files: hdfs-site.xml mapred-site.xml core-site.xml yarn-site.xml Then, I executed

localhost: ERROR: Cannot set priority of datanode process 2984

I set up and configured a multi-node Hadoop .Will appear when I start My Ubuntu is 16.04 and Hadoop is 3.0.2 Starting namenodes on [master] Starting datanodes

What is the difference between -hivevar and -hiveconf?

From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive

Gradle archive contains more than 65535 entries

I am integrating hadoop2.5.0 for running mapreduce job and spring-boot-1.2.7 release and getting error while including this 1) archive contains more than 65535

Iterate Twice in Map reduce

I have written a Reducer job in which my key and value is composite . I have a requirement of iterating twice through the values and hence trying to cache the v

org.apache.hadoop.hbase.io.ImmutableBytesWritable exception in HBase

We tried to test the following example code for accessing HBase tables (Spark-1.3.1, HBase-1.1.1, Hadoop-2.7.0): import sys from pyspark import SparkContext

Hive Error : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

I have got twitter data using flume on HDFS. Have 3 node cluster and MySQL Metastore for hive. When i execute below query select user_name.screen_name, user_n

Save Spark dataframe as dynamic partitioned table in Hive

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to a Hive table in parquet format using the method df.

pickle.PicklingError: args[0] from newobj args has the wrong class with hadoop python

I am trying to I am tring to delete stop words via spark,the code is as follow from nltk.corpus import stopwords from pyspark.context import SparkContext from

Apache Kafka cannot start multiple instances on same local machine

I am trying to set up Apache Kafka on my local machine to try it out following this official guide: https://kafka.apache.org/quickstart. However, when I tried

output/echo a meesage in hql/ hive query language

I need to create a hive.hql as follows. HIVE.hql: select * from tabel1; select * from table2; My question is: can i echo any message to my console like " re

reading and writing from hive tables with spark after aggregation

We have a hive warehouse, and wanted to use spark for various tasks (mainly classification). At times write the results back as a hive table. For example, we wr

Category "hadoop"

Other Categories