Category "hadoop"

Error while installing Spark on Google Colab

I am getting error while installing spark on Google Colab. It says tar: spark-2.2.1-bin-hadoop2.7.tgz: Cannot open: No such file or directory tar: Error

Hive - double precision

I have been working on hive and found something peculiar. Basically, while using double as a datatype for your column we need not have any precision specified (

How/Where can I write time series data? As Parquet format to Hadoop, or HBase, Cassandra?

I have real-time time series sensor data. My primary goal is to keep the raw data. I should do this so that the cost of storage is minimal. My scenario like th

hdfs: command not found

I am using Centos7 and Hadoop 3.2.1. I have created a new user in Linux. I copied the .bash_profile file from the master user to my new user. But when I try run

sqoop merge-key creating multiple part files instead of one which doesn't serve the purpose of using merge-key

Ideally, when we run incremental without merge-key it will create new file with the appended data set but if we use merge-key then it will create new whole data

ERROR in datanode execution while running Hadoop first time in Windows 10

I am trying to run Hadoop 3.1.1 in my Windows 10 machine. I modified all the files: hdfs-site.xml mapred-site.xml core-site.xml yarn-site.xml Then, I executed

localhost: ERROR: Cannot set priority of datanode process 2984

I set up and configured a multi-node Hadoop .Will appear when I start My Ubuntu is 16.04 and Hadoop is 3.0.2 Starting namenodes on [master] Starting datanodes

What is the difference between -hivevar and -hiveconf?

From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive

Gradle archive contains more than 65535 entries

I am integrating hadoop2.5.0 for running mapreduce job and spring-boot-1.2.7 release and getting error while including this 1) archive contains more than 65535

Iterate Twice in Map reduce

I have written a Reducer job in which my key and value is composite . I have a requirement of iterating twice through the values and hence trying to cache the v

org.apache.hadoop.hbase.io.ImmutableBytesWritable exception in HBase

We tried to test the following example code for accessing HBase tables (Spark-1.3.1, HBase-1.1.1, Hadoop-2.7.0): import sys from pyspark import SparkContext

Hive Error : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

I have got twitter data using flume on HDFS. Have 3 node cluster and MySQL Metastore for hive. When i execute below query select user_name.screen_name, user_n

Save Spark dataframe as dynamic partitioned table in Hive

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to a Hive table in parquet format using the method df.

pickle.PicklingError: args[0] from __newobj__ args has the wrong class with hadoop python

I am trying to I am tring to delete stop words via spark,the code is as follow from nltk.corpus import stopwords from pyspark.context import SparkContext from

Apache Kafka cannot start multiple instances on same local machine

I am trying to set up Apache Kafka on my local machine to try it out following this official guide: https://kafka.apache.org/quickstart. However, when I tried

output/echo a meesage in hql/ hive query language

I need to create a hive.hql as follows. HIVE.hql: select * from tabel1; select * from table2; My question is: can i echo any message to my console like " re

reading and writing from hive tables with spark after aggregation

We have a hive warehouse, and wanted to use spark for various tasks (mainly classification). At times write the results back as a hive table. For example, we wr

Hadoop configuration object not pointing to hdfs file system

I am trying to create small Spark program in Java. I am creating a Hadoop configuration object as show below: Configuration conf = new Configuration(false); con

Cannot connect to hive using beeline, user root cannot impersonate anonymous

I'm trying to connect to hive using beeline !connect jdbc:hive2://localhost:10000 and I'm being asked for a username and password Connecting to jdbc:hive2://l

Mysql : How to run heavy analytical query at real time

I am running a crm application which uses mysql database. My application generating lots of data in mysql. Now i want to give my customer a reporting section wh