I am trying to run Hadoop 3.1.1 in my Windows 10 machine. I modified all the files: hdfs-site.xml mapred-site.xml core-site.xml yarn-site.xml Then, I executed
I set up and configured a multi-node Hadoop .Will appear when I start My Ubuntu is 16.04 and Hadoop is 3.0.2 Starting namenodes on [master] Starting datanodes
From hive -h : --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive
I am integrating hadoop2.5.0 for running mapreduce job and spring-boot-1.2.7 release and getting error while including this 1) archive contains more than 65535
I have written a Reducer job in which my key and value is composite . I have a requirement of iterating twice through the values and hence trying to cache the v
We tried to test the following example code for accessing HBase tables (Spark-1.3.1, HBase-1.1.1, Hadoop-2.7.0): import sys from pyspark import SparkContext
I have got twitter data using flume on HDFS. Have 3 node cluster and MySQL Metastore for hive. When i execute below query select user_name.screen_name, user_n
I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to a Hive table in parquet format using the method df.
I am trying to I am tring to delete stop words via spark,the code is as follow from nltk.corpus import stopwords from pyspark.context import SparkContext from
I am trying to set up Apache Kafka on my local machine to try it out following this official guide: https://kafka.apache.org/quickstart. However, when I tried
I need to create a hive.hql as follows. HIVE.hql: select * from tabel1; select * from table2; My question is: can i echo any message to my console like " re
We have a hive warehouse, and wanted to use spark for various tasks (mainly classification). At times write the results back as a hive table. For example, we wr
I am trying to create small Spark program in Java. I am creating a Hadoop configuration object as show below: Configuration conf = new Configuration(false); con
I'm trying to connect to hive using beeline !connect jdbc:hive2://localhost:10000 and I'm being asked for a username and password Connecting to jdbc:hive2://l
I am running a crm application which uses mysql database. My application generating lots of data in mysql. Now i want to give my customer a reporting section wh
I am submitting a Spark Job using below command. I want to tail the yarn log using application Id similar to tail command operation in Linux box. export SPARK
I know hadoop version 2.7's FileUtil has the copyMerge function that merges multiple files into a new one. But the copyMerge function is no longer supported pe
My hadoop cluster HA active namenode (host1) suddenly switch to standby namenode(host2). I could not found any error in hadoop logs (in any server) to identify
Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS? If I use Spark standalone cluster manager and have my data distributed in HDFS c
I'm new to spark. I'm trying to run a spark job that loads data to elasticsearch. I've built a fat jar from my code and used it during spark-submit. spark-subm