I am stuck in a very odd situation related to Hbase design i would say. Hbase version >> Version 2.1.0-cdh6.2.1 So, the problem statement is, in Hbase, w
I want to run the same Java Spark Streaming (10 seconds micro batch) through 2 instances (sparkStr1 and sparkStr2). Mainly, they consume the same kafka topic (3
I am using in Spark Structured Streaming foreachBatch() to maintain manually a sliding window, consisting of the last 200000 entries. With every microbatch I re
I have a streaming query streaming data from Azure Eventhubs to ADLS every 5 seconds and the same streaming query is watermark for 1 hour window with 5 minute w
In my project , i need to read image dataset[each folder having different object and I want to read these folder in stream one by one ], and then need to extrac
I am trying to debug my spark UI, and in the SQL tab of spark UI getting this red mark on filter description, trying to figure out what does it mean. Spark UI s
I'm trying to store the tweets from my kafka cluster into Elastic Search. Initially, I set the output format to be 'org.elasticsearch.spark.sql'. But , it creat
I am using Spark in Horton works, when i execute the below code i am getting exception. i also have a separate spark instance running in my system - same code i
With scala 2.11 and spark-streaming-kafka-0-8_2.11 I could do import org.apache.spark.streaming.kafka.KafkaCluster val params = Map[String, Object]( "bootstr
I am a new beginner in the big data field, I need to make a demo which streams data from Kafka topic using spark stream then make some aggregation and filtering
I am trying to submit spark-submit but its failing with as weird message. Error: Could not find or load main class org.apache.spark.launcher.Main /opt/spark/b
I build a cluster use CDH5.14.2, includes 5 nodes, each node has 130G momery and 40 cpu cores. I builded the spark streamming application to read from multiple
I build a spark Streaming application to keep receiving messages from Kafka and then write them into a table HBase. This app runs pretty good for first 25 mins
I am using PostGre as database. I want to capture one table data for each batch and convert it as parquet file and store in to s3. I tried to connect using JDB