Category "hbase"

Spark 3.2.1 fetch HBase data not working with NewAPIHadoopRDD

Below is the sample code snippet that is used for data fetch from HBase. This worked fine with Spark 3.1.2. However after upgrading to Spark 3.2.1, it is not wo

Problem to read data from HBase on AWS EMR cluster using Java Spring boot client

I'm trying to write a simple API application to read data from HBase on an AWS EMR cluster. But I get an UnknownHostException when I try to send the reques

Hbase | Hbase col qualifier hidden using Hbase shell cmds but visible via hbaserdd spark code

I am stuck in a very odd situation related to Hbase design i would say. Hbase version >> Version 2.1.0-cdh6.2.1 So, the problem statement is, in Hbase, w

Receive a Kafka message through Spark Streaming and delete Phoenix/HBase data

In my project, I have the current workflow: Kafka message => Spark Streaming/processing => Insert/Update to HBase and/or Phoenix Both the Insert and Updat

All else held equal, which is the faster querying option: Milvus, RocksDB, or Apache HBase

I have a requirement to store billions of records (with capacity up to one trillion records) in a database (total size is in terms of petabytes). The records ar

HBase Shell - org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet

I am trying to set up distributed HBase on 3 nodes. I have already set up hadoop, YARN ZooKeeper and now HBase but when I launch hbase shell and run the simples

How to split HBase row key into 2 columns in Hive table

HBase Table rowkey: 2020-02-02^ghfgewr3434555, cf:1 timestamp=1604405829275, value=true rowkey: 2020-02-02^ghfgewr3434555, cf:2 timestamp=1604405829275, value=

py4j.protocol.Py4JJavaError: An error occurred while calling o63.save. : java.lang.NoClassDefFoundError: org/apache/spark/Logging

I am new to Spark and BigData component - HBase, I am trying to write Python code in Pyspark and connect to HBase to read data from HBase. I'm using the followi

How to pass hbase-site.xml to Google Cloud Dataflow template

We have a setup where we have a Hbase cluster running on Google cloud and using Dataflow I want to write into Hbase tables. For this, I want to pass my hbase-si

HBase data export to S3

I am trying to export HBase table(size-23TB) data to S3. So while using HBase export and passing S3 credentials via jceks path Command : hbase org.apache.hadoop

How/Where can I write time series data? As Parquet format to Hadoop, or HBase, Cassandra?

I have real-time time series sensor data. My primary goal is to keep the raw data. I should do this so that the cost of storage is minimal. My scenario like th

Apache Phoenix Join Fails (Encountered exception in sub plan [0] execution)

Here are the create table statements of the tables I'm testing, which was actually from Phoenix CREATE TABLE Test.Employee( Region VARCHAR NOT NULL, LocalI

Why does persist(StorageLevel.MEMORY_AND_DISK) give different results than cache() with HBase?

I could sound naive asking this question but this is a problem that I have recently faced in my project. Need some better understanding on it. df.persist(Stora

Hbase master not able to start regionserver

I have one master and one regionserer running on one machine,now I want to add another region server to it. This new machine has all the connection config requ