'How to split HBase row key into 2 columns in Hive table
HBase Table
rowkey: 2020-02-02^ghfgewr3434555, cf:1 timestamp=1604405829275, value=true
rowkey: 2020-02-02^ghfgewr3434555, cf:2 timestamp=1604405829275, value=true
rowkey: 2020-02-02^ghfgewr3434555, cf:3 timestamp=1604405829275, value=false
rowkey: 2020-02-02^ghfgewr3434555, cf:4 timestamp=1604405829275, value=false
Transfer HBase data into Hive table like below
Hive table
date ========= Id ======== cf:no == boolean
2020-02-02 ==== ghfgewr3434555 == 1 ======= true
2020-02-02 ==== ghfgewr3434555 == 2 ======= true
2020-02-02 ==== ghfgewr3434555 == 3 ======= false
2020-02-02 ==== ghfgewr3434555 == 4 ======= false
Solution 1:[1]
I have solved this problem using 2 table/View. First 1 just coping data from HBase table and second table/view split the rowkey into 2 columns.
First Table query in Hive
CREATE EXTERNAL TABLE hbase_hive_table(
key string,
t1 boolean,
t2 boolean
)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH
SERDEPROPERTIES ("hbase.columns.mapping" = "cf:1#b,cf:2#b)
TBLPROPERTIES ("hbase.table.name" = "hbase_table");
First Table/View query in Hive
CREATE VIEW IF NOT EXISTS hbase_hive_view
AS SELECT
CONCTNS.rowkey[0] AS date,
CONCTNS.rowkey[1] AS req_id,
t1,
t2
FROM
(SELECT split(key,'\\^') AS rowkey, t1, t2 FROM hbase_hive_table)
CONCTNS;
Solution 2:[2]
If you are thinking to transfer it only to be queried, you can actually create a connection in hive to that table specifying the properties
CREATE TABLE foo(rowkey STRING, a STRING, b STRING)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES (‘hbase.columns.mapping’ = ‘:key,f:c1,f:c2’)
TBLPROPERTIES (‘hbase.table.name’ = ‘bar’);
Proper doc here: https://blog.cloudera.com/hbase-via-hive-part-1/
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Vincent Doba |
Solution 2 | Marco Massetti |