'BigTable data schema design
I am learning about BigTable, and trying to design a good schema for it
user there will be an ID(unique), in time this user will receive many events(these events don't have an Id, only the timestamp is unique). Also I want to use bigTable garbage collection to expire a event.
The query that I will be use and won't change in the future:
1/ getAllEventsByUserId (order events by timestamp)
2/ getEventDetailByUserIdAndTimestamp
I'm confusing about the row key. Should I just user_id
as row key or user_id#timestmap
Base on what I know, using just user_id
can get very good performance for query 1 but I don't know if it bad for query 2. If user_id#timestamp
then query 2 is very good but will query 1 have to scan with pattern which will cost a lot resource
Solution 1:[1]
As each row key must be unique, user_id
can not be your row key. Hence, you should go with user_id#timestmap
instead
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Andrés |