'Druid generate missing records
I have a data table in druid and which has missing rows and I want to fill them by generating the missing timestamps and adding the precedent row value.
This is the table in druid :
| __time | distance |
|--------------------------|----------|
| 2022-05-05T08:41:00.000Z | 1337 |
| 2022-05-05T08:42:00.000Z | 1350 |
| 2022-05-05T08:44:00.000Z | 1360 |
| 2022-05-05T08:47:00.000Z | 1377 |
| 2022-05-05T08:48:00.000Z | 1400 |
And i want to add the missing minutes either by forcing it in the side of druid storage or by query it directly in druid without passing by other module. The final result that I want will be look like this:
| __time | distance |
|--------------------------|----------|
| 2022-05-05T08:41:00.000Z | 1337 |
| 2022-05-05T08:42:00.000Z | 1350 |
| 2022-05-05T08:43:00.000Z | 1350 |
| 2022-05-05T08:44:00.000Z | 1360 |
| 2022-05-05T08:45:00.000Z | 1360 |
| 2022-05-05T08:46:00.000Z | 1360 |
| 2022-05-05T08:47:00.000Z | 1377 |
| 2022-05-05T08:48:00.000Z | 1400 |
And thank you in advance !
Solution 1:[1]
A Driud time series query will produce a densely populated timeline at a given time granularity like the one you want for every minute. But its current functionality either skips empty time buckets or assigns them a value of zero.
Doing other gap filling functions like LVCF (last value carried forward) that you describe seems like a great enhancement. You can join the Apache Druid community and create an issue that describes this request. That's a great way to start a conversation about requirements and how it might be achieved.
And/Or you could also add the functionality and submit a PR. We're always looking for more members in the Apache Druid community.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Sergio Ferragut |