'Add a new record for each missing row in a DataFrame with TimeStamp without replacing the original records

Be the next Pandas DataFrame:

|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        1         |
|          2022-01-01 10:00:04        |        1         |
|          2022-02-01 11:30:06        |        1         |

First I am applying a change to group repeated records in the same row:

   df['date'] = pd.to_datetime(df['date'])
   df = df.groupby(df['date']).sum().reset_index()
|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        2         |
|          2022-02-01 11:30:06        |        1         |

Now I need to create a new DataFrame in the interval 2022-01-01 10:00:01 and 2022-02-01 11:30:06, which contains the records from the above DataFrame. In addition, include records with counter value 0, starting from the initial row at a given time interval in seconds.

For example, for 60 seconds:

|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        2         |
|          2022-01-01 10:01:01        |        0         |
|          2022-01-01 10:02:01        |        0         |
|          2022-01-01 10:03:01        |        0         |
...
|          2022-02-01 11:29:01        |        0         |
|          2022-02-01 11:30:01        |        0         |
|          2022-02-01 11:30:06        |        1         |

I had been advised to use this command, but it doesn't work, as it deletes the original records if they don't fall within the generated time interval.

 df = df.set_index('date').asfreq('1S', fill_value=0).reset_index()

I am grateful for your help.



Solution 1:[1]

Try:

# if necessary, convert it:
#df["date"] = pd.to_datetime(df["date"])

x = df.groupby("date").sum()

x = x.reindex(
    x.index.join(
        pd.DatetimeIndex(
            pd.date_range(x.index.min(), x.index.max(), freq="1min")
        ),
        how="outer",
    ),
    fill_value=0,
)

print(x)

Prints:

                     counter
2022-01-01 10:00:01        1
2022-01-01 10:00:04        2
2022-01-01 10:01:01        0
2022-01-01 10:02:01        0
2022-01-01 10:03:01        0

...

2022-02-01 11:29:01        0
2022-02-01 11:30:01        0
2022-02-01 11:30:06        1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrej Kesely