'How to split a DataFrame based on consecutive index?

I have a DataFrame 'work' with non consecutive index, here is an example:

Index Column1 Column2
4464  10.5    12.7
4465  11.3    12.8
4466  10.3    22.8
5123  11.3    21.8
5124  10.6    22.4
5323  18.6    23.5

I need to extract from this DataFrame new DataFrames containing only rows where the index is consecutive, so in this case my goal is to get

DF_1.index=[4464,4465,4466]
DF_2.index=[5123,5124]
DF_3.index=[5323]

maintaining all the columns.

Can anyone help me?

Solution 1:^[1]

`groupby`

You can make a perfectly "consecutive" array with

np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

If I were to subtract this from an index that is monotonically increasing, only those index members that were "consecutive" would show up as equal. This is a clever way to establish a key to group by.

list_of_df = [d for _, d in df.groupby(df.index - np.arange(len(df)))]

And print each one to prove it

print(*list_of_df, sep='\n\n')

       Column1  Column2
Index                  
4464      10.5     12.7
4465      11.3     12.8
4466      10.3     22.8

       Column1  Column2
Index                  
5123      11.3     21.8
5124      10.6     22.4

       Column1  Column2
Index                  
5323      18.6     23.5

`np.split`

You can use np.flatnonzero to identify where the differences are not equal to 1 and avoid using cumsum and groupby

list_of_df = np.split(df, np.flatnonzero(np.diff(df.index) != 1) + 1)

Proof

print(*list_of_df, sep='\n\n')

       Column1  Column2
Index                  
4464      10.5     12.7
4465      11.3     12.8
4466      10.3     22.8

       Column1  Column2
Index                  
5123      11.3     21.8
5124      10.6     22.4

       Column1  Column2
Index                  
5323      18.6     23.5

Solution 2:^[2]

Here is an alternative:

grouper = (~(pd.Series(df.index).diff() == 1)).cumsum().values  
dfs = [dfx for _ , dfx in df.groupby(grouper)]

We use the fact that a continuous difference of 1 equals a sequence (diff == 1).

Full example:

import pandas as pd

data = '''\
Index Column1 Column2
4464  10.5    12.7
4465  11.3    12.8
4466  10.3    22.8
5123  11.3    21.8
5124  10.6    22.4
5323  18.6    23.5
'''

fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep='\s+', index_col='Index')

non_sequence = pd.Series(df.index).diff() != 1
grouper = non_sequence.cumsum().values
dfs = [dfx for _ , dfx in df.groupby(grouper)]

print(dfs[0])

#       Column1  Column2
#Index                  
#4464      10.5     12.7
#4465      11.3     12.8
#4466      10.3     22.8

Another way of seeing it is that we look for non-sequence to groupby, might be more readable:

non_sequence = pd.Series(df.index).diff() != 1
grouper = non_sequence.cumsum().values
dfs = [dfx for _ , dfx in df.groupby(grouper)]

Solution 3:^[3]

You can use exec to create several dataframes and get your expected results :

df = pd.DataFrame({'Column1' : [10.5,11.3,10.3,11.3,10.6,18.6], 'Column2' : [10.5,11.3,10.3,11.3,10.6,18.6]})
df.index = [4464, 4465, 4466, 5123, 5124, 5323]

prev_index = df.index[0]
df_1 = pd.DataFrame(df.iloc[0]).T
num_df = 1
for i in df.index[1:]:
    if i == prev_index+1:
        exec('df_{} = df_{}.append(df.loc[{}])'.format(num_df, num_df, i))
    else :
        num_df += 1
        exec('df_{} = pd.DataFrame(df.loc[{}]).T'.format(num_df, i))
    prev_index = i

Solution 4:^[4]

Maybe there is a more elegant way to write it down but here is what works for me:

previous_index = df.index[0]
groups = {}
for x in df.index:
    if (x-previous_index) ==1 : 
        groups[max(groups.keys())].append(x)
    else:
        groups[len(groups.keys())]=[x]
    previous_index = x

output_dfs = []
for key, val in groups.items():
    print(key, val)
    output_dfs.append(df[df.index.isin(val)])

Your dataframes will be stored in output_dfs

output_dfs[0].index

[4464,4465,4466]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2
Solution 3	vlemaistre
Solution 4

'How to split a DataFrame based on consecutive index?

Solution 1:[1]

groupby

np.split

Solution 2:[2]

Solution 3:[3]

Solution 4:[4]