'Split list by sum of sublist items

I have a list of sublists with file names and sizes. I need to split that list into sublists based on the criteria that each splitted sublist must have a total file size less than 500 000 000 bytes. I have tried multiple solutions but I could not find a way to make it work. My last attempt is this:

import functools
import operator

data = [["c:\example_path", 480000],["c:\example_path2", 500000], ...]

list_final = []

sum = 0
list_items_subset = []

for index, item in enumerate(data):

   sum += item[1]

   if sum < 500000000:

      list_items_subset.append(item[0])

   else:
      list_final.append(list_items_subset)

      sum = 0
      
      list_items_subset = []
      list_items_subset.append(item[0])
      sum += item[1]

print("len data init: ", len(data))
print("len items final: ", len(functools.reduce(operator.iconcat, list_final, [])))

The list_final should store all the sublists of files which have a cumulative sum less than 500 000 000 bytes. In the code above, while sublists are created and inserted, I am left with items which are not included anywhere.

Thanks for any suggestions!



Solution 1:[1]

Is this what you want to get?

import functools
import operator

data = [[r"c:\example_path", 480000], [r"c:\example_path2", 500000]] * 10000

list_final = []

total_size = 0
list_items_subset = []

for name, size in data:
    total_size += size
    if total_size < 500000000:
        list_items_subset.append(name)

    else:
        list_final.append(list_items_subset)
        total_size = 0
        list_items_subset = [name]
        total_size += size

list_final.append(list_items_subset)
print("len data init: ", len(data))
print(len(list_final))
print("len items final: ", len(functools.reduce(operator.iconcat, list_final, [])))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pL3b