'Django deleting querysets with paging, not catching all parts of the set

I have a bit of a strange problem that I'm not quite able to explain.

I have a django project with some old, stale objects lying around. For example, lets say my objects look something like this:

class blog_post(models.Model):
    user_account = models.ForeignKey('accounts.Account')
    text = models.CharField(max_length=255)
    authors = models.ManyToManyField(author)
    created =  models.DateTimeField(blank=True, null=True)

This is not an exact copy of my model, but is close enough.

I've created a management command to build ordered querysets of these objects, and then delete with with a Paginator

My command looks something like this:

all_accounts = Account.objects.all()
for act in all_accounts.iterator():
    stale_objects = blog_post.objects.filter(user_account=act, 
    created=django.utils.timezone.now() - datetime.timedelta(days=7))

    paginator = Paginator(stale_objects.order_by('id'), 100)
    for page in range(1, paginator.num_pages + 1):
        page_stale_objects = blog_post.objects.filter(id__in=paginator.page(page).object_list.values_list('id'))
        page_stale_objects.delete()

The problem I'm having is, after I delete these objects with my command, there are still objects that fit the queryset parameters but are not deleted. So, I have to run the command 3+ times to properly find and remove all the objects.

I first figured that my date range was just weirdly on the edge of the DateTime so was not catching objects made shortly after 1 week past my command time. This is not the case, I've removed the created=... filter from the queryset, and have the same results.

Why are my querysets not catching all objects the first time this command runs? There are not excessive objects, at the most ~30,000 rows.



Solution 1:[1]

Paging through a queryset is translated into consecutive LIMIT/OFFSET calls. So, think about the sequence:

  • get items with offset 0 and limit 20
  • delete those items
  • get next page, ie 20 items from offset 21

But wait! Once we've deleted the first set, the queryset now starts at 0 again. The items that are now from 0 to 20 are skipped.

The solution is, don't do this. Pagination is for displaying objects, not deleting them.

Solution 2:[2]

I don't understand why you are using the Paginator if you just want to delete your queryset. Correct me if I am wrong but looks like you are doing the following:

  • get a queryset
  • sort this queryset by id
  • paginate it
  • get the objects from each list
  • delete them

When you could just do this:

  • get a queryset
  • delete the queryset

This will be a huge performance boost if you have a lot of objects.

So, I suggest you just do this:

stale_objects = blog_post.objects.filter(...)
stale_objects.delete()

Hope it helps!

Solution 3:[3]

A solution is to write a loop that uses a field to filter:

import string
for letter in string.ascii_lowercase:
   qs = MyModel.objects.filter(
       name__startswith=letter,
   )
   qs.delete()

Here, we will divide the delete in 26 querysets (a to z).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Daniel Roseman
Solution 2 Jahongir Rahmonov
Solution 3 eagle28