'Django ORM: get the monthly average of price for each category
Imagine this simple model:
class Expense(models.Model):
price = models.DecimalField(decimal_places=2, max_digits=6)
description = models.CharField(max_length=300)
category = models.CharField(choices=ExpenseCategory.choices, max_length=20)
created_at = models.DateField()
I'm trying to get the monthly average of price
for each category
in the current year. My general thought was to do something like:
sub = (
Expense.objects.filter(created_at__year=date.today().year)
.annotate(month=TruncMonth("created_at"))
.values("month", "category")
.annotate(total=Sum("price"))
.order_by("month")
)
qs = Expense.objects.values("category").annotate(avg=Avg(Subquery(sub.values("total"))))
I'm basically trying to:
- Truncate the months of
created_at
- Group by
category
andmonth
- Sum the
prices
- Aggregate the
prices
for eachcategory
It works just fine if I do like:
for category in categories:
sub.filter(category=category).aggregate(avg=Avg("total"))
Solution 1:[1]
Your query can be more simple than you think. Your current attempt at the solution is:
- Truncate
created_at
to get the month - Group by
category
andmonth
- Sum
prices
- Take Avg of the sum for each category
The problem with this is taking an aggregate of an aggregate. Let's think of your problem in reverse (We will do a bit of mathematics here). You want the average of the monthly price of a category, if we consider only one category and the monthly prices to be an array M[12]
, then we can express this as:
(M[0] + M[1] + ... + M[11]) / 12
Each of the values in M can be considered to be a summation of prices
where the month matches. If we consider P[12][] to be a 2 dimensional array containing prices for each month we can rewrite above formula as:
(Sum(P[0]) + Sum(P[1] + ... + Sum(P[12])) / 12
Thinking of this further it is simply the sum of all prices in the year divided by 12! That means your query can simply be written as:
from django.db.models import ExpressionWrapper, FloatField, Sum, Value
qs = Expense.objects.filter(
created_at__year=date.today().year
).values("category").annotate(
avg=ExpressionWrapper(
Sum("price") / Value(12), output_field=FloatField()
)
)
Note: Dividing by 12 means we are assuming that we have data for the entire year, this would probably not be true for the current year so instead we should divide by the appropriate number of months. We might also want to filter upto the previous month in case we are not significantly into the current month.
Solution 2:[2]
A function within a model manager could be a good solution to this. In your models.py
add a models.Manager class called ExpenseManager
before your Expense
class:
class ExpenseManager(models.Manager):
def summarise_categories(self):
return (
super(ExpenseManager, self)
.get_queryset()
.filter(created_at__year=date.today().year)
.annotate(month=TruncMonth("created_at"))
.values("month", "category")
.annotate(total=Sum("price")/Count("created_at__month", distinct=True))
)
then in you Expense
class do:
class Expense(models.Model):
...
objects = ExpenseManager()
To get the resulting queryset aggregation you can simply call the model manager method on the model objects within a view:
summarised_categories = Expense.objects.summarise_categories()
This approach pushes all the work to the database returning a dictionary of categories and their average monthly prices for the year-to-date.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 |