'Django - export excel make it faster with openpyxl
I am using Django restframework and I am trying to export excel. My issue is the process is take a lot of time till it generates the excel file.
The final file have about 1MB with 20k lines and the generation time is about 8 minutes and this does not seem right. here is the view:
class GenerateExcelView(APIView):
filename = 'AllHours.xlsx'
wb = Workbook()
ws = wb.active
ws.title = "Workbook"
data = Report.objects.all()
row_couter = 2
for line in data:
first_name = line.employee_id
second_name = line.employee_name
age = line.description
...
ws['A{}'.format(row_counter)] = first_name
ws['B{}'.format(row_counter)] = second_name
ws['C{}'.format(row_counter)] = age
...
row_counter +=1
response = HttpResponse(save_virtual_workbook(wb), content_type='application/ms-excel')
response["Content-Disposition"] = 'attachment; filename="' + filename + '"'
return response
There are few more columns... Is it possible to change the process so it is a bit faster?
EDIT: I had wrong indentation of the loop.
Solution 1:[1]
It tends to help a lot with performance to use prefetch_related on the queryset. Given a Table with a 100 rows each row having a foreign key to another table in you example the employee. Your loop would fetch the report then for each of the 100 rows the used relations. This is due to the lazy nature of the django ORM. As you can see we are already on at least 100 Queries... not so great. If you would use:
data = Report.objects.all().prefetch_related('employee')
It would use one db query in stead of a hundred. That should improve the speed of your solution by quite a bit already.
see more: https://docs.djangoproject.com/en/3.1/ref/models/querysets/#prefetch-related
Solution 2:[2]
I have been wrestling with the same problem, and even after refactoring into raw SQL there is little improvement. The issue is the speed of openpyxl.
Their documentation suggests that using write-only mode helps, but I found it to be a small improvement at best: my benchmark on a report with 2 tabs, 18k rows on the second tab, showed a 50% reduction after the query refactor to SQL plus an openpyxl refactor to use write-only mode (which is a pain if you are doing cell formatting or special rows like headers and totals).
You can check their performance page here: https://openpyxl.readthedocs.io/en/stable/performance.html
... but I wouldn't get your hopes up.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Phteven |
Solution 2 | John R |