'Trying to style large number of rows with openpyxl, is there a 'faster' way?
I've got a function I use to check and style rows of a spreadsheet based on matching criteria with openpyxl. While this has been functional for nearly every smaller sheet as I've graduated to using large sets of rows the speed has slowed considerably.
I'm struggling to find ways to speed it up and was hoping for some help figuring out what I'm missing.
I believe my problem is that I am looping through every row, but then looping through each cell of the target row to match a specific column value. Though I am not sure what I can do to get around this issue (Shown below):
for row in sheet.iter_rows():
for cell in row:
cell.alignment = Alignment(wrapText=True)
if row[type_row].value in group1:
Entire code of the function and another function I use to actually format the cells:
import openpyxl as opxl
from openpyxl.styles import Font, Border, Side, PatternFill, Alignment, NamedStyle
def format_cell(cell, fill, font, border):
cell.fill = fill
cell.font = font
cell.border = border
def set_style_base(sheet, type_row, col_for_other, col_for_step):
group1 = ['Temp1', 'Temp2', 'Temp3']
group2 = ['ACTIVITY']
group3 = ['ACTIVITY2']
group4 = ['OPEN']
group5 = ['BREAK']
group6 = ['WARNING']
group7 = ['DIAGNOSTIC']
pass_start_fill = PatternFill(start_color='00FF00', end_color='00FF00', fill_type='solid')
pass_end_fill = PatternFill(start_color='99FF99', end_color='99FF99', fill_type='solid')
error_start_fill = PatternFill(start_color='FF0000', end_color='FF0000', fill_type='solid')
error_end_fill = PatternFill(start_color='FF6666', end_color='FF6666', fill_type='solid')
fail_start_fill = PatternFill(start_color='FF8000', end_color='FF8000', fill_type='solid')
fail_end_fill = PatternFill(start_color='FFB266', end_color='FFB266', fill_type='solid')
group2_fill = PatternFill(start_color='FFA500', end_color='FFA500', fill_type='solid')
group3_fill = PatternFill(start_color='000000', end_color='000000', fill_type='solid')
group4_fill = PatternFill(start_color='00BFFF', end_color='00BFFF', fill_type='solid')
group5_fill = PatternFill(start_color='800080', end_color='800080', fill_type='solid')
group6_fill = PatternFill(start_color='FFFF00', end_color='FFFF00', fill_type='solid')
thick = Side(border_style="thick")
border_thick = Border(top=thick, left=thick, right=thick, bottom=thick)
bold_font = Font(bold=True)
def_font = Font(bold=False)
for row in sheet.iter_rows():
for cell in row:
cell.alignment = Alignment(wrapText=True)
if row[type_row].value in group1:
if row[col_for_step].data_type == "s":
if 'START:' in row[col_for_step].value:
if row[col_for_other].data_type == "s":
if row[col_for_other].value == 'PASS':
format_cell(cell, pass_start_fill, bold_font, border_thick)
if row[col_for_other].value == 'FAIL':
format_cell(cell, fail_start_fill, bold_font, border_thick)
if row[col_for_other].value == 'ERROR':
format_cell(cell, error_start_fill, bold_font, border_thick)
if 'END:' in row[col_for_step].value:
if row[col_for_other].data_type == "s":
if row[col_for_other].value == 'PASS':
format_cell(cell, pass_end_fill, bold_font, border_thick)
if row[col_for_other].value == 'FAIL':
format_cell(cell, fail_end_fill, bold_font, border_thick)
if row[col_for_other].value == 'ERROR':
format_cell(cell, error_end_fill, bold_font, border_thick)
if row[type_row].value in group2:
format_cell(cell, group2_fill, bold_font, border_thick)
if row[type_row].value in group3:
format_cell(cell, group3_fill, def_font, border_thick)
cell.font = Font(color = 'FFA500') #Font color needs to be set
if row[type_row].value in group4:
format_cell(cell, group4_fill, bold_font, border_thick)
if row[type_row].value in group5:
format_cell(cell, group5_fill, bold_font, border_thick)
if row[type_row].value in group6:
format_cell(cell, group6_fill, def_font, border_thick)
cell.font = Font(color = '000000') #Font color needs to be set
if row[type_row].value in group7: #special case as 'DIAGNOSTIC' has no fill
cell.border = Border(top=thick, left=thick, right=thick, bottom=thick)
cell.font = Font(bold=True)
if row[type_row].value not in [group1, group2, group3, group4, group5, group6, group7]:
if row[col_for_step].data_type == "s":
if 'START:' in row[col_for_step].value:
if row[col_for_other].data_type == "s":
if row[col_for_other].value == 'PASS':
format_cell(cell, pass_start_fill, bold_font, border_thick)
if row[col_for_other].value == 'FAIL':
format_cell(cell, fail_start_fill, bold_font, border_thick)
if row[col_for_other].value == 'ERROR':
format_cell(cell, error_start_fill, bold_font, border_thick)
if 'END:' in row[col_for_step].value:
if row[col_for_other].data_type == "s":
if row[col_for_other].value == 'PASS':
format_cell(cell, pass_end_fill, bold_font, border_thick)
if row[col_for_other].value == 'FAIL':
format_cell(cell, fail_end_fill, bold_font, border_thick)
if row[col_for_other].value == 'ERROR':
format_cell(cell, error_end_fill, bold_font, border_thick)
Any direction or suggestions would be amazing. Thanks
Solution 1:[1]
VBA isn't known for its speed either. It looks to me like there will be a problem processing the same cell multiple times but the structure is unclear. Dispatching would help here a bit. It looks like you only touch two cells per row, if that's the case I'd use those as min_col
and max_col
.
This would give you something like this.
conditions = {"PASS": {"start": pass_start_fill, "end": pass_end_fill}, "FAIL": {"start": fail_start_fill, "end", fail_end_fill}
for row in ws.iter_rows(min_col=col_for_step, max_col=col_for_other):
step, other = row[0], row[-1]
if step.value == "START":
format_cell(other, fill=conditions[other.value]["start"], font=bold_font, border=border_thick])
elif step.value == "END":
format_cell(other, fill=conditions[other.value]["end"], font=bold_font, border=border_thick])
This needs extending but if you can simplify the code, it should be easier to identify bottlenecks.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Charlie Clark |