'Conditional formatting on Duplicates using pandas
I have a dataFrame with 6 columns. And I want to do conditional formatting on two columns of them. So my dataFrame looks like this
And I want to highlight duplicate values from College and College_F2 columns. After which my dataframe will look like this
The code written for this looks like this:dataFrame_file = pd.read_excel(util.comcastFile2Path(), sheet_name='Sheet1')
def dummy_color(x):
c1 = 'background-color:red'
c2 = ''
cond = dataFrame_file.stack().duplicated(keep=False).unstack()
df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
return df1
dataFrame_file.style.apply(dummy_color,axis=None,subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)
And the error this code is giving to me is
ValueError: Shape of passed values is (6, 6), indices imply (6, 2)
IDE that I am using is PyCharm. How to resolve this issue?
Thanks in advance.
Solution 1:[1]
In solution is necessary working with all DataFrame, so omited subset
parameter and in cond
filtered columns for check duplicates and also added DataFrame.reindex
for filling False
to all another columns:
def dummy_color(x):
c1 = 'background-color:red'
c2 = ''
cond = (x[['College', 'College_F2']].stack()
.duplicated(keep=False)
.unstack()
.reindex(x.columns, axis=1, fill_value=False))
df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
return df1
dataFrame_file.style.apply(dummy_color,axis=None).to_excel(util.comcastFile2Path)
Like @anky_91 mentioned simplier is use subset parameter with x
for cond
variable, I think reason is x
variable are only columns filtered by subset
list:
def dummy_color(x):
c1 = 'background-color:red'
c2 = ''
cond = x.stack().duplicated(keep=False).unstack()
df1 = pd.DataFrame(np.where(cond, c1, c2), columns=x.columns, index=x.index)
return df1
dataFrame_file.style.apply(dummy_color,axis=None, subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)
Solution 2:[2]
def dummy_color(x):
color = 'red' if (len(dataFrame_file[dataFrame_file['College'] == x]) + len(dataFrame_file[dataFrame_file['College_F2'] == x])) > 1 else ''
return 'background-color: %s' % color
dataFrame_file.style.applymap(dummy_color, subset=['College', 'College_F2']).to_excel(util.comcastFile2Path)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | jlewkovich |