'How to subset Pandas Dataframe using an OR operator whilst avoiding "FutureWarning: elementwise comparison failed;"

I have a Pandas dataframe (tempDF) of 5 columns by N rows. Each element of the dataframe is an object (string in this case). For example, the dataframe looks like (this is fake data - not real world):

enter image description here

I have two tuples, each contains a collection of numbers as a string type. For example:

codeset = ('6108','532','98120')
additionalClinicalCodes = ('131','1','120','130')

I want to retrieve a subset of the rows from the tempDF in which the columns "medcode" OR "enttype" have at least one entry in the tuples above. Thus, from the example above, I would retrieve a subset containing rows with the index 8 and 9 and 11.

Until updating some packages earlier today (too many now to work out which has started throwing the warning), this did work:

tempDF = tempDF[tempDF["medcode"].isin(codeSet) | tempDF["enttype"].isin(additionalClinicalCodes)]

But now it is throwing the warning:

FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask |= (ar1 == a)

Looking at the API, isin states the the condition "if ALL" is in the iterable collection. I want an "if ANY" condition.

UPDATE #1

The problem lies with using the | operator, also the np.logical_or method. If I remove the second isin condition i.e., just keep tempDF[tempDF["medcode"].isin(codeSet) then no warning is thrown but I'm only subsetting on the one possible condition.



Solution 1:[1]

import numpy as np
tempDF = tempDF[np.logical_or(tempDF["medcode"].isin(codeSet), tempDF["enttype"].isin(additionalClinicalCodes))

Solution 2:[2]

I'm unable to reproduce your warning (I assume you are using an outdated numpy version), however I believe it is related to the fact that your enttype column is a numerical type, but you're using strings in additionalClinicalCodes.

Solution 3:[3]

Try this:

tempDF = temp[temp["medcode"].isin(list(codeset)) | temp["enttype"].isin(list(additionalClinicalCodes))]

Solution 4:[4]

Boiling your question down to an executable example:

import pandas as pd

tempDF = pd.DataFrame({'medcode': ['6108', '6154', '95744', '98120'], 'enttype': ['99', '131', '372', '372']})

codeset = ('6108','532','98120')
additionalClinicalCodes = ('131','1','120','130')

newDF = tempDF[tempDF["medcode"].isin(codeset) | tempDF["enttype"].isin(additionalClinicalCodes)]
print(newDF)
print("Pandas Version")
print(pd.__version__)

This returns for me

  medcode enttype
0    6108      99
1    6154     131
3   98120     372
Pandas Version
1.4.2

Thus I am not able to reproduce your warning.

Solution 5:[5]

This is a numpy strange behaviour. I think the right way to do this is yours way, but if the warning bothers you, try this:

tempDF = tempDF[
    (
        tempDF.medcode.isin(codeset).astype(int) +
        tempDF.isin(additionalClinicalCode).astype(int)
    ) >= 1
]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 safay
Solution 2 Xnot
Solution 3 Alex
Solution 4 jugi
Solution 5 Andrey Naradzetski