'get dataframe row count based on conditions
I want to get the count of dataframe rows based on conditional selection. I tried the following code.
print df[(df.IP == head.idxmax()) & (df.Method == 'HEAD') & (df.Referrer == '"-"')].count()
output:
IP 57
Time 57
Method 57
Resource 57
Status 57
Bytes 57
Referrer 57
Agent 57
dtype: int64
The output shows the count for each an every column in the dataframe. Instead I need to get a single count where all of the above conditions satisfied? How to do this? If you need more explanation about my dataframe please let me know.
Solution 1:[1]
You are asking for the condition where all the conditions are true, so len of the frame is the answer, unless I misunderstand what you are asking
In [17]: df = DataFrame(randn(20,4),columns=list('ABCD'))
In [18]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)]
Out[18]:
A B C D
12 0.491683 0.137766 0.859753 -1.041487
13 0.376200 0.575667 1.534179 1.247358
14 0.428739 1.539973 1.057848 -1.254489
In [19]: df[(df['A']>0) & (df['B']>0) & (df['C']>0)].count()
Out[19]:
A 3
B 3
C 3
D 3
dtype: int64
In [20]: len(df[(df['A']>0) & (df['B']>0) & (df['C']>0)])
Out[20]: 3
Solution 2:[2]
For increased performance you should not evaluate the dataframe using your predicate. You can just use the outcome of your predicate directly as illustrated below:
In [1]: import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(20,4),columns=list('ABCD'))
In [2]: df.head()
Out[2]:
A B C D
0 -2.019868 1.227246 -0.489257 0.149053
1 0.223285 -0.087784 -0.053048 -0.108584
2 -0.140556 -0.299735 -1.765956 0.517803
3 -0.589489 0.400487 0.107856 0.194890
4 1.309088 -0.596996 -0.623519 0.020400
In [3]: %time sum((df['A']>0) & (df['B']>0))
CPU times: user 1.11 ms, sys: 53 µs, total: 1.16 ms
Wall time: 1.12 ms
Out[3]: 4
In [4]: %time len(df[(df['A']>0) & (df['B']>0)])
CPU times: user 1.38 ms, sys: 78 µs, total: 1.46 ms
Wall time: 1.42 ms
Out[4]: 4
Keep in mind that this technique only works for counting the number of rows that comply with your predicate.
Solution 3:[3]
In Pandas, I like to use the shape
attribute to get number of rows.
df[df.A > 0].shape[0]
gives the number of rows matching the condition A > 0
, as desired.
Solution 4:[4]
You can use the method query
and get the shape
of the resulting dataframe. For example:
A B C
0 1 1 x
1 2 2 y
2 3 3 z
df.query("A == 2 & B > 1 & C != 'z'").shape[0]
Output:
1
Solution 5:[5]
import pandas as pd
data = {'title': ['Manager', 'Technical Analyst', 'Software Engineer', 'Sales Manager'], 'Description': [
'''a man or woman who controls an organization or part of an organization,a person who looks after the business affairs of a singer, actor, etc''',
'''Technical analysts, also known as chartists or technicians, employ technical analysis in their trading and research. Technical analysis looks for price patterns and trends based on historical performance to identify signals based on market sentiment and psychology.''',
'''A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term programmer is sometimes used as a synonym, but may also lack connotations of engineering education or skills.''',
'''A sales manager is someone who leads and supervises sales agents and runs the day-to-day sales operations of a business. They oversee the sales strategy, set sales goals, and track sales performance'''
]}
df = pd.DataFrame(data)
data2 = {'title': ['Manager', 'Technical Analyst', 'Software Engineer', 'Sales Manager'], 'Keywords': [
['organization','business','people','arrange']
,['technicians','analysis','research','business']
,['engineering', 'design', 'develop', 'maintain']
,['supervises', 'agents','business','performance','target']
]}
df2 = pd.DataFrame(data2)
print(df2)
df2=df2.explode('Keywords')
print(df2)
print("checking df3")
df3=df.merge(df2,how='left',on='title')
print(df3)
df3['match'] = df3.apply(lambda x: x.Keywords in x.Description, axis=1)
print(df3)
df4=df3.loc[df3['match']==True].groupby(['Description']).count()
print(df4)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Jeff |
Solution 2 | Enias Cailliau |
Solution 3 | flow2k |
Solution 4 | |
Solution 5 | Giridhar |