'string split with expand=True. Can anyone explain what is the meaning?
all_data['Title']= all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)[0]
Can anyone explain what is the meaning of this line of code? Especially with the expand=True
and [1]
[0]
.
Solution 1:[1]
Take a look here: pandas.Series.str.split
Expand the split strings into separate columns.
If True, return DataFrame/MultiIndex expanding dimensionality.
If False, return Series/Index, containing lists of strings.
s = pd.Series(
[
"this is a regular sentence",
]
)
s.str.split(expand=True)
0 1 2 3 4
this is a regular sentence
Solution 2:[2]
If you are using Pandas it is likely that you know also Jupyter Notebooks. So, for simplicity and readability let's complete the code you've posted with some additional information in a Notebook-like format:
(this markdown is here to override an error in the answer window interpreter)
```lang-python
import pandas as pd
raw_name = ['Bob, Mr. Ross', 'Alice, Mrs. Algae', 'Larry, Mr. lemon', 'John, Mr. Johnson']
all_data = pd.DataFrame({'Name': raw_name})
# This the OP's line
all_data['Title'] = all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)[0]
all_data
Name | Title | |
---|---|---|
0 | Bob, Mr. Ross | Mr |
1 | Alice, Mrs. Algae | Mrs |
2 | Larry, Mr. Lemon | Mr |
3 | John, Mr. Johnson | Mr |
Where the expand=True
renders a set of columns of strings. Therefore, after the first split, you may apply again another str.split
method since the first split has rendered dataframe of strings as columns. This would have been a little more complicated with a regular split (or expand=False
) which renders a series of lists.
Better explained with code examples:
all_data['Name'].str.split(', ', expand=False) # or no expand at all
0 | |
---|---|
0 | [Bob, Mr. Ross] |
1 | [Alice, Mrs. Algae] |
2 | [Larry, Mr. Lemon] |
3 | [John, Mr. Johnson] |
all_data['Name'].str.split(', ', expand=True)
0 | 1 | |
---|---|---|
0 | Bob | Mr. Ross |
1 | Alice | Mrs. Algae |
2 | Larry | Mr. Lemon |
3 | John | Mr. Johnson |
all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=False)
0 | |
---|---|
0 | [Mr, Ross] |
1 | [Mrs, Algae] |
2 | [Mr, Lemon] |
3 | [Mr, Johnson] |
all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)
0 | 1 | |
---|---|---|
0 | Mr | Ross |
1 | Mrs | Algae |
2 | Mr | Lemon |
3 | Mr | Johnson |
Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | wp78de |
Solution 2 | Alejandro QA |