'string split with expand=True. Can anyone explain what is the meaning?

all_data['Title']= all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)[0]

Can anyone explain what is the meaning of this line of code? Especially with the expand=True and [1] [0].



Solution 1:[1]

Take a look here: pandas.Series.str.split

Expand the split strings into separate columns.

If True, return DataFrame/MultiIndex expanding dimensionality.

If False, return Series/Index, containing lists of strings.

s = pd.Series(
    [
        "this is a regular sentence",
    ]
)    
s.str.split(expand=True)

0 1 2 3 4
this is a regular sentence

Solution 2:[2]

If you are using Pandas it is likely that you know also Jupyter Notebooks. So, for simplicity and readability let's complete the code you've posted with some additional information in a Notebook-like format:

(this markdown is here to override an error in the answer window interpreter)

```lang-python
    import pandas as pd
    
    raw_name = ['Bob, Mr. Ross', 'Alice, Mrs. Algae', 'Larry, Mr. lemon', 'John, Mr. Johnson']
    all_data = pd.DataFrame({'Name': raw_name})
    
    # This the OP's line
    all_data['Title'] = all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)[0]
    
    all_data
Name Title
0 Bob, Mr. Ross Mr
1 Alice, Mrs. Algae Mrs
2 Larry, Mr. Lemon Mr
3 John, Mr. Johnson Mr

Where the expand=True renders a set of columns of strings. Therefore, after the first split, you may apply again another str.split method since the first split has rendered dataframe of strings as columns. This would have been a little more complicated with a regular split (or expand=False) which renders a series of lists.

Better explained with code examples:

    all_data['Name'].str.split(', ', expand=False) # or no expand at all
0
0 [Bob, Mr. Ross]
1 [Alice, Mrs. Algae]
2 [Larry, Mr. Lemon]
3 [John, Mr. Johnson]
    all_data['Name'].str.split(', ', expand=True) 
0 1
0 Bob Mr. Ross
1 Alice Mrs. Algae
2 Larry Mr. Lemon
3 John Mr. Johnson
    all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=False)
0
0 [Mr, Ross]
1 [Mrs, Algae]
2 [Mr, Lemon]
3 [Mr, Johnson]
    all_data['Name'].str.split(', ', expand=True)[1].str.split('.', expand=True)
0 1
0 Mr Ross
1 Mrs Algae
2 Mr Lemon
3 Mr Johnson

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.split.html

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 wp78de
Solution 2 Alejandro QA