'How can I split the document path to the foldername and the document name in python?

I need to split the document path to the foldername and the document name in python. It is a large dataframe including many rows.For the filename with no document name followed, just leave the document name column blank in the result. For example, I have a dataframe like the follows:

     no  filename
     1  \\apple\config.csv
     2  \\apple\fox.pdf
     3  \\orange\cat.xls
     4  \\banana\eggplant.pdf
     5  \\lucy
...

I expect the output shown as follows:

    foldername  documentname
    \\apple     config.csv
    \\apple     fox.pdf
    \\orange    cat.xls
    \\banana    eggplant.pdf
    \\lucy 
...

I have tried the following code,but it does not work.


    y={'Foldername':[],'Docname':[]}
    def splitnames(x):
        if "." in x:
            docname=os.path.basename(x)
            rm="\\"+docname
            newur=x.replace(rm,'')
        else:
            newur=x
            docname=""
        result=[newur,docname]
        y["Foldername"].append(result[0])
        y["Docname"].append(result[1])
        return y;

    dff=df$filename.apply(splitnames)

Thank you so much for the help!!

python pandas url

Solution 1:^[1]

Not sure how you're getting the paths, but you could create some Pathlib objects and use some class methods to grab the file name and folder name.

from pathlib import Path

data = """ no  filename
     1  \\apple\\config.csv
     2  \\apple\\fox.pdf
     3  \\orange\\cat.xls
     4  \\banana\\eggplant.pdf
     5  \\lucy"""

df = pd.read_csv(StringIO(data),sep='\s+')
df['filename'] = df['filename'].apply(Path)


df['folder'] = df['filename'].apply(lambda x : x.parent if '.' in x.suffix else x)
df['document_name'] = df['filename'].apply(lambda x : x.name if '.' in x.suffix  else np.nan)


print(df)

   no              filename   folder document_name
0   1     \apple\config.csv   \apple    config.csv
1   2        \apple\fox.pdf   \apple       fox.pdf
2   3       \orange\cat.xls  \orange       cat.xls
3   4  \banana\eggplant.pdf  \banana  eggplant.pdf
4   5                 \lucy    \lucy           NaN

Solution 2:^[2]

Possibly, you shall use apply function twice, to generate separate columns:

import pandas as pd
filenames = [r'\\apple\config.csv', r'\\apple\fox.pdf', r'\\orange\cat.xls', r'\\banana\eggplant.pdf']
df = pd.DataFrame( { 'filename':filenames })
df['Foldername'] = df['filename'].apply( lambda x : r'\\' + x.split('\\')[-2]  )
df['Docname'] = df['filename'].apply( lambda x :  x.split('\\')[-1]  )

Default apply function awaits single value to be created and also in this case it is worth to indicate to which column you want to use it.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

Solution 3:^[3]

Extension to Umar.H suggestion is to use split under the os lib

df['Docname'] = df['filename'].apply(lambda x : os.path.split(x)[1])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2	RunTheGauntlet
Solution 3	rpb

'How can I split the document path to the foldername and the document name in python?

Solution 1:[1]

Solution 2:[2]

Solution 3:[3]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]

Solution 3:^[3]