'How can I split the document path to the foldername and the document name in python?
I need to split the document path to the foldername and the document name in python. It is a large dataframe including many rows.For the filename with no document name followed, just leave the document name column blank in the result. For example, I have a dataframe like the follows:
no filename
1 \\apple\config.csv
2 \\apple\fox.pdf
3 \\orange\cat.xls
4 \\banana\eggplant.pdf
5 \\lucy
...
I expect the output shown as follows:
foldername documentname
\\apple config.csv
\\apple fox.pdf
\\orange cat.xls
\\banana eggplant.pdf
\\lucy
...
I have tried the following code,but it does not work.
y={'Foldername':[],'Docname':[]}
def splitnames(x):
if "." in x:
docname=os.path.basename(x)
rm="\\"+docname
newur=x.replace(rm,'')
else:
newur=x
docname=""
result=[newur,docname]
y["Foldername"].append(result[0])
y["Docname"].append(result[1])
return y;
dff=df$filename.apply(splitnames)
Thank you so much for the help!!
Solution 1:[1]
Not sure how you're getting the paths, but you could create some Pathlib objects and use some class methods to grab the file name and folder name.
:
from pathlib import Path
data = """ no filename
1 \\apple\\config.csv
2 \\apple\\fox.pdf
3 \\orange\\cat.xls
4 \\banana\\eggplant.pdf
5 \\lucy"""
df = pd.read_csv(StringIO(data),sep='\s+')
df['filename'] = df['filename'].apply(Path)
df['folder'] = df['filename'].apply(lambda x : x.parent if '.' in x.suffix else x)
df['document_name'] = df['filename'].apply(lambda x : x.name if '.' in x.suffix else np.nan)
print(df)
no filename folder document_name
0 1 \apple\config.csv \apple config.csv
1 2 \apple\fox.pdf \apple fox.pdf
2 3 \orange\cat.xls \orange cat.xls
3 4 \banana\eggplant.pdf \banana eggplant.pdf
4 5 \lucy \lucy NaN
Solution 2:[2]
Possibly, you shall use apply function twice, to generate separate columns:
import pandas as pd
filenames = [r'\\apple\config.csv', r'\\apple\fox.pdf', r'\\orange\cat.xls', r'\\banana\eggplant.pdf']
df = pd.DataFrame( { 'filename':filenames })
df['Foldername'] = df['filename'].apply( lambda x : r'\\' + x.split('\\')[-2] )
df['Docname'] = df['filename'].apply( lambda x : x.split('\\')[-1] )
Default apply function awaits single value to be created and also in this case it is worth to indicate to which column you want to use it.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
Solution 3:[3]
Extension to Umar.H suggestion is to use split
under the os
lib
df['Docname'] = df['filename'].apply(lambda x : os.path.split(x)[1])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | RunTheGauntlet |
Solution 3 | rpb |