'HTML tag remove from

I have written code for downloading files from FTP sites ! here's my basic start of code:

filenames = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE)
#filenames <- strsplit(filenames,"<!></><>")
filenames <- gsub("<(.|\n)*?>","", filenames)
#gsub("<.*?>", filenames)
filenames = unlist(filenames)
filenames

and I am getting output something like this

"\n\n \n Index of /data\n \n \nIndex of /data\nNameLast modifiedSize\nParent Directory  - \n2D/24-Feb-2021 15:57 - \n3DRefl/13-Oct-2020 16:30 - \n3DRhoHV/13-Oct-2020 16:30 - \n3DZdr/13-Oct-2020 16:30 - \nProbSevere/13-Oct-2020 16:30 - \nRIDGEII/11-Feb-2021 19:02 - \nheartbeat-50m11-Mar-2022 10:07
48M\nheartbeat-500m11-Mar-2022 10:07 477M\n\n\n\n"

can anyone please tell me how I can remove tags I used below method

filenames <- gsub("<(.|\n)*?>","", filenames)

r


Solution 1:[1]

Using stringr

library(stringr)
str_remove_all(filenames, '\n')
[1] "  Index of /data  Index of /dataNameLast modifiedSizeParent Directory  - 2D/24-Feb-2021 15:57 - 3DRefl/13-Oct-2020 16:30 - 3DRhoHV/13-Oct-2020 16:30 - 3DZdr/13-Oct-2020 16:30 - ProbSevere/13-Oct-2020 16:30 - RIDGEII/11-Feb-2021 19:02 - heartbeat-50m11-Mar-2022 10:0748Mheartbeat-500m11-Mar-2022 10:07 477M"

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nad Pat