'fread does not read character vector

I am trying to download a list using R with the following code:

name <- paste0("https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx")
master <- readLines(url(name))
master <- master[grep("SC 13(D|G)", master)]
master <- gsub("#", "", master)
master_table <- fread(textConnection(master), sep = "|")

The final line returns an error. I verified that textConnection works as expected and I could read from it using readLines, but fread returns an error. read.table runs into the same problem.

Error in fread(textConnection(master), sep = "|") :  input= must be a single character string containing a file name, a system command containing at least one space, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or, the input data itself containing at least one \n or \r

What am I doing wrong?



Solution 1:[1]

1) In the first line we don't need paste. In the next line we don't need url(...). Also we have limited the input to 1000 lines to illustrate the example in less time. We can omit the gsub if we specify na.strings in fread. Also collapsing the input to a single string allows elimination of textConnection in fread.

library(data.table)

name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
master <- readLines(name, 1000)
master <- master[grep("SC 13(D|G)", master)]
master <- paste(master, collapse = "\n")
master_table <- fread(master, sep = "|", na.strings = "")

2) A second approach which may be faster is to download the file first and then fread it as shown.

name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
download.file(name, "master.txt")
master_table <- fread('findstr "SC 13[DG]" master.txt', sep = "|", na.strings = "")

The above is for Windows. For Linux with bash replace the last line with:

master_table <- fread("grep 'SC 13[DG]' master.txt", sep = "|", na.strings = "")

Solution 2:[2]

I'm not quite sure of the broader context, in particular whether you need to use fread(), but

s <- scan(text=master, sep="|", what=character())

works well, and fast (0.1 seconds).

Solution 3:[3]

I would like to add the final solution, which was implemented in fread https://github.com/Rdatatable/data.table/issues/1423. Perhaps this also saves others a bit of time.

So the solution becomes simpler:

library(data.table)

name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
master <- readLines(name, 1000)
master <- master[grep("SC 13(D|G)", master)]
master <- paste(master, collapse = "\n")
master_table <- fread(text = master, sep = "|")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 hannes101