'fread does not read character vector
I am trying to download a list using R with the following code:
name <- paste0("https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx")
master <- readLines(url(name))
master <- master[grep("SC 13(D|G)", master)]
master <- gsub("#", "", master)
master_table <- fread(textConnection(master), sep = "|")
The final line returns an error. I verified that textConnection
works as expected and I could read from it using readLines
, but fread
returns an error. read.table
runs into the same problem.
Error in fread(textConnection(master), sep = "|") : input= must be a single character string containing a file name, a system command containing at least one space, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or, the input data itself containing at least one \n or \r
What am I doing wrong?
Solution 1:[1]
1) In the first line we don't need paste
. In the next line we don't need url(...)
. Also we have limited the input to 1000 lines to illustrate the example in less time. We can omit the gsub
if we specify na.strings
in fread
. Also collapsing the input to a single string allows elimination of textConnection
in fread
.
library(data.table)
name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
master <- readLines(name, 1000)
master <- master[grep("SC 13(D|G)", master)]
master <- paste(master, collapse = "\n")
master_table <- fread(master, sep = "|", na.strings = "")
2) A second approach which may be faster is to download the file first and then fread
it as shown.
name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
download.file(name, "master.txt")
master_table <- fread('findstr "SC 13[DG]" master.txt', sep = "|", na.strings = "")
The above is for Windows. For Linux with bash replace the last line with:
master_table <- fread("grep 'SC 13[DG]' master.txt", sep = "|", na.strings = "")
Solution 2:[2]
I'm not quite sure of the broader context, in particular whether you need to use fread()
, but
s <- scan(text=master, sep="|", what=character())
works well, and fast (0.1 seconds).
Solution 3:[3]
I would like to add the final solution, which was implemented in fread
https://github.com/Rdatatable/data.table/issues/1423.
Perhaps this also saves others a bit of time.
So the solution becomes simpler:
library(data.table)
name <- "https://www.sec.gov/Archives/edgar/full-index/2016/QTR1/master.idx"
master <- readLines(name, 1000)
master <- master[grep("SC 13(D|G)", master)]
master <- paste(master, collapse = "\n")
master_table <- fread(text = master, sep = "|")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | |
Solution 3 | hannes101 |