'What is the meaning of these error messages in running pivot_wider() in RStudio?
I'm a newbie in R. Is there anyone who can help me?
I import a CSV of extract of stackoverflow data from,
s <- read_csv("https://www.ics.uci.edu/~duboisc/stackoverflow/answers.csv")
Then, I separate different values in 'tags' column into rows,
ss1 <- separate_rows(ss, tags)
Then, I apply pivot_wider()
on 'tags' column,
ss2 <- pivot_wider(ss1, names_from = tags, values_from = qs)
The following error messages are shown,
Error: Internal error in
compact_rep()
: Negativen
incompact_rep()
. Runrlang::last_error()
to see where the error occurred. In addition: Warning messages: 1: Values are not uniquely identified; output will contain list-cols.
- Use
values_fn = list
to suppress this warning.- Use
values_fn = length
to identify where the duplicates arise- Use
values_fn = {summary_fun}
to summarise duplicates 2: In nrow * ncol : NAs produced by integer overflow
I have searched the different keywords in these messages but am not able to find out the overall meaning of these errors. Is there anyone who can help me? Thanks.
Solution 1:[1]
@Anoushiravan R:
Thank you very much for your kind suggestion again.
With your suggestion, I find these error messages,
> ss1 <- s %>%
+ separate_rows(tags) %>%
+ select(qs, tags) %>%
+ group_by(tags) %>%
+ mutate(id = row_number()) %>%
+ ungroup() %>%
+ mutate(tags = if_else(tags == "", "unknown", tags))
> ss2 <- ss1 %>% pivot_wider(names_from = tags, values_from = qs, names_repair = "minimal")
Error: cannot allocate vector of size 5.4 Gb
Before, I always get another error message In nrow * ncol : NAs produced by integer overflow
.
Then, I google In nrow * ncol : NAs produced by integer overflow
and find that it may be in relation to the console pane. See https://github.com/wrathematics/float/issues/17
Also, I remove all the objects/datasets in "global environment" and restart RS, now I get the result as yours.
As I want to include ALL columns in the result, I remove "select(qs, tags) %>%" from your suggestion with the following codes and errors,
> ss1 <- s %>%
+ separate_rows(tags) %>%
+
+ group_by(tags) %>%
+ mutate(id = row_number()) %>%
+ ungroup() %>%
+ mutate(tags = if_else(tags == "", "unknown", tags))
> View(ss1)
> ss2 <- ss1 %>% pivot_wider(names_from = tags, values_from = qs, names_repair = "minimal")
Error: Internal error in `compact_rep()`: Negative `n` in `compact_rep()`.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
In nrow * ncol : NAs produced by integer overflow
The In nrow * ncol : NAs produced by integer overflow
appears again.
I google the first major error, Error: Internal error in `compact_rep()`: Negative `n` in `compact_rep()
and cannot find a good answer.
I also try different combination with "group_by" but cannot get a satisfactory result. Anyway thank you very much for your help.
Solution 2:[2]
Ok I edited my solution, I hope this is something you were looking for. This time I used separate_rows
as per your suggestion to separate the values stacked in every rows in tags
column. Run the following code and then let me know if there is anything else you need.
s %>%
separate_rows(tags) %>%
select(qs, tags) %>%
group_by(tags) %>%
mutate(id = row_number()) %>%
ungroup() %>%
mutate(tags = if_else(tags == "", "unknown", tags)) %>%
pivot_wider(names_from = tags, values_from = qs, names_repair = "minimal")
# A tibble: 68,384 x 10,522
id php error gd image processing lisp scheme subjective clojure cocoa touch
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 0 0 0 0 10 10 10 10 0 0
2 2 0 0 0 0 0 10 10 10 10 0 0
3 3 1 0 1 1 1 10 10 10 10 0 0
4 4 1 2 0 1 1 10 10 10 10 1 1
5 5 1 2 0 1 1 10 10 10 10 0 0
6 6 2 2 1 1 1 10 10 10 10 1 1
7 7 2 2 1 0 1 10 10 10 10 1 1
8 8 2 2 0 0 1 10 10 10 10 3 3
9 9 0 2 0 0 1 10 10 10 10 3 3
10 10 0 2 0 0 1 10 10 10 10 3 3
# ... with 68,374 more rows, and 10,510 more variables
Since data here is a bit heavy I suggest you first run the code until pivot_wider
and then run pivot_wider
line. I don't know why but only in this way I get the desired output otherwise I receives an error.
Solution 3:[3]
This is a bug in R, or a limitation, whatever we call it there is no direct solution for it. This is the essence of the error:
`a <- 1000000L
b <- 2000000L
a * b`
It yields NA
with a warning: In a * b : NAs produced by integer overflow
I have circumvented the issue by a new approach, not as neat as direct as using separate_row()
and then `pivot_longer(), but it works!
This is the idea:
- find all the unique (hash)tags save them in a vector
- loop through the vector and
str_detect()
the elements in the original text - You will have a logical vector for each tag as the result of 2,
bind_cols()
them.
Actually 2&3 are implemented in a loop.
For 1, you can use the separate_row()
and then distinct()
the tags column, then pull it out of the tbl.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | sspoldtwo |
Solution 2 | |
Solution 3 | Shaahin |