'Multiples rows to one row in R [closed]
In R, I have a data frame with several values. I would like to have a data frame that transforms the data frame into a data frame with just one row with all the values. I have a data frame like this:
df <- data.frame(A = c("time", "time", "time"),
B = c("place", "place", "place"),
C = c(NA, 1, NA),
D = c(NA, NA, 2),
E = c(3, NA, NA),
`F` = c(4,4, NA),
G = c(NA, 5, NA))
A B C D E F G
1 time place NA NA 3 4 NA
2 time place 1 NA NA 4 5
3 time place NA 2 NA NA NA
And I want a dataframe like this:
A B C D E F G
1 time place 1 2 3 4 5
I tried using the function reshape like here: Turning one row into multiple rows in r
And tried the function unique but than I lose a lot of data: Selecting unique rows in matrix using R
Solution 1:[1]
We can group by 'A', 'B' and select the first non-NA element across
other columns
library(dplyr)
df1 %>%
group_by(A, B) %>%
summarise(across(everything(), ~ .[order(is.na(.))][1]), .groups = 'drop')
-output
# A tibble: 1 x 8
# A B C D E F G H
# <chr> <chr> <int> <int> <int> <int> <int> <lgl>
#1 time place 1 2 3 4 5 NA
Or with coalesce
library(purrr)
df1 %>%
group_by(A, B) %>%
summarise(across(everything(), ~ reduce(., coalesce)), .groups = 'drop')
data
df1 <- structure(list(A = c("time", "time", "time"), B = c("place",
"place", "place"), C = c(NA, 1L, NA), D = c(NA, NA, 2L), E = c(3L,
NA, NA), F = c(4L, NA, NA), G = c(NA, 5L, NA), H = c(NA, NA,
NA)), class = "data.frame", row.names = c(NA, -3L))
Solution 2:[2]
Solution using colMeans
to account for diverging values:
1.Create test data
df <- structure(list(
A = c("time", "time", "time"),
B = c("place", "place", "place"),
C = c(NA, 1L, NA),
D = c(NA, NA, 2L),
E= c(3L, NA, NA),
F = c(4L, 4L, NA),
G = c(NA, 5L, NA)),
row.names = c(NA, -3L),
class = c("data.table", "data.frame"))
2.Use unique
for first two columns and colMeans
for the rest, cast to data.frame:
cbind(unique(df[, 1:2]), as.data.frame.list(colMeans(df[,3:7], na.rm = TRUE)))
Returns:
A B C D E F G
1: time place 1 2 3 4 5
Solution 3:[3]
You can use na.omit() with summarise, a la:
library(tidyverse)
df %>% group_by(A, B) %>%
summarise(C = mean(na.omit(C)),
D = mean(na.omit(D)),
E = mean(na.omit(E)),
F = mean(na.omit(F)),
G = mean(na.omit(G)))
Your example data has only unique values in each column C-G, so following your comment, I have used mean() to pick up the mean of non-NA observations.
data:
df <- structure(list(A = c("time", "time", "time"), B = c("place",
"place", "place"), C = c(NA, 1L, NA), D = c(NA, NA, 2L), E = c(3L,
NA, NA), F = c(4L, 4L, NA), G = c(NA, 5L, NA)), class = "data.frame", row.names = c(NA,
-3L))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | akrun |
Solution 2 | dario |
Solution 3 |