'How to look at differences between 2 columns in R
I just need to write some code that will look at the difference between the "est_age" and "known_age" columns in my data set. Then I need to know what percentage were an exact match and what percentage were 1 different, 2 different, and 3 different
Fish_id Reader est_age Known_age
1 BKF 5 7
2 BKF 7 16
3 BKF 4 5
4 BKF 12 12
5 BKF 6 10
6 BKF 5 6
7 BKF 8 12
8 BKF 5 5
9 BKF 7 7
10 BKF 6 7
11 BKF 8 8
12 BKF 5 7
13 BKF 5 7
14 BKF 5 14
15 BKF 6 6
16 BKF 6 7
17 BKF 6 6
18 BKF 6 5
19 BKF 15 18
20 BKF 8 7
21 BKF 7 4
22 BKF 8 12
23 BKF 7 8
24 BKF 9 7
25 BKF 5 8
26 BKF 11 23
27 BKF 6 5
28 BKF 4 4
29 BKF 6 7
30 BKF 7 12
31 BKF 6 6
32 BKF 5 5
33 BKF 5 8
34 BKF 11 10
Solution 1:[1]
I'm not quite sure what you're trying to accomplish. Remember the more precise your question is, the more helpful answer you get. Also try to provide an easy to use reproducible example (see r-package reprex).
You could have provided your data in a format like this:
df2 =
structure(list(Fish_id = 1:34,
Reader = rep("BKF", 34),
est_age = c(5L, 7L, 4L, 12L, 6L, 5L, 8L, 5L, 7L, 6L, 8L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 15L, 8L, 7L, 8L, 7L, 9L, 5L, 11L, 6L, 4L, 6L, 7L, 6L, 5L, 5L, 11L),
Known_age = c(7L, 16L, 5L, 12L, 10L, 6L, 12L, 5L, 7L, 7L, 8L, 7L, 7L, 14L, 6L, 7L, 6L, 5L, 18L, 7L, 4L, 12L, 8L, 7L, 8L, 23L, 5L, 4L, 7L, 12L, 6L, 5L, 8L, 10L)),
row.names = c(NA, -34L), class = c("tbl_df", "tbl", "data.frame"))
The task is easily done with the data-wrangling package "dplyr":
library(dplyr)
df %>%
mutate(diff = abs(Known_age - est_age)) %>% # calculate (absolute) differences
count(diff) %>% # count number of observations per difference
mutate(n_rel = scales::percent(n / sum(n))) # calculate percentage
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | MarkusN |