'Is there an R function for setting rows on aggregate data?

The data I am working with is from eBird, and I am looking to sort out species occurrence by both name and year. There are over 30k individual observations, each with its own number of birds. From the raw data I posted below, on Jan 1, 2021 and someone observed 2 Cooper's Hawks, etc.

Raw looks like this:

specificName   indivualCount  eventDate  year
Cooper's Hawk    1    (1/1/2018)   2018
Cooper's Hawk    1    (1/1/2020)    2020
Cooper's Hawk    2    (1/1/2021)    2021

Ideally, I would be able to group all the Cooper's Hawks specificName by the year they were observed and sum the total invidualcounts. That way I can make statistical comparisons between the number of birds observed in 2018, 2019, 2020, & 2021.

I created the separate column for the year
year <- as.POSIXct(ebird.df$eventDate, format = "%m/%d/%Y") ebird.df$year <- as.numeric(format(year, "%Y"))

Then aggregated with the follwing:
aggdata <- aggregate(ebird.df$individualCount , by = list( ebird.df$specificname, ebird.df$year ), FUN = sum)

There are hundreds of bird species, so Cooper's Hawks start on the 115th row so the output looks like this:

  Group.1   Group.2    x
115   2018  Cooper's Hawk  86
116   2019  Cooper's Hawk  152
117   2020  Cooper's Hawk  221
118   2021  Cooper's Hawk  116

My question is how to I get the data to into a table that looks like the following:

Species Name   2018 2019 2020 2021
Cooper's Hawk   86   152  221  116

I want to eventually run some basic ecology stats on the data using vegan, but one problem first I guess lol
Thanks!



Solution 1:[1]

There are errors in the data and code in the question so we used the code and reproducible data given in the Note at the end.

Now, using xtabs we get an xtabs table directly from ebird.df like this. No packages are used.

xtabs(individualCount ~ specificName + year, ebird.df)
##                year
## specificName    2018 2020 2021
##   Cooper's Hawk    1    1    2

Optionally convert it to a data.frame:

xtabs(individualCount ~ specificName + year, ebird.df) |> 
  as.data.frame.matrix()
##               2018 2020 2021
## Cooper's Hawk    1    1    2

Although we did not need to use aggdata if you need it for some other reason then it can be computed using aggregate.formula like this:

aggregate(individualCount ~ specificName + year, ebird.df, sum)

Note

Lines <- "specificName,individualCount,eventDate,year
\"Cooper's Hawk\",1,(1/1/2018),2018
\"Cooper's Hawk\",1,(1/1/2020),2020
\"Cooper's Hawk\",2,(1/1/2021),2021"
ebird.df <- read.csv(text = Lines, strip.white = TRUE)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1