'Is there a faster way to find synonyms for a large list of taxa in R?
I have a list of about ~96,000 species names I need to collect all synonyms for. I have tried the 'taxize' package with the synonyms() function, which outputs the information I need but my list is too long for it to work properly. I have looked into the 'taxizedb' package which has been suggested as faster for some users before, but I am not sure which functions within this package will accomplish what I am trying to do.
Any suggestions would be greatly appreciated! Thanks!
Code so far:
library("taxize")
library("tidyverse")
#load in list of species (~96,000)
#vspli <- read.csv(file="AllBHLspecieslist.csv", header=TRUE) #my code
vspli <- c("Acer obtusatum", "Acer interius", "Acer opalus", "Acer saccharum", "Acer palmatum") #workable example
#Use Taxize to search for synonyms
synlist1 <- synonyms(c(vspli), db="itis", rows=1) #currently this line of code crashes before completion when using the list of 96k species
Solution 1:[1]
In case anyone comes across this later, I found the package 'taxadb' which allowed for the completion of this problem much faster. Here is the code in case it proves useful:
library(taxadb)
#create local itis database
td_create("itis",overwrite=FALSE)
allnames<-read.csv(file="AllBHLspecieslist.csv", header=TRUE)
#get IDS for each scientific name
syn1<-allnames %>%
select(Scientific.Name) %>%
mutate(ID=get_ids(Scientific.Name,"itis"))
#Deal with NAs (one name corresponds to more than 1 ITIS code) (~10k names)
syn1_NA<-as.data.frame(syn1$Scientific.Name[is.na(syn1$ID)])
colnames(syn1_NA)<-c("name")
NA_IDS<-NULL
for(i in unique(syn1_NA$name)){
tmp<-as.data.frame(filter_name(i, 'itis')[5])
tmp$name<-paste0(i)
NA_IDS<-rbind(NA_IDS,tmp)
}
#join with originial names
colnames(syn1)<-c("name","ID")
IDS<-left_join(syn1,NA_IDS,by="name") #I think its a left join double check this
#extract just the unique IDs
IDS<-data.frame(ID=c(IDS[,"ID"],IDS[,"acceptedNameUsageID"]))
IDS<-as.data.frame(unique(IDS$ID))
IDS<-as.data.frame(IDS[-is.na(IDS)])
colnames(IDS)<-"ID"
#extract all names with synonyms in ITIS that are at the species level [literally all of them]
#set query
ITIS<-taxa_tbl("itis") %>%
select(scientificName,taxonRank,acceptedNameUsageID,taxonomicStatus) %>%
filter(taxonRank == "species")
#see query
ITIS %>% show_query()
#retrieve results
ITIS_names<-ITIS %>% collect()
#filter to only those that match ITIS codes for all my species
ITIS_names<-ITIS_names %>%
filter(acceptedNameUsageID %in% IDS$ID)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | mfertakos |