Yuppiiii!!! You know how you feel, when your work is getting recognized, right? ;)
Some time back, I was working on an issue with - yes, again - with PhyloProfile, where I need to sort a list of taxa based on their taxonomy distance. From Bastian - the guy who knows everything, I got to know taxize, a greate library from rOpenSci project for playing with NCBI taxonomy database. taxize has a function called
class2tree() which create a tree object from a given list of species.
>library(taxize) > spnames <- c('Homo_sapiens', + 'Pan_troglodytes', + 'Macaca_mulatta', + 'Mus_musculus', + 'Rattus_norvegicus', + 'Bos_taurus', + 'Canis_lupus', + 'Ornithorhynchus_anatinus', + 'Xenopus_tropicalis', + 'Takifugu_rubripes') > out <- classification(spnames, db='ncbi') > tr <- class2tree(out) > plot(tr)
and this is the tree
tr we got
This tree has many unresolved splits, since
class2tree() remove all unrank levels from the taxonomy ranks leading to a missing information for separating those splits (unrank levels are ranks that are named as “no rank” on the following table).
> out$Homo_sapiens name rank id 1 cellular organisms no rank 131567 2 Eukaryota superkingdom 2759 3 Opisthokonta no rank 33154 4 Metazoa kingdom 33208 5 Eumetazoa no rank 6072 6 Bilateria no rank 33213 7 Deuterostomia no rank 33511 8 Chordata phylum 7711 9 Craniata subphylum 89593 10 Vertebrata no rank 7742 11 Gnathostomata no rank 7776 12 Teleostomi no rank 117570 13 Euteleostomi no rank 117571 14 Sarcopterygii no rank 8287 15 Dipnotetrapodomorpha no rank 1338369 16 Tetrapoda no rank 32523 17 Amniota no rank 32524 18 Mammalia class 40674 19 Theria no rank 32525 20 Eutheria no rank 9347 21 Boreoeutheria no rank 1437010 22 Euarchontoglires superorder 314146 23 Primates order 9443 24 Haplorrhini suborder 376913 25 Simiiformes infraorder 314293 26 Catarrhini parvorder 9526 27 Hominoidea superfamily 314295 28 Hominidae family 9604 29 Homininae subfamily 207598 30 Homo genus 9605 31 Homo sapiens species 9606
Bastian opened an issue in taxize github repository. At that time, I could already solve the issue (mostly) with my Perl code. But while writing the manuscript for PhyloProfile, we decided to convert the Perl code into R, so that we don’t have many scripts in many different languages in one program :-D
Eventually, I’ve not only improved the sorting result while implementing the algorithm in R (by using APE tree object), but I could also rewrite the
class2tree() function to include the full taxonomy information for creating the species tree. By that I can sucessfully reconstruct the NCBI taxonomy tree, yahooo!!
I have learned so many practical things with this first contribution. Thank you so much, Bastian ;-)