miércoles, enero 30, 2008

Taxonomy: its time to change!

I just read a post by Christopher Taylor about a 'taxonomic problem' with Drosophila melanogaster. The curiosity of the case prompt me to made this post, that I have in my mind for a long time.

One of the main advantages of a classification is the information retrieval. Biological classification allows to find relationships and characters of a determinate specie. There is a set laws to manage the classification, know as the codes (they are different codes for animals, plants, bacteria, etc.).

In this age of intent and massive databases, one could thing that taxonomy are ready to made a direct jump, but unfortunately that seems not to be the case. The codes are really old ones, and many taxonomist are afraid of changing the rules would collapse the system.

I do not think that taxonomist would leave the data designers to propose a new system, but I think that a deep collaboration, coupled with some changes in the code structures will speed-up taxonomic research. The main objective of taxonomist is to describe and classify species, not to be lawyers!

I identify some problems that I think that be are the most important.

Linnean names

Linnean names are nice, they provide a some form of order in a time of great explorations around the world (XVIII-XIX century), when europeans became surprised with the diversity around the world. Many things are change from that time. By now we have several phylogenetic methods, which show how arbitrary a clade name would be (I like this post about the subject), and of course, the tree of life have far more divisions than Linnean ranks. Now we have databases to store names and search algorithms and find it quickly.

A rank-free taxonomy seems to be more adequate to store our information about the phylogeny of species. That is not an embrace of 'phylocode', as I prefer a taxonomy based on specific characters, or better a combination of characters and topology, instead of a topology based taxonomy (note that apomorphy based definition are also based on topology). Also, there is an special utility from ranked names: they can serve as Landmarks when browsing the tree of life.

The nested nature of biological classification allows a rank free taxonomy without the pains imagined by [1], and using “landmarks” helps in search--and writing--of abstracts, titles of papers, and of course, the search of a particular clade, as you can see using GenBank. Of course, genus and species can be (and I think would be) of obligatory nature. That allows a continuity with Linnean taxonomy.

If linnean names are more landmarks, then many of the laws for synonyms, and so one, need to be changed to a more practical usage.

Synonyms

Perhaps, the most annoying characteristic of taxonomy is the synonymy, rank changes, and several related problems. This problems are actually a burden of actual codes, and can be changed with changing the laws, without harming the actual classification.

Synonyms are really bad for databasesas the same entity is labeled with two different names, or (worst!) a same name is applied to different entities, in other cases, the range of the synonymy seems to be overlapping.

Of course, no matter what phylocoders say, the same problem applies to their 'phylogenetic' names: as new revision is published, the names continue but with completely different meaning. At least in traditional taxonomy, it is possible to reject some names.

It seems better to relax some of the naming rules, so a new classification would be clearly different from the original one. If a huge family is discovered to be massively paraphyletic, I think that it is no point allowing the original name to survive, it is simply an brutally wrong name.

For example, “reptiles” for a long time include many steam ammniotes, therapsids “mammal-like reptiles”, anapsids (as turtles), lepidosaurs (lizards and snakes) and crocodyles. They are amniotes but exclude birds and mammals. I can see a reason to retain that ugly name. It is simple, synonymize it with their monophyletic equivalent (Amniota), and never more use it. There is no way that a modern paper allows some confusion with the old ones. The only valid use of the name is when someone shows that the original reptiles, are monophyletic.

Names can be used instead to show different possible classifications. For example, the different arrangements of Arthropoda receive different names: Atelocerata (Myriapods, like centipede, and insects) vs. Pancrustacea (Crustaceans and insects), Mandibulata (Myriposd, crustaceans and insects) vs. schizoramia (crustaceans and arachnids). These names identify different entities, each one attached to a different phylogenetic proposal (note that form phylocoders, atelocerata, pancrustacea and mandibulata can be the same entity!).

Types

The reptile and arthropod example, are possible because there are no types fixing the names of that groups. At family levels, family names, or genus names are ruled by typification of names. Which allows more confusion, solutions to actual research.

There is an example: Lygaelidae was a large family of bugs (Insecta: Heteroptera), for long time it was believed that it was paraphyletic [2], but this was only demostrated by Henry [3]. He propose a new whole classification of Lygelids, elevating to family range no less than 7 subfamilies, and restructuring the meaning of Lygaelidae to only 3 subfamilies. Is the new Lygaelidae the same of, say 20 or 30 years ago? Of course not. Then typification was creating name stability (as is created by topology by phylocoders) but at cost of the loss of name utility.

As in the case of reptiles, there are no Lygaelidae any more. Any new reference i the litarature to 'Lygalelidae' only confuses with the initial meaning of the group (which is a synonym of the superfamly Lyageoidea). Of course you can use 'Lygaeidae sensu Henry' but it is only a clumsy (and error prone) way to give a new name.

Another nice example was provided by the previously mentioned post of Chris. It is about Drosophila. In this case, the usage can be against the taxonomic practice. For m, the solution is simple: no more Drosophila. But as there is a huge number of users of the name, that surely don't care about taxonomy, that outnumbered the number of Drosphilic taxonomic publications, there is when a commission can rule. The practical solution is to maintain Drosophila to the molecular people. Then solutions in cases of conflict would be guided by practical options, rather than some old described type.

This, of course, allows to made classification changes without 'using the types' (as far as the original characters were examined!). And free taxonomist to depend on some poorly known species (or even, specimen) to nominate genera and families.

Speaking about types...

Type specimens have a particular property: they are in the first world, but they are collected in the third world. The reason for this is historical, but their consequences are seen more acute today. Museums are measured by their amount of 'type specimens', and there are particular politics to borrow that specimens (only borrowing one on time, certifications, curators permission...). Also, some taxonomist, specially from the old past, simply nominate a wonderful amount of new species, only giving a superficial description (some color, some illustrations of genital parts) and based the whole 'description' on a type species designation. This old practice continued in several obscure papers in the third world.

Then typing, although seems to be a reasonable way to be objective, is more harming I guess that typing will never gone, but at the moment there are several nice perspectives to free from borrowing politics. Approaches like [4] with a great emphasis on characters and images, can change the situation. Researches far away from type specimenes can see high quality pics of several specimen parts. A side consequence is that the concept of type becomes lost, it is impossible to pic every part from a single specimen, and is possible that it ends destroyed, then the new typing would be more responsible, as it would be based in several different specimens.

Moreover a destypification increase general collections value, that is more the quantity and quality (e.g. fresh specimenes) of material available, than a particular specimen collected in 1816, saved from a fire in 1874, harmfully damaged by bad curation in 1903...

Actually there are many phylogenetic work without using type specimens, that is, the major bulk of molecular phylogenies, and I guess several morphological ones. I think that they do it in a very objective and testable way. If they can live without types, why classical taxonomist do not?

The data matrix

A second question from the previous section, is how a non-typified research can be objective? The answer is that instead on focusing on a particular specimen, phylogeneticists use a data matrix of taxon an characters.

A non type taxonomy enforce the use of well delimited characters, it is the only way to show the reality of the new designation. Look at some recent revisions with a phylogenetic analysis, and compare it with a revision without it (for example some of both see the pubs of AMNH). The character matrix allows to a quick examination of several characters, it is possible to see which state each character has in each taxon. New technologies (see [4]) couple specimen, characters and images for each cell entry. By default a matrix provide a multi-entry key, the identification tools are better.

There are some nice things of using a character matrix. The first, is an increasing interest in provide well defined characters [4]. As character are used for phylogenetic analysis, they would be stricter. Other characteristics like color patterns, length measurements would be restricted to a simpler description. Another advantage is that it provides a quick classification of a new species.

A nice real example was provided with the dinosaur paleontologist researches. The y publish some quick and small reports in high profile journals (like Nature or Science) with small descriptions, but as they have a great database of characters, several points of the anatomy of the new described fossil are immediately 'published', long before the detailed description in a more specialized journal.

Thinking on databasing

Of course, using a character matrix is direct consequence for storage: well defined characters and images enter smoothly in a database [4].

I think that the new challenges of the 'biodiversity crisis' as well as the 'taxonomic crisis' can be solved with a thinking of data storage. How can we store the data more efficiently? How can we link taxonomic and publication data? How changes in our knowledge about phylogeny could change the previous publication data, how the harm can be minimized?

It is important to a new taxonomy to keep the great advances made from Linneaus times, a start from the scratch is clearly a wrong solution. But also taxonomist would be able to made some concessions in their practice, and update it to new data architecture of the world.

It is time that taxonomy became a useful discussion about actual data, facing the massive extinction that the man is producing around the world, it seems weird that a taxonomic study would began searching for old papers from XVIII century, which only utility is that they provide a name, descriptions, characters and other things from that papers are of low value (by the way, taxonomy is the only field of science that continue using such old data. For historians old text are the source of investigation, for taxonomist is a more lawyer-like activity of searching for an 'old case'. Catalogs are some nice curiosities, and surely valuable for historians, but what is their actual value for taxonomist? They are important only because points to papers that establish a name).

If taxonomy is the main objective, then useful data storage is the main objective. Book keeping and law courts are not part of knowing biodiversity.

References
[1] Dominguez, E., Wheeler, Q. 1997. Taxonomic stability is ignorance. Cladistics 13: 367-372. doi: 10.1111/j.1096-0031.1997.tb00325.x
[2] Schuh, R. T., Slater, J. A. 1995. True Bugs of the World. Cornell Univ. New York
[3] Henry, T. J. 1997. Phylogenetic analysis of family groups within the infraorder Pentatomomorpha (Hemiptera: Heteroptera), with emphasis on the Lygaeoidea. Annals of the Entomological Society of America 90: 275-301
[4] Ramírez, M. J. et al. 2007. Linking of digital images to phylogenetic data matrices using a morphological ontology. Systematic Biology 56: 283-294. doi: 10.1080/10635150701313848

viernes, enero 18, 2008

DataTube? I can't wait!!

I just read a wonderful news at WiredScience, Google will be hosting open scientific data on the web [http://research.google.com]! WiredScience says that the interface will be similar to the one from YouTube, with annotations and comments.

I can't wait to see many morphological matrices, and morphological pics! I think that the excellent proposal of Ramírez et al. [1] can be coupled with that project :).

[1] Ramírez, M. J. et al. 2007. Linking of digital images to phylogenetic data matrices using a morphological ontology. Systematic biology 56: 283-294. doi: 10.1080/10635150701313848