Computer cladistics / ¡Cladística a la lata!: Taxonomy: its time to change!

miércoles, enero 30, 2008

Taxonomy: its time to change!

I just read a post by Christopher Taylor about a 'taxonomic problem' with Drosophila melanogaster. The curiosity of the case prompt me to made this post, that I have in my mind for a long time.

One of the main advantages of a classification is the information retrieval. Biological classification allows to find relationships and characters of a determinate specie. There is a set laws to manage the classification, know as the codes (they are different codes for animals, plants, bacteria, etc.).

In this age of intent and massive databases, one could thing that taxonomy are ready to made a direct jump, but unfortunately that seems not to be the case. The codes are really old ones, and many taxonomist are afraid of changing the rules would collapse the system.

I do not think that taxonomist would leave the data designers to propose a new system, but I think that a deep collaboration, coupled with some changes in the code structures will speed-up taxonomic research. The main objective of taxonomist is to describe and classify species, not to be lawyers!

I identify some problems that I think that be are the most important.

Linnean names

Linnean names are nice, they provide a some form of order in a time of great explorations around the world (XVIII-XIX century), when europeans became surprised with the diversity around the world. Many things are change from that time. By now we have several phylogenetic methods, which show how arbitrary a clade name would be (I like this post about the subject), and of course, the tree of life have far more divisions than Linnean ranks. Now we have databases to store names and search algorithms and find it quickly.

A rank-free taxonomy seems to be more adequate to store our information about the phylogeny of species. That is not an embrace of 'phylocode', as I prefer a taxonomy based on specific characters, or better a combination of characters and topology, instead of a topology based taxonomy (note that apomorphy based definition are also based on topology). Also, there is an special utility from ranked names: they can serve as Landmarks when browsing the tree of life.

The nested nature of biological classification allows a rank free taxonomy without the pains imagined by [1], and using “landmarks” helps in search--and writing--of abstracts, titles of papers, and of course, the search of a particular clade, as you can see using GenBank. Of course, genus and species can be (and I think would be) of obligatory nature. That allows a continuity with Linnean taxonomy.

If linnean names are more landmarks, then many of the laws for synonyms, and so one, need to be changed to a more practical usage.

Synonyms

Perhaps, the most annoying characteristic of taxonomy is the synonymy, rank changes, and several related problems. This problems are actually a burden of actual codes, and can be changed with changing the laws, without harming the actual classification.

Synonyms are really bad for databasesas the same entity is labeled with two different names, or (worst!) a same name is applied to different entities, in other cases, the range of the synonymy seems to be overlapping.

Of course, no matter what phylocoders say, the same problem applies to their 'phylogenetic' names: as new revision is published, the names continue but with completely different meaning. At least in traditional taxonomy, it is possible to reject some names.

It seems better to relax some of the naming rules, so a new classification would be clearly different from the original one. If a huge family is discovered to be massively paraphyletic, I think that it is no point allowing the original name to survive, it is simply an brutally wrong name.

For example, “reptiles” for a long time include many steam ammniotes, therapsids “mammal-like reptiles”, anapsids (as turtles), lepidosaurs (lizards and snakes) and crocodyles. They are amniotes but exclude birds and mammals. I can see a reason to retain that ugly name. It is simple, synonymize it with their monophyletic equivalent (Amniota), and never more use it. There is no way that a modern paper allows some confusion with the old ones. The only valid use of the name is when someone shows that the original reptiles, are monophyletic.

Names can be used instead to show different possible classifications. For example, the different arrangements of Arthropoda receive different names: Atelocerata (Myriapods, like centipede, and insects) vs. Pancrustacea (Crustaceans and insects), Mandibulata (Myriposd, crustaceans and insects) vs. schizoramia (crustaceans and arachnids). These names identify different entities, each one attached to a different phylogenetic proposal (note that form phylocoders, atelocerata, pancrustacea and mandibulata can be the same entity!).

Types

The reptile and arthropod example, are possible because there are no types fixing the names of that groups. At family levels, family names, or genus names are ruled by typification of names. Which allows more confusion, solutions to actual research.

There is an example: Lygaelidae was a large family of bugs (Insecta: Heteroptera), for long time it was believed that it was paraphyletic [2], but this was only demostrated by Henry [3]. He propose a new whole classification of Lygelids, elevating to family range no less than 7 subfamilies, and restructuring the meaning of Lygaelidae to only 3 subfamilies. Is the new Lygaelidae the same of, say 20 or 30 years ago? Of course not. Then typification was creating name stability (as is created by topology by phylocoders) but at cost of the loss of name utility.

As in the case of reptiles, there are no Lygaelidae any more. Any new reference i the litarature to 'Lygalelidae' only confuses with the initial meaning of the group (which is a synonym of the superfamly Lyageoidea). Of course you can use 'Lygaeidae sensu Henry' but it is only a clumsy (and error prone) way to give a new name.

Another nice example was provided by the previously mentioned post of Chris. It is about Drosophila. In this case, the usage can be against the taxonomic practice. For m, the solution is simple: no more Drosophila. But as there is a huge number of users of the name, that surely don't care about taxonomy, that outnumbered the number of Drosphilic taxonomic publications, there is when a commission can rule. The practical solution is to maintain Drosophila to the molecular people. Then solutions in cases of conflict would be guided by practical options, rather than some old described type.

This, of course, allows to made classification changes without 'using the types' (as far as the original characters were examined!). And free taxonomist to depend on some poorly known species (or even, specimen) to nominate genera and families.

Speaking about types...

Type specimens have a particular property: they are in the first world, but they are collected in the third world. The reason for this is historical, but their consequences are seen more acute today. Museums are measured by their amount of 'type specimens', and there are particular politics to borrow that specimens (only borrowing one on time, certifications, curators permission...). Also, some taxonomist, specially from the old past, simply nominate a wonderful amount of new species, only giving a superficial description (some color, some illustrations of genital parts) and based the whole 'description' on a type species designation. This old practice continued in several obscure papers in the third world.

Then typing, although seems to be a reasonable way to be objective, is more harming I guess that typing will never gone, but at the moment there are several nice perspectives to free from borrowing politics. Approaches like [4] with a great emphasis on characters and images, can change the situation. Researches far away from type specimenes can see high quality pics of several specimen parts. A side consequence is that the concept of type becomes lost, it is impossible to pic every part from a single specimen, and is possible that it ends destroyed, then the new typing would be more responsible, as it would be based in several different specimens.

Moreover a destypification increase general collections value, that is more the quantity and quality (e.g. fresh specimenes) of material available, than a particular specimen collected in 1816, saved from a fire in 1874, harmfully damaged by bad curation in 1903...

Actually there are many phylogenetic work without using type specimens, that is, the major bulk of molecular phylogenies, and I guess several morphological ones. I think that they do it in a very objective and testable way. If they can live without types, why classical taxonomist do not?

The data matrix

A second question from the previous section, is how a non-typified research can be objective? The answer is that instead on focusing on a particular specimen, phylogeneticists use a data matrix of taxon an characters.

A non type taxonomy enforce the use of well delimited characters, it is the only way to show the reality of the new designation. Look at some recent revisions with a phylogenetic analysis, and compare it with a revision without it (for example some of both see the pubs of AMNH). The character matrix allows to a quick examination of several characters, it is possible to see which state each character has in each taxon. New technologies (see [4]) couple specimen, characters and images for each cell entry. By default a matrix provide a multi-entry key, the identification tools are better.

There are some nice things of using a character matrix. The first, is an increasing interest in provide well defined characters [4]. As character are used for phylogenetic analysis, they would be stricter. Other characteristics like color patterns, length measurements would be restricted to a simpler description. Another advantage is that it provides a quick classification of a new species.

A nice real example was provided with the dinosaur paleontologist researches. The y publish some quick and small reports in high profile journals (like Nature or Science) with small descriptions, but as they have a great database of characters, several points of the anatomy of the new described fossil are immediately 'published', long before the detailed description in a more specialized journal.

Thinking on databasing

Of course, using a character matrix is direct consequence for storage: well defined characters and images enter smoothly in a database [4].

I think that the new challenges of the 'biodiversity crisis' as well as the 'taxonomic crisis' can be solved with a thinking of data storage. How can we store the data more efficiently? How can we link taxonomic and publication data? How changes in our knowledge about phylogeny could change the previous publication data, how the harm can be minimized?

It is important to a new taxonomy to keep the great advances made from Linneaus times, a start from the scratch is clearly a wrong solution. But also taxonomist would be able to made some concessions in their practice, and update it to new data architecture of the world.

It is time that taxonomy became a useful discussion about actual data, facing the massive extinction that the man is producing around the world, it seems weird that a taxonomic study would began searching for old papers from XVIII century, which only utility is that they provide a name, descriptions, characters and other things from that papers are of low value (by the way, taxonomy is the only field of science that continue using such old data. For historians old text are the source of investigation, for taxonomist is a more lawyer-like activity of searching for an 'old case'. Catalogs are some nice curiosities, and surely valuable for historians, but what is their actual value for taxonomist? They are important only because points to papers that establish a name).

If taxonomy is the main objective, then useful data storage is the main objective. Book keeping and law courts are not part of knowing biodiversity.

References
[1] Dominguez, E., Wheeler, Q. 1997. Taxonomic stability is ignorance. Cladistics 13: 367-372. doi: 10.1111/j.1096-0031.1997.tb00325.x
[2] Schuh, R. T., Slater, J. A. 1995. True Bugs of the World. Cornell Univ. New York
[3] Henry, T. J. 1997. Phylogenetic analysis of family groups within the infraorder Pentatomomorpha (Hemiptera: Heteroptera), with emphasis on the Lygaeoidea. Annals of the Entomological Society of America 90: 275-301
[4] Ramírez, M. J. et al. 2007. Linking of digital images to phylogenetic data matrices using a morphological ontology. Systematic Biology 56: 283-294. doi: 10.1080/10635150701313848

7 comentarios:

Christopher Taylor dijo...: My experience would suggest completely the opposite - as the amount of available data increases, then the concept of the type specimen becomes all the more important. In fact, it becomes critical. Written descriptions can become inadequate or misleading. What was thought to be one taxon can become two or more. In these cases, the type can be vital to working out what the original author was dealing with. I have had to look at a number of species that had not been touched since the original description, and for which the description was completely inadequate. Without the type specimens (which, yes, were not in the best condition, but were still adequate for identification) I would have been unable to do anything. it is important to understand that the type specimen is not required to supply all the information on a species, or even to be the best exemplar of a taxon (though it's all to the good if it is) - it serves to tie the name to an actual entity.

I do agree that the issue of type specimens not being in their country of origin is a serious issue. However, I think this calls for a change in how the type specimens are handled rather than a drop in the type specimen concept altogether. The Western Australian Museum (to which I am affiliated), for instance, has a policy that type of new taxa described by its workers should, where possible, be deposited in the country of origin, and I think more institutes need to adopt such a policy. High-resolution photographs of type specimens can also free up a lot of demand on the specimen itself.

It seems better to relax some of the naming rules, so a new classification would be clearly different from the original one. If a huge family is discovered to be massively paraphyletic, I think that it is no point allowing the original name to survive, it is simply an brutally wrong name.

But the question becomes, how different does a new classification have to be to justify all new names? For instance, a number of people have complained to me about the rapidly multiplying mass of higher clade names for mammals, as each new concept receives a different name For instance, when Cetacea was found to have arisen from within Artiodactyla, we were presented with the new Cetartiodactyla. Was this better than subsuming Cetacea within Artiodactyla? If the dorr is opened for new names for minor changes in concept, we will end up with an endless confusion of overlapping and largely superfluous names.

If taxonomy is the main objective, then useful data storage is the main objective. Book keeping and law courts are not part of knowing biodiversity.

Book-keeping is an integral part of known biodiversity, because it is vital to useful data storage. Taxonomy and taxonomic concepts should be free to change because our understanding of biodiversity should be free to change. But each new review does not start from scratch - it builds upon what has been done before.; enero 31, 2008 12:06 a. m.
Mike Keesey dijo...: I agree with some of your points but not others. Regrettably, I only have time to focus on those others.

That is not an embrace of 'phylocode', as I prefer a taxonomy based on specific characters, or better a combination of characters and topology

The PhyloCode certainly allows for the latter, in the form of apomorphy-based definitions. The former is useless. Consider: "the clade of organisms with wings used for powered flight". You have to add a specimen specifier (or a proxy specimen specifier, like a species), or it's ambiguous between Avialae, Pterosauria (or a slightly more inclusive clade), "Apo-Chiroptera", and "Apo-Pterygota". (Maybe some others, I dunno.) Character specifiers simply do not work without context.

And apomorphy-based definitions have some special problems of their own. For one thing, it's often difficult to define characters as binary. (Is there a sharp line between "powered flight" and other forms of wing-assisted locomotion?) For another, many characters do not fossilize, leaving extinct taxa in limbo. (Did Microraptor engage in powered flight?) Finally, basal populations may be polymorphic. (See Gauthier and de Queiroz's [2001] discussion of Ornithurae and Confuciusornis.)

I think these sorts of unresolved definitions are okay, because they give us a goal to strive toward and a frame of reference for certain issues. But it hardly seems like a good idea to enforce that for all definitions. Often (perhaps usually, as Sereno has argued), node- and branch-based definitions are much more stable.

Of course, no matter what phylocoders say, the same problem applies to their 'phylogenetic' names: as new revision is published, the names continue but with completely different meaning.

The PhyloCode makes no bones about it -- it has a whole article on synonymy. I think there's an important difference between rank-based synonymy and phylogeny-based synonymy, though. Rank-based synonymy involves a subjective element. Two researchers may agree in every detail on the phylogeny of apes and humans and still have differing uses of "Hominidae". But under a phylogeny-based code, agreement on phylogeny means agreement on nomenclature.

As for revision to phylogeny -- the only ways to get a completely stable system are A) mandate content and never let it change, no matter what we discover, or B) become omniscient. B is impossible and A is highly undesirable. So this kind of instability is actually a good thing.

And in the latest draft, there are some provisions for emending definitions, in cases where this kind of instability leads to larger problems. Some types of emendation ("second-order") require CPN approval, while others ("first-order") do not.

These names identify different entities, each one attached to a different phylogenetic proposal (note that form phylocoders, atelocerata, pancrustacea and mandibulata can be the same entity!).

Not necessarily true--definitions can be given restrictive clauses or otherwise formed in such a way as to render them null under certain hypotheses. For example, you could define Atelocerata as "the final common ancestor of [insert insect species] and [insert myriapod species] and all descendants thereof, provided that it does not include [insert decapod species] or [insert spider species]."

As for types--specimens are necessary to taxonomy, if not as types then as specifiers. As my powered flight example shows, you simply cannot create a taxonomic definition without some sort of reference to actual organisms.; enero 31, 2008 1:28 a. m.
Mike Keesey dijo...: Whoops, it's not "first-order" and "second-order" emendations. (I think those were Sereno's original terms for the concepts, though.) It's "restricted" (requiring approval) and "unrestricted" (not requiring approval--basically bookkeeping).

Some relevant links to the PhyloCode:
Arts. 15.8-15.15 (Emendations)
Art. 14 (Synonymy)
Art. 11.8 (qualifying clauses and potentially inapplicable definitions)
Art. 9.10 (rule on interpreting apomorphy specifiers); enero 31, 2008 1:44 a. m.
Salva dijo...: First of all thanks for the comments :)

Chris, you made a good point that I totally miss, and it is about the old typified material and descriptions ;). I hope that actual peer review standards, a more phylogeny oriented taxonomy and more quality images on i-net can help to produce better descriptions, I think that we need more time to examining new data, instead of a continuous re-examination of old material--of course, reexamination is always a good thing, but I think that by the moment, we had few time to do it :'(--.

It is an excellent news to know that there are some institutions that actually care about type deposition in the original country :)

Mike, I feel a somewhat clashing feelings about phylo-code, some days I love it, other days I hate it... For me the worst thing is the ambiguity of names, that, I think create a false idea of stability. Names with a possible null set, are for me a better option than stem-, node- or apo- based definitions.

I hope that people who works on the phylo-code, go more in that direction ;).; febrero 01, 2008 10:20 p. m.
Mike Keesey dijo...: For me the worst thing is the ambiguity of names, that, I think create a false idea of stability.

Well, the names are unambiguously attached to a definition, which may refer to ambiguous content. In the transitional system, names are unambiguously attached to ambiguous definitions, which always refer to ambiguous content. Surely the PhyloCode is a step up? (And if you can think of something better, I'd like to hear it. No, phenetics is not better.)

Names with a possible null set, are for me a better option than stem-, node- or apo- based definitions.

There's no difference between them. Consider this branch-based (="stem-based") definition, taken from Art. 11.8 Example 3:

"Halecostomi [is] the most inclusive clade containing Amia calva Linnaeus 1766 and Perca fluviatilis Linnaeus 1758 but not Lepisosteus osseus Linnaeus 1758."

This is a straightforward branch-based definition except that it has two internal specifiers. If L. osseus is actually descended from the final common ancestor of A. calva and P. fluviatilis, then Halecostomi is null.

For the other definitional types, it's possible in theory to create null definitions without qualifying clauses, but in practice this would almost certainly never happen. But any type of definition can be opened to nullification by a qualifying clause. My Atelocerata example is a node-based definition with a qualifying clause.

And even if the original definition does not include a qualifying clause, an unrestricted emendation may be published giving it one, without having to go through CPN approval. See Arts. 15.11-15.15 for more on unrestricted emendations.

Remember, the PhyloCode is still in draft form, and [valid] criticisms are welcomed. (There's a open meeting this summer where more details may be hammered out, in fact.); febrero 01, 2008 10:50 p. m.
Mike Keesey dijo...: Err, "traditional (i.e., rank-based) system", not "transitional system".; febrero 01, 2008 11:04 p. m.
Anónimo dijo...: mexico viagra viagra 100mg cheap viagra canada viagra covered by insurance mexico viagra viagra and hearing loss viagra in the water cheap herbal viagra viagra for sale without a prescription mexico viagra viagra alternative viagra online uk viagra vs cialis soma and viagra prescriptions free viagra; noviembre 08, 2009 4:51 a. m.