viernes, abril 18, 2008

Vouchers, types and specimens

Prompted by the closure of Utrech Herbarium, Chris Taylor write two interesting posts about the nature of type specimens and vouchers (of molecular studies). In both cases Chris remarks about the importance of a comparison specimen to resolve some taxonomical problems.

In a more implicit way, the post about vouchers remarks how molecular tools can be very important, but in a some way or another, to made a proper 'molecular taxonomy' you need to rely on a proper morphological (classic) taxonomy.

I a post that I made some time ago, Chris point me some of that remarks, and actually I think that my disgust with type specimens is not the types, but the bad practice around it, specifically, that many specialist from old days (and maybe some actual ones), propose hundreds of new species based on telegraphic descriptions.

The other thing about type specimens that I do not like, is that it seems that there is an 'single specimen taxonomy', I thing that every specimen in a collection is equally valuable. For me, most important than the type speciemen information is the 'examined specimens' list, that usually include individuals of different sexes, and ontogenetic states.

But I think that new generation of taxonomists are moved into the right direction. First there are new approaches intended to store and provide access to information on an specimen basis, I specially like [1], even if the data are not made public (I prefer public data, but I understand that a taxonomist want to keep his/her data with him after a long work of several years, for at least the same amount of time!), it allows a quick reexamination of material.

The second point, but sadly it is not very popular in third world countries (and even in first world ones), is that every, I repeat every taxonomic contribution would be made under a rigurous phylogenetic framework. That is, character discussion, published data matrix, and phylogenetic tree-based classification.

Character discussions allows a more objective definition of the examined characters, as inside-study coherent terminology, because phylogenetic characters would be the same character across several species.

Data matrix, shows in a easy readable form (specially if you have a phylogenetic data editor, there are several free on the net) several characters examined for a particular terminal (that is a set of examined specimens), which shrink the usually lengthly (and bored to read) description section of the paper. That part can be reserved to particular characters that are not used under the cladistic analysis (for example, colors, measurements, proportions), or particular character scorings (explaining some particular scoring), biological information (such distribution, ecological aspects), links to figures/pictures (electronic or in print) and the list examined specimens. When a character is scored, the author of the matrix is saying that he/she view that character in the species, it also includes a list of non-seen characters (usually typed as '?'). Note that the matrix is an excellent way to fuse the information stored with the several specimens examined, so linking data from the specimens and from the matrix is direct [1].

Finally a cladogram-based classification, allows to maintain the information from the analysis, tightly binded with the taxonomy. If someone proposes a new genus, it will show that the difference with other ones is a really different group, and not a highly appomorphic group inside a previous established genus. A classification need evidence that support each proposed taxon, and the cladogram is the well know form to found evidence of grouping.

[1] Ramírez, M. J. et al. 2007. Linking of digital images to phylogenetic data matrices using a morphological ontology. Systematic Biology 56: 283-294. doi: 10.1080/10635150701313848

miércoles, abril 16, 2008

We can't get characters, but we can get states

Another piece for the seminaries, this time, about phylogenetics

Ramírez, M. J. 2007. Homology as a parsimony problem: a dynamic homology approach for morphological data. Claditics 23: 588-612. DOI: 10.1111/j.1096-0031.2007.00162.x

I read it few time after it was posted early on line, and I don't want to talk about it, but as it was proposed for the seminar, I put my own view about the paper.

Homology for some morphological structures is, sometimes, straightforward within a group, but, as we move to more inclusive scopes, the interpretation becomes blurred. For example, we know well that the legs from insects are all the same legs, also we know that the joint-legs are homolog within all arthropoda, but which is the equivalent for the pair-2 of insect legs in myriapoda? In vertebrates, the homology of cranial bones is nearly direct in each 'class', but comparison of cranial bones in fishes (specially the fossil ones) with the cranial bones is fairly complicated.

So Matín [Ramírez] give us two-step ways to deal with such cases. The first step, is a formalization of the classic way to deal with characters, a comparison with the possibles states, and its implications, but he puts under lights that the whole decision would be made in a context that evaluates several possible alternatives, and set an specific cost to each one, as a way to chose among the possible alternatives, the most parsimonious one is preferred. In this vein, their work is very similar to Agnarsson and Coddington [1], and in my opinion more easy to grasp.

But in [1] you make the chose and then, go to the standard cladistic analysis. Martín does not make the decision, he wants that the simultaneous analysis, selects the best possible arrangements, in a framework directly derived from molecular 'dynamic homology' [2, 3]. Under an strict dynamic framework each topology would indicate an specific arrangement for the morphology, but as Martín notes, in a direct difference among DNA, not all arrangement can be valid. Then he limits their scope to a set of previously defined morphological 'alignments' and choose the most parsimonious one.

Although Martín description of the problem is more adequate than [1], Agnasson and Coddigton are far better in leaving homology decisions and parsimony analysis separated. When choosing homologous characters, the main objective is to found characters that are the same, you can use several tools of the morphological analysis to do it. If there are some doubts, then it seems better to leave potential unions separated, or fused but with a lesser weight than the other, well established characters [4]. You can use a particular weighting schema to found the homologs, but it is not necessary to use the same in the construction of the cladogram.

As is seen in every character discussion, you can have a plenty of reasons to decide about a character (sometimes, such discussion includes what happens with alternative codings), but claiming that the choose was made because it fits with the best cladogram found... it seems not to be a good reason.

And it is not a good reason! Why? Because a character claim based solely on the cladogram, is just like homoplasy, you can only spoke about it because of the cladogram, then it is an ad hoc hypothesis [5]. 'Dynamic homology' in the molecular sense, or in the morphological one proposed by Ramírez are both ad hoc. It is not a coincidence that Martín found that under his method, the justification of parsimony of minimization of ad hoc hypothesis is not easily followed, and then, methods based directly on homoplasy, like implied weights [6] produce estrange results.

I think that the paper have a great value for its first part, and can be integrated with the proposal of [1]. But as most of the justifications of 'dynamic homology', Martín trades a fully coherent minimization of ad hoc hypothesis of homoplasy [5] with 'minimization of steps' .

[1] Argarsson, I., Coddington, J.A. 2007. Quantitative tests of primary homology. Cladistics 24: 51-61, DOI: 10.1111/j.1096-0031.2007.00168.x
[2] Wheeler, W.C. 1996. Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics 12: 1-9. DOI: 10.1111/j.1096-0031.1996.tb00189.x
[3] Wheeler, W.C. et al. 2006. Dynamic homology and phylogenetic systematics: an unified approach using POY. AMNH, New York. Freely available: http://research.amnh.org/scicomp/pdfs/wheeler/Wheeler_etal2006b.pdf
[4] Neff, N. 1986. A rational basis for a priori character weighting. Syst. Zool. 35: 110-123. JSTOR link: http://www.jstor.org/pss/2413295
[5] Farris, J.S. 1983. The logical basis pf phylogenetic analysis. In: Advances in Cladistics, vol. 2 (Platnick, N.I., Funk, V.A., Eds.). Columbia, New York vol 2. Pp. 7-36.
[6] Goloboff, P.A. 1993. Estimating character weights during tree search. Cladistics 9: 83-91. DOI: 10.1111/j.1096-0031.1993.tb00209.x

Addendum
Of course I do not deny the role of previous analyses and the checking of different alternative codings. That forms part of the tools from which morphologist made their homology desitions.

lunes, abril 14, 2008

C--

Bueno, el post esta más relacionado con la programación que con la biogeografía... pero si les interesa, ahí puse algunas de mis experiencias recientes con el C++...