sábado, diciembre 22, 2007

Homology and parsimony

Today is a transaltion day ;). This post is a translation of my previous post in spanish about the subject.

This post is somewhat inspired in a discussion by Ebach and Williams in their blog systematics & biogeography (they had many post about the subject!). Here I want to show the relationship between parsimony (or cladistic analysis) and homology.


It is a long tradition of discussion about the definition of homology. I use a definition similar to the traditional one, then two structures are homologs when both are considered the same structure in different organism. There are countless arguments to acept two structures as being the same: God's plan, natural order, morphotype, or the one I endorse, by common ancestry.

If two structures are homologs, it implies that they are the same structure, and they are inherited from a common ancestor of the two examined organims. Inheritance implies several things by definition: specially from genetics and development biology. Moreover, the structures could not be highly similar, even if they are the same, because they can be strongly modified. Nevertheless both are the same, so it implies that structures change across the time, in our assumptions include several process that promote the differentiation, such as genetic interaction and population dynamics.

Cladistic analysis do not have a direct interest in these phenomena associated with homology. The process are of interest in other fields like evolutionary biology, development biology, genetics, molecular biology, population genetics, ecology, etc. But lack of interest do not imply that they could be ignored! In the character definition for a cladistic analysis (i.e selection of homolog characters) several of those factors would be taken into account to propose character limits and character codification.

Unfortunately, as process is not of direct interest for cladistics allows the erroneous idea that the process is irrelevant in character definition. The the so called pattern cladists (as Nelson and Platnick [1] and recently Pleijel [2], Brower [3] and Ebach and Williams) to think that cladistics is free of evolutionary thinking, but it is a totally wrong position: the use of homolog characters implies a framework based on origin by common ancestry, the every character--and not only its apomorphic state--are the same structure [4, 5].

But it is more, implications of evolution are big for several characters, specially if there is a well knowledge of the character, the the idea of Kluge [6] who claims that only assumption of cladistic analysis is 'descent with modification' is also wrong. When you include inheritance and evolution, many things are included in the definition of character. Of course, for some characters we only got a morphological knowledge of the character, then assuming a simple 'descent with modification' seems to be correct, but for other characters (as in many vertebrates) we got knowledge from development and genetics of the structure, in such cases the assumption are far more complex than descent with modification.

Parsimony: the algorithm

The basic principle of parsimony algorithm is fairly simple. If you want to know the character state in the node 'x', which descendants are 'y' and 'z', and we know their character states, then the character state of 'x' is the intersection between states of 'y' and 'z', if the intersection is void then the state of 'x' is the union of 'y' and 'z' states. Using an union implies a character change (a step). Of course, optimization of states is more complex, but for the present discussion we only need the basic steps.
State assignation using the parsimony algorithm: (A) and (B) the state of ancestor 'x' is equal to the intersection of descendants 'y' and 'z', i this case, white; (c) 'y' and 'z' do not share any state, then the state assigned to 'x' is the state union of his descendants.

As you see, the algorithm is 'independent' of the data used, you can use any kind of character (in computer science this problem is know as the coloring problem, and the 'characters' are colors of a map) and any kind of terminal. It is a common character of every method formalized in an algorithmic form. Then it is necessary to provide a proper justification to use the algorithm in a particular problem.

Parsimony: cladistic analysis

With the evolutive concept of homology, its union with the algorithmic parsimony is direct. If a character, no matter its state, is the same between two organisms that share a common ancestor, it implies that the character is inherited from the common ancestor, then the common ancestor would have the character.

Moreover if both organisms share the same form of the character, that is the same state, then that state would be in its common ancestor (the character is the same!), but if both organisms had different forms of the characters, we do not know which form would be present in the common ancestor, so we assume that it could be any of the both states, in that case, if both organisms really share the same character then a transformation would be happen.

This is exactly the same description of the parsimony algorithm. In cladistics the use of parsimony algorithm is justified because used characters are homologs. In this context the algorithm maximize our homology propositions when minimize transformations: this allows that most terminals with the same states would be contiguous. Then the hypothesis ad hoc of homoplasy are minimized [7].

Parsimony and homology tests

It is a common idea between cladists to say that parsimony is a test of homology: congruence. O disagree, because as I argument here the basis of parsimony is assuming from the very beginning that character are homologs! Any homology test would be previous to a rigorous cladistic analysis.

Homology test could have several forms, they could be morphology arguments (usally put under 'similarity' label), anatomical position, structural organization, ontogeny, genetics, and in most cases the 'test' is a conjunction of these procedures--for these reasons defining a character implies an strong theoretical background--. After examining all of those alternatives you got a good character. Is for these reasons that homoplasy is an ad hoc hypothesis: homoplasy is only justified in realtion with the cladogram.

Of course, character revision is always welcomed, and homoplasic ones maybe demand a close examination, but it is equally valid to examine every character. It is possible that codification form some characters is dubious, in such cases it is possible to use, as an exploratory devise different codifications (similar to the proposition of Ramirez [8] for morphology, and Wheeler [9] for molecules). But beware: this codification is not supported by the cladogram, because the argument used to defend that codification is the same used for homoplasy: it is justified only in relation with the cladoogram, the the codification is an ad hoc codification. In ambiguous cases I prefer a weighting schema as proposed by Neff [10]: because we know little about the character, and we have some doubts about its coding, it is better that it has a lower weight than characters that we know better.

Bonus: an historical speculation

He I show homology and parsimony ideas in a separated fashion and then I fuse them. I do it in that way to clarify the argument. But historically the development is intertwined from the beginning. If you read Wagner [11] in the algorithmic pathway, and Hennig [12] from the logical point it is clear that both positions are very close. Both visions were fused in an excelent fashion by Farris and its collaborators [7, 13, 14], which ideas (especially from [14]) agree in many points exposed here. Then from the beginning cladistic analysis and the parsimony algorithm walk together.

Wagner, Hennig and Farris development their ideas from a morphology context. At the same time Dayoff [15] experiment with several algorithms for molecular sequences, which at least today, homology ideas for molecular biologist are different to the morphological concept, I do not know what homology ideas used molecular biologist form 60s, but it seems that she did not believe that two bases (in case of Dayoff, two aminoacids) equal in two organisms imply common origin, the idea of point mutations precludes the idea. It is worth to note that Dayoff never could find a way to assign states in ancestors.

[1] Nelson, G., Platnick, N. 1981. Systematics and biogeography. Columbia Univ., New York.
[2] Pleijel, F. 1995. On character coding for phylogeny reconstruction. Cladistics 11: 309-315.
[3] Brower, A.V.Z. 2000. Evolution is not a necessary assumption of cladistics. Cladistics 16: 143-154.
[4] Fitzhugh, K. 2006. The philosophical basis of character coding for inference of phylogenetic hypothesis. Zoologica scripta 35: 261-286.
[5] Grant, T., kluge, A.G. 2004. Transformation series as an ideographic character concept. Cladistics 20: 23-31.
[6] Kluge, A.G. 2003. On the deduction of species relationships: a précis. Cladistics 19: 233-239.
[7] Farris, J.S. 1983. The logical basis of phylogenetic systematics. In: Platnick, M., Funk, V.A. (Eds.), Advances in cladistics, vol. 2. Columbia Univ., New York, Pp. 7-36.
[8] Ramirez, M.J. 2007. Homology as a parsimony problem: a dynamic homology approach for morphological data. Cladistics 23: 588-612.
[9] Wheeler, W. 1996. Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics 12: 1-9.
[10] Neff, N.A. 1986. A rational basis for a priori character weighting. Systematic zoology 35: 110-123.
[11] Wagner, W.H. 1961. Problems in the classification of ferns. In: Recent advances in botany, vol. 1, Univ. Toronto, Toronto, Pp. 841-844.
[12] Hennigh, W. 1966. Phylogenetic systematics. Univ. Illinois, Urbana.
[13] Kluge, A.G., Farris, J.S. 1969. Quantitative phyletic and the evolution of anurans. Systematic zoology 18: 1-32.
[14] Farris, J.S., Kluge, A.G., Eckardt, M.J. 1970. A numerical approach to phylogenetic systematics. Systematic zoology 19: 172-189.
[15] Dayoff, M.O. 1969. Computer analysis of protein evolution. Scientific american 221: 87-95.

Bases de datos relacionales para datos filogenéticos

Hoy es un día de traducciones ;). Este post es la versión en español de mi post previo en 'ingles' sobre el tema.

Hace un buen tiempo cuando estudiaba ingeniería de sistemas (=ciencias de los computadores), el paradigma reinante para las bases de datos eran las bases de datos relacionales, pero con el rápido crecimiento de internet, computadores más poderosas, y motores de búsqueda eficaces y populares--como Google--pusieron las bases de datos basadas en texto en la cima. Las bases de datos basadas en texto son el paradigma en el cual se han desarrollado las bases de datos para datos filogenéticos, como GenBank y TreeBASE.

En las bases de datos basadas en texto (BDTex) el énfasis se coloca en estandarizar los archivos de entrada, que deben contener todos los datos necesarios para hacer una búsqueda. Seguramente el lector conoce los archivos que se pueden descargar de GenBank, o las matrices/árboles descargados desde TreeBASE. Ellos tienen muchos campos, como una clasificación, identificación de las secuencias o caracteres, un algoritmo de string matching (coincidencia de letras/palabras? No se como se diga esto en español :P) es usado para encontrar coincidencias exactas o similares a la solicitud del usuario.

Entonces, la principal fuente de investigaciones para las BDTex esta basada en los algoritmos de string matching. El blog iPhylo de Rod Page, tiene varios posts, y enlaces a artículos y manuscritos (aquí o aquí) sobre el tema. Pero yo creo que las bases de BDTex son una aproximación erronea.

La coincidencia de letras (string matching) es una buena herramienta para buscar palabras claves a lo largo de la red, o varios grupos de letras, como la búsqueda de secuencias en GenBank (yo supongo que esa es la principal razón para que sea una BDTex!), pero parece problemática para una base de datos de taxonomía/filogenia. Quiero resaltar algunos puntos:
  • Ambigüedad: aunque la entrada para cargar archivos en los servidores estén basados en plantillas de 'cortar y pager' (o usen asistentes paso a paso), es responsabilidad de quien carga el archivo llenarlo adecuadamente. Como Page llama la atención, en muchas de las matrices de TreeBASE los nombres de los terminales no son nombres científicos.
  • Información irrelevante: cuando uno descarga un archivo, este suele estar lleno de información que el usuario no desea, y otra información no esta disponible. Además como las búsquedas se basan en coincidencias de texto, cosas como notas, comentarios y otros campos pueden confundir a los algoritmos (recuerden, se trata de algoritmos de string matching!).
  • Formateado: Al ser una base de datos basada en texto, se tiene que hacer un compromiso en hacer los archivos legibles (=estandarizados) por varios programas lo que hace que los campos sean rigidos y existe una gran dificultad para incluir nuevos campos e información. Por ejemplo, la información geográfica en GenBank no es obligatoria--hasta donde se--incluso para los conjuntos de datos usados en análisis filogeográficos! La información geográfica y de material examinado para TreeBASE parece imposible si su implementación continua unida al formato nexus.
  • Taxonomía: Como consecuencia del formateado rígido, clasificaciones alternativas y búsquedas usando sinónimos no pueden ser implementadas, o requieren búsquedas adicionales en bases de datos alternativas.
  • Filogenia: ¿Alguna vez han intentado buscar una 'estructura de árbol' en TreeBASE?
Las bases de datos relacionales son un concepto diferente a las BDTex. Se basan en usar muchas tablas independientes conectadas mediante campos-clave (keys) muchas veces usando tablas secundarias para enlazar diferentes tablas. El motor de búsqueda esta basado en explorar las claves específicas (en búsquedas complejas) y campos específicos dentro de cada tabla.
Mi propia idea de como debe ser la estructura de la base de datos, en forma simplificada, y basada en mis propios intereses y búsquedas que yo he realizado es esta:
Por supuesto, un diseño adecuado requiere muchos años de desarrollo, con montones de entrevistas a varios taxónomos para permitir un producto que pueda ser usado adecuadamente alrededor del mundo! (A pesar que en sus post el apoya las BDTex, este post de Rod Page da varias ideas muy interesantes sobre el trabajo multidisciplinario de una base de datos para filogenias, por supuesto hay muchas cosas en la que yo no estoy de acuerdo xD).

Esta es la descripción de las diferentes tablas de mi bosquejo. La tabla 'taxon' guarda la nomeclatura del nombre actual de un taxon, sea una especie o una entidad supra-especifica, puede incluir la diagnosis (enlazada con la tabla de entrada de caracteres!), el 'concepto filogenético', enlace a dibujos o fotografías, el espécimen tipo (con enlace a la tabla de especímenes) y cosas de ese estilo. La tabla de 'synonyms' y 'classification' son tablas secundarias que guardan solo la relación entre dos taxones: los sinónimos (se puede incluir el motivo de la sinonimia), y el siguiente taxon más inclusivo de un taxon particular (y puede incluir el autor de esa propuesta de inclusión). Como cada taxon es tratado de forma independiente uno puede incluir tantos sinónimos como desee, o múltiples clasificaciones propuestas.

La tabla 'specimen' puede guardar información especifica de el material examinado, enlace a fotografías del espécimen, la localidad donde fue colectado, y cosas así.

Además hay una sección de caracteres! La tabla 'character' guarda el nombre, descripción, y quizá una bibliografía y dibujos o fotos del carácter, con 'character equivalences' es posible guardar caracteres equivalentes usados en otros estudios, permitiendo referencias cruzadas entre estudios que tengan diferentes grupos de terminales, la tabla 'character entry' guarda la información especifica de un carácter y un taxon, puede ser una celda de una matriz de caracteres morfológicos, o un fragmento específico de una secuencia molecular.

Este diseño puede ayudar en las cosas en donde las BDTex fallan. La ambigüedad es reducida, porque cada entrada es única y especifica: al cargar un archivo en la base de datos se debe identificar la naturaleza especifica de la entrada. BDTex pueden desarrollarse con un estándar como ese, pero en este caso es posible incluir muchos sinónimos, y nombres específicos de caracteres. Un proceso curatorial para la taxonomía puede ser posible sin destruir la integridad del conjunto de datos, y como los resultados filogenéticos deben ser introducidos en forma de una clasificación, el distanciamiento entre la filogenia y la clasificación [1] se vera reducido.

Cuando se realiza una búsqueda el usuario puede recuperar solo la información específica que desea: por ejemplo todos los caracteres para la cabeza de Hymenoptera, usando las equivalencias los caracteres pueden ser organizados en una forma relativamente adecuada, y usando la tabla de clasificaciones es posible encontrar caracteres de la cabeza usados en diferentes estudios, caracteres que pueden estar incluidos si se utiliza una clasificación alternativa, o caracteres que pueden estar presentes al ser incluidos en clasificaciones más inclusivas (por ejemplo, caracteres usados para definir Hexapoda o Arthropoda), de esta forma se consigue un verdadero sistema de recuperación de la información basado en la clasificación [2].

Tal y como muestran Nixon et al. [3] un único formato de archivo no es algo bueno para una base de datos, más bien, es preferible la estructura de tablas, y mecanismos de reporte que puedan producir entradas en diferentes formatos, por ejemplo recuperar secuencias en el formato de GenBank, en el de TNT, y en POY (fasta), o un archivo de distribuciones listo para usar con NDM.

La tabla de clasificaciones puede ayudar para localizar estudios que apoyen o rechacen una clasificación particular, el usuario puede encontrar la evidencia para un agrupamiento y para la clasificación alternativa. Pueden desarrollarse y usarse algoritmos que realicen la tarea de convertir la estructura de un árbol en una solicitud de búsqueda--Page a posteado sobre el tema con relativa frecuencia ;)--. Pero es más poderoso que las busquedas basadas en texto pues la misma base de datos posee la información (la clasificación jerárquica) para realizar la búsqueda.

Espero algún día hacerme millonario (lo dudo :P) o recibir financiación--ojala ;)--para elaborar esta enorme tarea, o al menos que alguien en la red, tenga ideas similares. Hasta entonces el único camino parece ser el sufrimiento continuo con las bases de datos 'taxonómicas' y 'filogenéticas' que esta por la red...

Pd. Como se puede ver, quizá esta base de datos pondria más cosas sobre el investigador que cargue los datos... pero despues de haber estado en campo por meses, examinar material por horas, día tras día, escribir reportes y manuscritos, cargar la información es solo una parte de toda la investigación!

[1] Franz, N.M. 2005. On the lack of good scientific reasons for the growing phylogeny/classification gap. Cladistics 21: 495-500.
[2] Farris, J.S. 1979. The information content of the phylogenetic system. Systematic zoology 28: 483-519.
[3] Nixon, K.C., Carpenter, J.M., Borgardt, S.J. 2001. Beyond NEXUS: universal cladistic data objects. Cladistics 17: S53-S59.

jueves, diciembre 20, 2007

Relational databases for phylogenetic data

A long time ago, when I'm studying computer science, the only paradigm for databases were realtional databases, then under internet fast growth, more powerful computers, and popular searching engines--like Google--put text based databases on the top. Text based databases are the paradigm used to construct some databases for phylogenetic data, like GenBank, and TreeBASE.

Under text based databases the emphasis is put on somewhat standardized input files, that keep all the possible data necessary to make the search, you surely know the files retrieved from GenBank, or the matrices/trees from treeBASE, they have several fields, like classification, identification of sequence or character, a string matching algorithm is used to found exact and similar matches from the user query.

Then, the principal source of research in a text based database is to implement matching algorithms, the blog iPhylo from Rod Page, have several posts, paper and manuscripts links (here, here) about that subject. But I think that a text based databases are a wrong approach.

I remark some points:
  • Ambiguity: even if the entry for uploads is based on a 'cut and paste' template (or a wizard) the uploader is left with the responsibility to fill it adequately. As Page remarks, in many TreeBASE matrices terminal names are not properly scientific names.
  • Irrelevant information: when you retrieve a file, it is usually full of information that you don't want, other information it is not provided. As searches are based on text matches, several notes, comments and other fields contain information, its function seems to be to confuse the searching algorithms (remember, they are string matching algorithms!).
  • Formatting: As is text based, a compromise to made the files available to several programs made the entry fields to be rigid and difficult to modify and include new fields/information. For example in GenBank geographic information is not mandatory--as far as I know--even for phylogeographic datasets! Geographic and examined material for TreeBASE seems to be impossible to implement using the nexus format.
  • Taxonomy: As a consequence of the rigid format, alternative taxonomies and synonyms searches can not be implemented, or require searches on alternative databases.
  • Phylogeny: Why not try a 'tree structure' search on TreeBASE?
String matching is a nice tool to search for keywords around the net, or some string structures, as sequence matching in GenBank (I guess that this is the main reason to make it a text based database!), but it seems to problematic for taxonomy/phylogeny databases.

Relational databases are a different concept from text-based databases. They are based on several independent tables connected to key-fields and in several cases using a secondary tables to match keys form different tables. The searching engine is based on specific keys (for complex searches) and specific fields form each table.

My own idea of the structure of database, in a sketch fashion, and based on my usual queries is like this:
Of course, a proper database design will need several years of development, with tons of interviews of several taxonomists to allow a product that could be used in a right way for several people around the world! (Although he had several post endorsing string matching databases, this post of Rod Page provide several nice ideas about the integrative work of a phylogenetic databases, of course there are several things that i don-t like xD).

Here I explain some part of the different tables, the table 'taxon' store the actual nomeclature of a taxon name, a species or a supra-specific entity, its name, author, it may include the diagnosis (linked to character entry!) a 'phylogenetic concept', a link to pictures, the type specimen (with a link to specimen!) and things like that. The table 'synonyms' and 'classification' are secondary tables to store only a relationship between two taxons: synonyms (it could include the motive of the synonymy), and the next inclusive taxon to a particular taxon (it might include the author of this inclusive relationship), as each taxon is independent you could include as many synonyms as you know, or as many classifications proposed.

The 'specimen' table could store specific information about the examined material, links to pictures of the specimen, the locality where the specimen was collected, and so on.

It is also a character section, the table 'character' store the name, description, and maybe bibliography and pictures of the character, with 'character equivalences' it is possible to store equivalent characters used in other studies, allowing cross reference between studies that used different taxon scopes, the 'character entry' table store the specific information for the character and the taxon, it could be a single cell in from a morphology matrix, or an specific sequence fragment.

This design could help in queries/actions in which text-based databases fail. The ambiguity is reduced, as each entry is a single one: the uploader would identify the particular nature of their entries in an specific way, text based databases could be developed with that standard in mind, but here it is possible to keep an strict species naming, as many synonyms as you want, and specific character names. A curator process for the taxonomy could be possible without harming the whole database data, and as phylogenetic results would be introduced in a form of classification, the gap between phylogeny and classification [1] would be reduced.

When you perform a search you could retrieve only the specific information that you want: for example all head characters from Hymenoptera, using the equivalences the characters could be more or less organized, and using the classification table you could retrieve head characters used for many different studies, alternative characters that could match under alternative classifications, or possible characters that could be present as they are scored for more inclusive classifications (for example, a character used to define Hexapoda and Arthropoda), a truly information retrieval system based on classification [2].

As is remarked by Nixon et al. [3] a single file format is not a good thing to a database, instead, it is preferable a table structure, and report tools that could produce entries in different formats, for example retrieving sequences in a GenBank format, in a TNT format, and a POY (fasta) format, or a distribution file ready to use in NDM.

The classification table could help to retrieve studies that support or reject a particular classification, you could found the evidence for one grouping and for the alternative classification. Particular algorithms need to be developed to translate the tree structure to a query--Page posted about the subject frequently ;)--. But it is more powerful that a text based search because the same database have the information needed (the hierarchic classification) to perform the search.

I hope sometime I became rich (I doubt it) or receive a grant--I hope ;)--to perform this huge task, or at least that someone around the net, had similar ideas. Until then the only path is continuous suffering with some 'taxonomic' and 'phylogenetic' databases around the net...

Pd. As you note, maybe a database like that put some burden on researcher that upload the data... but if you go to field trips for months, examine material for hours, day after day, write reports and manuscripts, uploading information it is part of the whole research!

[1] Franz, N.M. 2005. On the lack of good scientific reasons for the growing phylogeny/classification gap. Cladistics 21: 495-500.
[2] Farris, J.S. 1979. The information content of the phylogenetic system. Systematic zoology 28: 483-519.
[3] Nixon, K.C., Carpenter, J.M., Borgardt, S.J. 2001. Beyond NEXUS: universal cladistic data objects. Cladistics 17: S53-S59.

viernes, diciembre 14, 2007

Muy buena noticia sobre TNT

En estos momentos no estoy en la ciudad, así que no puedo escribir posts muy largos :'(... pero no podia dejar pasar esta gran noticia sobre TNT. La Willy Hennig Society a tomado la financiación y apoyo a TNT (un programa de Pablo Goloboff, Steve Farris y Kevin Nixon), así que a partir de ahora el programa es gratuito! Solo hay que cumplir algunas condiciones muy simples: uso personal, y citar el programa--y como no la financiación por partde de la WHS--al publicar resultados.

En caso de que no lo sepan, TNT es el programa más rápido para análisis filogenético de parsimonia, que implementa muchos de los más recientes y eficientes algoritmos de búsqueda [1, 2], y un excelente y poderoso lenguaje de macros/scripts. Si ustedes realizan análisis cladísticos/filogenéticos, este es el programa ideal.

Yo adoro el editor de caracteres! Es muy fácil de usar, y mucho más intuitivo que WinClada, NDE o Mesquite (sip! Es mucho mejor que el de Mesquite!).

Pueden descargarlo aquí: http://www.zmuc.dk/public/phylogeny/TNT/
lean la licencia y a disfrútenlo :)

Cuando regrese a Bogotá completare las citas ;)

[1] Nixon, K.C. 1999. The parsimony ratchet a new method for rapid parsimony analysis. Cladistics 15: 407-414.
[2] Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15: 415-428.

Really huge news about TNT

I'm out of the city, so no computer for long posts :'(... but I'm happy to give a flash news about TNT: The Willy Hennig Society takes the sponsorship of TNT (a software by Pablo Goloboff, Steve Farris and Kevin Nixon), so from now, the program is free! The are some simple conditions: personal use, and a citation of the program--and the sponsor, that is the WHS ;)--in published results!

In case that you don't know, TNT is the most faster program for phylogenetic analysis under parsimony, implements several new and efficient heuristic algoritms [1,2], and a powerful script/macro language. If you are doing cladistics/phylogenetics, you should surely dream with this program!

I love the matrix editor! Is easy to use, and more straightforward than WinClada, NDE or Mesquite (yep!... far better than mesquite!).

You could download it at: http://www.zmuc.dk/public/phylogeny/TNT/
read the license agreement and enjoy :)

I give the proper citations when I return to Bogotá :P

[1] Nixon, K.C. 1999. The parsimony ratchet a new method for rapid parsimony analysis. Cladistics 15: 407-414.
[2] Goloboff, P. A. 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15: 415-428.