9 resultados para Genome annotation
em Universidad Politécnica de Madrid
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
We present two new algorithms which perform automatic parallelization via source-to-source transformations. The objective is to exploit goal-level, unrestricted independent and-parallelism. The proposed algorithms use as targets new parallel execution primitives which are simpler and more flexible than the well-known &/2 parallel operator. This makes it possible to genérate better parallel expressions by exposing more potential parallelism among the literals of a clause than is possible with &/2. The difference between the two algorithms stems from whether the order of the solutions obtained is preserved or not. We also report on a preliminary evaluation of an implementation of our approach. We compare the performance obtained to that of previous annotation algorithms and show that relevant improvements can be obtained.
Resumo:
Antimicrobial peptides constitute an important factor in the defense of plants against pathogens, and bacterial resistance to these peptides have previously been shown to be an important virulence factor in Dickeya dadantii, the causal agent of soft-rot disease of vegetables. In order to understand the bacterial response to antimicrobial pep- tides, a transcriptional microarray analysis was performed upon treatment with sub-lethal concentration of thionins, a widespread plant peptide. In all, 36 genes were found to be overexpressed, and were classified according to their deduced function as i) transcriptional regulators, ii) transport, and iii) modification of the bacterial membrane. One gene encoding a uricase was found to be repressed. The majority of these genes are known to be under the control of the PhoP/PhoQ system. Five genes representing the different functions induced were selected for further analysis. The results obtained indicate that the presence of antimicrobial peptides induces a complex response which includes peptide-specific elements and general stress-response elements contributing differentially to the virulence in different hosts.
Resumo:
Rhizobium leguminosarum (Rl) es una alfa-proteobacteria capaz de establecer una simbiosis diazotrófica con distintas leguminosas. A pesar de la importancia de esta simbiosis en el balance global del ciclo del nitrógeno, muy pocos genomas de rhizobios han sido secuenciados, que aporten nuevos conocimientos relacionados con las características genéticas que contribuyen a importantes procesos simbióticos. Únicamente tres secuencias completas de Rl han sido publicadas: Rl bv. viciae 3841 y dos genomas de Rl bv. trifolii (WSM1325 y WSM2304), ambos simbiontes de trébol. La secuencia genómica de Rlv UPM791 se ha determinado por medio de secuenciación 454. Este genoma tiene un tamaño aproximado de 7.8 Mb, organizado en un cromosoma y 5 replicones extracromosómicos, que incluyen un plásmido simbiótico de 405 kb. Este nuevo genoma se ha analizado en relación a las funciones simbióticas y adaptativas en comparación con los genomas completos de Rlv 3841 y Rl bv. trifolii WSM1325 y WSM2304. Mientras que los plásmidos pUPM791a y b se encuentran conservados, el plásmido simbiótico pUPM791c exhibe un grado de conservación muy bajo comparado con aquellos descritos en las otras cepas de Rl. Uno de los factores implicados en el establecimiento de la simbiosis es el sistema de comunicación intercelular conocido como Quorum Sensing (QS). El análisis del genoma de Rlv UPM791 ha permitido la identificación de dos sistemas tipo LuxRI mediados por señales de tipo N-acyl-homoserina lactonas (AHLs). El análisis mediante HPLC-MS ha permitido asociar las señales C6-HSL, C7-HSL y C8-HSL al sistema rhiRI, codificado en el plásmido simbiótico; mientras que el sistema cinRI, localizado en el cromosoma, produce 3OH-C14:1-HSL. Se ha identificado una tercera sintasa (TraI) codificada en el plásmido simbiótico, pero su regulador correspondiente se encuentra truncado debido a un salto de fase. Adicionalmente, se han encontrado tres reguladores de tipo LuxR-orphan que no presentan una sintasa LuxI asociada. El efecto potencial de las señales tipo AHL se ha estudiado mediante una estrategia de quorum quenching, la cual interfiere con los sistemas de QS de la bacteria. Esta estrategia está basada en la introducción del gen aiiA de Bacillus subtilis, que expresa constitutivamente una enzima lactonasa degradadora de AHLs. Para llevar a cabo el análisis en condiciones simbióticas, se ha desarrollado un sistema de doble marcaje que permite la identificación basado en los marcadores gusA y celB, que codifican para una enzima β–glucuronidasa y una β–galactosidasa termoestable, respectivamente. Los resultados obtenidos indican que Rlv UPM791 predomina sobre la cepa Rlv 3841 para la formación de nódulos en plantas de guisante. La baja estabilidad del plásmido que codifica para aiiA, no ha permitido obtener una conclusión definitiva sobre el efecto de la lactonasa AiiA en competitividad. Con el fin de analizar el significado y la regulación de la producción de moléculas señal tipo AHL, se han generado mutantes defectivos en cada uno de los dos sistemas de QS. Se ha llevado a cabo un análisis detallado sobre la producción de AHLs, formación de biofilm y simbiosis con plantas de guisante, veza y lenteja. El efecto de las deleciones de los genes rhiI y rhiR en Rlv UPM791 es más drástico en ausencia del plásmido pUPM791d. Mutaciones en cinI o cinRIS muestran tanto ausencia de señales, como producción exclusivamente de las de bajo peso molecular, respectivamente, producidas por el sistema rhiRI. Estas mutaciones mostraron un efecto importante en simbiosis. El sistema rhiRI se necesita para un comportamiento simbiótico normal. Además, mutantes cinRIS generaron nódulos blancos e ineficientes, mientras que el mutante cinI fue incapaz de producir nódulos en ninguna de las leguminosas utilizadas. Dicha mutación resultó en la inestabilización del plasmido simbiótico por un mecanismo dependiente de cinI que no ha sido aclarado. En general, los resultados obtenidos indican la existencia de un modelo de regulación dependiente de QS significativamente distinto a los que se han descrito previamente en otras cepas de R. leguminosarum, en las cuales no se había observado ningún fenotipo relevante en simbiosis. La regulación de la producción de AHLs Rlv UPM791 es un proceso complejo que implica genes situados en los plásmidos UPM791c y UPM791d, además de la señal 3-OH-C14:1-HSL. Finalmente, se ha identificado un transportador de tipo RND, homologo a mexAB-oprM de P. aeruginosa e implicado en la extrusión de AHLs de cadena larga. La mutación he dicho transportador no tuvo efectos apreciables sobre la simbiosis. ABSTRACT Rhizobium leguminosarum (Rl) is a soil alpha-proteobacterium that establishes a diazotrophic symbiosis with different legumes. Despite the importance of this symbiosis to the global nitrogen cycling balance, very few rhizobial genomes have been sequenced so far which provide new insights into the genetic features contributing to symbiotically relevant processes. Only three complete sequences of Rl strains have been published: Rl bv. viciae 3841, harboring six plasmids (7.75 Mb) and two Rl bv. trifolii (WSM1325 and WSM2304), both clover symbionts, harboring 5 and 4 plasmids, respectively (7.41 and 6.87 Mb). The genomic sequence of Rlv UPM791 was undertaken by means of 454 sequencing. Illumina and Sanger reads were used to improve the assembly, leading to 17 final contigs. This genome has an estimated size of 7.8 Mb organized in one chromosome and five extrachromosomal replicons, including a 405 kb symbiotic plasmid. Four of these plasmids are already closed, whereas there are still gaps in the smallest one (pUPM791d) due to the presence of insertion elements and repeated sequences, which difficult the assembly. The annotation has been carried out thanks to the Manatee pipeline. This new genome has been analyzed as regarding symbiotic and adaptive functions in comparison to the Rlv 3841 complete genome, and to those from Rl bv. trifolii strains WSM1325 and WSM2304. While plasmids pUPM791a and b are conserved, the symbiotic plasmid pUPM791c exhibited the lowest degree of conservation as compared to those from the other Rl strains. One of the factors involved in the symbiotic process is the intercellular communication system known as Quorum Sensing (QS). This mechanism allows bacteria to carry out diverse biological processes in a coordinate way through the production and detection of extracellular signals that regulate the transcription of different target genes. Analysis of the Rlv UPM791 genome allowed the identification of two LuxRI-like systems mediated by N-acyl-homoserine lactones (AHLs). HPLC-MS analysis allowed the adscription of C6-HSL, C7-HSL and C8-HSL signals to the rhiRI system, encoded in the symbiotic plasmid, whereas the cinRI system, located in the chromosome, produces 3OH-C14:1-HSL, previously described as “bacteriocin small”. A third synthase (TraI) is encoded also in the symbiotic plasmid, but its cognate regulator TraR is not functional due to a fameshift mutation. Three additional LuxR orphans were also found which no associated LuxI-type synthase. The potential effect of AHLs has been studied by means of a quorum quenching approach to interfere with the QS systems of the bacteria. This approach is based upon the introduction into the strains Rl UPM791 and Rl 3841 of the Bacillus subtilis gene aiiA expressing constitutively an AHL-degrading lactonase enzyme which led to virtual absence of AHL even when AiiA-expressing cells were a fraction of the total population. No significant effect of AiiA-mediated AHL removal on competitiveness for growth in solid surface was observed. For analysis under symbiotic conditions we have set up a two-label system to identify nodules produced by two different strains in pea roots, based on the markers gusA and celB, encoding a β–glucuronidase and a thermostable β–galactosidase enzymes, respectively. The results obtained show that Rlv UPM791 outcompetes Rlv 3841 for nodule formation in pea plants, and that the presence of the AiiA plasmid does not significantly affect the relative competitiveness of the two Rlv strains. However, the low stability of the pME6863 plasmid, encoding aiiA, did not lead to a clear conclusion about the AiiA lactonase effect on competitiveness. In order to further analyze the significance and regulation of the production of AHL signal molecules, mutants deficient in each of the two QS systems were constructed. A detailed analysis of the effect of these mutations on AHL production, biofilm formation and symbiosis with pea, vetch and lentil plants has been carried out. The effect of deletions on Rlv UPM791 rhiI and rhiR genes is more pronounced in the absence of plasmid pUPM791d, as no signal is detected in UPM791.1, lacking this plasmid. Mutations in cinI or cinRIS show either no signals, or only the small ones produced by the rhiRI system, suggesting that cinR might be regulating the rhiRI system. These mutations had a strong effect on symbiosis. Analysis of rhi mutants revealed that rhiRI system is required for normal symbiotic performance, as a drastic reduction of symbiotic fitness is observed when rhiI is deleted, and rhiR is essential for nitrogen fixation in the absence of plasmid pUPM791d. Furthermore, cinRIS mutants resulted in white and inefficient nodules, whereas cinI mutant was unable to form nodules on any legume tested. The latter mutation is associated to the instabilization of the symbiotic plasmid through a mechanism still uncovered. Overall, the results obtained indicate the existence of a model of QS-dependent regulation significantly different to that previously described in other R. leguminosarum strains, where no relevant symbiotic phenotype had been observed. The regulation of AHL production in Rlv UPM791 is a complex process involving the symbiotic plasmid (pUPM791c) and the smallest plasmid (pUPM791d), with a key role for the 3-OH-C14:1-HSL signal. Finally, we made a search for potential AHL transporters in Rlv UPM791 genome. These signals diffuse freely across membranes, but in the case of the long-chain AHLs an active efflux system might be required, as it has been described for C12-HSL in the case of Pseudomonas aeruginosa. We have identified a putative AHL transporter of the RND family homologous to P. aeruginosa mexAB-oprM. A mutant strain deficient in this transporter has been generated, and TLC analysis shows absence of 3OH-C14:1-HSL in its supernatant. This deficiency was complemented by the reintroduction of an intact copy of the genes via plasmid transfer. The mutation in mexAB genes had no significant effects on the symbiotic performance of R. leguminosarum bv. viciae.
Resumo:
Actualmente, la Web provee un inmenso conjunto de servicios (WS-*, RESTful, OGC WFS), los cuales están normalmente expuestos a través de diferentes estándares que permiten localizar e invocar a estos servicios. Estos servicios están, generalmente, descritos utilizando información textual, sin una descripción formal, es decir, la descripción de los servicios es únicamente sintáctica. Para facilitar el uso y entendimiento de estos servicios, es necesario anotarlos de manera formal a través de la descripción de los metadatos. El objetivo de esta tesis es proponer un enfoque para la anotación semántica de servicios Web en el dominio geoespacial. Este enfoque permite automatizar algunas de las etapas del proceso de anotación, mediante el uso combinado de recursos ontológicos y servicios externos. Este proceso ha sido evaluado satisfactoriamente con un conjunto de servicios en el dominio geoespacial. La contribución principal de este trabajo es la automatización parcial del proceso de anotación semántica de los servicios RESTful y WFS, lo cual mejora el estado del arte en esta área. Una lista detallada de las contribuciones son: • Un modelo para representar servicios Web desde el punto de vista sintáctico y semántico, teniendo en cuenta el esquema y las instancias. • Un método para anotar servicios Web utilizando ontologías y recursos externos. • Un sistema que implementa el proceso de anotación propuesto. • Un banco de pruebas para la anotación semántica de servicios RESTful y OGC WFS. Abstract The Web contains an immense collection of Web services (WS-*, RESTful, OGC WFS), normally exposed through standards that tell us how to locate and invocate them. These services are usually described using mostly textual information and without proper formal descriptions, that is, existing service descriptions mostly stay on a syntactic level. If we want to make such services potentially easier to understand and use, we may want to annotate them formally, by means of descriptive metadata. The objective of this thesis is to propose an approach for the semantic annotation of services in the geospatial domain. Our approach automates some stages of the annotation process, by using a combination of thirdparty resources and services. It has been successfully evaluated with a set of geospatial services. The main contribution of this work is the partial automation of the process of RESTful and WFS semantic annotation services, what improves the current state of the art in this area. The more detailed list of contributions are: • A model for representing Web services. • A method for annotating Web services using ontological and external resources. • A system that implements the proposed annotation process. • A gold standard for the semantic annotation of RESTful and OGC WFS services, and algorithms for evaluating the annotations.
Resumo:
This paper describes a framework for annotation on travel blogs based on subjectivity (FATS). The framework has the capability to auto-annotate -sentence by sentence- sections from blogs (posts) about travelling in the Spanish language. FATS is used in this experiment to annotate com- ponents from travel blogs in order to create a corpus of 300 annotated posts. Each subjective element in a sentence is annotated as positive or negative as appropriate. Currently correct annotations add up to about 95 per cent in our subset of the travel domain. By means of an iterative process of annotation we can create a subjectively annotated domain specific corpus.
Resumo:
The use of microsatellite markers in large-scale genetic studies is limited by its low throughput and high cost and labor requirements. Here, we provide a panel of 45 multiplex PCRs for fast and cost-efficient genome-wide fluorescence-based microsatellite analysis in grapevine. The developed multiplex PCRs panel (with up to 15-plex) enables the scoring of 270 loci covering all the grapevine genome (9 to 20 loci/chromosome) using only 45 PCRs and sequencer runs. The 45 multiplex PCRs were validated using a diverse grapevine collection of 207 accessions, selected to represent most of the cultivated Vitis vinifera genetic diversity. Particular attention was paid to quality control throughout the whole process (assay replication, null allele detection, ease of scoring). Genetic diversity summary statistics and features of electrophoretic profiles for each studied marker are provided, as are the genotypes of 25 common cultivars that could be used as references in other studies.
Resumo:
Phaseolus vulgaris L. (frijol común o judía) es una leguminosa de gran demanda para la nutrición humana y un producto agrícola muy importante. Sin embargo, la producción de frijol se ve limitada por presiones ambientales como la sequía. En México, el 85% de la cosecha de frijol se produce en la temporada de primavera-verano, principalmente en las regiones del altiplano semiárido con una precipitación anual entre 250 y 400 mm. A pesar del implemento de tecnología en el campo, los factores naturales impiden al agricultor llegar a los rendimientos deseados. El Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias (INIFAP), como instituto de investigación gubernamental en México, tiene como objetivo la mejora de cultivos estratégicos, uno de ellos, P. vulgaris. Los estudios en relación a la sequía se enfocan especialmente en la selección de genotipos tolerantes, los cuales son sometidos en condiciones de estrés y monitoreando parámetros como el rendimiento y peso de semilla, además de algunos indicadores tales como índice de cosecha. El resultado de estos trabajos ha sido la obtención de variedades con mayor tolerancia a la sequía, tales como Pinto Villa y Pinto Saltillo. En los últimos años se ha avanzado notablemente en el conocimiento de las bases moleculares en las respuestas de las plantas al estrés. De acuerdo a diversos estudios se ha demostrado que las plantas bajo estrés por sequía experimentan cambios en la expresión de genes involucrados en la señalización, regulación de la transcripción y la traducción, transporte de agua y la función directa en la protección celular. También se ha observado que el déficit de agua es causado por las temperaturas extremas y la alta concentración de sales, por lo que al nivel molecular, las respuestas al estrés tienen puntos de especificidad y puntos de entrecruzamiento. La sequía puede generar estreses secundarios, tales como el nutricional, oxidativo y osmótico. Sin embargo, es necesario identificar y caracterizar muchos de los componentes involucrados en las respuestas al déficit hídrico, la caracterización de estos genes permitirá tener una mejor comprensión de los mecanismos bioquímicos y fisiológicos involucrados en la tolerancia al estrés. Actualmente, con el apoyo de la biología molecular se han identificado algunos genes que otorgan ventajas para la adaptación a ambientes desfavorables. Por lo que el objetivo del presente trabajo es identificar marcadores genéticos asociados a rasgos fenotípicos con énfasis a la tolerancia a estrés hídrico en P. vulgaris. Una vez establecidos los marcadores asociados al estrés hídrico, es factible considerar su uso para la selección asistida por marcadores en líneas o variedades de frijol de interés para los mejoradores. Se evaluaron 282 familias F3:5 derivadas de la cruza entre los cultivares Pinto Villa y Pinto Saltillo. Las familias se sembraron bajo un diseño simple de látice 17x17, el experimento se llevo acabo en el ciclo primavera-verano del 2010 y 2011, y otoñoinvierno de 2010 en el Campo Experimental Bajío del INIFAP con dos repeticiones para cada tratamiento de humedad (riego completo y sequía terminal). En todos los genotipos se realizó el fenotipado (variables fenotípicas) y el genotipado a través de marcadores moleculares. Los análisis estadísticos se basaron en el análisis de componentes principales (Eigen Analysis Selection Index Method, ESIM), la asociación entre marcadores SNP y el fenotipado (paquete SNPassoc para R) y el análisis de varianza (ANOVA). Los valores ESIM mostraron que las variables de Rendimiento, Días a floración, Días a madurez fisiológica e Índice de cosecha fueron sobresalientes en sequía terminal, por lo que se sugieren tomarse en consideración para los estudios de sequía en P. vulgaris como monitores de evaluación a la resistencia. Se identificaron nueve familias sobresalieron por sus valores ESIM (PV/PS6, 22, 131, 137, 149, 154, 201, 236 y 273), además de presentar valores superiores para el rendimiento en comparación con los parentales. Estos genotipos son candidatos interesantes para realizar estudios de identificación de loci asociados con la respuesta al estrés, y como potenciales parentales en el desarrollo de nuevas variedades de frijol. En los análisis de asociación SNPassoc se identificaron 83 SNPs significativos (p<0,0003) asociados a los rasgos fenotípicos, obteniendo un total de 222 asociaciones, de las cuales predomina el modelo genético de codominancia para las variables Días a floración, Periodo reproductivo y Biomasa total. Treinta y siete SNPs se identificaron a diferentes funciones biológicas a través del análisis de anotación funcional, de los cuales 12 SNPs (9, 18, 28, 39, 61, 69, 80, 106, 115, 128, 136 y 142) sobresalen por su asociación al fenotipado, y cuya anotación funcional indica que se encuentran en genes relacionados a la tolerancia a la sequía, tales como la actividad kinasa, actividad metabólica del almidón, carbohidratos y prolina, respuesta al estrés oxidativo, así como en los genes LEA y posibles factores de transcripción. En el caso de los análisis ANOVA, se identificaron 72 asociaciones entre los SNPs y las variables fenotípicas (F< 3,94E-04). Las 72 asociaciones corresponden a 30 SNPs y 7 variables fenotípicas, de las que predomina Peso de 100 semillas y Periodo reproductivo. Para los rasgos de Rendimiento, Índice de cosecha y Días a madurez fisiológica se presentaron asociaciones con seis SNPs (17, 34, 37, 50, 93 y 107), de los cuales, a los SNP37 y SNP107 fueron identificados a la anotación biológica de protein binding. Por otro lado, los SNP106 y SNP128 asociados al Periodo reproductivo, son genes con actividad kinasa y actividad metabólica del almidón, respectivamente. Para los marcadores tipo AFLP, se identificaron 271 asociaciones (F<2,34E-04). Las asociaciones corresponden a 86 AFLPs con todas las variables fenotípicas evaluadas, de las que predomina peso de 100 semillas, Días a floración y Periodo reproductivo. Debido a que los en los AFLPs no es posible determinar su anotación biológica, se proponen como marcadores potenciales relacionados a la resistencia a la sequía en frijol. Los AFLPs candidatos requieren más estudios tales como la secuenciación de los alelos respectivos, así como la identificación de éstas secuencias en el genoma de referencia y su anotación biológica, entre otros análisis, de esta manera podríamos establecer aquellos marcadores candidatos a la validación para la selección asistida. El presente trabajo propone tanto genotipos como marcadores genéticos, que deben ser validados para ser utilizados en el programa de mejoramiento de P. vulgaris, con el objetivo de desarrollar nuevas líneas o variedades tolerantes a la sequía. ABSTRACT Phaseolus vulgaris L. (common bean or judia) is a legume of great demand for human consumption and an important agricultural product. However, the common bean production is limited by environmental stresses, such as drought. In Mexico, 85% of the common bean crop is produced in the spring-summer season mainly in semiarid highland regions with a rainfall between 250 and 400 mm per year. In spite of the improvement of crop technology, the natural factors hamper getting an optimal yield. The National Institute for Forestry, Agriculture and Livestock (INIFAP) is a government research institute from Mexico, whose main objective is the genetic breeding of strategic crops, like P. vulgaris L. The drought tolerance studies particularly focus on the selection of bean tolerant genotypes, which are subjected to stress conditions, by means of monitoring parameters such as yield and seed weight, plus some agronomic indicators such as harvest index. The results of these works have led to obtain cultivars with higher drought tolerance such as Pinto Villa and Pinto Saltillo. Significant achievements have been recently made in understanding the molecular basis of stress plant responses. Several studies have shown that plants under drought stress present changes in gene expression related to cell signalling, transcriptional and translational regulation, water transport and cell protection. In addition, it has been observed that the extreme temperatures and high salt concentrations can cause a water deficiency so, at the molecular level, stress responses have specific and crossover points. The drought can cause secondary stresses, such as nutritional, oxidative and osmotic stress. It is required the identification of more components involved in the response to water deficit, the characterization of these genes will allow a better understanding of the biochemical and physiological mechanisms involved in stress tolerance. Currently, with the support of molecular biology techniques, some genes that confer an advantage for the crop adaptation to unfavourable environments have been identified. The objective of this study is to identify genetic markers associated with phenotypic traits with emphasis on water stress tolerance in P. vulgaris. The establishment of molecular markers linked to drought tolerance would make possible their use for marker-assisted selection in bean breeding programs. Two hundred and eighty two F3:5 families derived from a cross between the drought resistant cultivars Pinto Villa and Pinto Saltillo were evaluated. The families were sowed under a 17x17 simple lattice design. The experiment was conducted between spring-summer seasons in 2010 and 2011, and autumn-winter seasons in 2010 at the Bajio Experimental Station of INIFAP with two treatments (full irrigation and terminal drought). All families were phenotyped and genotyped using molecular markers. Statistical analysis was based on principal component analysis (Eigen Analysis Selection Index Method, ESIM), association analysis between SNP markers and phenotype (SNPassoc package R) and analysis of variance (ANOVA). The ESIM values showed that seed yield, days to flowering, days to physiological maturity and harvest index were outstanding traits in terminal drought treatment, so they could be considered as suitable parameters for drought-tolerance evaluation in P. vulgaris. Nine outstanding families for the ESIM values were identified (PV/PS6, 22, 131, 137, 149, 154, 201, 236 and 273), in addition, these families showed higher values for seed yield compared to the parental cultivars. These families are promising candidates for studies focused on the identification of loci associated to the stress response, and as potential parental cultivars for the development of new varieties of common bean. In the SNPassoc analysis, 83 SNPs were found significantly associated (p<0.0003) with phenotypic traits, obtaining a total of 222 associations, most of which involved the traits days to flowering, reproductive period and total biomass under a codominant genetic model. The functional annotation analysis showed 37 SNPs with different biological functions, 12 of them (9, 18, 28, 39, 61, 69, 80, 106, 115, 128, 136 and 142) stand out by their association to phenotype. The functional annotation suggested a connection with genes related to drought tolerance, such as kinase activity, starch, carbohydrates and proline metabolic processes, responses to oxidative stress, as well as LEA genes and putative transcription factors. In the ANOVA analysis, 72 associations between SNPs and phenotypic traits (F<3.94E- 04) were identified. All of these associations corresponded to 30 SNPs markers and seven phenotypic traits. Weight of 100 seeds and reproductive period were the traits with more associations. Seed yield, harvest index and days to physiological maturity were associated to six SNPs (17, 34, 37, 50, 93 and 107), the SNP37 and SNP107 were identified as located in protein binding genes. The SNP106 and SNP128 were associated with the reproductive period and belonged to genes with kinase activity and genes related to starch metabolic process, respectively. In the case of AFLP markers, 271 associations (F<2.34E-04) were identified. The associations involved 86 AFLPs and all phenotypic traits, being the most frequently associated weight of 100 seeds, days to flowering and reproductive period. Even though it is not possible to perform a functional annotation for AFLP markers, they are proposed as potential markers related to drought resistance in common bean. AFLPs candidates require additional studies such as the sequencing of the respective alleles, identification of these sequences in the reference genome and gene annotation, before their use in marker assisted selection. This work, although requires further validation, proposes both genotypes and genetic markers that could be used in breeding programs of P. vulgaris in order to develop new lines or cultivars with enhanced drought-tolerance.
Resumo:
Nowadays, there is a great amount of genomic and transcriptomic data available about forest species, including ambitious projects looking for complete sequencing and annotation of different gymnosperm genomes [1, 2]. Pinus canariensis is an endemic conifer of the Canary Islands with re-sprouting capability and resilience against fire and mechanical damage, as result of an adaptation to volcanic environments. Additionally, this species has a high proportion of axial parenchyma compared with other conifers, and this tissue connects with radial parenchyma allowing transport of reserves. The most internal tracheids stop accumulating water [3], and get filled of resins and polyphenols synthesized by the axial parenchyma; this is the so-called ?torch-heartwood? [4], which avoids decay. This wood achieves very high prices due to its particular resistance to rot. These features make P. canariensis an interesting model species for the analysis of these developmental processes in conifers. In this study we aim to perform a complete transcriptome annotation during xylogenesis in Pinus canariensis, using next-generation sequencing (NGS) -Roche 454 pyrosequencing-, in order to provide a genomic resource for further analysis, including expression profiling and the identification of candidate genes for important adaptive features.