948 resultados para local sequence alignment problem


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The number of sequences generated by genome projects has increased exponentially, but gene characterization has not followed at the same rate. Sequencing and analysis of full-length cDNAs is an important step in gene characterization that has been used nowadays by several research groups. In this work, we have selected Schistosoma mansoni clones for full-length sequencing, using an algorithm that investigates the presence of the initial methionine in the parasite sequence based on the positions of alignment start between two sequences. BLAST searches to produce such alignments have been performed using parasite expressed sequence tags produced by Minas Gerais Genome Network against sequences from the database Eukaryotic Cluster of Orthologous Groups (KOG). This procedure has allowed the selection of clones representing 398 proteins which have not been deposited as S. mansoni complete CDS in any public database. Dedicated sequencing of 96 of such clones with reads from both 5' and 3' ends has been performed. These reads have been assembled using PHRAP, resulting in the production of 33 full-length sequences that represent novel S. mansoni proteins. These results shall contribute to construct a more complete view of the biology of this important parasite.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

When underwater vehicles perform navigation close to the ocean floor, computer vision techniques can be applied to obtain quite accurate motion estimates. The most crucial step in the vision-based estimation of the vehicle motion consists on detecting matchings between image pairs. Here we propose the extensive use of texture analysis as a tool to ameliorate the correspondence problem in underwater images. Once a robust set of correspondences has been found, the three-dimensional motion of the vehicle can be computed with respect to the bed of the sea. Finally, motion estimates allow the construction of a map that could aid to the navigation of the robot

Relevância:

30.00% 30.00%

Publicador:

Resumo:

DNA sequence variation has been associated with quantitative changes in molecular phenotypes such as gene expression, but its impact on chromatin states is poorly characterized. To understand the interplay between chromatin and genetic control of gene regulation, we quantified allelic variability in transcription factor binding, histone modifications, and gene expression within humans. We found abundant allelic specificity in chromatin and extensive local, short-range, and long-range allelic coordination among the studied molecular phenotypes. We observed genetic influence on most of these phenotypes, with histone modifications exhibiting strong context-dependent behavior. Our results implicate transcription factors as primary mediators of sequence-specific regulation of gene expression programs, with histone modifications frequently reflecting the primary regulatory event.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The global emergence and spread of malaria parasites resistant to antimalarial drugs is the major problem in malaria control. The genetic basis of the parasite's resistance to the antimalarial drug chloroquine (CQ) is well-documented, allowing for the analysis of field isolates of malaria parasites to address evolutionary questions concerning the origin and spread of CQ-resistance. Here, we present DNA sequence analyses of both the second exon of the Plasmodium falciparum CQ-resistance transporter (pfcrt) gene and the 5' end of the P. falciparum multidrug-resistance 1 (pfmdr-1) gene in 40 P. falciparum field isolates collected from eight different localities of Odisha, India. First, we genotyped the samples for the pfcrt K76T and pfmdr-1 N86Y mutations in these two genes, which are the mutations primarily implicated in CQ-resistance. We further analyzed amino acid changes in codons 72-76 of the pfcrt haplotypes. Interestingly, both the K76T and N86Y mutations were found to co-exist in 32 out of the total 40 isolates, which were of either the CVIET or SVMNT haplotype, while the remaining eight isolates were of the CVMNK haplotype. In total, eight nonsynonymous single nucleotide polymorphisms (SNPs) were observed, six in the pfcrt gene and two in the pfmdr-1 gene. One poorly studied SNP in the pfcrt gene (A97T) was found at a high frequency in many P. falciparum samples. Using population genetics to analyze these two gene fragments, we revealed comparatively higher nucleotide diversity in the pfcrt gene than in the pfmdr-1 gene. Furthermore, linkage disequilibrium was found to be tight between closely spaced SNPs of the pfcrt gene. Finally, both the pfcrt and the pfmdr-1 genes were found to evolve under the standard neutral model of molecular evolution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The aromatase inhibitor letrozole, as compared with tamoxifen, improves disease-free survival among postmenopausal women with receptor-positive early breast cancer. It is unknown whether sequential treatment with tamoxifen and letrozole is superior to letrozole therapy alone. METHODS: In this randomized, phase 3, double-blind trial of the treatment of hormone-receptor-positive breast cancer in postmenopausal women, we randomly assigned women to receive 5 years of tamoxifen monotherapy, 5 years of letrozole monotherapy, or 2 years of treatment with one agent followed by 3 years of treatment with the other. We compared the sequential treatments with letrozole monotherapy among 6182 women and also report a protocol-specified updated analysis of letrozole versus tamoxifen monotherapy in 4922 women. RESULTS: At a median follow-up of 71 months after randomization, disease-free survival was not significantly improved with either sequential treatment as compared with letrozole alone (hazard ratio for tamoxifen followed by letrozole, 1.05; 99% confidence interval [CI], 0.84 to 1.32; hazard ratio for letrozole followed by tamoxifen, 0.96; 99% CI, 0.76 to 1.21). There were more early relapses among women who were assigned to tamoxifen followed by letrozole than among those who were assigned to letrozole alone. The updated analysis of monotherapy showed that there was a nonsignificant difference in overall survival between women assigned to treatment with letrozole and those assigned to treatment with tamoxifen (hazard ratio for letrozole, 0.87; 95% CI, 0.75 to 1.02; P=0.08). The rate of adverse events was as expected on the basis of previous reports of letrozole and tamoxifen therapy. CONCLUSIONS: Among postmenopausal women with endocrine-responsive breast cancer, sequential treatment with letrozole and tamoxifen, as compared with letrozole monotherapy, did not improve disease-free survival. The difference in overall survival with letrozole monotherapy and tamoxifen monotherapy was not statistically significant. (ClinicalTrials.gov number, NCT00004205.)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The high occurrence of nosocomial multidrug-resistant (MDR) microorganisms is considered a global health problem. Here, we report the draft genome sequence of a MDR Pseudomonas aeruginosa strain isolated in Brazil that belongs to the endemic clone ST277. The genome encodes important resistance determinant genes and consists of 6.7 Mb with a G+C content of 66.86% and 6,347 predicted coding regions including 60 RNAs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Local adaptation provides an opportunity to study the genetic basis of adaptation and investigate the allelic architecture of adaptive genes. We study delay of germination 1 (DOG1), a gene controlling natural variation in seed dormancy in Arabidopsis thaliana and investigate evolution of dormancy in 41 populations distributed in four regions separated by natural barriers. Using F(ST) and Q(ST) comparisons, we compare variation at DOG1 with neutral markers and quantitative variation in seed dormancy. Patterns of genetic differentiation among populations suggest that the gene DOG1 contributes to local adaptation. Although Q(ST) for seed dormancy is not different from F(ST) for neutral markers, a correlation with variation in summer precipitation supports that seed dormancy is adaptive. We characterize dormancy variation in several F(2) -populations and show that a series of functionally distinct alleles segregate at the DOG1 locus. Theoretical models have shown that the number and effect of alleles segregatin at quantitative trait loci (QTL) have important consequences for adaptation. Our results provide support to models postulating a large number of alleles at quantitative trait loci involved in adaptation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper focus on the problem of locating single-phase faults in mixed distribution electric systems, with overhead lines and underground cables, using voltage and current measurements at the sending-end and sequence model of the network. Since calculating series impedance for underground cables is not as simple as in the case of overhead lines, the paper proposes a methodology to obtain an estimation of zero-sequence impedance of underground cables starting from previous single-faults occurred in the system, in which an electric arc occurred at the fault location. For this reason, the signal is previously pretreated to eliminate its peaks voltage and the analysis can be done working with a signal as close as a sinus wave as possible

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Methicillin-resistant Staphylococcus aureus (MRSA) is a major cause of nosocomial infections worldwide. To differentiate reliably among S. aureus isolates, we recently developed double locus sequence typing (DLST) based on the analysis of partial sequences of clfB and spa genes. In the present study, we evaluated the usefulness of DLST for epidemiological investigations of MRSA by routinely typing 1242 strains isolated in Western Switzerland. Additionally, particular local and international collections were typed by pulsed field gel electrophoresis (PFGE) and DLST to check the compatibility of DLST with the results obtained by PFGE, and for international comparisons. Using DLST, we identified the major MRSA clones of Western Switzerland, and demonstrated the close relationship between local and international clones. The congruence of 88% between the major PFGE and DLST clones indicated that our results obtained by DLST were compatible with earlier results obtained by PFGE. DLST could thus easily be incorporated in a routine surveillance procedure. In addition, the unambiguous definition of DLST types makes this method more suitable than PFGE for long-term epidemiological surveillance. Finally, the comparison of the results obtained by DLST, multilocus sequence typing, PFGE, Staphylococcal cassette chromosome mec typing and the detection of Panton-Valentine leukocidin genes indicated that no typing scheme should be used on its own. It is only the combination of data from different methods that gives the best chance of describing precisely the epidemiology and phylogeny of MRSA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both.Results: Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors.Conclusion: We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required. Results: Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html webcite) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented. Conclusion: OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decades, the globalized competition among cities and regions made them develop new strategies for branding and promoting their territory to attract tourists, investors, companies and residents. Major sports events - such as the Olympic Games, the FIFA World Cup or World and Continental Championships - have played an integral part in these strategies. Believing, with or without evidence, in the capacity of those events to improve the visibility and the economy of the host destination, many cities, regions and even countries have engaged in establishing sports events hosting strategies. The problem of the globalized competition in the sports events "market" is that many cities and regions do not have the resources - either financial, human or in terms of infrastructure - to compete in hosting major sports events. Consequently, many cities or regions have to turn to second-tier sports events. To organise those smaller events means less media coverage and more difficulty in finding sponsors, while the costs - both financial and in terms of services - stay high for the community. This paper analyses how Heritage Sporting Events (HSE) might be an opportunity for cities and regions engaged in sports events hosting strategies. HSE is an emerging concept that to date has been under-researched in the academic literature. Therefore, this paper aims to define the concept of HSE through an exploratory research study. A multidisciplinary literature review reveals two major characteristics of HSEs: the sustainability in the territory and the authenticity of the event constructed through a differentiation process. These characteristics, defined through multiple variables, give us the opportunity to observe the construction process of a sports event into a heritage object. This paper argues that HSEs can be seen as territorial resources that can represent a competitive advantage for host destinations. In conclusion, academics are invited to further research HSEs to better understand their construction process and their impacts on the territory, while local authorities are invited to consider HSEs for the branding and the promotion of their territory.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

From a managerial point of view, the more effcient, simple, and parameter-free (ESP) an algorithm is, the more likely it will be used in practice for solving real-life problems. Following this principle, an ESP algorithm for solving the Permutation Flowshop Sequencing Problem (PFSP) is proposed in this article. Using an Iterated Local Search (ILS) framework, the so-called ILS-ESP algorithm is able to compete in performance with other well-known ILS-based approaches, which are considered among the most effcient algorithms for the PFSP. However, while other similar approaches still employ several parameters that can affect their performance if not properly chosen, our algorithm does not require any particular fine-tuning process since it uses basic "common sense" rules for the local search, perturbation, and acceptance criterion stages of the ILS metaheuristic. Our approach defines a new operator for the ILS perturbation process, a new acceptance criterion based on extremely simple and transparent rules, and a biased randomization process of the initial solution to randomly generate different alternative initial solutions of similar quality -which is attained by applying a biased randomization to a classical PFSP heuristic. This diversification of the initial solution aims at avoiding poorly designed starting points and, thus, allows the methodology to take advantage of current trends in parallel and distributed computing. A set of extensive tests, based on literature benchmarks, has been carried out in order to validate our algorithm and compare it against other approaches. These tests show that our parameter-free algorithm is able to compete with state-of-the-art metaheuristics for the PFSP. Also, the experiments show that, when using parallel computing, it is possible to improve the top ILS-based metaheuristic by just incorporating to it our biased randomization process with a high-quality pseudo-random number generator.