987 resultados para Statistical measures


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Construction of multiple sequence alignments is a fundamental task in Bioinformatics. Multiple sequence alignments are used as a prerequisite in many Bioinformatics methods, and subsequently the quality of such methods can be critically dependent on the quality of the alignment. However, automatic construction of a multiple sequence alignment for a set of remotely related sequences does not always provide biologically relevant alignments.Therefore, there is a need for an objective approach for evaluating the quality of automatically aligned sequences. The profile hidden Markov model is a powerful approach in comparative genomics. In the profile hidden Markov model, the symbol probabilities are estimated at each conserved alignment position. This can increase the dimension of parameter space and cause an overfitting problem. These two research problems are both related to conservation. We have developed statistical measures for quantifying the conservation of multiple sequence alignments. Two types of methods are considered, those identifying conserved residues in an alignment position, and those calculating positional conservation scores. The positional conservation score was exploited in a statistical prediction model for assessing the quality of multiple sequence alignments. The residue conservation score was used as part of the emission probability estimation method proposed for profile hidden Markov models. The results of the predicted alignment quality score highly correlated with the correct alignment quality scores, indicating that our method is reliable for assessing the quality of any multiple sequence alignment. The comparison of the emission probability estimation method with the maximum likelihood method showed that the number of estimated parameters in the model was dramatically decreased, while the same level of accuracy was maintained. To conclude, we have shown that conservation can be successfully used in the statistical model for alignment quality assessment and in the estimation of emission probabilities in the profile hidden Markov models.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

A statistical modeling approach is proposed for use in searching large microarray data sets for genes that have a transcriptional response to a stimulus. The approach is unrestricted with respect to the timing, magnitude or duration of the response, or the overall abundance of the transcript. The statistical model makes an accommodation for systematic heterogeneity in expression levels. Corresponding data analyses provide gene-specific information, and the approach provides a means for evaluating the statistical significance of such information. To illustrate this strategy we have derived a model to depict the profile expected for a periodically transcribed gene and used it to look for budding yeast transcripts that adhere to this profile. Using objective criteria, this method identifies 81% of the known periodic transcripts and 1,088 genes, which show significant periodicity in at least one of the three data sets analyzed. However, only one-quarter of these genes show significant oscillations in at least two data sets and can be classified as periodic with high confidence. The method provides estimates of the mean activation and deactivation times, induced and basal expression levels, and statistical measures of the precision of these estimates for each periodic transcript.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

AIM: To determine cytomegalovirus (CMV) frequency in neonatal intrahepatic cholestasis by serology, histological revision (searching for cytomegalic cells), immunohistochemistry, and polymerase chain reaction (PCR), and to verify the relationships among these methods. METHODS: The study comprised 101 non-consecutive infants submitted for hepatic biopsy between March 1982 and December 2005. Serological results were obtained from the patient's files and the other methods were performed on paraffin-embedded liver samples from hepatic biopsies. The following statistical measures were calculated: frequency, sensibility, specific positive predictive value, negative predictive value, and accuracy. RESULTS: The frequencies of positive results were as follows: serology, 7/64 (11%); histological revision, 0/84; immunohistochemistry, 1/44 (2%), and PCR, 6/77 (8%). Only one patient had positive immunohistochemical findings and a positive PCR. The following statistical measures were calculated between PCR and serology: sensitivity, 33.3%; specificity, 88.89%; positive predictive value, 28.57%; negative predictive value, 90.91%; and accuracy, 82.35%. CONCLUSION: The frequency of positive CMV varied among the tests. Serology presented the highest positive frequency. When compared to PCR, the sensitivity and positive predictive value of serology were low. (C) 2009 The WJG Press and Baishicleng. All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Histamine is an important biogenic amine, which acts with a group of four G-protein coupled receptors (GPCRs), namely H(1) to H(4) (H(1)R - H(4)R) receptors. The actions of histamine at H(4)R are related to immunological and inflammatory processes, particularly in pathophysiology of asthma, and H(4)R ligands having antagonistic properties could be helpful as antiinflammatory agents. In this work, molecular modeling and QSAR studies of a set of 30 compounds, indole and benzimidazole derivatives, as H(4)R antagonists were performed. The QSAR models were built and optimized using a genetic algorithm function and partial least squares regression (WOLF 5.5 program). The best QSAR model constructed with training set (N = 25) presented the following statistical measures: r (2) = 0.76, q (2) = 0.62, LOF = 0.15, and LSE = 0.07, and was validated using the LNO and y-randomization techniques. Four of five compounds of test set were well predicted by the selected QSAR model, which presented an external prediction power of 80%. These findings can be quite useful to aid the designing of new anti-H(4) compounds with improved biological response.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We recently evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach in delineating breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We previously evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach to delineate breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The thin-layer drying behaviour of bananas in a beat pump dehumidifier dryer was examined. Four pre-treatments (blanching, chilling, freezing and combined blanching and freezing) were applied to the bananas, which were dried at 50 degreesC with an air velocity of 3.1 m s(-1) and with the relative humidity of the inlet air of 10-35%. Three drying models, the simple model, the two-term exponential model and the Page model were examined. All models were evaluated using three statistical measures, correlation coefficient, root means square error, and mean absolute percent error. Moisture diffusivity was calculated based on the diffusion equation for an infinite cylindrical shape using the slope method. The rate of drying was higher for the pre-treatments involving freezing. The sample which was blanched only did not show any improvement in drying rate. In fact, a longer drying time resulted due to water absorption during blanching. There was no change in the rate for the chilled sample compared with the control. While all models closely fitted the drying data, the simple model showed greatest deviation from the experimental results. The two-term exponential model was found to be the best model for describing the drying curves of bananas because its parameters represent better the physical characteristics of the drying process. Moisture diffusivities of bananas were in the range 4.3-13.2 x 10(-10) m(2)s(-1). (C) 2002 Published by Elsevier Science Ltd.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method’s artifacts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A Work Project, presented as part of the requirements for the Award of a Masters Degree in Economics from the NOVA – School of Business and Economics

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Coherence resonance occurring in semiconductor lasers with optical feedback is studied via the Lang-Kobayashi model with external nonwhite noise in the pumping current. The temporal correlation and the amplitude of the noise have a highly relevant influence in the system, leading to an optimal coherent response for suitable values of both the noise amplitude and correlation time. This phenomenon is quantitatively characterized by means of several statistical measures.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Defining the limits of an urban agglomeration is essential both for fundamental and applied studies in quantitative and theoretical geography. A simple and consistent way for defining such urban clusters is important for performing different statistical analysis and comparisons. Traditionally, agglomerations are defined using a rather qualitative approach based on various statistical measures. This definition varies generally from one country to another, and the data taken into account are different. In this paper, we explore the use of the City Clustering Algorithm (CCA) for the agglomeration definition in Switzerland. This algorithm provides a systemic and easy way to define an urban area based only on population data. The CCA allows the specification of the spatial resolution for defining the urban clusters. The results from different resolutions are compared and analysed, and the effect of filtering the data investigated. Different scales and parameters allow highlighting different phenomena. The study of Zipf's law using the visual rank-size rule shows that it is valid only for some specific urban clusters, inside a narrow range of the spatial resolution of the CCA. The scale where emergence of one main cluster occurs can also be found in the analysis using Zipf's law. The study of the urban clusters at different scales using the lacunarity measure - a complementary measure to the fractal dimension - allows to highlight the change of scale at a given range.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

1. Few examples of habitat-modelling studies of rare and endangered species exist in the literature, although from a conservation perspective predicting their distribution would prove particularly useful. Paucity of data and lack of valid absences are the probable reasons for this shortcoming. Analytic solutions to accommodate the lack of absence include the ecological niche factor analysis (ENFA) and the use of generalized linear models (GLM) with simulated pseudo-absences. 2. In this study we tested a new approach to generating pseudo-absences, based on a preliminary ENFA habitat suitability (HS) map, for the endangered species Eryngium alpinum. This method of generating pseudo-absences was compared with two others: (i) use of a GLM with pseudo-absences generated totally at random, and (ii) use of an ENFA only. 3. The influence of two different spatial resolutions (i.e. grain) was also assessed for tackling the dilemma of quality (grain) vs. quantity (number of occurrences). Each combination of the three above-mentioned methods with the two grains generated a distinct HS map. 4. Four evaluation measures were used for comparing these HS maps: total deviance explained, best kappa, Gini coefficient and minimal predicted area (MPA). The last is a new evaluation criterion proposed in this study. 5. Results showed that (i) GLM models using ENFA-weighted pseudo-absence provide better results, except for the MPA value, and that (ii) quality (spatial resolution and locational accuracy) of the data appears to be more important than quantity (number of occurrences). Furthermore, the proposed MPA value is suggested as a useful measure of model evaluation when used to complement classical statistical measures. 6. Synthesis and applications. We suggest that the use of ENFA-weighted pseudo-absence is a possible way to enhance the quality of GLM-based potential distribution maps and that data quality (i.e. spatial resolution) prevails over quantity (i.e. number of data). Increased accuracy of potential distribution maps could help to define better suitable areas for species protection and reintroduction.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In many industrial applications, accurate and fast surface reconstruction is essential for quality control. Variation in surface finishing parameters, such as surface roughness, can reflect defects in a manufacturing process, non-optimal product operational efficiency, and reduced life expectancy of the product. This thesis considers reconstruction and analysis of high-frequency variation, that is roughness, on planar surfaces. Standard roughness measures in industry are calculated from surface topography. A fast and non-contact method to obtain surface topography is to apply photometric stereo in the estimation of surface gradients and to reconstruct the surface by integrating the gradient fields. Alternatively, visual methods, such as statistical measures, fractal dimension and distance transforms, can be used to characterize surface roughness directly from gray-scale images. In this thesis, the accuracy of distance transforms, statistical measures, and fractal dimension are evaluated in the estimation of surface roughness from gray-scale images and topographies. The results are contrasted to standard industry roughness measures. In distance transforms, the key idea is that distance values calculated along a highly varying surface are greater than distances calculated along a smoother surface. Statistical measures and fractal dimension are common surface roughness measures. In the experiments, skewness and variance of brightness distribution, fractal dimension, and distance transforms exhibited strong linear correlations to standard industry roughness measures. One of the key strengths of photometric stereo method is the acquisition of higher frequency variation of surfaces. In this thesis, the reconstruction of planar high-frequency varying surfaces is studied in the presence of imaging noise and blur. Two Wiener filterbased methods are proposed of which one is optimal in the sense of surface power spectral density given the spectral properties of the imaging noise and blur. Experiments show that the proposed methods preserve the inherent high-frequency variation in the reconstructed surfaces, whereas traditional reconstruction methods typically handle incorrect measurements by smoothing, which dampens the high-frequency variation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A large number of urban surface energy balance models now exist with different assumptions about the important features of the surface and exchange processes that need to be incorporated. To date, no com- parison of these models has been conducted; in contrast, models for natural surfaces have been compared extensively as part of the Project for Intercomparison of Land-surface Parameterization Schemes. Here, the methods and first results from an extensive international comparison of 33 models are presented. The aim of the comparison overall is to understand the complexity required to model energy and water exchanges in urban areas. The degree of complexity included in the models is outlined and impacts on model performance are discussed. During the comparison there have been significant developments in the models with resulting improvements in performance (root-mean-square error falling by up to two-thirds). Evaluation is based on a dataset containing net all-wave radiation, sensible heat, and latent heat flux observations for an industrial area in Vancouver, British Columbia, Canada. The aim of the comparison is twofold: to identify those modeling ap- proaches that minimize the errors in the simulated fluxes of the urban energy balance and to determine the degree of model complexity required for accurate simulations. There is evidence that some classes of models perform better for individual fluxes but no model performs best or worst for all fluxes. In general, the simpler models perform as well as the more complex models based on all statistical measures. Generally the schemes have best overall capability to model net all-wave radiation and least capability to model latent heat flux.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A model for estimating the turbulent kinetic energy dissipation rate in the oceanic boundary layer, based on insights from rapid-distortion theory, is presented and tested. This model provides a possible explanation for the very high dissipation levels found by numerous authors near the surface. It is conceived that turbulence, injected into the water by breaking waves, is subsequently amplified due to its distortion by the mean shear of the wind-induced current and straining by the Stokes drift of surface waves. The partition of the turbulent shear stress into a shear-induced part and a wave-induced part is taken into account. In this picture, dissipation enhancement results from the same mechanism responsible for Langmuir circulations. Apart from a dimensionless depth and an eddy turn-over time, the dimensionless dissipation rate depends on the wave slope and wave age, which may be encapsulated in the turbulent Langmuir number La_t. For large La_t, or any Lat but large depth, the dissipation rate tends to the usual surface layer scaling, whereas when Lat is small, it is strongly enhanced near the surface, growing asymptotically as ɛ ∝ La_t^{-2} when La_t → 0. Results from this model are compared with observations from the WAVES and SWADE data sets, assuming that this is the dominant dissipation mechanism acting in the ocean surface layer and statistical measures of the corresponding fit indicate a substantial improvement over previous theoretical models. Comparisons are also carried out against more recent measurements, showing good order-of-magnitude agreement, even when shallow-water effects are important.