63 resultados para statistical methods


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

En l"actualitat és difícil parlar de processos estadístics d"anàlisi quantitativa de dades sense fer referència a la informàtica aplicada a la recerca. Aquests recursos informàtics es basen sovint en paquets de programes informàtics que tenen l"objectiu d"ajudar al/a la investigador/a en la fase d"anàlisi de dades. En aquests moments un dels paquets més perfeccionats i complets és l"SPSS (Statistical Package for the Social Sciences). L"SPSS és un paquet de programes per tal de dur a terme l"anàlisi estadística de les dades. Constitueix una aplicació estadística força potent, de la qual s"han anat desenvolupant diverses versions des dels seus inicis, als anys setanta. En aquest manual les sortides d"ordinador que es presenten pertanyen a la versió 11.0.1. No obstant això, tot i que la forma ha anat variant des dels inicis, pel que fa al funcionament segueix essent molt similar entre les diferents versions. Abans d"iniciar-nos en la utilització de les aplicacions de l"SPSS és important familiaritzarse amb algunes de les finestres que més farem servir. En entrar a l"SPSS el primer que ens trobem és l"editor de dades. Aquesta finestra visualitza, bàsicament, les dades que anirem introduint. L"editor de dades inclou dues opcions: la Vista de les dades i la de les variables. Aquestes opcions es poden seleccionar a partir de les dues pestanyes que es presenten en la part inferior. La vista de dades conté el menú general i la matriu de dades. Aquesta matriu s"estructura amb els casos a les files i les variables a les columnes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: In longitudinal studies where subjects experience recurrent incidents over a period of time, such as respiratory infections, fever or diarrhea, statistical methods are required to take into account the within-subject correlation. Methods: For repeated events data with censored failure, the independent increment (AG), marginal (WLW) and conditional (PWP) models are three multiple failure models that generalize Cox"s proportional hazard model. In this paper, we revise the efficiency, accuracy and robustness of all three models under simulated scenarios with varying degrees of within-subject correlation, censoring levels, maximum number of possible recurrences and sample size. We also study the methods performance on a real dataset from a cohort study with bronchial obstruction. Results: We find substantial differences between methods and there is not an optimal method. AG and PWP seem to be preferable to WLW for low correlation levels but the situation reverts for high correlations. Conclusions: All methods are stable in front of censoring, worsen with increasing recurrence levels and share a bias problem which, among other consequences, makes asymptotic normal confidence intervals not fully reliable, although they are well developed theoretically.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Una vegada es disposa de les dades introduïdes al paquet estadístic de l"SPSS (Statistical Package of Social Science) en una matriu de dades, és el moment de plantejar-se optimitzar aquesta matriu per poder extreure el màxim rendiment a les dades, segons el tipus d"anàlisi que es pretengui dur a terme. Per a això, el mateix SPSS té una sèrie d"utilitats que poden ser de gran utilitat. Aquestes utilitats bàsiques poden diferenciar-se segons la seva funcionalitat entre: les utilitats per a l"edició de dades, les utilitats per a la modificació de variables i les opcions d"ajuda que ens brinda. A continuació es presenten algunes d"aquestes utilitats.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The aim of this study is to define a new statistic, PVL, based on the relative distance between the likelihood associated with the simulation replications and the likelihood of the conceptual model. Our results coming from several simulation experiments of a clinical trial show that the PVL statistic range can be a good measure of stability to establish when a computational model verifies the underlying conceptual model. PVL improves also the analysis of simulation replications because only one statistic is associated with all the simulation replications. As well it presents several verification scenarios, obtained by altering the simulation model, that show the usefulness of PVL. Further simulation experiments suggest that a 0 to 20 % range may define adequate limits for the verification problem, if considered from the viewpoint of an equivalence test.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

El análisis discriminante es un método estadístico a través del cual se busca conocer qué variables, medidas en objetos o individuos, explican mejor la atribución de la diferencia de los grupos a los cuales pertenecen dichos objetos o individuos. Es una técnica que nos permite comprobar hasta qué punto las variables independientes consideradas en la investigación clasifican correctamente a los sujetos u objetos. Se muestran y explican los principales elementos que se relacionan con el procedimiento para llevar a cabo el análisis discriminante y su aplicación utilizando el paquete estadístico SPSS, versión 18, para el desarrollo del modelo estadístico, las condiciones para la aplicación del análisis, la estimación e interpretación de las funciones discriminantes, los métodos de clasificación y la validación de los resultados.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Compositional data (concentrations) are common in geosciences. Neglecting its character mey lead to erroneous conclusions. Spurious correlation (K. Pearson, 1897) has disastrous consequences. On the basis of the pioneering work by J. Aitchison in the 1980s, a methodology free of these drawbacks is now available. The geometry of the símplex allows the representation of compositions using orthogonal co-ordinares, to which usual statistical methods can be applied, thus facilating computation ans analysis. The use of (log) ratios precludes the interpretation of single concentrations disregarding their relative character. A hydro-chemical data set is used to illustrate the point

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents and compares two approaches to estimate the origin (upstream or downstream) of voltage sag registered in distribution substations. The first approach is based on the application of a single rule dealing with features extracted from the impedances during the fault whereas the second method exploit the variability of waveforms from an statistical point of view. Both approaches have been tested with voltage sags registered in distribution substations and advantages, drawbacks and comparative results are presented

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both.Results: Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors.Conclusion: We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Ground clutter caused by anomalous propagation (anaprop) can affect seriously radar rain rate estimates, particularly in fully automatic radar processing systems, and, if not filtered, can produce frequent false alarms. A statistical study of anomalous propagation detected from two operational C-band radars in the northern Italian region of Emilia Romagna is discussed, paying particular attention to its diurnal and seasonal variability. The analysis shows a high incidence of anaprop in summer, mainly in the morning and evening, due to the humid and hot summer climate of the Po Valley, particularly in the coastal zone. Thereafter, a comparison between different techniques and datasets to retrieve the vertical profile of the refractive index gradient in the boundary layer is also presented. In particular, their capability to detect anomalous propagation conditions is compared. Furthermore, beam path trajectories are simulated using a multilayer ray-tracing model and the influence of the propagation conditions on the beam trajectory and shape is examined. High resolution radiosounding data are identified as the best available dataset to reproduce accurately the local propagation conditions, while lower resolution standard TEMP data suffers from interpolation degradation and Numerical Weather Prediction model data (Lokal Model) are able to retrieve a tendency to superrefraction but not to detect ducting conditions. Observing the ray tracing of the centre, lower and upper limits of the radar antenna 3-dB half-power main beam lobe it is concluded that ducting layers produce a change in the measured volume and in the power distribution that can lead to an additional error in the reflectivity estimate and, subsequently, in the estimated rainfall rate.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, some steganalytic techniques designed to detect the existence of hidden messages using histogram shifting methods are presented. Firstly, some techniques to identify specific methods of histogram shifting, based on visible marks on the histogram or abnormal statistical distributions are suggested. Then, we present a general technique capable of detecting all histogram shifting techniques analyzed. This technique is based on the effect of histogram shifting methods on the "volatility" of the histogram of differences and the study of its reduction whenever new data are hidden.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present study evaluates the performance of four methods for estimating regression coefficients used to make statistical decisions regarding intervention effectiveness in single-case designs. Ordinary least squares estimation is compared to two correction techniques dealing with general trend and one eliminating autocorrelation whenever it is present. Type I error rates and statistical power are studied for experimental conditions defined by the presence or absence of treatment effect (change in level or in slope), general trend, and serial dependence. The results show that empirical Type I error rates do not approximate the nominal ones in presence of autocorrelation or general trend when ordinary and generalized least squares are applied. The techniques controlling trend show lower false alarm rates, but prove to be insufficiently sensitive to existing treatment effects. Consequently, the use of the statistical significance of the regression coefficients for detecting treatment effects is not recommended for short data series.