930 resultados para Data Interpretation, Statistical


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims/hypothesis: We investigated whether children who are heavier at birth have an increased risk of type 1 diabetes. Methods: Relevant studies published before February 2009 were identified from literature searches using MEDLINE, Web of Science and EMBASE. Authors of all studies containing relevant data were contacted and asked to provide individual patient data or conduct pre-specified analyses. Risk estimates of type 1 diabetes by category of birthweight were calculated for each study, before and after adjustment for potential confounders. Meta-analysis techniques were then used to derive combined ORs and investigate heterogeneity between studies. Results: Data were available for 29 predominantly European studies (five cohort, 24 case-control studies), including 12,807 cases of type 1 diabetes. Overall, studies consistently demonstrated that children with birthweight from 3.5 to 4 kg had an increased risk of diabetes of 6% (OR 1.06 [95% CI 1.01-1.11]; p=0.02) and children with birthweight over 4 kg had an increased risk of 10% (OR 1.10 [95% CI 1.04-1.19]; p=0.003), compared with children weighing 3.0 to 3.5 kg at birth. This corresponded to a linear increase in diabetes risk of 3% per 500 g increase in birthweight (OR 1.03 [95% CI 1.00-1.06]; p=0.03). Adjustments for potential confounders such as gestational age, maternal age, birth order, Caesarean section, breastfeeding and maternal diabetes had little effect on these findings. Conclusions/interpretation: Children who are heavier at birth have a significant and consistent, but relatively small increase in risk of type 1 diabetes. © 2010 Springer-Verlag.


--------------------------------------------------------------------------------

Reaxys Database Information|

--------------------------------------------------------------------------------

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation: The inference of regulatory networks from large-scale expression data holds great promise because of the potentially causal interpretation of these networks. However, due to the difficulty to establish reliable methods based on observational data there is so far only incomplete knowledge about possibilities and limitations of such inference methods in this context.

Results: In this article, we conduct a statistical analysis investigating differences and similarities of four network inference algorithms, ARACNE, CLR, MRNET and RN, with respect to local network-based measures. We employ ensemble methods allowing to assess the inferability down to the level of individual edges. Our analysis reveals the bias of these inference methods with respect to the inference of various network components and, hence, provides guidance in the interpretation of inferred regulatory networks from expression data. Further, as application we predict the total number of regulatory interactions in human B cells and hypothesize about the role of Myc and its targets regarding molecular information processing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Neptune’s Cave in the Velfjord–Tosenfjord area of Nordland, Norway is described, together with its various organic deposits. Samples of attached barnacles, loose marine molluscs, animal bones and organic sediments were dated, with radiocarbon ages of 9840+/-90 and 9570+/-80 yr BP being derived for the barnacles and molluscs, based on the superseded but locally used marine reservoir age of 440 years. A growth temperature of c. 7.51C in undiluted seawater is deduced from the d13C and d18O values of both types of marine shell, which is consistent with their early Holocene age. From the dates, and an assessment of local Holocene uplift and Weichselian deglaciation, a scenario is constructed that could explain the situation and condition of the various deposits. The analysis uses assumed local isobases and sea-level curve to give results: that are consistent with previous data, that equate the demise of the barnacles to the collapse of a tidewater glacier in Tosenfjord, and that constrain the minimum extent of local Holocene uplift. An elk fell into the cave in the mid-Holocene at 510070 yr BP, after which a much later single ‘bog-burst’ event at 178070 yr BP could explain the transport of the various loose deposits further into the cave.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The monitoring of multivariate systems that exhibit non-Gaussian behavior is addressed. Existing work advocates the use of independent component analysis (ICA) to extract the underlying non-Gaussian data structure. Since some of the source signals may be Gaussian, the use of principal component analysis (PCA) is proposed to capture the Gaussian and non-Gaussian source signals. A subsequent application of ICA then allows the extraction of non-Gaussian components from the retained principal components (PCs). A further contribution is the utilization of a support vector data description to determine a confidence limit for the non-Gaussian components. Finally, a statistical test is developed for determining how many non-Gaussian components are encapsulated within the retained PCs, and associated monitoring statistics are defined. The utility of the proposed scheme is demonstrated by a simulation example, and the analysis of recorded data from an industrial melter.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To date, the processing of wildlife location data has relied on a diversity of software and file formats. Data management and the following spatial and statistical analyses were undertaken in multiple steps, involving many time-consuming importing/exporting phases. Recent technological advancements in tracking systems have made large, continuous, high-frequency datasets of wildlife behavioral data available, such as those derived from the global positioning system (GPS) and other animal-attached sensor devices. These data can be further complemented by a wide range of other information about the animals’ environment. Management of these large and diverse datasets for modelling animal behaviour and ecology can prove challenging, slowing down analysis and increasing the probability of mistakes in data handling. We address these issues by critically evaluating the requirements for good management of GPS data for wildlife biology. We highlight that dedicated data management tools and expertise are needed. We explore current research in wildlife data management. We suggest a general direction of development, based on a modular software architecture with a spatial database at its core, where interoperability, data model design and integration with remote-sensing data sources play an important role in successful GPS data handling.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background. The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. Results. Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. Conclusions. Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper examines the stability of the benefit transfer function across 42 recreational forests in the British Isles. A working definition of reliable function transfer is Put forward, and a suitable statistical test is provided. A novel split sample method is used to test the sensitivity of the models' log-likelihood values to the removal of contingent valuation (CV) responses collected at individual forest sites, We find that a stable function improves Our measure of transfer reliability, but not by much. We conclude that, in empirical Studies on transferability, considerations of function stability are secondary to the availability and quality of site attribute data. Modellers' can study the advantages of transfer function stability vis-a-vis the value of additional information on recreation site attributes. (c) 2008 Elsevier GmbH. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates the performance of the tests proposed by Hadri and by Hadri and Larsson for testing for stationarity in heterogeneous panel data under model misspecification. The panel tests are based on the well known KPSS test (cf. Kwiatkowski et al.) which considers two models: stationarity around a deterministic level and stationarity around a deterministic trend. There is no study, as far as we know, on the statistical properties of the test when the wrong model is used. We also consider the case of the simultaneous presence of the two types of models in a panel. We employ two asymptotics: joint asymptotic, T, N -> infinity simultaneously, and T fixed and N allowed to grow indefinitely. We use Monte Carlo experiments to investigate the effects of misspecification in sample sizes usually used in practice. The results indicate that the assumption that T is fixed rather than asymptotic leads to tests that have less size distortions, particularly for relatively small T with large N panels (micro-panels) than the tests derived under the joint asymptotics. We also find that choosing a deterministic trend when a deterministic level is true does not significantly affect the properties of the test. But, choosing a deterministic level when a deterministic trend is true leads to extreme over-rejections. Therefore, when unsure about which model has generated the data, it is suggested to use the model with a trend. We also propose a new statistic for testing for stationarity in mixed panel data where the mixture is known. The performance of this new test is very good for both cases of T asymptotic and T fixed. The statistic for T asymptotic is slightly undersized when T is very small (

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents an analysis of entropy-based molecular descriptors. Specifically, we use real chemical structures, as well as synthetic isomeric structures, and investigate properties of and among descriptors with respect to the used data set by a statistical analysis. Our numerical results provide evidence that synthetic chemical structures are notably different to real chemical structures and, hence, should not be used to investigate molecular descriptors. Instead, an analysis based on real chemical structures is favorable. Further, we find strong hints that molecular descriptors can be partitioned into distinct classes capturing complementary information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nonlinear principal component analysis (PCA) based on neural networks has drawn significant attention as a monitoring tool for complex nonlinear processes, but there remains a difficulty with determining the optimal network topology. This paper exploits the advantages of the Fast Recursive Algorithm, where the number of nodes, the location of centres, and the weights between the hidden layer and the output layer can be identified simultaneously for the radial basis function (RBF) networks. The topology problem for the nonlinear PCA based on neural networks can thus be solved. Another problem with nonlinear PCA is that the derived nonlinear scores may not be statistically independent or follow a simple parametric distribution. This hinders its applications in process monitoring since the simplicity of applying predetermined probability distribution functions is lost. This paper proposes the use of a support vector data description and shows that transforming the nonlinear principal components into a feature space allows a simple statistical inference. Results from both simulated and industrial data confirm the efficacy of the proposed method for solving nonlinear principal component problems, compared with linear PCA and kernel PCA.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this study is to survey the use of networks and network-based methods in systems biology. This study starts with an introduction to graph theory and basic measures allowing to quantify structural properties of networks. Then, the authors present important network classes and gene networks as well as methods for their analysis. In the last part of this study, the authors review approaches that aim at analysing the functional organisation of gene networks and the use of networks in medicine. In addition to this, the authors advocate networks as a systematic approach to general problems in systems biology, because networks are capable of assuming multiple roles that are very beneficial connecting experimental data with a functional interpretation in biological terms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper studies single-channel speech separation, assuming unknown, arbitrary temporal dynamics for the speech signals to be separated. A data-driven approach is described, which matches each mixed speech segment against a composite training segment to separate the underlying clean speech segments. To advance the separation accuracy, the new approach seeks and separates the longest mixed speech segments with matching composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the constituent training segments, and hence the error of separation. For convenience, we call the new approach Composition of Longest Segments, or CLOSE. The CLOSE method includes a data-driven approach to model long-range temporal dynamics of speech signals, and a statistical approach to identify the longest mixed speech segments with matching composite training segments. Experiments are conducted on the Wall Street Journal database, for separating mixtures of two simultaneous large-vocabulary speech utterances spoken by two different speakers. The results are evaluated using various objective and subjective measures, including the challenge of large-vocabulary continuous speech recognition. It is shown that the new separation approach leads to significant improvement in all these measures.