914 resultados para statistical data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Aortic aneurysm and dissection are important causes of death in older people. Ruptured aneurysms show catastrophic fatality rates reaching near 80%. Few population-based mortality studies have been published in the world and none in Brazil. The objective of the present study was to use multiple-cause-of-death methodology in the analysis of mortality trends related to aortic aneurysm and dissection in the state of Sao Paulo, between 1985 and 2009. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which aortic aneurysm and dissection were listed as a cause-of-death. The variables sex, age, season of the year, and underlying, associated or total mentions of causes of death were studied using standardized mortality rates, proportions and historical trends. Statistical analyses were performed by chi-square goodness-of-fit and H Kruskal-Wallis tests, and variance analysis. The joinpoint regression model was used to evaluate changes in age-standardized rates trends. A p value less than 0.05 was regarded as significant. Results: Over a 25-year period, there were 42,615 deaths related to aortic aneurysm and dissection, of which 36,088 (84.7%) were identified as underlying cause and 6,527 (15.3%) as an associated cause-of-death. Dissection and ruptured aneurysms were considered as an underlying cause of death in 93% of the deaths. For the entire period, a significant increased trend of age-standardized death rates was observed in men and women, while certain non-significant decreases occurred from 1996/2004 until 2009. Abdominal aortic aneurysms and aortic dissections prevailed among men and aortic dissections and aortic aneurysms of unspecified site among women. In 1985 and 2009 death rates ratios of men to women were respectively 2.86 and 2.19, corresponding to a difference decrease between rates of 23.4%. For aortic dissection, ruptured and non-ruptured aneurysms, the overall mean ages at death were, respectively, 63.2, 68.4 and 71.6 years; while, as the underlying cause, the main associated causes of death were as follows: hemorrhages (in 43.8%/40.5%/13.9%); hypertensive diseases (in 49.2%/22.43%/24.5%) and atherosclerosis (in 14.8%/25.5%/15.3%); and, as associated causes, their principal overall underlying causes of death were diseases of the circulatory (55.7%), and respiratory (13.8%) systems and neoplasms (7.8%). A significant seasonal variation, with highest frequency in winter, occurred in deaths identified as underlying cause for aortic dissection, ruptured and non-ruptured aneurysms. Conclusions: This study introduces the methodology of multiple-causes-of-death to enhance epidemiologic knowledge of aortic aneurysm and dissection in Sao Paulo, Brazil. The results presented confer light to the importance of mortality statistics and the need for epidemiologic studies to understand unique trends in our own population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims. We studied four young star clusters to characterise their anomalous extinction or variable reddening and asses whether they could be due to contamination by either dense clouds or circumstellar effects. Methods. We evaluated the extinction law (R-V) by adopting two methods: (i) the use of theoretical expressions based on the colour-excess of stars with known spectral type; and (ii) the analysis of two-colour diagrams, where the slope of the observed colour distribution was compared to the normal distribution. An algorithm to reproduce the zero-age main-sequence (ZAMS) reddened colours was developed to derive the average visual extinction (A(V)) that provides the closest fit to the observational data. The structure of the clouds was evaluated by means of a statistical fractal analysis, designed to compare their geometric structure with the spatial distribution of the cluster members. Results. The cluster NGC 6530 is the only object of our sample affected by anomalous extinction. On average, the other clusters suffer normal extinction, but several of their members, mainly in NGC 2264, seem to have high R-V, probably because of circumstellar effects. The ZAMS fitting provides A(V) values that are in good agreement with those found in the literature. The fractal analysis shows that NGC 6530 has a centrally concentrated distribution of stars that differs from the substructures found in the density distribution of the cloud projected in the A(V) map, suggesting that the original cloud was changed by the cluster formation. However, the fractal dimension and statistical parameters of Berkeley 86, NGC 2244, and NGC 2264 indicate that there is a good cloud-cluster correlation, when compared to other works based on an artificial distribution of points.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A common interest in gene expression data analysis is to identify from a large pool of candidate genes the genes that present significant changes in expression levels between a treatment and a control biological condition. Usually, it is done using a statistic value and a cutoff value that are used to separate the genes differentially and nondifferentially expressed. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating sequentially credibility intervals from predictive densities which are constructed using the sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained report evidence that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a well-known publicly available data set on Escherichia coli bacterium.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Aortic aneurysm and dissection are important causes of death in older people. Ruptured aneurysms show catastrophic fatality rates reaching near 80%. Few population-based mortality studies have been published in the world and none in Brazil. The objective of the present study was to use multiple-cause-of-death methodology in the analysis of mortality trends related to aortic aneurysm and dissection in the state of Sao Paulo, between 1985 and 2009. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which aortic aneurysm and dissection were listed as a cause-of-death. The variables sex, age, season of the year, and underlying, associated or total mentions of causes of death were studied using standardized mortality rates, proportions and historical trends. Statistical analyses were performed by chi-square goodness-of-fit and H Kruskal-Wallis tests, and variance analysis. The joinpoint regression model was used to evaluate changes in age-standardized rates trends. A p value less than 0.05 was regarded as significant. Results: Over a 25-year period, there were 42,615 deaths related to aortic aneurysm and dissection, of which 36,088 (84.7%) were identified as underlying cause and 6,527 (15.3%) as an associated cause-of-death. Dissection and ruptured aneurysms were considered as an underlying cause of death in 93% of the deaths. For the entire period, a significant increased trend of age-standardized death rates was observed in men and women, while certain non-significant decreases occurred from 1996/2004 until 2009. Abdominal aortic aneurysms and aortic dissections prevailed among men and aortic dissections and aortic aneurysms of unspecified site among women. In 1985 and 2009 death rates ratios of men to women were respectively 2.86 and 2.19, corresponding to a difference decrease between rates of 23.4%. For aortic dissection, ruptured and non-ruptured aneurysms, the overall mean ages at death were, respectively, 63.2, 68.4 and 71.6 years; while, as the underlying cause, the main associated causes of death were as follows: hemorrhages (in 43.8%/40.5%/13.9%); hypertensive diseases (in 49.2%/22.43%/24.5%) and atherosclerosis (in 14.8%/25.5%/15.3%); and, as associated causes, their principal overall underlying causes of death were diseases of the circulatory (55.7%), and respiratory (13.8%) systems and neoplasms (7.8%). A significant seasonal variation, with highest frequency in winter, occurred in deaths identified as underlying cause for aortic dissection, ruptured and non-ruptured aneurysms. Conclusions: This study introduces the methodology of multiple-causes-of-death to enhance epidemiologic knowledge of aortic aneurysm and dissection in São Paulo, Brazil. The results presented confer light to the importance of mortality statistics and the need for epidemiologic studies to understand unique trends in our own population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the past decade, the advent of efficient genome sequencing tools and high-throughput experimental biotechnology has lead to enormous progress in the life science. Among the most important innovations is the microarray tecnology. It allows to quantify the expression for thousands of genes simultaneously by measurin the hybridization from a tissue of interest to probes on a small glass or plastic slide. The characteristics of these data include a fair amount of random noise, a predictor dimension in the thousand, and a sample noise in the dozens. One of the most exciting areas to which microarray technology has been applied is the challenge of deciphering complex disease such as cancer. In these studies, samples are taken from two or more groups of individuals with heterogeneous phenotypes, pathologies, or clinical outcomes. these samples are hybridized to microarrays in an effort to find a small number of genes which are strongly correlated with the group of individuals. Eventhough today methods to analyse the data are welle developed and close to reach a standard organization (through the effort of preposed International project like Microarray Gene Expression Data -MGED- Society [1]) it is not unfrequant to stumble in a clinician's question that do not have a compelling statistical method that could permit to answer it.The contribution of this dissertation in deciphering disease regards the development of new approaches aiming at handle open problems posed by clinicians in handle specific experimental designs. In Chapter 1 starting from a biological necessary introduction, we revise the microarray tecnologies and all the important steps that involve an experiment from the production of the array, to the quality controls ending with preprocessing steps that will be used into the data analysis in the rest of the dissertation. While in Chapter 2 a critical review of standard analysis methods are provided stressing most of problems that In Chapter 3 is introduced a method to adress the issue of unbalanced design of miacroarray experiments. In microarray experiments, experimental design is a crucial starting-point for obtaining reasonable results. In a two-class problem, an equal or similar number of samples it should be collected between the two classes. However in some cases, e.g. rare pathologies, the approach to be taken is less evident. We propose to address this issue by applying a modified version of SAM [2]. MultiSAM consists in a reiterated application of a SAM analysis, comparing the less populated class (LPC) with 1,000 random samplings of the same size from the more populated class (MPC) A list of the differentially expressed genes is generated for each SAM application. After 1,000 reiterations, each single probe given a "score" ranging from 0 to 1,000 based on its recurrence in the 1,000 lists as differentially expressed. The performance of MultiSAM was compared to the performance of SAM and LIMMA [3] over two simulated data sets via beta and exponential distribution. The results of all three algorithms over low- noise data sets seems acceptable However, on a real unbalanced two-channel data set reagardin Chronic Lymphocitic Leukemia, LIMMA finds no significant probe, SAM finds 23 significantly changed probes but cannot separate the two classes, while MultiSAM finds 122 probes with score >300 and separates the data into two clusters by hierarchical clustering. We also report extra-assay validation in terms of differentially expressed genes Although standard algorithms perform well over low-noise simulated data sets, multi-SAM seems to be the only one able to reveal subtle differences in gene expression profiles on real unbalanced data. In Chapter 4 a method to adress similarities evaluation in a three-class prblem by means of Relevance Vector Machine [4] is described. In fact, looking at microarray data in a prognostic and diagnostic clinical framework, not only differences could have a crucial role. In some cases similarities can give useful and, sometimes even more, important information. The goal, given three classes, could be to establish, with a certain level of confidence, if the third one is similar to the first or the second one. In this work we show that Relevance Vector Machine (RVM) [2] could be a possible solutions to the limitation of standard supervised classification. In fact, RVM offers many advantages compared, for example, with his well-known precursor (Support Vector Machine - SVM [3]). Among these advantages, the estimate of posterior probability of class membership represents a key feature to address the similarity issue. This is a highly important, but often overlooked, option of any practical pattern recognition system. We focused on Tumor-Grade-three-class problem, so we have 67 samples of grade I (G1), 54 samples of grade 3 (G3) and 100 samples of grade 2 (G2). The goal is to find a model able to separate G1 from G3, then evaluate the third class G2 as test-set to obtain the probability for samples of G2 to be member of class G1 or class G3. The analysis showed that breast cancer samples of grade II have a molecular profile more similar to breast cancer samples of grade I. Looking at the literature this result have been guessed, but no measure of significance was gived before.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The candidate tackled an important issue in contemporary management: the role of CSR and Sustainability. The research proposal focused on a longitudinal and inductive research, directed to specify the evolution of CSR and contribute to the new institutional theory, in particular institutional work framework, and to the relation between institutions and discourse analysis. The documental analysis covers all the evolution of CSR, focusing also on a number of important networks and associations. Some of the methodologies employed in the thesis have been employed as a consequence of data analysis, in a truly inductive research process. The thesis is composed by two section. The first section mainly describes the research process and the analyses results. The candidates employed several research methods: a longitudinal content analysis of documents, a vocabulary research with statistical metrics as cluster analysis and factor analysis, a rhetorical analysis of justifications. The second section puts in relation the analysis results with theoretical frameworks and contributions. The candidate confronted with several frameworks: Actor-Network-Theory, Institutional work and Boundary Work, Institutional Logic. Chapters are focused on different issues: a historical reconstruction of CSR; a reflection about symbolic adoption of recurrent labels; two case studies of Italian networks, in order to confront institutional and boundary works; a theoretical model of institutional change based on contradiction and institutional complexity; the application of the model to CSR and Sustainability, proposing Sustainability as a possible institutional logic.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data. We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization and exploratory analysis is an important part of any data analysis and is made more challenging when the data are voluminous and high-dimensional. One such example is environmental monitoring data, which are often collected over time and at multiple locations, resulting in a geographically indexed multivariate time series. Financial data, although not necessarily containing a geographic component, present another source of high-volume multivariate time series data. We present the mvtsplot function which provides a method for visualizing multivariate time series data. We outline the basic design concepts and provide some examples of its usage by applying it to a database of ambient air pollution measurements in the United States and to a hypothetical portfolio of stocks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many seemingly disparate approaches for marginal modeling have been developed in recent years. We demonstrate that many current approaches for marginal modeling of correlated binary outcomes produce likelihoods that are equivalent to the proposed copula-based models herein. These general copula models of underlying latent threshold random variables yield likelihood based models for marginal fixed effects estimation and interpretation in the analysis of correlated binary data. Moreover, we propose a nomenclature and set of model relationships that substantially elucidates the complex area of marginalized models for binary data. A diverse collection of didactic mathematical and numerical examples are given to illustrate concepts.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A time series is a sequence of observations made over time. Examples in public health include daily ozone concentrations, weekly admissions to an emergency department or annual expenditures on health care in the United States. Time series models are used to describe the dependence of the response at each time on predictor variables including covariates and possibly previous values in the series. Time series methods are necessary to account for the correlation among repeated responses over time. This paper gives an overview of time series ideas and methods used in public health research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nitrogen and water are essential for plant growth and development. In this study, we designed experiments to produce gene expression data of poplar roots under nitrogen starvation and water deprivation conditions. We found low concentration of nitrogen led first to increased root elongation followed by lateral root proliferation and eventually increased root biomass. To identify genes regulating root growth and development under nitrogen starvation and water deprivation, we designed a series of data analysis procedures, through which, we have successfully identified biologically important genes. Differentially Expressed Genes (DEGs) analysis identified the genes that are differentially expressed under nitrogen starvation or drought. Protein domain enrichment analysis identified enriched themes (in same domains) that are highly interactive during the treatment. Gene Ontology (GO) enrichment analysis allowed us to identify biological process changed during nitrogen starvation. Based on the above analyses, we examined the local Gene Regulatory Network (GRN) and identified a number of transcription factors. After testing, one of them is a high hierarchically ranked transcription factor that affects root growth under nitrogen starvation. It is very tedious and time-consuming to analyze gene expression data. To avoid doing analysis manually, we attempt to automate a computational pipeline that now can be used for identification of DEGs and protein domain analysis in a single run. It is implemented in scripts of Perl and R.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dr. Rossi discusses the common errors that are made when fitting statistical models to data. Focuses on the planning, data analysis, and interpretation phases of a statistical analysis, and highlights the errors that are commonly made by researchers of these phases. The implications of these commonly made errors are discussed along with a discussion of the methods that can be used to prevent these errors from occurring. A prescription for carrying out a correct statistical analysis will be discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cluster randomized trials (CRTs) use as the unit of randomization clusters, which are usually defined as a collection of individuals sharing some common characteristics. Common examples of clusters include entire dental practices, hospitals, schools, school classes, villages, and towns. Additionally, several measurements (repeated measurements) taken on the same individual at different time points are also considered to be clusters. In dentistry, CRTs are applicable as patients may be treated as clusters containing several individual teeth. CRTs require certain methodological procedures during sample calculation, randomization, data analysis, and reporting, which are often ignored in dental research publications. In general, due to similarity of the observations within clusters, each individual within a cluster provides less information compared with an individual in a non-clustered trial. Therefore, clustered designs require larger sample sizes compared with non-clustered randomized designs, and special statistical analyses that account for the fact that observations within clusters are correlated. It is the purpose of this article to highlight with relevant examples the important methodological characteristics of cluster randomized designs as they may be applied in orthodontics and to explain the problems that may arise if clustered observations are erroneously treated and analysed as independent (non-clustered).