194 resultados para Statistics - Analysis
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
In recent years, we have experienced increasing interest in the understanding of the physical properties of collisionless plasmas, mostly because of the large number of astrophysical environments (e. g. the intracluster medium (ICM)) containing magnetic fields that are strong enough to be coupled with the ionized gas and characterized by densities sufficiently low to prevent the pressure isotropization with respect to the magnetic line direction. Under these conditions, a new class of kinetic instabilities arises, such as firehose and mirror instabilities, which have been studied extensively in the literature. Their role in the turbulence evolution and cascade process in the presence of pressure anisotropy, however, is still unclear. In this work, we present the first statistical analysis of turbulence in collisionless plasmas using three-dimensional numerical simulations and solving double-isothermal magnetohydrodynamic equations with the Chew-Goldberger-Low laws closure (CGL-MHD). We study models with different initial conditions to account for the firehose and mirror instabilities and to obtain different turbulent regimes. We found that the CGL-MHD subsonic and supersonic turbulences show small differences compared to the MHD models in most cases. However, in the regimes of strong kinetic instabilities, the statistics, i.e. the probability distribution functions (PDFs) of density and velocity, are very different. In subsonic models, the instabilities cause an increase in the dispersion of density, while the dispersion of velocity is increased by a large factor in some cases. Moreover, the spectra of density and velocity show increased power at small scales explained by the high growth rate of the instabilities. Finally, we calculated the structure functions of velocity and density fluctuations in the local reference frame defined by the direction of magnetic lines. The results indicate that in some cases the instabilities significantly increase the anisotropy of fluctuations. These results, even though preliminary and restricted to very specific conditions, show that the physical properties of turbulence in collisionless plasmas, as those found in the ICM, may be very different from what has been largely believed.
Resumo:
Background: Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes. Methods: Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries. Results: Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues. Conclusion: To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor.
Resumo:
Background: Dermatomyositis (DM) and polymyositis (PM) are rare systemic autoimmune rheumatic diseases with high fatality rates. There have been few population-based mortality studies of dermatomyositis and polymyositis in the world, and none have been conducted in Brazil. The objective of the present study was to employ multiple-cause of-death methodology in the analysis of trends in mortality related to dermatomyositis and polymyositis in the state of Sao Paulo, Brazil, between 1985 and 2007. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which DM or PM was listed as a cause of death. The variables sex, age and underlying, associated or total mentions of causes of death were studied using mortality rates, proportions and historical trends. Statistical analysis were performed by chi-square and H Kruskal-Wallis tests, variance analysis and linear regression. A p value less than 0.05 was regarded as significant. Results: Over a 23-year period, there were 318 DM-related deaths and 316 PM-related deaths. Overall, DM/PM was designated as an underlying cause in 55.2% and as an associated cause in 44.8%; among 634 total deaths females accounted for 71.5%. During the study period, age-and gender-adjusted DM mortality rates did not change significantly, although PM as an underlying cause and total mentions of PM trended lower (p < 0.05). The mean ages at death were 47.76 +/- 20.81 years for DM and 54.24 +/- 17.94 years for PM (p = 0.0003). For DM/PM, respectively, as underlying causes, the principal associated causes of death were as follows: pneumonia (in 43.8%/33.5%); respiratory failure (in 34.4%/32.3%); interstitial pulmonary diseases and other pulmonary conditions (in 28.9%/17.6%); and septicemia (in 22.8%/15.9%). For DM/PM, respectively, as associated causes, the following were the principal underlying causes of death: respiratory disorders (in 28.3%/26.0%); circulatory disorders (in 17.4%/20.5%); neoplasms (in 16.7%/13.7%); infectious and parasitic diseases (in 11.6%/9.6%); and gastrointestinal disorders (in 8.0%/4.8%). Of the 318 DM-related deaths, 36 involved neoplasms, compared with 20 of the 316 PM-related deaths (p = 0.03). Conclusions: Our study using multiple cause of deaths found that DM/PM were identified as the underlying cause of death in only 55.2% of the deaths, indicating that both diseases were underestimated in the primary mortality statistics. We observed a predominance of deaths in women and in older individuals, as well as a trend toward stability in the mortality rates. We have confirmed that the risk of death is greater when either disease is accompanied by neoplasm, albeit to lesser degree in individuals with PM. The investigation of the underlying and associated causes of death related to DM/PM broaden the knowledge of the natural history of both diseases and could help integrate mortality data for use in the evaluation of control measures for DM/PM.
Resumo:
The existence of juxtaposed regions of distinct cultures in spite of the fact that people's beliefs have a tendency to become more similar to each other's as the individuals interact repeatedly is a puzzling phenomenon in the social sciences. Here we study an extreme version of the frequency-dependent bias model of social influence in which an individual adopts the opinion shared by the majority of the members of its extended neighborhood, which includes the individual itself. This is a variant of the majority-vote model in which the individual retains its opinion in case there is a tie among the neighbors' opinions. We assume that the individuals are fixed in the sites of a square lattice of linear size L and that they interact with their nearest neighbors only. Within a mean-field framework, we derive the equations of motion for the density of individuals adopting a particular opinion in the single-site and pair approximations. Although the single-site approximation predicts a single opinion domain that takes over the entire lattice, the pair approximation yields a qualitatively correct picture with the coexistence of different opinion domains and a strong dependence on the initial conditions. Extensive Monte Carlo simulations indicate the existence of a rich distribution of opinion domains or clusters, the number of which grows with L(2) whereas the size of the largest cluster grows with ln L(2). The analysis of the sizes of the opinion domains shows that they obey a power-law distribution for not too large sizes but that they are exponentially distributed in the limit of very large clusters. In addition, similarly to other well-known social influence model-Axelrod's model-we found that these opinion domains are unstable to the effect of a thermal-like noise.
Resumo:
Thanks to recent advances in molecular biology, allied to an ever increasing amount of experimental data, the functional state of thousands of genes can now be extracted simultaneously by using methods such as cDNA microarrays and RNA-Seq. Particularly important related investigations are the modeling and identification of gene regulatory networks from expression data sets. Such a knowledge is fundamental for many applications, such as disease treatment, therapeutic intervention strategies and drugs design, as well as for planning high-throughput new experiments. Methods have been developed for gene networks modeling and identification from expression profiles. However, an important open problem regards how to validate such approaches and its results. This work presents an objective approach for validation of gene network modeling and identification which comprises the following three main aspects: (1) Artificial Gene Networks (AGNs) model generation through theoretical models of complex networks, which is used to simulate temporal expression data; (2) a computational method for gene network identification from the simulated data, which is founded on a feature selection approach where a target gene is fixed and the expression profile is observed for all other genes in order to identify a relevant subset of predictors; and (3) validation of the identified AGN-based network through comparison with the original network. The proposed framework allows several types of AGNs to be generated and used in order to simulate temporal expression data. The results of the network identification method can then be compared to the original network in order to estimate its properties and accuracy. Some of the most important theoretical models of complex networks have been assessed: the uniformly-random Erdos-Renyi (ER), the small-world Watts-Strogatz (WS), the scale-free Barabasi-Albert (BA), and geographical networks (GG). The experimental results indicate that the inference method was sensitive to average degree k variation, decreasing its network recovery rate with the increase of k. The signal size was important for the inference method to get better accuracy in the network identification rate, presenting very good results with small expression profiles. However, the adopted inference method was not sensible to recognize distinct structures of interaction among genes, presenting a similar behavior when applied to different network topologies. In summary, the proposed framework, though simple, was adequate for the validation of the inferred networks by identifying some properties of the evaluated method, which can be extended to other inference methods.
Resumo:
Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over-or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.
Resumo:
In 2003-2004, several food items were purchased from large commercial outlets in Coimbra, Portugal. Such items included meats (chicken, pork, beef), eggs, rice, beans and vegetables (tomato, carrot, potato, cabbage, broccoli, lettuce). Elemental analysis was carried out through INAA at the Technological and Nuclear Institute (ITN, Portugal), the Nuclear Energy Centre for Agriculture (CENA, Brazil), and the Nuclear Engineering Teaching Lab of the University of Texas at Austin (NETL, USA). At the latter two, INAA was also associated to Compton suppression. It can be concluded that by applying Compton suppression (1) the detection limits for arsenic, copper and potassium improved; (2) the counting-statistics error for molybdenum diminished; and (3) the long-lived zinc had its 1115-keV photopeak better defined. In general, the improvement sought by introducing Compton suppression in foodstuff analysis was not significant. Lettuce, cabbage and chicken (liver, stomach, heart) are the richest diets in terms of human nutrients.
Resumo:
In this paper, we compare three residuals to assess departures from the error assumptions as well as to detect outlying observations in log-Burr XII regression models with censored observations. These residuals can also be used for the log-logistic regression model, which is a special case of the log-Burr XII regression model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the modified martingale-type residual in log-Burr XII regression models with censored data.
Resumo:
In a sample of censored survival times, the presence of an immune proportion of individuals who are not subject to death, failure or relapse, may be indicated by a relatively high number of individuals with large censored survival times. In this paper the generalized log-gamma model is modified for the possibility that long-term survivors may be present in the data. The model attempts to separately estimate the effects of covariates on the surviving fraction, that is, the proportion of the population for which the event never occurs. The logistic function is used for the regression model of the surviving fraction. Inference for the model parameters is considered via maximum likelihood. Some influence methods, such as the local influence and total local influence of an individual are derived, analyzed and discussed. Finally, a data set from the medical area is analyzed under the log-gamma generalized mixture model. A residual analysis is performed in order to select an appropriate model.
Resumo:
This paper proposes a regression model considering the modified Weibull distribution. This distribution can be used to model bathtub-shaped failure rate functions. Assuming censored data, we consider maximum likelihood and Jackknife estimators for the parameters of the model. We derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes and we also present some ways to perform global influence. Besides, for different parameter settings, sample sizes and censoring percentages, various simulations are performed and the empirical distribution of the modified deviance residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended for a martingale-type residual in log-modified Weibull regression models with censored data. Finally, we analyze a real data set under log-modified Weibull regression models. A diagnostic analysis and a model checking based on the modified deviance residual are performed to select appropriate models. (c) 2008 Elsevier B.V. All rights reserved.
Resumo:
In this study, regression models are evaluated for grouped survival data when the effect of censoring time is considered in the model and the regression structure is modeled through four link functions. The methodology for grouped survival data is based on life tables, and the times are grouped in k intervals so that ties are eliminated. Thus, the data modeling is performed by considering the discrete models of lifetime regression. The model parameters are estimated by using the maximum likelihood and jackknife methods. To detect influential observations in the proposed models, diagnostic measures based on case deletion, which are denominated global influence, and influence measures based on small perturbations in the data or in the model, referred to as local influence, are used. In addition to those measures, the local influence and the total influential estimate are also employed. Various simulation studies are performed and compared to the performance of the four link functions of the regression models for grouped survival data for different parameter settings, sample sizes and numbers of intervals. Finally, a data set is analyzed by using the proposed regression models. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
A comparative study between microsatellite and allozyme markers was conducted on the genetic structure and mating system in natural populations of Euterpe edulis Mart. Three cohorts, including seedlings, saplings, and adults, were examined in 4 populations using 10 allozyme loci and 10 microsatellite loci. As expected, microsatellite markers had a much higher degree of polymorphism than allozymes, but estimates of multilocus outcrossing rate ((t) over cap (m) = 1.00), as well as estimates of genetic structure (F(IS), G(ST)), were similar for the 2 sets of markers. Estimates of R(ST), for microsatellites, were higher than those of GST, but results of both statistics revealed a close agreement for the genetic structure of the species. This study provides support for the important conclusion that allozymes are still useful and reliable markers to estimate population genetic parameters. Effects of sample size on estimates from hypervariable loci are also discussed in this paper.
Resumo:
In this paper, we introduce a Bayesian analysis for survival multivariate data in the presence of a covariate vector and censored observations. Different ""frailties"" or latent variables are considered to capture the correlation among the survival times for the same individual. We assume Weibull or generalized Gamma distributions considering right censored lifetime data. We develop the Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods.
A bivariate regression model for matched paired survival data: local influence and residual analysis
Resumo:
The use of bivariate distributions plays a fundamental role in survival and reliability studies. In this paper, we consider a location scale model for bivariate survival times based on the proposal of a copula to model the dependence of bivariate survival data. For the proposed model, we consider inferential procedures based on maximum likelihood. Gains in efficiency from bivariate models are also examined in the censored data setting. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the bivariate regression model for matched paired survival data. Sensitivity analysis methods such as local and total influence are presented and derived under three perturbation schemes. The martingale marginal and the deviance marginal residual measures are used to check the adequacy of the model. Furthermore, we propose a new measure which we call modified deviance component residual. The methodology in the paper is illustrated on a lifetime data set for kidney patients.
Resumo:
The multivariate skew-t distribution (J Multivar Anal 79:93-113, 2001; J R Stat Soc, Ser B 65:367-389, 2003; Statistics 37:359-363, 2003) includes the Student t, skew-Cauchy and Cauchy distributions as special cases and the normal and skew-normal ones as limiting cases. In this paper, we explore the use of Markov Chain Monte Carlo (MCMC) methods to develop a Bayesian analysis of repeated measures, pretest/post-test data, under multivariate null intercept measurement error model (J Biopharm Stat 13(4):763-771, 2003) where the random errors and the unobserved value of the covariate (latent variable) follows a Student t and skew-t distribution, respectively. The results and methods are numerically illustrated with an example in the field of dentistry.