957 resultados para Gases, Rare--Statistical methods.
Resumo:
The Gumbel distribution is perhaps the most widely applied statistical distribution for problems in engineering. We propose a generalization-referred to as the Kumaraswamy Gumbel distribution-and provide a comprehensive treatment of its structural properties. We obtain the analytical shapes of the density and hazard rate functions. We calculate explicit expressions for the moments and generating function. The variation of the skewness and kurtosis measures is examined and the asymptotic distribution of the extreme values is investigated. Explicit expressions are also derived for the moments of order statistics. The methods of maximum likelihood and parametric bootstrap and a Bayesian procedure are proposed for estimating the model parameters. We obtain the expected information matrix. An application of the new model to a real dataset illustrates the potentiality of the proposed model. Two bivariate generalizations of the model are proposed.
Resumo:
Many discussions have enlarged the literature in Bibliometrics since the Hirsch proposal, the so called h-index. Ranking papers according to their citations, this index quantifies a researcher only by its greatest possible number of papers that are cited at least h times. A closed formula for h-index distribution that can be applied for distinct databases is not yet known. In fact, to obtain such distribution, the knowledge of citation distribution of the authors and its specificities are required. Instead of dealing with researchers randomly chosen, here we address different groups based on distinct databases. The first group is composed of physicists and biologists, with data extracted from Institute of Scientific Information (IS!). The second group is composed of computer scientists, in which data were extracted from Google-Scholar system. In this paper, we obtain a general formula for the h-index probability density function (pdf) for groups of authors by using generalized exponentials in the context of escort probability. Our analysis includes the use of several statistical methods to estimate the necessary parameters. Also an exhaustive comparison among the possible candidate distributions are used to describe the way the citations are distributed among authors. The h-index pdf should be used to classify groups of researchers from a quantitative point of view, which is meaningfully interesting to eliminate obscure qualitative methods. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Genes involved in host-pathogen interactions are often strongly affected by positive natural selection. The Duffy antigen, coded by the Duffy antigen receptor for chemokines (DARC) gene, serves as a receptor for Plasmodium vivax in humans and for Plasmodium knowlesi in some nonhuman primates. In the majority of sub-Saharan Africans, a nucleic acid variant in GATA-1 of the gene promoter is responsible for the nonexpression of the Duffy antigen on red blood cells and consequently resistance to invasion by P. vivax. The Duffy antigen also acts as a receptor for chemokines and is expressed in red blood cells and many other tissues of the body. Because of this dual role, we sequenced a 3,000-bp region encompassing the entire DARC gene as well as part of its 5' and 3' flanking regions in a phylogenetic sample of primates and used statistical methods to evaluate the nature of selection pressures acting on the gene during its evolution. We analyzed both coding and regulatory regions of the DARC gene. The regulatory analysis showed accelerated rates of substitution at several sites near known motifs. Our tests of positive selection in the coding region using maximum likelihood by branch sites and maximum likelihood by codon sites did not yield statistically significant evidence for the action of positive selection. However, the maximum likelihood test in which the gene was subdivided into different structural regions showed that the known binding region for P. vivax/P. knowlesi is under very different selective pressures than the remainder of the gene. In fact, most of the gene appears to be under strong purifying selection, but this is not evident in the binding region. We suggest that the binding region is under the influence of two opposing selective pressures, positive selection possibly exerted by the parasite and purifying selection exerted by chemokines.
Resumo:
Chaabene, H, Hachana, Y, Franchini, E, Mkaouer, B, Montassar, M, and Chamari, K. Reliability and construct validity of the karate-specific aerobic test. J Strength Cond Res 26(12): 3454-3460, 2012-The aim of this study was to examine absolute and relative reliabilities and external responsiveness of the Karate-specific aerobic test (KSAT). This study comprised 43 male karatekas, 19 of them participated in the first study to establish test-retest reliability and 40, selected on the bases of their karate experience and level of practice, participated in the second study to identify external responsiveness of the KSAT. The latter group was divided into 2 categories: national-level group (G(n)) and regional-level group (Gr). Analysis showed excellent test-retest reliability of time to exhaustion (TE), with intraclass correlation coefficient ICC(3,1) >0.90, standard error of measurement (SEM) <5%: (3.2%) and mean difference (bias) +/- the 95% limits of agreement: -9.5 +/- 78.8 seconds. There was a significant difference between test-retest session in peak lactate concentration (Peak [La]) (9.12 +/- 2.59 vs. 8.05 +/- 2.67 mmol.L-1; p < 0.05) but not in peak heart rate (HRpeak) and rating of perceived exertion (RPE) (196 +/- 9 vs. 194 +/- 9 b.min(-1) and 7.6 +/- 0.93 vs. 7.8 +/- 1.15; p > 0.05), respectively. National-level karate athletes (1,032 +/- 101 seconds) were better than regional level (841 +/- 134 seconds) on TE performance during KSAT (p < 0.001). Thus, KSAT provided good external responsiveness. The area under the receiver operator characteristics curve was >0.70 (0.86; confidence interval 95%: 0.72-0.95). Significant difference was detected in Peak [La] between national- (6.09 +/- 1.78 mmol.L-1) and regional-level (8.48 +/- 2.63 mmol.L-1) groups, but not in HRpeak (194 +/- 8 vs. 195 +/- 8 b.min(-1)) and RPE (7.57 +/- 1.15 vs. 7.42 +/- 1.1), respectively. The result of this study indicates that KSAT provides excellent absolute and relative reliabilities. The KSAT can effectively distinguish karate athletes of different competitive levels. Thus, the KSAT may be suitable for field assessment of aerobic fitness of karate practitioners.
Resumo:
OBJETIVO: Descrever uma série de pacientes portadores de obstrução do sistema lacrimal associado à radioiodoterapia para tratamento de carcinoma de tireoide, revisar os dados clínicos e a resposta ao tratamento cirúrgico desta rara complicação. MÉTODOS: Foi realizada uma análise retrospectiva dos achados oftalmológicos de pacientes com histórico de carcinoma de tireoide previamente submetidos à tireoidectomia e à RIT que foram encaminhados para cirurgia de vias lacrimais. RESULTADOS: Dezessete pacientes com carcinoma de tireoide tratados com tireoidectomia e RIT apresentaram obstrução do ducto nasolacrimal sintomática após período médio de 13,2 meses do tratamento do câncer. Onze pacientes tiveram epífora bilateral, 8 com mucocele de saco lacrimal. A idade dos pacientes variou entre 30 e 80 anos, sendo 10 com idade menor ou igual a 49 anos. A dose cumulativa média de radioiodo administrada foi de 571 mCi (variação entre 200-1200 mCi). Sintomas de obstrução nasal e aumento de glândulas salivares ocorreram em 53% dos pacientes. Todos os pacientes foram submetidos à dacriocistorrinostomia. Observou-se ainda que nos 3 pacientes mais jovens houve maior sangramento intraoperatótio e dilatação de saco lacrimal. A resolução completa da epífora e da dacriocistite ocorreu em 82,4%, e foi parcial em 17,6% (3 pacientes mantiveram queixa unilateral após a correção da obstrução bilateralmente). O seguimento médio foi de 6 meses (intervalo: 2-24 meses). CONCLUSÕES: Alta dose cumulativa de radioiodo, disfunção nasal e de glândulas salivares estão associadas à obstrução das vias lacrimais. Observa-se uma maior porcentagem de pacientes mais jovens apresentando quadro de dacriocistite quando comparado à dacrioestenose idiopática. A absorção de iodo radioativo pela mucosa do ducto nasolacrimal com subsequente inflamação, edema e fibrose parece ter relação direta com a obstrução do ducto nasolacrimal. O conhecimento desta complicação é importante para o estudo e abordagem correta desses pacientes.
Resumo:
The use of statistical methods to analyze large databases of text has been useful in unveiling patterns of human behavior and establishing historical links between cultures and languages. In this study, we identified literary movements by treating books published from 1590 to 1922 as complex networks, whose metrics were analyzed with multivariate techniques to generate six clusters of books. The latter correspond to time periods coinciding with relevant literary movements over the last five centuries. The most important factor contributing to the distinctions between different literary styles was the average shortest path length, in particular the asymmetry of its distribution. Furthermore, over time there has emerged a trend toward larger average shortest path lengths, which is correlated with increased syntactic complexity, and a more uniform use of the words reflected in a smaller power-law coefficient for the distribution of word frequency. Changes in literary style were also found to be driven by opposition to earlier writing styles, as revealed by the analysis performed with geometrical concepts. The approaches adopted here are generic and may be extended to analyze a number of features of languages and cultures.
Resumo:
Background Statistical methods for estimating usual intake require at least two short-term dietary measurements in a subsample of the target population. However, the percentage of individuals with a second dietary measurement (replication rate) may influence the precision of estimates, such as percentiles and proportions of individuals below cut-offs of intake. Objective To investigate the precision of the usual food intake estimates using different replication rates and different sample sizes. Participants/setting Adolescents participating in the continuous National Health and Nutrition Examination Survey 2007-2008 (n=1,304) who completed two 24-hour recalls. Statistical analyses performed The National Cancer Institute method was used to estimate the usual intake of dark green vegetables in the original sample comprising 1,304 adolescents with a replication rate of 100%. A bootstrap with 100 replications was performed to estimate CIs for percentiles and proportions of individuals below cut-offs of intake. Using the same bootstrap replications, four sets of data sets were sampled with different replication rates (80%, 60%, 40%, and 20%). For each data set created, the National Cancer Institute method was performed and percentiles, Cl, and proportions of individuals below cut-offs were calculated. Precision estimates were checked by comparing each Cl obtained from data sets with different replication rates with the Cl obtained from original data set. Further, we sampled 1,000, 750, 500, and 250 individuals from the original data set, and performed the same analytical procedures. Results Percentiles of intake and percentage of individuals below the cut-off points were similar throughout the replication rates and sample sizes, but the Cl increased as the replication rate decreased. Wider CIs were observed at 40% and 20% of replication rate. Conclusions The precision of the usual intake estimates decreased when low replication rates were used. However, even with different sample sizes, replication rates >40% may not lead to an important loss of precision. J Acad Nutr Diet. 2012;112:1015-1020.
Resumo:
Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
In this article, we propose a new Bayesian flexible cure rate survival model, which generalises the stochastic model of Klebanov et al. [Klebanov LB, Rachev ST and Yakovlev AY. A stochastic-model of radiation carcinogenesis - latent time distributions and their properties. Math Biosci 1993; 113: 51-75], and has much in common with the destructive model formulated by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)]. In our approach, the accumulated number of lesions or altered cells follows a compound weighted Poisson distribution. This model is more flexible than the promotion time cure model in terms of dispersion. Moreover, it possesses an interesting and realistic interpretation of the biological mechanism of the occurrence of the event of interest as it includes a destructive process of tumour cells after an initial treatment or the capacity of an individual exposed to irradiation to repair altered cells that results in cancer induction. In other words, what is recorded is only the damaged portion of the original number of altered cells not eliminated by the treatment or repaired by the repair system of an individual. Markov Chain Monte Carlo (MCMC) methods are then used to develop Bayesian inference for the proposed model. Also, some discussions on the model selection and an illustration with a cutaneous melanoma data set analysed by Rodrigues et al. [Rodrigues J, de Castro M, Balakrishnan N and Cancho VG. Destructive weighted Poisson cure rate models. Technical Report, Universidade Federal de Sao Carlos, Sao Carlos-SP. Brazil, 2009 (accepted in Lifetime Data Analysis)] are presented.
Resumo:
Abstract Background Several mathematical and statistical methods have been proposed in the last few years to analyze microarray data. Most of those methods involve complicated formulas, and software implementations that require advanced computer programming skills. Researchers from other areas may experience difficulties when they attempting to use those methods in their research. Here we present an user-friendly toolbox which allows large-scale gene expression analysis to be carried out by biomedical researchers with limited programming skills. Results Here, we introduce an user-friendly toolbox called GEDI (Gene Expression Data Interpreter), an extensible, open-source, and freely-available tool that we believe will be useful to a wide range of laboratories, and to researchers with no background in Mathematics and Computer Science, allowing them to analyze their own data by applying both classical and advanced approaches developed and recently published by Fujita et al. Conclusion GEDI is an integrated user-friendly viewer that combines the state of the art SVR, DVAR and SVAR algorithms, previously developed by us. It facilitates the application of SVR, DVAR and SVAR, further than the mathematical formulas present in the corresponding publications, and allows one to better understand the results by means of available visualizations. Both running the statistical methods and visualizing the results are carried out within the graphical user interface, rendering these algorithms accessible to the broad community of researchers in Molecular Biology.
Resumo:
Background The genetic mechanisms underlying interindividual blood pressure variation reflect the complex interplay of both genetic and environmental variables. The current standard statistical methods for detecting genes involved in the regulation mechanisms of complex traits are based on univariate analysis. Few studies have focused on the search for and understanding of quantitative trait loci responsible for gene × environmental interactions or multiple trait analysis. Composite interval mapping has been extended to multiple traits and may be an interesting approach to such a problem. Methods We used multiple-trait analysis for quantitative trait locus mapping of loci having different effects on systolic blood pressure with NaCl exposure. Animals studied were 188 rats, the progenies of an F2 rat intercross between the hypertensive and normotensive strain, genotyped in 179 polymorphic markers across the rat genome. To accommodate the correlational structure from measurements taken in the same animals, we applied univariate and multivariate strategies for analyzing the data. Results We detected a new quantitative train locus on a region close to marker R589 in chromosome 5 of the rat genome, not previously identified through serial analysis of individual traits. In addition, we were able to justify analytically the parametric restrictions in terms of regression coefficients responsible for the gain in precision with the adopted analytical approach. Conclusion Future work should focus on fine mapping and the identification of the causative variant responsible for this quantitative trait locus signal. The multivariable strategy might be valuable in the study of genetic determinants of interindividual variation of antihypertensive drug effectiveness.
Resumo:
Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.
Resumo:
The objective of this thesis is to improve the understanding of what processes and mechanism affects the distribution of polychlorinated biphenyls (PCBs) and organic carbon in coastal sediments. Because of the strong association of hydrophobic organic contaminants (HOCs) such as PCBs with organic matter in the aquatic environment, these two entities are naturally linked. The coastal environment is the most complex and dynamic part of the ocean when it comes to both cycling of organic matter and HOCs. This environment is characterised by the largest fluxes and most diverse sources of both entities. A wide array of methods was used to study these processes throughout this thesis. In the field sites in the Stockholm archipelago of the Baltic proper, bottom sediments and settling particulate matter were retrieved using sediment coring devices and sediment traps from morphometrically and seismically well-characterized locations. In the laboratory, the samples have been analysed for PCBs, stable carbon isotope ratios, carbon-nitrogen atom ratios as well as standard sediment properties. From the fieldwork in the Stockholm Archipelago and the following laboratory work it was concluded that the inner Stockholm archipelago has a low (≈ 4%) trapping efficiency for freshwater-derived organic carbon. The corollary is a large potential for long-range waterborne transport of OC and OC-associated nutrients and hydrophobic organic pollutants from urban Stockholm to more pristine offshore Baltic Sea ecosystems. Theoretical work has been carried out using Geographical Information Systems (GIS) and statistical methods on a database of 4214 individual sediment samples, each with reported individual PCB congener concentrations. From this work it was concluded that the continental shelf sediments are key global inventories and ultimate sinks of PCBs. Depending on congener, 10-80% of the cumulative historical emissions to the environment are accounted for in continental shelf sediments. Further it was concluded that the many infamous and highly contaminated surface sediments of urban harbours and estuaries of contaminated rivers cannot be of importance as a secondary source to sustain the concentrations observed in remote sediments. Of the global shelf PCB inventory < 1% are in sediments near population centres while ≥ 90% is in remote areas (> 10 km from any dwellings). The remote sub-basin of the North Atlantic Ocean contains approximately half of the global shelf sediment inventory for most of the PCBs studied.
Resumo:
Máster en Oceanografía