40 resultados para exploratory spatial data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

O objetivo deste estudo foi avaliar as taxas de mortalidade por câncer de boca no período de 1991-2001, no município de Bauru-SP. A fonte de informação utilizada para o reconhecimento e seleção da população-alvo foram Certidões de Óbito dos Cartórios do município de Bauru com dados relativos ao período 1991-2001. Foram coletadas informações referentes a sexo, idade, localização da lesão e endereço. A coleta dos endereços visou à identificação no mapa do município de Bauru da localização geográfica do domicílio. Utilizando ferramentas do geoprocessamento, foi feita a inserção no mapa dos casos identificados. Foram registrados 67 casos de morte por câncer de boca na cidade de Bauru entre 1991 e 2001, com maiores taxas no sexo masculino e sexta década de vida. A análise da distribuição espacial mostra que a maioria dos casos encontra-se próxima à linha férrea que corta o município e foi responsável, em grande parte, pela ocupação territorial pela população, sendo esta também uma área que abrange os bairros mais antigos do município. O câncer de boca constitui importante causa de óbito no município, requerendo um planejamento de ações georreferenciadas pelo sistema local de saúde.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objetivou-se verificar a prevalência de deficiência auditiva referida pela população urbana de quatro localidades do Estado de São Paulo, Brasil, e estudar as causas atribuídas e variáveis sócio-demográficas. Foi realizado um estudo transversal de base populacional com dados referentes à população com 12 anos ou mais residente nas quatro localidades, em 2001 e 2002. Participaram 5.250 sujeitos selecionados por amostragem probabilística, estratificada e selecionada por conglomerados, em dois estágios. A análise dos dados foi exploratória, incluindo análise bivariada e regressão logística múltipla. A prevalência de deficiência auditiva foi 5,21%, mais acentuada nas faixas etárias acima de 59 anos (18,7%), que referiram doenças nos 15 dias anteriores à entrevista (8,4%), com transtorno mental comum (8,85%) e que fizeram uso de medicamentos nos últimos 3 dias (8,45%). O estudo dos fatores que se associam à deficiência auditiva direcionam intervenções de saúde para que atendam as reais necessidades da população, principalmente na atenção primária. Há necessidade de mais estudos populacionais com enfoque na audição, visto que esta é uma área escassa de publicações no Brasil.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes. Methods: Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries. Results: Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues. Conclusion: To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Dermatomyositis (DM) and polymyositis (PM) are rare systemic autoimmune rheumatic diseases with high fatality rates. There have been few population-based mortality studies of dermatomyositis and polymyositis in the world, and none have been conducted in Brazil. The objective of the present study was to employ multiple-cause of-death methodology in the analysis of trends in mortality related to dermatomyositis and polymyositis in the state of Sao Paulo, Brazil, between 1985 and 2007. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which DM or PM was listed as a cause of death. The variables sex, age and underlying, associated or total mentions of causes of death were studied using mortality rates, proportions and historical trends. Statistical analysis were performed by chi-square and H Kruskal-Wallis tests, variance analysis and linear regression. A p value less than 0.05 was regarded as significant. Results: Over a 23-year period, there were 318 DM-related deaths and 316 PM-related deaths. Overall, DM/PM was designated as an underlying cause in 55.2% and as an associated cause in 44.8%; among 634 total deaths females accounted for 71.5%. During the study period, age-and gender-adjusted DM mortality rates did not change significantly, although PM as an underlying cause and total mentions of PM trended lower (p < 0.05). The mean ages at death were 47.76 +/- 20.81 years for DM and 54.24 +/- 17.94 years for PM (p = 0.0003). For DM/PM, respectively, as underlying causes, the principal associated causes of death were as follows: pneumonia (in 43.8%/33.5%); respiratory failure (in 34.4%/32.3%); interstitial pulmonary diseases and other pulmonary conditions (in 28.9%/17.6%); and septicemia (in 22.8%/15.9%). For DM/PM, respectively, as associated causes, the following were the principal underlying causes of death: respiratory disorders (in 28.3%/26.0%); circulatory disorders (in 17.4%/20.5%); neoplasms (in 16.7%/13.7%); infectious and parasitic diseases (in 11.6%/9.6%); and gastrointestinal disorders (in 8.0%/4.8%). Of the 318 DM-related deaths, 36 involved neoplasms, compared with 20 of the 316 PM-related deaths (p = 0.03). Conclusions: Our study using multiple cause of deaths found that DM/PM were identified as the underlying cause of death in only 55.2% of the deaths, indicating that both diseases were underestimated in the primary mortality statistics. We observed a predominance of deaths in women and in older individuals, as well as a trend toward stability in the mortality rates. We have confirmed that the risk of death is greater when either disease is accompanied by neoplasm, albeit to lesser degree in individuals with PM. The investigation of the underlying and associated causes of death related to DM/PM broaden the knowledge of the natural history of both diseases and could help integrate mortality data for use in the evaluation of control measures for DM/PM.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recurrences are close returns of a given state in a time series, and can be used to identify different dynamical regimes and other related phenomena, being particularly suited for analyzing experimental data. In this work, we use recurrence quantification analysis to investigate dynamical patterns in scalar data series obtained from measurements of floating potential and ion saturation current at the plasma edge of the Tokamak Chauffage Alfveacuten Breacutesilien [R. M. O. Galva approximate to o , Plasma Phys. Controlled Fusion 43, 1181 (2001)]. We consider plasma discharges with and without the application of radial electric bias, and also with two different regimes of current ramp. Our results indicate that biasing improves confinement through destroying highly recurrent regions within the plasma column that enhance particle and heat transport.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Alternative splicing of gene transcripts greatly expands the functional capacity of the genome, and certain splice isoforms may indicate specific disease states such as cancer. Splice junction microarrays interrogate thousands of splice junctions, but data analysis is difficult and error prone because of the increased complexity compared to differential gene expression analysis. We present Rank Change Detection (RCD) as a method to identify differential splicing events based upon a straightforward probabilistic model comparing the over-or underrepresentation of two or more competing isoforms. RCD has advantages over commonly used methods because it is robust to false positive errors due to nonlinear trends in microarray measurements. Further, RCD does not depend on prior knowledge of splice isoforms, yet it takes advantage of the inherent structure of mutually exclusive junctions, and it is conceptually generalizable to other types of splicing arrays or RNA-Seq. RCD specifically identifies the biologically important cases when a splice junction becomes more or less prevalent compared to other mutually exclusive junctions. The example data is from different cell lines of glioblastoma tumors assayed with Agilent microarrays.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A rapid method for classification of mineral waters is proposed. The discrimination power was evaluated by a novel combination of chemometric data analysis and qualitative multi-elemental fingerprints of mineral water samples acquired from different regions of the Brazilian territory. The classification of mineral waters was assessed using only the wavelength emission intensities obtained by inductively coupled plasma optical emission spectrometry (ICP OES), monitoring different lines of Al, B, Ba, Ca, Cl, Cu, Co, Cr, Fe, K, Mg, Mn, Na, Ni, P, Pb, S, Sb, Si, Sr, Ti, V, and Zn, and Be, Dy, Gd, In, La, Sc and Y as internal standards. Data acquisition was done under robust (RC) and non-robust (NRC) conditions. Also, the combination of signal intensities of two or more emission lines for each element were evaluated instead of the individual lines. The performance of two classification-k-nearest neighbor (kNN) and soft independent modeling of class analogy (SIMCA)-and preprocessing algorithms, autoscaling and Pareto scaling, were evaluated for the ability to differentiate between the various samples in each approach tested (combination of robust or non-robust conditions with use of individual lines or sum of the intensities of emission lines). It was shown that qualitative ICP OES fingerprinting in combination with multivariate analysis is a promising analytical tool that has potential to become a recognized procedure for rapid authenticity and adulteration testing of mineral water samples or other material whose physicochemical properties (or origin) are directly related to mineral content.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Despite modern weed control practices, weeds continue to be a threat to agricultural production. Considering the variability of weeds, a classification methodology for the risk of infestation in agricultural zones using fuzzy logic is proposed. The inputs for the classification are attributes extracted from estimated maps for weed seed production and weed coverage using kriging and map analysis and from the percentage of surface infested by grass weeds, in order to account for the presence of weed species with a high rate of development and proliferation. The output for the classification predicts the risk of infestation of regions of the field for the next crop. The risk classification methodology described in this paper integrates analysis techniques which may help to reduce costs and improve weed control practices. Results for the risk classification of the infestation in a maize crop field are presented. To illustrate the effectiveness of the proposed system, the risk of infestation over the entire field is checked against the yield loss map estimated by kriging and also with the average yield loss estimated from a hyperbolic model.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims: We aimed to evaluate if the co-localisation of calcium and necrosis in intravascular ultrasound virtual histology (IVUS-VH) is due to artefact, and whether this effect can be mathematically estimated. Methods and results: We hypothesised that, in case calcium induces an artefactual coding of necrosis, any addition in calcium content would generate an artificial increment in the necrotic tissue. Stent struts were used to simulate the ""added calcium"". The change in the amount and in the spatial localisation of necrotic tissue was evaluated before and after stenting (n=17 coronary lesions) by means of a especially developed imaging software. The area of ""calcium"" increased from a median of 0.04 mm(2) at baseline to 0.76 mm(2) after stenting (p<0.01). In parallel the median necrotic content increased from 0.19 mm(2) to 0.59 mm(2) (p<0.01). The ""added"" calcium strongly predicted a proportional increase in necrosis-coded tissue in the areas surrounding the calcium-like spots (model R(2)=0.70; p<0.001). Conclusions: Artificial addition of calcium-like elements to the atherosclerotic plaque led to an increase in necrotic tissue in virtual histology that is probably artefactual. The overestimation of necrotic tissue by calcium strictly followed a linear pattern, indicating that it may be amenable to mathematical correction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we compare three residuals to assess departures from the error assumptions as well as to detect outlying observations in log-Burr XII regression models with censored observations. These residuals can also be used for the log-logistic regression model, which is a special case of the log-Burr XII regression model. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and the empirical distribution of each residual is displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the modified martingale-type residual in log-Burr XII regression models with censored data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A four-parameter extension of the generalized gamma distribution capable of modelling a bathtub-shaped hazard rate function is defined and studied. The beauty and importance of this distribution lies in its ability to model monotone and non-monotone failure rate functions, which are quite common in lifetime data analysis and reliability. The new distribution has a number of well-known lifetime special sub-models, such as the exponentiated Weibull, exponentiated generalized half-normal, exponentiated gamma and generalized Rayleigh, among others. We derive two infinite sum representations for its moments. We calculate the density of the order statistics and two expansions for their moments. The method of maximum likelihood is used for estimating the model parameters and the observed information matrix is obtained. Finally, a real data set from the medical area is analysed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Joint generalized linear models and double generalized linear models (DGLMs) were designed to model outcomes for which the variability can be explained using factors and/or covariates. When such factors operate, the usual normal regression models, which inherently exhibit constant variance, will under-represent variation in the data and hence may lead to erroneous inferences. For count and proportion data, such noise factors can generate a so-called overdispersion effect, and the use of binomial and Poisson models underestimates the variability and, consequently, incorrectly indicate significant effects. In this manuscript, we propose a DGLM from a Bayesian perspective, focusing on the case of proportion data, where the overdispersion can be modeled using a random effect that depends on some noise factors. The posterior joint density function was sampled using Monte Carlo Markov Chain algorithms, allowing inferences over the model parameters. An application to a data set on apple tissue culture is presented, for which it is shown that the Bayesian approach is quite feasible, even when limited prior information is available, thereby generating valuable insight for the researcher about its experimental results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The stock market suffers uncertain relations throughout the entire negotiation process, with different variables exerting direct and indirect influence on stock prices. This study focuses on the analysis of certain aspects that may influence these values offered by the capital market, based on the Brazil Index of the Sao Paulo Stock Exchange (Bovespa), which selects 100 stocks among the most traded on Bovespa in terms of number of trades and financial volume. The selected variables are characterized by the companies` activity area and the business volume in the month of data collection, i.e. April/2007. This article proposes an analysis that joins the accounting view of the stock price variables that can be influenced with the use of multivariate qualitative data analysis. Data were explored through Correspondence Analysis (Anacor) and Homogeneity Analysis (Homals). According to the research, the selected variables are associated with the values presented by the stocks, which become an internal control instrument and a decision-making tool when it comes to choosing investments.