989 resultados para Structure mining
Resumo:
Precision of released figures is not only an important quality feature of official statistics,it is also essential for a good understanding of the data. In this paper we show a casestudy of how precision could be conveyed if the multivariate nature of data has to betaken into account. In the official release of the Swiss earnings structure survey, the totalsalary is broken down into several wage components. We follow Aitchison's approachfor the analysis of compositional data, which is based on logratios of components. Wefirst present diferent multivariate analyses of the compositional data whereby the wagecomponents are broken down by economic activity classes. Then we propose a numberof ways to assess precision
Resumo:
BACKGROUND Spain shows the highest bladder cancer incidence rates in men among European countries. The most important risk factors are tobacco smoking and occupational exposure to a range of different chemical substances, such as aromatic amines. METHODS This paper describes the municipal distribution of bladder cancer mortality and attempts to "adjust" this spatial pattern for the prevalence of smokers, using the autoregressive spatial model proposed by Besag, York and Molliè, with relative risk of lung cancer mortality as a surrogate. RESULTS It has been possible to compile and ascertain the posterior distribution of relative risk for bladder cancer adjusted for lung cancer mortality, on the basis of a single Bayesian spatial model covering all of Spain's 8077 towns. Maps were plotted depicting smoothed relative risk (RR) estimates, and the distribution of the posterior probability of RR>1 by sex. Towns that registered the highest relative risks for both sexes were mostly located in the Provinces of Cadiz, Seville, Huelva, Barcelona and Almería. The highest-risk area in Barcelona Province corresponded to very specific municipal areas in the Bages district, e.g., Suría, Sallent, Balsareny, Manresa and Cardona. CONCLUSION Mining/industrial pollution and the risk entailed in certain occupational exposures could in part be dictating the pattern of municipal bladder cancer mortality in Spain. Population exposure to arsenic is a matter that calls for attention. It would be of great interest if the relationship between the chemical quality of drinking water and the frequency of bladder cancer could be studied.
Resumo:
Lymphatic filarial (LF) parasites have been under anti-filarial drug pressure for more than half a century. Currently, annual mass drug administration (MDA) of diethylcarbamazine (DEC) or ivermectin in combination with albendazole (ALB) have been used globally to eliminate LF. Long-term chemotherapies exert significant pressure on the genetic structure of parasitic populations. We investigated the genetic variation among 210 Wuchereria bancrofti populations that were under three different chemotherapy strategies, namely MDA with DEC alone (group I, n = 74), MDA with DEC and ALB (group II, n = 60) and selective therapy (ST) with DEC (group III, n = 34) to understand the impact of these three drug regimens on the parasite genetic structure. Randomly amplified polymorphic DNA profiles were generated for the three groups of parasite populations; the gene diversity, gene flow and genetic distance values were determined and phylogenetic trees were constructed. Analysis of these parameters indicated that parasite populations under ST with a standard dose of DEC (group III) were genetically more diverse (0.2660) than parasite populations under MDA with DEC alone (group I, H = 0.2197) or with DEC + ALB (group II, H = 0.2317). These results indicate that the MDA may reduce the genetic diversity of W. bancrofti populations when compared to the genetic diversity of parasite populations under ST.
Resumo:
In response to Catani et al., we show that corticospinal pathways adhere via sharp turns to two local grid orientations; that our studies have three times the diffusion resolution of those compared; and that the noted technical concerns, including crossing angles, do not challenge the evidence of mathematically specific geometric structure. Thus, the geometric thesis gives the best account of the available evidence.
Resumo:
Over the past decade, significant interest has been expressed in relating the spatial statistics of surface-based reflection ground-penetrating radar (GPR) data to those of the imaged subsurface volume. A primary motivation for this work is that changes in the radar wave velocity, which largely control the character of the observed data, are expected to be related to corresponding changes in subsurface water content. Although previous work has indeed indicated that the spatial statistics of GPR images are linked to those of the water content distribution of the probed region, a viable method for quantitatively analyzing the GPR data and solving the corresponding inverse problem has not yet been presented. Here we address this issue by first deriving a relationship between the 2-D autocorrelation of a water content distribution and that of the corresponding GPR reflection image. We then show how a Bayesian inversion strategy based on Markov chain Monte Carlo sampling can be used to estimate the posterior distribution of subsurface correlation model parameters that are consistent with the GPR data. Our results indicate that if the underlying assumptions are valid and we possess adequate prior knowledge regarding the water content distribution, in particular its vertical variability, this methodology allows not only for the reliable recovery of lateral correlation model parameters but also for estimates of parameter uncertainties. In the case where prior knowledge regarding the vertical variability of water content is not available, the results show that the methodology still reliably recovers the aspect ratio of the heterogeneity.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.
Resumo:
In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.
Resumo:
To evaluate whether environmental heterogeneity contributes to the genetic heterogeneity in Anopheles triannulatus, larval habitat characteristics across the Brazilian states of Roraima and Pará and genetic sequences were examined. A comparison with Anopheles goeldii was utilised to determine whether high genetic diversity was unique to An. triannulatus. Student t test and analysis of variance found no differences in habitat characteristics between the species. Analysis of population structure of An. triannulatus and An. goeldii revealed distinct demographic histories in a largely overlapping geographic range. Cytochrome oxidase I sequence parsimony networks found geographic clustering for both species; however nuclear marker networks depicted An. triannulatus with a more complex history of fragmentation, secondary contact and recent divergence. Evidence of Pleistocene expansions suggests both species are more likely to be genetically structured by geographic and ecological barriers than demography. We hypothesise that niche partitioning is a driving force for diversity, particularly in An. triannulatus.
Resumo:
Human T-cell lymphotropic virus type 1 (HTLV-1) is mainly associated with two diseases: tropical spastic paraparesis/HTLV-1-associated myelopathy (TSP/HAM) and adult T-cell leukaemia/lymphoma. This retrovirus infects five-10 million individuals throughout the world. Previously, we developed a database that annotates sequence data from GenBank and the present study aimed to describe the clinical, molecular and epidemiological scenarios of HTLV-1 infection through the stored sequences in this database. A total of 2,545 registered complete and partial sequences of HTLV-1 were collected and 1,967 (77.3%) of those sequences represented unique isolates. Among these isolates, 93% contained geographic origin information and only 39% were related to any clinical status. A total of 1,091 sequences contained information about the geographic origin and viral subtype and 93% of these sequences were identified as subtype “a”. Ethnicity data are very scarce. Regarding clinical status data, 29% of the sequences were generated from TSP/HAM and 67.8% from healthy carrier individuals. Although the data mining enabled some inferences about specific aspects of HTLV-1 infection to be made, due to the relative scarcity of data of available sequences, it was not possible to delineate a global scenario of HTLV-1 infection.
Resumo:
Species range shifts in response to climate and land use change are commonly forecasted with species distribution models based on species occurrence or abundance data. Although appealing, these models ignore the genetic structure of species, and the fact that different populations might respond in different ways because of adaptation to their environment. Here, we introduced ancestry distribution models, that is, statistical models of the spatial distribution of ancestry proportions, for forecasting intra-specific changes based on genetic admixture instead of species occurrence data. Using multi-locus genotypes and extensive geographic coverage of distribution data across the European Alps, we applied this approach to 20 alpine plant species considering a global increase in temperature from 0.25 to 4 °C. We forecasted the magnitudes of displacement of contact zones between plant populations potentially adapted to warmer environments and other populations. While a global trend of movement in a north-east direction was predicted, the magnitude of displacement was species-specific. For a temperature increase of 2 °C, contact zones were predicted to move by 92 km on average (minimum of 5 km, maximum of 212 km) and by 188 km for an increase of 4 °C (minimum of 11 km, maximum of 393 km). Intra-specific turnover-measuring the extent of change in global population genetic structure-was generally found to be moderate for 2 °C of temperature warming. For 4 °C of warming, however, the models indicated substantial intra-specific turnover for ten species. These results illustrate that, in spite of unavoidable simplifications, ancestry distribution models open new perspectives to forecast population genetic changes within species and complement more traditional distribution-based approaches.