978 resultados para Blog datasets
Resumo:
BACKGROUND: The goals of our study are to determine the most appropriate model for alcohol consumption as an exposure for burden of disease, to analyze the effect of the chosen alcohol consumption distribution on the estimation of the alcohol Population- Attributable Fractions (PAFs), and to characterize the chosen alcohol consumption distribution by exploring if there is a global relationship within the distribution. METHODS: To identify the best model, the Log-Normal, Gamma, and Weibull prevalence distributions were examined using data from 41 surveys from Gender, Alcohol and Culture: An International Study (GENACIS) and from the European Comparative Alcohol Study. To assess the effect of these distributions on the estimated alcohol PAFs, we calculated the alcohol PAF for diabetes, breast cancer, and pancreatitis using the three above-named distributions and using the more traditional approach based on categories. The relationship between the mean and the standard deviation from the Gamma distribution was estimated using data from 851 datasets for 66 countries from GENACIS and from the STEPwise approach to Surveillance from the World Health Organization. RESULTS: The Log-Normal distribution provided a poor fit for the survey data, with Gamma and Weibull distributions providing better fits. Additionally, our analyses showed that there were no marked differences for the alcohol PAF estimates based on the Gamma or Weibull distributions compared to PAFs based on categorical alcohol consumption estimates. The standard deviation of the alcohol distribution was highly dependent on the mean, with a unit increase in alcohol consumption associated with a unit increase in the mean of 1.258 (95% CI: 1.223 to 1.293) (R2 = 0.9207) for women and 1.171 (95% CI: 1.144 to 1.197) (R2 = 0. 9474) for men. CONCLUSIONS: Although the Gamma distribution and the Weibull distribution provided similar results, the Gamma distribution is recommended to model alcohol consumption from population surveys due to its fit, flexibility, and the ease with which it can be modified. The results showed that a large degree of variance of the standard deviation of the alcohol consumption Gamma distribution was explained by the mean alcohol consumption, allowing for alcohol consumption to be modeled through a Gamma distribution using only average consumption.
Resumo:
Cancer genomes frequently contain somatic copy number alterations (SCNA) that can significantly perturb the expression level of affected genes and thus disrupt pathways controlling normal growth. In melanoma, many studies have focussed on the copy number and gene expression levels of the BRAF, PTEN and MITF genes, but little has been done to identify new genes using these parameters at the genome-wide scale. Using karyotyping, SNP and CGH arrays, and RNA-seq, we have identified SCNA affecting gene expression ('SCNA-genes') in seven human metastatic melanoma cell lines. We showed that the combination of these techniques is useful to identify candidate genes potentially involved in tumorigenesis. Since few of these alterations were recurrent across our samples, we used a protein network-guided approach to determine whether any pathways were enriched in SCNA-genes in one or more samples. From this unbiased genome-wide analysis, we identified 28 significantly enriched pathway modules. Comparison with two large, independent melanoma SCNA datasets showed less than 10% overlap at the individual gene level, but network-guided analysis revealed 66% shared pathways, including all but three of the pathways identified in our data. Frequently altered pathways included WNT, cadherin signalling, angiogenesis and melanogenesis. Additionally, our results emphasize the potential of the EPHA3 and FRS2 gene products, involved in angiogenesis and migration, as possible therapeutic targets in melanoma. Our study demonstrates the utility of network-guided approaches, for both large and small datasets, to identify pathways recurrently perturbed in cancer.
Resumo:
Mammals are characterized by specific phenotypic traits that include lactation, hair, and relatively large brains with unique structures. Individual mammalian lineages have, in turn, evolved characteristic traits that distinguish them from others. These include obvious anatom¬ical differences but also differences related to reproduction, life span, cognitive abilities, be¬havior. and disease susceptibility. However, the molecular basis of the diverse mammalian phenotypes and the selective pressures that shaped their evolution remain largely unknown. In the first part of my thesis, I analyzed the genetic factors associated with the origin of a unique mammalian phenotype lactation and I studied the selective pressures that forged the transition from oviparity to viviparity. Using a comparative genomics approach and evolutionary simulations, I showed that the emergence of lactation, as well as the appear¬ance of the casein gene family, significantly reduced selective pressure on the major egg-yolk proteins (the vitellogenin family). This led to a progressive loss of vitellogenins, which - in oviparous species - act as storage proteins for lipids, amino acids, phosphorous and calcium in the isolated egg. The passage to internal fertilization and placentation in therian mam¬mals rendered vitellogenins completely dispensable, which ended in the loss of the whole gene family in this lineage. As illustrated by the vitellogenin study, changes in gene content are one possible underlying factor for the evolution of mammalian-specific phenotypes. However, more subtle genomic changes, such as mutations in protein-coding sequences, can also greatly affect the phenotypes. In particular, it was proposed that changes at the level of gene reg¬ulation could underlie many (or even most) phenotypic differences between species. In the second part of my thesis, I participated in a major comparative study of mammalian tissue transcriptomes, with the goal of understanding how evolutionary forces affected expression patterns in the past 200 million years of mammalian evolution. I showed that, while com¬parisons of gene expressions are in agreement with the known species phylogeny, the rate of expression evolution varies greatly among lineages. Species with low effective population size, such as monotremes and hominoids, showed significantly accelerated rates of gene expression evolution. The most likely explanation for the high rate of gene expression evolution in these lineages is the accumulation of mildly deleterious mutations in regulatory regions, due to the low efficiency of purifying selection. Thus, our observations are in agreement with the nearly neutral theory of molecular evolution. I also describe substantial differences in evolutionary rates between tissues, with brain being the most constrained (especially in primates) and testis significantly accelerated. The rate of gene expression evolution also varies significantly between chromosomes. In particular, I observed an acceleration of gene expression changes on the X chromosome, probably as a result of adaptive processes associated with the origin of therian sex chromosomes. Lastly, I identified several individual genes as well as co-regulated expression modules that have undergone lineage specific expression changes and likely under¬lie various phenotypic innovations in mammals. The methods developed during my thesis, as well as the comprehensive gene content analyses and transcriptomics datasets made available by our group, will likely prove to be useful for further exploratory analyses of the diverse mammalian phenotypes.
Resumo:
Background: Systematic approaches for identifying proteins involved in different types of cancer are needed. Experimental techniques such as microarrays are being used to characterize cancer, but validating their results can be a laborious task. Computational approaches are used to prioritize between genes putatively involved in cancer, usually based on further analyzing experimental data. Results: We implemented a systematic method using the PIANA software that predicts cancer involvement of genes by integrating heterogeneous datasets. Specifically, we produced lists of genes likely to be involved in cancer by relying on: (i) protein-protein interactions; (ii) differential expression data; and (iii) structural and functional properties of cancer genes. The integrative approach that combines multiple sources of data obtained positive predictive values ranging from 23% (on a list of 811 genes) to 73% (on a list of 22 genes), outperforming the use of any of the data sources alone. We analyze a list of 20 cancer gene predictions, finding that most of them have been recently linked to cancer in literature. Conclusion: Our approach to identifying and prioritizing candidate cancer genes can be used to produce lists of genes likely to be involved in cancer. Our results suggest that differential expression studies yielding high numbers of candidate cancer genes can be filtered using protein interaction networks.
Resumo:
Long-range Terrestrial Laser Scanning (TLS) is widely used in studies on rock slope instabilities. TLS point clouds allow the creation of high-resolution digital elevation models for detailed mapping of landslide morphologies and the measurement of the orientation of main discontinuities. Multi-temporal TLS datasets enable the quantification of slope displacements and rockfall volumes. We present three case studies using TLS for the investigation and monitoring of rock slope instabilities in Norway: 1) the analysis of 3D displacement of the Oksfjellet rock slope failure (Troms, northern Norway); 2) the detection and quantification of rockfalls along the sliding surfaces and at the front of the Kvitfjellet rock slope instability (Møre og Romsdal, western Norway); 3) the analysis of discontinuities and rotational movements of an unstable block at Stampa (Sogn og Fjordane, western Norway). These case studies highlight the possibilities but also limitations of TLS in investigating and monitoring unstable rock slopes.
Resumo:
The development of susceptibility maps for debris flows is of primary importance due to population pressure in hazardous zones. However, hazard assessment by processbased modelling at a regional scale is difficult due to the complex nature of the phenomenon, the variability of local controlling factors, and the uncertainty in modelling parameters. A regional assessment must consider a simplified approach that is not highly parameter dependant and that can provide zonation with minimum data requirements. A distributed empirical model has thus been developed for regional susceptibility assessments using essentially a digital elevation model (DEM). The model is called Flow-R for Flow path assessment of gravitational hazards at a Regional scale (available free of charge under www.flow-r.org) and has been successfully applied to different case studies in various countries with variable data quality. It provides a substantial basis for a preliminary susceptibility assessment at a regional scale. The model was also found relevant to assess other natural hazards such as rockfall, snow avalanches and floods. The model allows for automatic source area delineation, given user criteria, and for the assessment of the propagation extent based on various spreading algorithms and simple frictional laws.We developed a new spreading algorithm, an improved version of Holmgren's direction algorithm, that is less sensitive to small variations of the DEM and that is avoiding over-channelization, and so produces more realistic extents. The choices of the datasets and the algorithms are open to the user, which makes it compliant for various applications and dataset availability. Amongst the possible datasets, the DEM is the only one that is really needed for both the source area delineation and the propagation assessment; its quality is of major importance for the results accuracy. We consider a 10m DEM resolution as a good compromise between processing time and quality of results. However, valuable results have still been obtained on the basis of lower quality DEMs with 25m resolution.
Resumo:
Introduction: Responses to external stimuli are typically investigated by averaging peri-stimulus electroencephalography (EEG) epochs in order to derive event-related potentials (ERPs) across the electrode montage, under the assumption that signals that are related to the external stimulus are fixed in time across trials. We demonstrate the applicability of a single-trial model based on patterns of scalp topographies (De Lucia et al, 2007) that can be used for ERP analysis at the single-subject level. The model is able to classify new trials (or groups of trials) with minimal a priori hypotheses, using information derived from a training dataset. The features used for the classification (the topography of responses and their latency) can be neurophysiologically interpreted, because a difference in scalp topography indicates a different configuration of brain generators. An above chance classification accuracy on test datasets implicitly demonstrates the suitability of this model for EEG data. Methods: The data analyzed in this study were acquired from two separate visual evoked potential (VEP) experiments. The first entailed passive presentation of checkerboard stimuli to each of the four visual quadrants (hereafter, "Checkerboard Experiment") (Plomp et al, submitted). The second entailed active discrimination of novel versus repeated line drawings of common objects (hereafter, "Priming Experiment") (Murray et al, 2004). Four subjects per experiment were analyzed, using approx. 200 trials per experimental condition. These trials were randomly separated in training (90%) and testing (10%) datasets in 10 independent shuffles. In order to perform the ERP analysis we estimated the statistical distribution of voltage topographies by a Mixture of Gaussians (MofGs), which reduces our original dataset to a small number of representative voltage topographies. We then evaluated statistically the degree of presence of these template maps across trials and whether and when this was different across experimental conditions. Based on these differences, single-trials or sets of a few single-trials were classified as belonging to one or the other experimental condition. Classification performance was assessed using the Receiver Operating Characteristic (ROC) curve. Results: For the Checkerboard Experiment contrasts entailed left vs. right visual field presentations for upper and lower quadrants, separately. The average posterior probabilities, indicating the presence of the computed template maps in time and across trials revealed significant differences starting at ~60-70 ms post-stimulus. The average ROC curve area across all four subjects was 0.80 and 0.85 for upper and lower quadrants, respectively and was in all cases significantly higher than chance (unpaired t-test, p<0.0001). In the Priming Experiment, we contrasted initial versus repeated presentations of visual object stimuli. Their posterior probabilities revealed significant differences, which started at 250ms post-stimulus onset. The classification accuracy rates with single-trial test data were at chance level. We therefore considered sub-averages based on five single trials. We found that for three out of four subjects' classification rates were significantly above chance level (unpaired t-test, p<0.0001). Conclusions: The main advantage of the present approach is that it is based on topographic features that are readily interpretable along neurophysiologic lines. As these maps were previously normalized by the overall strength of the field potential on the scalp, a change in their presence across trials and between conditions forcibly reflects a change in the underlying generator configurations. The temporal periods of statistical difference between conditions were estimated for each training dataset for ten shuffles of the data. Across the ten shuffles and in both experiments, we observed a high level of consistency in the temporal periods over which the two conditions differed. With this method we are able to analyze ERPs at the single-subject level providing a novel tool to compare normal electrophysiological responses versus single cases that cannot be considered part of any cohort of subjects. This aspect promises to have a strong impact on both basic and clinical research.
Resumo:
BACKGROUND: Structural mutations (SMs) play a major role in cancer development. In some cancers, such as breast and ovarian, DNA double-strand breaks (DSBs) occur more frequently in transcribed regions, while in other cancer types such as prostate, there is a consistent depletion of breakpoints in transcribed regions. Despite such regularity, little is understood about the mechanisms driving these effects. A few works have suggested that protein binding may be relevant, e.g. in studies of androgen receptor binding and active chromatin in specific cell types. We hypothesized that this behavior might be general, i.e. that correlation between protein-DNA binding (and open chromatin) and breakpoint locations is common across divergent cancers. RESULTS: We investigated this hypothesis by comprehensively analyzing the relationship among 457 ENCODE protein binding ChIP-seq experiments, 125 DnaseI and 24 FAIRE experiments, and 14,600 SMs from 8 diverse cancer datasets covering 147 samples. In most cancers, including breast and ovarian, we found enrichment of protein binding and open chromatin in the vicinity of SM breakpoints at distances up to 200 kb. Furthermore, for all cancer types we observed an enhanced enrichment in regions distant from genes when compared to regions proximal to genes, suggesting that the SM-induction mechanism is independent from the bias of DSBs to occur near transcribed regions. We also observed a stronger effect for sites with more than one protein bound. CONCLUSIONS: Protein binding and open chromatin state are associated with nearby SM breakpoints in many cancer datasets. These observations suggest a consistent mechanism underlying SM locations across different cancers.
Resumo:
Data mining can be defined as the extraction of previously unknown and potentially useful information from large datasets. The main principle is to devise computer programs that run through databases and automatically seek deterministic patterns. It is applied in different fields of application, e.g., remote sensing, biometry, speech recognition, but has seldom been applied to forensic case data. The intrinsic difficulty related to the use of such data lies in its heterogeneity, which comes from the many different sources of information. The aim of this study is to highlight potential uses of pattern recognition that would provide relevant results from a criminal intelligence point of view. The role of data mining within a global crime analysis methodology is to detect all types of structures in a dataset. Once filtered and interpreted, those structures can point to previously unseen criminal activities. The interpretation of patterns for intelligence purposes is the final stage of the process. It allows the researcher to validate the whole methodology and to refine each step if necessary. An application to cutting agents found in illicit drug seizures was performed. A combinatorial approach was done, using the presence and the absence of products. Methods coming from the graph theory field were used to extract patterns in data constituted by links between products and place and date of seizure. A data mining process completed using graphing techniques is called ``graph mining''. Patterns were detected that had to be interpreted and compared with preliminary knowledge to establish their relevancy. The illicit drug profiling process is actually an intelligence process that uses preliminary illicit drug classes to classify new samples. Methods proposed in this study could be used \textit{a priori} to compare structures from preliminary and post-detection patterns. This new knowledge of a repeated structure may provide valuable complementary information to profiling and become a source of intelligence.
Resumo:
The huge conservation interest that mammals attract and the large datasets that have been collected on them have propelled a diversity of global mammal prioritization schemes, but no comprehensive global mammal conservation strategy. We highlight some of the potential discrepancies between the schemes presented in this theme issue, including: conservation of species or areas, reactive and proactive conservation approaches, conservation knowledge and action, levels of aggregation of indicators of trend and scale issues. We propose that recently collected global mammal data and many of the mammal prioritization schemes now available could be incorporated into a comprehensive global strategy for the conservation of mammals. The task of developing such a strategy should be coordinated by a super-partes, authoritative institution (e.g. the International Union for Conservation of Nature, IUCN). The strategy would facilitate funding agencies, conservation organizations and national institutions to rapidly identify a number of short-term and long-term global conservation priorities, and act complementarily to achieve them.
Resumo:
Este proyecto final de carrera, consiste en la creación de un blog que gira en torno de una misma temática: el P.I.R (Psicólogo interno residente). Este título se crea en diciembre de 1998 mediante el Real Decreto 2490/1998. Se obtiene realizando la formación especializada en una plaza sanitaria de psicólogo interno residente o por un proceso de solicitud, de acuerdo con lo previsto en las disposiciones transitorias, para quienes ejercen antes de la creación del mismo.
Resumo:
The potential of type-2 fuzzy sets for managing high levels of uncertainty in the subjective knowledge of experts or of numerical information has focused on control and pattern classification systems in recent years. One of the main challenges in designing a type-2 fuzzy logic system is how to estimate the parameters of type-2 fuzzy membership function (T2MF) and the Footprint of Uncertainty (FOU) from imperfect and noisy datasets. This paper presents an automatic approach for learning and tuning Gaussian interval type-2 membership functions (IT2MFs) with application to multi-dimensional pattern classification problems. T2MFs and their FOUs are tuned according to the uncertainties in the training dataset by a combination of genetic algorithm (GA) and crossvalidation techniques. In our GA-based approach, the structure of the chromosome has fewer genes than other GA methods and chromosome initialization is more precise. The proposed approach addresses the application of the interval type-2 fuzzy logic system (IT2FLS) for the problem of nodule classification in a lung Computer Aided Detection (CAD) system. The designed IT2FLS is compared with its type-1 fuzzy logic system (T1FLS) counterpart. The results demonstrate that the IT2FLS outperforms the T1FLS by more than 30% in terms of classification accuracy.
Resumo:
An important problem in descriptive and prescriptive research in decision making is to identify regions of rationality, i.e., the areas for which heuristics are and are not effective. To map the contours of such regions, we derive probabilities that heuristics identify the best of m alternatives (m > 2) characterized by k attributes or cues (k > 1). The heuristics include a single variable (lexicographic), variations of elimination-by-aspects, equal weighting, hybrids of the preceding, and models exploiting dominance. We use twenty simulated and four empirical datasets for illustration. We further provide an overview by regressing heuristic performance on factors characterizing environments. Overall, sensible heuristics generally yield similar choices in many environments. However, selection of the appropriate heuristic can be important in some regions (e.g., if there is low inter-correlation among attributes/cues). Since our work assumes a hit or miss decision criterion, we conclude by outlining extensions for exploring the effects of different loss functions.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
An important policy issue in recent years concerns the number of people claimingdisability benefits for reasons of incapacity for work. We distinguish between workdisability , which may have its roots in economic and social circumstances, and healthdisability which arises from clear diagnosed medical conditions. Although there is a linkbetween work and health disability, economic conditions, and in particular the businesscycle and variations in the risk of unemployment over time and across localities, mayplay an important part in explaining both the stock of disability benefit claimants andinflows to and outflow from that stock. We employ a variety of cross?country andcountry?specific household panel data sets, as well as administrative data, to testwhether disability benefit claims rise when unemployment is higher, and also toinvestigate the impact of unemployment rates on flows on and off the benefit rolls. Wefind strong evidence that local variations in unemployment have an importantexplanatory role for disability benefit receipt, with higher total enrolments, loweroutflows from rolls and, often, higher inflows into disability rolls in regions and periodsof above?average unemployment. Although general subjective measures of selfreporteddisability and longstanding illness are also positively associated withunemployment rates, inclusion of self?reported health measures does not eliminate thestatistical relationship between unemployment rates and disability benefit receipt;indeed including general measures of health often strengthens that underlyingrelationship. Intriguingly, we also find some evidence from the United Kingdom and theUnited States that the prevalence of self?reported objective specific indicators ofdisability are often pro?cyclical that is, the incidence of specific forms of disability arepro?cyclical whereas claims for disability benefits given specific health conditions arecounter?cyclical. Overall, the analysis suggests that, for a range of countries and datasets, levels of claims for disability benefits are not simply related to changes in theincidence of health disability in the population and are strongly influenced by prevailingeconomic conditions. We discuss the policy implications of these various findings.