946 resultados para Legacy datasets
Resumo:
HIV virulence, i.e. the time of progression to AIDS, varies greatly among patients. As for other rapidly evolving pathogens of humans, it is difficult to know if this variance is controlled by the genotype of the host or that of the virus because the transmission chain is usually unknown. We apply the phylogenetic comparative approach (PCA) to estimate the heritability of a trait from one infection to the next, which indicates the control of the virus genotype over this trait. The idea is to use viral RNA sequences obtained from patients infected by HIV-1 subtype B to build a phylogeny, which approximately reflects the transmission chain. Heritability is measured statistically as the propensity for patients close in the phylogeny to exhibit similar infection trait values. The approach reveals that up to half of the variance in set-point viral load, a trait associated with virulence, can be heritable. Our estimate is significant and robust to noise in the phylogeny. We also check for the consistency of our approach by showing that a trait related to drug resistance is almost entirely heritable. Finally, we show the importance of taking into account the transmission chain when estimating correlations between infection traits. The fact that HIV virulence is, at least partially, heritable from one infection to the next has clinical and epidemiological implications. The difference between earlier studies and ours comes from the quality of our dataset and from the power of the PCA, which can be applied to large datasets and accounts for within-host evolution. The PCA opens new perspectives for approaches linking clinical data and evolutionary biology because it can be extended to study other traits or other infectious diseases.
Resumo:
Abstract. Terrestrial laser scanning (TLS) is one of the most promising surveying techniques for rockslope characteriza- tion and monitoring. Landslide and rockfall movements can be detected by means of comparison of sequential scans. One of the most pressing challenges of natural hazards is com- bined temporal and spatial prediction of rockfall. An outdoor experiment was performed to ascertain whether the TLS in- strumental error is small enough to enable detection of pre- cursory displacements of millimetric magnitude. This con- sists of a known displacement of three objects relative to a stable surface. Results show that millimetric changes cannot be detected by the analysis of the unprocessed datasets. Dis- placement measurement are improved considerably by ap- plying Nearest Neighbour (NN) averaging, which reduces the error (1σ ) up to a factor of 6. This technique was ap- plied to displacements prior to the April 2007 rockfall event at Castellfollit de la Roca, Spain. The maximum precursory displacement measured was 45 mm, approximately 2.5 times the standard deviation of the model comparison, hampering the distinction between actual displacement and instrumen- tal error using conventional methodologies. Encouragingly, the precursory displacement was clearly detected by apply- ing the NN averaging method. These results show that mil- limetric displacements prior to failure can be detected using TLS.
Resumo:
Background: Microarray data is frequently used to characterize the expression profile of a whole genome and to compare the characteristics of that genome under several conditions. Geneset analysis methods have been described previously to analyze the expression values of several genes related by known biological criteria (metabolic pathway, pathology signature, co-regulation by a common factor, etc.) at the same time and the cost of these methods allows for the use of more values to help discover the underlying biological mechanisms. Results: As several methods assume different null hypotheses, we propose to reformulate the main question that biologists seek to answer. To determine which genesets are associated with expression values that differ between two experiments, we focused on three ad hoc criteria: expression levels, the direction of individual gene expression changes (up or down regulation), and correlations between genes. We introduce the FAERI methodology, tailored from a two-way ANOVA to examine these criteria. The significance of the results was evaluated according to the self-contained null hypothesis, using label sampling or by inferring the null distribution from normally distributed random data. Evaluations performed on simulated data revealed that FAERI outperforms currently available methods for each type of set tested. We then applied the FAERI method to analyze three real-world datasets on hypoxia response. FAERI was able to detect more genesets than other methodologies, and the genesets selected were coherent with current knowledge of cellular response to hypoxia. Moreover, the genesets selected by FAERI were confirmed when the analysis was repeated on two additional related datasets. Conclusions: The expression values of genesets are associated with several biological effects. The underlying mathematical structure of the genesets allows for analysis of data from several genes at the same time. Focusing on expression levels, the direction of the expression changes, and correlations, we showed that two-step data reduction allowed us to significantly improve the performance of geneset analysis using a modified two-way ANOVA procedure, and to detect genesets that current methods fail to detect.
Resumo:
One of the important questions in biological evolution is to know if certain changes along protein coding genes have contributed to the adaptation of species. This problem is known to be biologically complex and computationally very expensive. It, therefore, requires efficient Grid or cluster solutions to overcome the computational challenge. We have developed a Grid-enabled tool (gcodeml) that relies on the PAML (codeml) package to help analyse large phylogenetic datasets on both Grids and computational clusters. Although we report on results for gcodeml, our approach is applicable and customisable to related problems in biology or other scientific domains.
Resumo:
Given a set of images of scenes containing different object categories (e.g. grass, roads) our objective is to discover these objects in each image, and to use this object occurrences to perform a scene classification (e.g. beach scene, mountain scene). We achieve this by using a supervised learning algorithm able to learn with few images to facilitate the user task. We use a probabilistic model to recognise the objects and further we classify the scene based on their object occurrences. Experimental results are shown and evaluated to prove the validity of our proposal. Object recognition performance is compared to the approaches of He et al. (2004) and Marti et al. (2001) using their own datasets. Furthermore an unsupervised method is implemented in order to evaluate the advantages and disadvantages of our supervised classification approach versus an unsupervised one
Resumo:
Neuroimaging studies analyzing neurophysiological signals are typically based on comparing averages of peri-stimulus epochs across experimental conditions. This approach can however be problematic in the case of high-level cognitive tasks, where response variability across trials is expected to be high and in cases where subjects cannot be considered part of a group. The main goal of this thesis has been to address this issue by developing a novel approach for analyzing electroencephalography (EEG) responses at the single-trial level. This approach takes advantage of the spatial distribution of the electric field on the scalp (topography) and exploits repetitions across trials for quantifying the degree of discrimination between experimental conditions through a classification scheme. In the first part of this thesis, I developed and validated this new method (Tzovara et al., 2012a,b). Its general applicability was demonstrated with three separate datasets, two in the visual modality and one in the auditory. This development allowed then to target two new lines of research, one in basic and one in clinical neuroscience, which represent the second and third part of this thesis respectively. For the second part of this thesis (Tzovara et al., 2012c), I employed the developed method for assessing the timing of exploratory decision-making. Using single-trial topographic EEG activity during presentation of a choice's payoff, I could predict the subjects' subsequent decisions. This prediction was due to a topographic difference which appeared on average at ~516ms after the presentation of payoff and was subject-specific. These results exploit for the first time the temporal correlates of individual subjects' decisions and additionally show that the underlying neural generators start differentiating their responses already ~880ms before the button press. Finally, in the third part of this project, I focused on a clinical study with the goal of assessing the degree of intact neural functions in comatose patients. Auditory EEG responses were assessed through a classical mismatch negativity paradigm, during the very early phase of coma, which is currently under-investigated. By taking advantage of the decoding method developed in the first part of the thesis, I could quantify the degree of auditory discrimination at the single patient level (Tzovara et al., in press). Our results showed for the first time that even patients who do not survive the coma can discriminate sounds at the neural level, during the first hours after coma onset. Importantly, an improvement in auditory discrimination during the first 48hours of coma was predictive of awakening and survival, with 100% positive predictive value. - L'analyse des signaux électrophysiologiques en neuroimagerie se base typiquement sur la comparaison des réponses neurophysiologiques à différentes conditions expérimentales qui sont moyennées après plusieurs répétitions d'une tâche. Pourtant, cette approche peut être problématique dans le cas des fonctions cognitives de haut niveau, où la variabilité des réponses entre les essais peut être très élevéeou dans le cas où des sujets individuels ne peuvent pas être considérés comme partie d'un groupe. Le but principal de cette thèse est d'investiguer cette problématique en développant une nouvelle approche pour l'analyse des réponses d'électroencephalographie (EEG) au niveau de chaque essai. Cette approche se base sur la modélisation de la distribution du champ électrique sur le crâne (topographie) et profite des répétitions parmi les essais afin de quantifier, à l'aide d'un schéma de classification, le degré de discrimination entre des conditions expérimentales. Dans la première partie de cette thèse, j'ai développé et validé cette nouvelle méthode (Tzovara et al., 2012a,b). Son applicabilité générale a été démontrée avec trois ensembles de données, deux dans le domaine visuel et un dans l'auditif. Ce développement a permis de cibler deux nouvelles lignes de recherche, la première dans le domaine des neurosciences cognitives et l'autre dans le domaine des neurosciences cliniques, représentant respectivement la deuxième et troisième partie de ce projet. En particulier, pour la partie cognitive, j'ai appliqué cette méthode pour évaluer l'information temporelle de la prise des décisions (Tzovara et al., 2012c). En se basant sur l'activité topographique de l'EEG au niveau de chaque essai pendant la présentation de la récompense liée à un choix, on a pu prédire les décisions suivantes des sujets (en termes d'exploration/exploitation). Cette prédiction s'appuie sur une différence topographique qui apparaît en moyenne ~516ms après la présentation de la récompense. Ces résultats exploitent pour la première fois, les corrélés temporels des décisions au niveau de chaque sujet séparément et montrent que les générateurs neuronaux de ces décisions commencent à différentier leurs réponses déjà depuis ~880ms avant que les sujets appuient sur le bouton. Finalement, pour la dernière partie de ce projet, je me suis focalisée sur une étude Clinique afin d'évaluer le degré des fonctions neuronales intactes chez les patients comateux. Des réponses EEG auditives ont été examinées avec un paradigme classique de mismatch negativity, pendant la phase précoce du coma qui est actuellement sous-investiguée. En utilisant la méthode de décodage développée dans la première partie de la thèse, j'ai pu quantifier le degré de discrimination auditive au niveau de chaque patient (Tzovara et al., in press). Nos résultats montrent pour la première fois que même des patients comateux qui ne vont pas survivre peuvent discriminer des sons au niveau neuronal, lors de la phase aigue du coma. De plus, une amélioration dans la discrimination auditive pendant les premières 48heures du coma a été observée seulement chez des patients qui se sont réveillés par la suite (100% de valeur prédictive pour un réveil).
Resumo:
Background: The variety of DNA microarray formats and datasets presently available offers an unprecedented opportunity to perform insightful comparisons of heterogeneous data. Cross-species studies, in particular, have the power of identifying conserved, functionally important molecular processes. Validation of discoveries can now often be performed in readily available public data which frequently requires cross-platform studies.Cross-platform and cross-species analyses require matching probes on different microarray formats. This can be achieved using the information in microarray annotations and additional molecular biology databases, such as orthology databases. Although annotations and other biological information are stored using modern database models ( e. g. relational), they are very often distributed and shared as tables in text files, i.e. flat file databases. This common flat database format thus provides a simple and robust solution to flexibly integrate various sources of information and a basis for the combined analysis of heterogeneous gene expression profiles.Results: We provide annotationTools, a Bioconductor-compliant R package to annotate microarray experiments and integrate heterogeneous gene expression profiles using annotation and other molecular biology information available as flat file databases. First, annotationTools contains a specialized set of functions for mining this widely used database format in a systematic manner. It thus offers a straightforward solution for annotating microarray experiments. Second, building on these basic functions and relying on the combination of information from several databases, it provides tools to easily perform cross-species analyses of gene expression data.Here, we present two example applications of annotationTools that are of direct relevance for the analysis of heterogeneous gene expression profiles, namely a cross-platform mapping of probes and a cross-species mapping of orthologous probes using different orthology databases. We also show how to perform an explorative comparison of disease-related transcriptional changes in human patients and in a genetic mouse model.Conclusion: The R package annotationTools provides a simple solution to handle microarray annotation and orthology tables, as well as other flat molecular biology databases. Thereby, it allows easy integration and analysis of heterogeneous microarray experiments across different technological platforms or species.
Resumo:
The analysis of multi-modal and multi-sensor images is nowadays of paramount importance for Earth Observation (EO) applications. There exist a variety of methods that aim at fusing the different sources of information to obtain a compact representation of such datasets. However, for change detection existing methods are often unable to deal with heterogeneous image sources and very few consider possible nonlinearities in the data. Additionally, the availability of labeled information is very limited in change detection applications. For these reasons, we present the use of a semi-supervised kernel-based feature extraction technique. It incorporates a manifold regularization accounting for the geometric distribution and jointly addressing the small sample problem. An exhaustive example using Landsat 5 data illustrates the potential of the method for multi-sensor change detection.
Resumo:
A recent finding of the structural VAR literature is that the response of hours worked to a technology shock depends on the assumption on the order of integration of the hours. In this work we relax this assumption, allowing for fractional integration and long memory in the process for hours and productivity. We find that the sign and magnitude of the estimated impulse responses of hours to a positive technology shock depend crucially on the assumptions applied to identify them. Responses estimated with short-run identification are positive and statistically significant in all datasets analyzed. Long-run identification results in negative often not statistically significant responses. We check validity of these assumptions with the Sims (1989) procedure, concluding that both types of assumptions are appropriate to recover the impulse responses of hours in a fractionally integrated VAR. However, the application of longrun identification results in a substantial increase of the sampling uncertainty. JEL Classification numbers: C22, E32. Keywords: technology shock, fractional integration, hours worked, structural VAR, identification
Resumo:
BACKGROUND: The goals of our study are to determine the most appropriate model for alcohol consumption as an exposure for burden of disease, to analyze the effect of the chosen alcohol consumption distribution on the estimation of the alcohol Population- Attributable Fractions (PAFs), and to characterize the chosen alcohol consumption distribution by exploring if there is a global relationship within the distribution. METHODS: To identify the best model, the Log-Normal, Gamma, and Weibull prevalence distributions were examined using data from 41 surveys from Gender, Alcohol and Culture: An International Study (GENACIS) and from the European Comparative Alcohol Study. To assess the effect of these distributions on the estimated alcohol PAFs, we calculated the alcohol PAF for diabetes, breast cancer, and pancreatitis using the three above-named distributions and using the more traditional approach based on categories. The relationship between the mean and the standard deviation from the Gamma distribution was estimated using data from 851 datasets for 66 countries from GENACIS and from the STEPwise approach to Surveillance from the World Health Organization. RESULTS: The Log-Normal distribution provided a poor fit for the survey data, with Gamma and Weibull distributions providing better fits. Additionally, our analyses showed that there were no marked differences for the alcohol PAF estimates based on the Gamma or Weibull distributions compared to PAFs based on categorical alcohol consumption estimates. The standard deviation of the alcohol distribution was highly dependent on the mean, with a unit increase in alcohol consumption associated with a unit increase in the mean of 1.258 (95% CI: 1.223 to 1.293) (R2 = 0.9207) for women and 1.171 (95% CI: 1.144 to 1.197) (R2 = 0. 9474) for men. CONCLUSIONS: Although the Gamma distribution and the Weibull distribution provided similar results, the Gamma distribution is recommended to model alcohol consumption from population surveys due to its fit, flexibility, and the ease with which it can be modified. The results showed that a large degree of variance of the standard deviation of the alcohol consumption Gamma distribution was explained by the mean alcohol consumption, allowing for alcohol consumption to be modeled through a Gamma distribution using only average consumption.
Resumo:
Cancer genomes frequently contain somatic copy number alterations (SCNA) that can significantly perturb the expression level of affected genes and thus disrupt pathways controlling normal growth. In melanoma, many studies have focussed on the copy number and gene expression levels of the BRAF, PTEN and MITF genes, but little has been done to identify new genes using these parameters at the genome-wide scale. Using karyotyping, SNP and CGH arrays, and RNA-seq, we have identified SCNA affecting gene expression ('SCNA-genes') in seven human metastatic melanoma cell lines. We showed that the combination of these techniques is useful to identify candidate genes potentially involved in tumorigenesis. Since few of these alterations were recurrent across our samples, we used a protein network-guided approach to determine whether any pathways were enriched in SCNA-genes in one or more samples. From this unbiased genome-wide analysis, we identified 28 significantly enriched pathway modules. Comparison with two large, independent melanoma SCNA datasets showed less than 10% overlap at the individual gene level, but network-guided analysis revealed 66% shared pathways, including all but three of the pathways identified in our data. Frequently altered pathways included WNT, cadherin signalling, angiogenesis and melanogenesis. Additionally, our results emphasize the potential of the EPHA3 and FRS2 gene products, involved in angiogenesis and migration, as possible therapeutic targets in melanoma. Our study demonstrates the utility of network-guided approaches, for both large and small datasets, to identify pathways recurrently perturbed in cancer.
Resumo:
Mammals are characterized by specific phenotypic traits that include lactation, hair, and relatively large brains with unique structures. Individual mammalian lineages have, in turn, evolved characteristic traits that distinguish them from others. These include obvious anatom¬ical differences but also differences related to reproduction, life span, cognitive abilities, be¬havior. and disease susceptibility. However, the molecular basis of the diverse mammalian phenotypes and the selective pressures that shaped their evolution remain largely unknown. In the first part of my thesis, I analyzed the genetic factors associated with the origin of a unique mammalian phenotype lactation and I studied the selective pressures that forged the transition from oviparity to viviparity. Using a comparative genomics approach and evolutionary simulations, I showed that the emergence of lactation, as well as the appear¬ance of the casein gene family, significantly reduced selective pressure on the major egg-yolk proteins (the vitellogenin family). This led to a progressive loss of vitellogenins, which - in oviparous species - act as storage proteins for lipids, amino acids, phosphorous and calcium in the isolated egg. The passage to internal fertilization and placentation in therian mam¬mals rendered vitellogenins completely dispensable, which ended in the loss of the whole gene family in this lineage. As illustrated by the vitellogenin study, changes in gene content are one possible underlying factor for the evolution of mammalian-specific phenotypes. However, more subtle genomic changes, such as mutations in protein-coding sequences, can also greatly affect the phenotypes. In particular, it was proposed that changes at the level of gene reg¬ulation could underlie many (or even most) phenotypic differences between species. In the second part of my thesis, I participated in a major comparative study of mammalian tissue transcriptomes, with the goal of understanding how evolutionary forces affected expression patterns in the past 200 million years of mammalian evolution. I showed that, while com¬parisons of gene expressions are in agreement with the known species phylogeny, the rate of expression evolution varies greatly among lineages. Species with low effective population size, such as monotremes and hominoids, showed significantly accelerated rates of gene expression evolution. The most likely explanation for the high rate of gene expression evolution in these lineages is the accumulation of mildly deleterious mutations in regulatory regions, due to the low efficiency of purifying selection. Thus, our observations are in agreement with the nearly neutral theory of molecular evolution. I also describe substantial differences in evolutionary rates between tissues, with brain being the most constrained (especially in primates) and testis significantly accelerated. The rate of gene expression evolution also varies significantly between chromosomes. In particular, I observed an acceleration of gene expression changes on the X chromosome, probably as a result of adaptive processes associated with the origin of therian sex chromosomes. Lastly, I identified several individual genes as well as co-regulated expression modules that have undergone lineage specific expression changes and likely under¬lie various phenotypic innovations in mammals. The methods developed during my thesis, as well as the comprehensive gene content analyses and transcriptomics datasets made available by our group, will likely prove to be useful for further exploratory analyses of the diverse mammalian phenotypes.
Resumo:
Background: Systematic approaches for identifying proteins involved in different types of cancer are needed. Experimental techniques such as microarrays are being used to characterize cancer, but validating their results can be a laborious task. Computational approaches are used to prioritize between genes putatively involved in cancer, usually based on further analyzing experimental data. Results: We implemented a systematic method using the PIANA software that predicts cancer involvement of genes by integrating heterogeneous datasets. Specifically, we produced lists of genes likely to be involved in cancer by relying on: (i) protein-protein interactions; (ii) differential expression data; and (iii) structural and functional properties of cancer genes. The integrative approach that combines multiple sources of data obtained positive predictive values ranging from 23% (on a list of 811 genes) to 73% (on a list of 22 genes), outperforming the use of any of the data sources alone. We analyze a list of 20 cancer gene predictions, finding that most of them have been recently linked to cancer in literature. Conclusion: Our approach to identifying and prioritizing candidate cancer genes can be used to produce lists of genes likely to be involved in cancer. Our results suggest that differential expression studies yielding high numbers of candidate cancer genes can be filtered using protein interaction networks.
Resumo:
Long-range Terrestrial Laser Scanning (TLS) is widely used in studies on rock slope instabilities. TLS point clouds allow the creation of high-resolution digital elevation models for detailed mapping of landslide morphologies and the measurement of the orientation of main discontinuities. Multi-temporal TLS datasets enable the quantification of slope displacements and rockfall volumes. We present three case studies using TLS for the investigation and monitoring of rock slope instabilities in Norway: 1) the analysis of 3D displacement of the Oksfjellet rock slope failure (Troms, northern Norway); 2) the detection and quantification of rockfalls along the sliding surfaces and at the front of the Kvitfjellet rock slope instability (Møre og Romsdal, western Norway); 3) the analysis of discontinuities and rotational movements of an unstable block at Stampa (Sogn og Fjordane, western Norway). These case studies highlight the possibilities but also limitations of TLS in investigating and monitoring unstable rock slopes.
Resumo:
The development of susceptibility maps for debris flows is of primary importance due to population pressure in hazardous zones. However, hazard assessment by processbased modelling at a regional scale is difficult due to the complex nature of the phenomenon, the variability of local controlling factors, and the uncertainty in modelling parameters. A regional assessment must consider a simplified approach that is not highly parameter dependant and that can provide zonation with minimum data requirements. A distributed empirical model has thus been developed for regional susceptibility assessments using essentially a digital elevation model (DEM). The model is called Flow-R for Flow path assessment of gravitational hazards at a Regional scale (available free of charge under www.flow-r.org) and has been successfully applied to different case studies in various countries with variable data quality. It provides a substantial basis for a preliminary susceptibility assessment at a regional scale. The model was also found relevant to assess other natural hazards such as rockfall, snow avalanches and floods. The model allows for automatic source area delineation, given user criteria, and for the assessment of the propagation extent based on various spreading algorithms and simple frictional laws.We developed a new spreading algorithm, an improved version of Holmgren's direction algorithm, that is less sensitive to small variations of the DEM and that is avoiding over-channelization, and so produces more realistic extents. The choices of the datasets and the algorithms are open to the user, which makes it compliant for various applications and dataset availability. Amongst the possible datasets, the DEM is the only one that is really needed for both the source area delineation and the propagation assessment; its quality is of major importance for the results accuracy. We consider a 10m DEM resolution as a good compromise between processing time and quality of results. However, valuable results have still been obtained on the basis of lower quality DEMs with 25m resolution.