45 resultados para Neural Network Models for Competing Risks Data
Resumo:
MOTIVATION: Combinatorial interactions of transcription factors with cis-regulatory elements control the dynamic progression through successive cellular states and thus underpin all metazoan development. The construction of network models of cis-regulatory elements, therefore, has the potential to generate fundamental insights into cellular fate and differentiation. Haematopoiesis has long served as a model system to study mammalian differentiation, yet modelling based on experimentally informed cis-regulatory interactions has so far been restricted to pairs of interacting factors. Here, we have generated a Boolean network model based on detailed cis-regulatory functional data connecting 11 haematopoietic stem/progenitor cell (HSPC) regulator genes. RESULTS: Despite its apparent simplicity, the model exhibits surprisingly complex behaviour that we charted using strongly connected components and shortest-path analysis in its Boolean state space. This analysis of our model predicts that HSPCs display heterogeneous expression patterns and possess many intermediate states that can act as 'stepping stones' for the HSPC to achieve a final differentiated state. Importantly, an external perturbation or 'trigger' is required to exit the stem cell state, with distinct triggers characterizing maturation into the various different lineages. By focusing on intermediate states occurring during erythrocyte differentiation, from our model we predicted a novel negative regulation of Fli1 by Gata1, which we confirmed experimentally thus validating our model. In conclusion, we demonstrate that an advanced mammalian regulatory network model based on experimentally validated cis-regulatory interactions has allowed us to make novel, experimentally testable hypotheses about transcriptional mechanisms that control differentiation of mammalian stem cells. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Resumo:
The paper presents an approach for mapping of precipitation data. The main goal is to perform spatial predictions and simulations of precipitation fields using geostatistical methods (ordinary kriging, kriging with external drift) as well as machine learning algorithms (neural networks). More practically, the objective is to reproduce simultaneously both the spatial patterns and the extreme values. This objective is best reached by models integrating geostatistics and machine learning algorithms. To demonstrate how such models work, two case studies have been considered: first, a 2-day accumulation of heavy precipitation and second, a 6-day accumulation of extreme orographic precipitation. The first example is used to compare the performance of two optimization algorithms (conjugate gradients and Levenberg-Marquardt) of a neural network for the reproduction of extreme values. Hybrid models, which combine geostatistical and machine learning algorithms, are also treated in this context. The second dataset is used to analyze the contribution of radar Doppler imagery when used as external drift or as input in the models (kriging with external drift and neural networks). Model assessment is carried out by comparing independent validation errors as well as analyzing data patterns.
Resumo:
In occupational exposure assessment of airborne contaminants, exposure levels can either be estimated through repeated measurements of the pollutant concentration in air, expert judgment or through exposure models that use information on the conditions of exposure as input. In this report, we propose an empirical hierarchical Bayesian model to unify these approaches. Prior to any measurement, the hygienist conducts an assessment to generate prior distributions of exposure determinants. Monte-Carlo samples from these distributions feed two level-2 models: a physical, two-compartment model, and a non-parametric, neural network model trained with existing exposure data. The outputs of these two models are weighted according to the expert's assessment of their relevance to yield predictive distributions of the long-term geometric mean and geometric standard deviation of the worker's exposure profile (level-1 model). Bayesian inferences are then drawn iteratively from subsequent measurements of worker exposure. Any traditional decision strategy based on a comparison with occupational exposure limits (e.g. mean exposure, exceedance strategies) can then be applied. Data on 82 workers exposed to 18 contaminants in 14 companies were used to validate the model with cross-validation techniques. A user-friendly program running the model is available upon request.
Resumo:
Epidemiological data indicate that 75% of subjects with major psychiatric disorders have their onset in the age range of 17-24 years. An estimated 35-50% of college and university students drop out prematurely due to insufficient coping skills under chronic stress, while 85% of students receiving a psychiatric diagnosis withdraw from college/university prior to the completion of their education. In this study we aimed at developing standardized means for identifying students with insufficient coping skills under chronic stress and at risk for mental health problems. A sample of 1,217 college students from 3 different sites in the U.S. and Switzerland completed 2 self-report questionnaires: the Coping Strategies Inventory "COPE" and the Zurich Health Questionnaire "ZHQ" which assesses "regular exercises", "consumption behavior", "impaired physical health", "psychosomatic disturbances", and "impaired mental health". The data were subjected to structure analyses by means of a Neural Network approach. We found 2 highly stable and reproducible COPE scales that explained the observed inter-individual variation in coping behavior sufficiently well and in a socio-culturally independent way. The scales reflected basic coping behavior in terms of "activity-passivity" and "defeatism-resilience", and in the sense of stable, socio-culturally independent personality traits. Correlation analyses carried out for external validation revealed a close relationship between high scores on the defeatism scale and impaired physical and mental health. This underlined the role of insufficient coping behavior as a risk factor for physical and mental health problems. The combined COPE and ZHQ instruments appear to constitute powerful screening tools for insufficient coping skills under chronic stress and for risks of mental health problems.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
The localization of Last Glacial Maximum (LGM) refugia is crucial information to understand a species' history and predict its reaction to future climate changes. However, many phylogeographical studies often lack sampling designs intensive enough to precisely localize these refugia. The hairy land snail Trochulus villosus has a small range centred on Switzerland, which could be intensively covered by sampling 455 individuals from 52 populations. Based on mitochondrial DNA sequences (COI and 16S), we identified two divergent lineages with distinct geographical distributions. Bayesian skyline plots suggested that both lineages expanded at the end of the LGM. To find where the origin populations were located, we applied the principles of ancestral character reconstruction and identified a candidate refugium for each mtDNA lineage: the French Jura and Central Switzerland, both ice-free during the LGM. Additional refugia, however, could not be excluded, as suggested by the microsatellite analysis of a population subset. Modelling the LGM niche of T. villosus, we showed that suitable climatic conditions were expected in the inferred refugia, but potentially also in the nunataks of the alpine ice shield. In a model selection approach, we compared several alternative recolonization scenarios by estimating the Akaike information criterion for their respective maximum-likelihood migration rates. The 'two refugia' scenario received by far the best support given the distribution of genetic diversity in T. villosus populations. Provided that fine-scale sampling designs and various analytical approaches are combined, it is possible to refine our necessary understanding of species responses to environmental changes.
Resumo:
The research considers the problem of spatial data classification using machine learning algorithms: probabilistic neural networks (PNN) and support vector machines (SVM). As a benchmark model simple k-nearest neighbor algorithm is considered. PNN is a neural network reformulation of well known nonparametric principles of probability density modeling using kernel density estimator and Bayesian optimal or maximum a posteriori decision rules. PNN is well suited to problems where not only predictions but also quantification of accuracy and integration of prior information are necessary. An important property of PNN is that they can be easily used in decision support systems dealing with problems of automatic classification. Support vector machine is an implementation of the principles of statistical learning theory for the classification tasks. Recently they were successfully applied for different environmental topics: classification of soil types and hydro-geological units, optimization of monitoring networks, susceptibility mapping of natural hazards. In the present paper both simulated and real data case studies (low and high dimensional) are considered. The main attention is paid to the detection and learning of spatial patterns by the algorithms applied.
Resumo:
Abstract : This work is concerned with the development and application of novel unsupervised learning methods, having in mind two target applications: the analysis of forensic case data and the classification of remote sensing images. First, a method based on a symbolic optimization of the inter-sample distance measure is proposed to improve the flexibility of spectral clustering algorithms, and applied to the problem of forensic case data. This distance is optimized using a loss function related to the preservation of neighborhood structure between the input space and the space of principal components, and solutions are found using genetic programming. Results are compared to a variety of state-of--the-art clustering algorithms. Subsequently, a new large-scale clustering method based on a joint optimization of feature extraction and classification is proposed and applied to various databases, including two hyperspectral remote sensing images. The algorithm makes uses of a functional model (e.g., a neural network) for clustering which is trained by stochastic gradient descent. Results indicate that such a technique can easily scale to huge databases, can avoid the so-called out-of-sample problem, and can compete with or even outperform existing clustering algorithms on both artificial data and real remote sensing images. This is verified on small databases as well as very large problems. Résumé : Ce travail de recherche porte sur le développement et l'application de méthodes d'apprentissage dites non supervisées. Les applications visées par ces méthodes sont l'analyse de données forensiques et la classification d'images hyperspectrales en télédétection. Dans un premier temps, une méthodologie de classification non supervisée fondée sur l'optimisation symbolique d'une mesure de distance inter-échantillons est proposée. Cette mesure est obtenue en optimisant une fonction de coût reliée à la préservation de la structure de voisinage d'un point entre l'espace des variables initiales et l'espace des composantes principales. Cette méthode est appliquée à l'analyse de données forensiques et comparée à un éventail de méthodes déjà existantes. En second lieu, une méthode fondée sur une optimisation conjointe des tâches de sélection de variables et de classification est implémentée dans un réseau de neurones et appliquée à diverses bases de données, dont deux images hyperspectrales. Le réseau de neurones est entraîné à l'aide d'un algorithme de gradient stochastique, ce qui rend cette technique applicable à des images de très haute résolution. Les résultats de l'application de cette dernière montrent que l'utilisation d'une telle technique permet de classifier de très grandes bases de données sans difficulté et donne des résultats avantageusement comparables aux méthodes existantes.
Resumo:
Brain fluctuations at rest are not random but are structured in spatial patterns of correlated activity across different brain areas. The question of how resting-state functional connectivity (FC) emerges from the brain's anatomical connections has motivated several experimental and computational studies to understand structure-function relationships. However, the mechanistic origin of resting state is obscured by large-scale models' complexity, and a close structure-function relation is still an open problem. Thus, a realistic but simple enough description of relevant brain dynamics is needed. Here, we derived a dynamic mean field model that consistently summarizes the realistic dynamics of a detailed spiking and conductance-based synaptic large-scale network, in which connectivity is constrained by diffusion imaging data from human subjects. The dynamic mean field approximates the ensemble dynamics, whose temporal evolution is dominated by the longest time scale of the system. With this reduction, we demonstrated that FC emerges as structured linear fluctuations around a stable low firing activity state close to destabilization. Moreover, the model can be further and crucially simplified into a set of motion equations for statistical moments, providing a direct analytical link between anatomical structure, neural network dynamics, and FC. Our study suggests that FC arises from noise propagation and dynamical slowing down of fluctuations in an anatomically constrained dynamical system. Altogether, the reduction from spiking models to statistical moments presented here provides a new framework to explicitly understand the building up of FC through neuronal dynamics underpinned by anatomical connections and to drive hypotheses in task-evoked studies and for clinical applications.
Resumo:
The cross-recognition of peptides by cytotoxic T lymphocytes is a key element in immunology and in particular in peptide based immunotherapy. Here we develop three-dimensional (3D) quantitative structure-activity relationships (QSARs) to predict cross-recognition by Melan-A-specific cytotoxic T lymphocytes of peptides bound to HLA A*0201 (hereafter referred to as HLA A2). First, we predict the structure of a set of self- and pathogen-derived peptides bound to HLA A2 using a previously developed ab initio structure prediction approach [Fagerberg et al., J. Mol. Biol., 521-46 (2006)]. Second, shape and electrostatic energy calculations are performed on a 3D grid to produce similarity matrices which are combined with a genetic neural network method [So et al., J. Med. Chem., 4347-59 (1997)] to generate 3D-QSAR models. The models are extensively validated using several different approaches. During the model generation, the leave-one-out cross-validated correlation coefficient (q (2)) is used as the fitness criterion and all obtained models are evaluated based on their q (2) values. Moreover, the best model obtained for a partitioned data set is evaluated by its correlation coefficient (r = 0.92 for the external test set). The physical relevance of all models is tested using a functional dependence analysis and the robustness of the models obtained for the entire data set is confirmed using y-randomization. Finally, the validated models are tested for their utility in the setting of rational peptide design: their ability to discriminate between peptides that only contain side chain substitutions in a single secondary anchor position is evaluated. In addition, the predicted cross-recognition of the mono-substituted peptides is confirmed experimentally in chromium-release assays. These results underline the utility of 3D-QSARs in peptide mimetic design and suggest that the properties of the unbound epitope are sufficient to capture most of the information to determine the cross-recognition.
Resumo:
The paper deals with the development and application of the methodology for automatic mapping of pollution/contamination data. General Regression Neural Network (GRNN) is considered in detail and is proposed as an efficient tool to solve this problem. The automatic tuning of isotropic and an anisotropic GRNN model using cross-validation procedure is presented. Results are compared with k-nearest-neighbours interpolation algorithm using independent validation data set. Quality of mapping is controlled by the analysis of raw data and the residuals using variography. Maps of probabilities of exceeding a given decision level and ?thick? isoline visualization of the uncertainties are presented as examples of decision-oriented mapping. Real case study is based on mapping of radioactively contaminated territories.
Resumo:
OBJECTIVE: To evaluate the effect of adjuvant chemotherapy (AC) on mortality after radical nephroureterectomy (RNU) for upper tract urothelial carcinoma (UTUC) with positive lymph nodes (LNs) and to identify patient subgroups that are most likely to benefit from AC. PATIENTS AND METHODS: We retrospectively analysed data of 263 patients with LN-positive UTUC, who underwent full surgical resection. In all, 107 patients (41%) received three to six cycles of AC, while 156 (59.3%) were treated with RNU alone. UTUC-related mortality was evaluated using competing-risks regression models. RESULTS: In all patients (Tall N+), administration of AC had no significant impact on UTUC-related mortality on univariable (P = 0.49) and multivariable (P = 0.11) analysis. Further stratified analyses showed that only N+ patients with pT3-4 disease benefited from AC. In this subgroup, AC reduced UTUC-related mortality by 34% (P = 0.019). The absolute difference in mortality was 10% after the first year and increased to 23% after 5 years. On multivariable analysis, administration of AC was associated with significantly reduced UTUC-related mortality (subhazard ratio 0.67, P = 0.022). Limitations of this study are the retrospective non-randomised design, selection bias, absence of a central pathological review and different AC protocols. CONCLUSIONS: AC seems to reduce mortality in patients with pT3-4 LN-positive UTUC after RNU. This subgroup of LN-positive patients could serve as target population for an AC prospective randomised trial.
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
Ubiquitin ligases play a pivotal role in substrate recognition and ubiquitin transfer, yet little is known about the regulation of their catalytic activity. Nedd4 (neural-precursor-cell-expressed, developmentally down-regulated 4)-2 is an E3 ubiquitin ligase composed of a C2 domain, four WW domains (protein-protein interaction domains containing two conserved tryptophan residues) that bind PY motifs (L/PPXY) and a ubiquitin ligase HECT (homologous with E6-associated protein C-terminus) domain. In the present paper we show that the WW domains of Nedd4-2 bind (weakly) to a PY motif (LPXY) located within its own HECT domain and inhibit auto-ubiquitination. Pulse-chase experiments demonstrated that mutation of the HECT PY-motif decreases the stability of Nedd4-2, suggesting that it is involved in stabilization of this E3 ligase. Interestingly, the HECT PY-motif mutation does not affect ubiquitination or down-regulation of a known Nedd4-2 substrate, ENaC (epithelial sodium channel). ENaC ubiquitination, in turn, appears to promote Nedd4-2 self-ubiquitination. These results support a model in which the inter- or intra-molecular WW-domain-HECT PY-motif interaction stabilizes Nedd4-2 by preventing self-ubiquitination. Substrate binding disrupts this interaction, allowing self-ubiquitination of Nedd4-2 and subsequent degradation, resulting in down-regulation of Nedd4-2 once it has ubiquitinated its target. These findings also point to a novel mechanism employed by a ubiquitin ligase to regulate itself differentially compared with substrate ubiquitination and stability.
Resumo:
Cerebral microangiopathy (CMA) has been associated with executive dysfunction and fronto-parietal neural network disruption. Advances in magnetic resonance imaging allow more detailed analyses of gray (e.g., voxel-based morphometry-VBM) and white matter (e.g., diffusion tensor imaging-DTI) than traditional visual rating scales. The current study investigated patients with early CMA and healthy control subjects with all three approaches. Neuropsychological assessment focused on executive functions, the cognitive domain most discussed in CMA. The DTI and age-related white matter changes rating scales revealed convergent results showing widespread white matter changes in early CMA. Correlations were found in frontal and parietal areas exclusively with speeded, but not with speed-corrected executive measures. The VBM analyses showed reduced gray matter in frontal areas. All three approaches confirmed the hypothesized fronto-parietal network disruption in early CMA. Innovative methods (DTI) converged with results from conventional methods (visual rating) while allowing greater spatial and tissue accuracy. They are thus valid additions to the analysis of neural correlates of cognitive dysfunction. We found a clear distinction between speeded and nonspeeded executive measures in relationship to imaging parameters. Cognitive slowing is related to disease severity in early CMA and therefore important for early diagnostics.