816 resultados para Pattern Analysis Statistical Modeling and Computational Learning (PASCAL)
Resumo:
The thesis has covered various aspects of modeling and analysis of finite mean time series with symmetric stable distributed innovations. Time series analysis based on Box and Jenkins methods are the most popular approaches where the models are linear and errors are Gaussian. We highlighted the limitations of classical time series analysis tools and explored some generalized tools and organized the approach parallel to the classical set up. In the present thesis we mainly studied the estimation and prediction of signal plus noise model. Here we assumed the signal and noise follow some models with symmetric stable innovations.We start the thesis with some motivating examples and application areas of alpha stable time series models. Classical time series analysis and corresponding theories based on finite variance models are extensively discussed in second chapter. We also surveyed the existing theories and methods correspond to infinite variance models in the same chapter. We present a linear filtering method for computing the filter weights assigned to the observation for estimating unobserved signal under general noisy environment in third chapter. Here we consider both the signal and the noise as stationary processes with infinite variance innovations. We derived semi infinite, double infinite and asymmetric signal extraction filters based on minimum dispersion criteria. Finite length filters based on Kalman-Levy filters are developed and identified the pattern of the filter weights. Simulation studies show that the proposed methods are competent enough in signal extraction for processes with infinite variance.Parameter estimation of autoregressive signals observed in a symmetric stable noise environment is discussed in fourth chapter. Here we used higher order Yule-Walker type estimation using auto-covariation function and exemplify the methods by simulation and application to Sea surface temperature data. We increased the number of Yule-Walker equations and proposed a ordinary least square estimate to the autoregressive parameters. Singularity problem of the auto-covariation matrix is addressed and derived a modified version of the Generalized Yule-Walker method using singular value decomposition.In fifth chapter of the thesis we introduced partial covariation function as a tool for stable time series analysis where covariance or partial covariance is ill defined. Asymptotic results of the partial auto-covariation is studied and its application in model identification of stable auto-regressive models are discussed. We generalize the Durbin-Levinson algorithm to include infinite variance models in terms of partial auto-covariation function and introduce a new information criteria for consistent order estimation of stable autoregressive model.In chapter six we explore the application of the techniques discussed in the previous chapter in signal processing. Frequency estimation of sinusoidal signal observed in symmetric stable noisy environment is discussed in this context. Here we introduced a parametric spectrum analysis and frequency estimate using power transfer function. Estimate of the power transfer function is obtained using the modified generalized Yule-Walker approach. Another important problem in statistical signal processing is to identify the number of sinusoidal components in an observed signal. We used a modified version of the proposed information criteria for this purpose.
Resumo:
Analyses of ecological data should account for the uncertainty in the process(es) that generated the data. However, accounting for these uncertainties is a difficult task, since ecology is known for its complexity. Measurement and/or process errors are often the only sources of uncertainty modeled when addressing complex ecological problems, yet analyses should also account for uncertainty in sampling design, in model specification, in parameters governing the specified model, and in initial and boundary conditions. Only then can we be confident in the scientific inferences and forecasts made from an analysis. Probability and statistics provide a framework that accounts for multiple sources of uncertainty. Given the complexities of ecological studies, the hierarchical statistical model is an invaluable tool. This approach is not new in ecology, and there are many examples (both Bayesian and non-Bayesian) in the literature illustrating the benefits of this approach. In this article, we provide a baseline for concepts, notation, and methods, from which discussion on hierarchical statistical modeling in ecology can proceed. We have also planted some seeds for discussion and tried to show where the practical difficulties lie. Our thesis is that hierarchical statistical modeling is a powerful way of approaching ecological analysis in the presence of inevitable but quantifiable uncertainties, even if practical issues sometimes require pragmatic compromises.
Resumo:
In recent decades, two prominent trends have influenced the data modeling field, namely network analysis and machine learning. This thesis explores the practical applications of these techniques within the domain of drug research, unveiling their multifaceted potential for advancing our comprehension of complex biological systems. The research undertaken during this PhD program is situated at the intersection of network theory, computational methods, and drug research. Across six projects presented herein, there is a gradual increase in model complexity. These projects traverse a diverse range of topics, with a specific emphasis on drug repurposing and safety in the context of neurological diseases. The aim of these projects is to leverage existing biomedical knowledge to develop innovative approaches that bolster drug research. The investigations have produced practical solutions, not only providing insights into the intricacies of biological systems, but also allowing the creation of valuable tools for their analysis. In short, the achievements are: • A novel computational algorithm to identify adverse events specific to fixed-dose drug combinations. • A web application that tracks the clinical drug research response to SARS-CoV-2. • A Python package for differential gene expression analysis and the identification of key regulatory "switch genes". • The identification of pivotal events causing drug-induced impulse control disorders linked to specific medications. • An automated pipeline for discovering potential drug repurposing opportunities. • The creation of a comprehensive knowledge graph and development of a graph machine learning model for predictions. Collectively, these projects illustrate diverse applications of data science and network-based methodologies, highlighting the profound impact they can have in supporting drug research activities.
Resumo:
This work presents a method for predicting resource availability in opportunistic grids by means of use pattern analysis (UPA), a technique based on non-supervised learning methods. This prediction method is based on the assumption of the existence of several classes of computational resource use patterns, which can be used to predict the resource availability. Trace-driven simulations validate this basic assumptions, which also provide the parameter settings for the accurate learning of resource use patterns. Experiments made with an implementation of the UPA method show the feasibility of its use in the scheduling of grid tasks with very little overhead. The experiments also demonstrate the method`s superiority over other predictive and non-predictive methods. An adaptative prediction method is suggested to deal with the lack of training data at initialization. Further adaptative behaviour is motivated by experiments which show that, in some special environments, reliable resource use patterns may not always be detected. Copyright (C) 2009 John Wiley & Sons, Ltd.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Dissertação para obtenção do Grau de Doutor em Ciências da Educação
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.
Resumo:
We study the properties of the well known Replicator Dynamics when applied to a finitely repeated version of the Prisoners' Dilemma game. We characterize the behavior of such dynamics under strongly simplifying assumptions (i.e. only 3 strategies are available) and show that the basin of attraction of defection shrinks as the number of repetitions increases. After discussing the difficulties involved in trying to relax the 'strongly simplifying assumptions' above, we approach the same model by means of simulations based on genetic algorithms. The resulting simulations describe a behavior of the system very close to the one predicted by the replicator dynamics without imposing any of the assumptions of the mathematical model. Our main conclusion is that mathematical and computational models are good complements for research in social sciences. Indeed, while computational models are extremely useful to extend the scope of the analysis to complex scenarios hard to analyze mathematically, formal models can be useful to verify and to explain the outcomes of computational models.
Resumo:
To investigate their role in receptor coupling to G(q), we mutated all basic amino acids and some conserved hydrophobic residues of the cytosolic surface of the alpha(1b)-adrenergic receptor (AR). The wild type and mutated receptors were expressed in COS-7 cells and characterized for their ligand binding properties and ability to increase inositol phosphate accumulation. The experimental results have been interpreted in the context of both an ab initio model of the alpha(1b)-AR and of a new homology model built on the recently solved crystal structure of rhodopsin. Among the twenty-three basic amino acids mutated only mutations of three, Arg(254) and Lys(258) in the third intracellular loop and Lys(291) at the cytosolic extension of helix 6, markedly impaired the receptor-mediated inositol phosphate production. Additionally, mutations of two conserved hydrophobic residues, Val(147) and Leu(151) in the second intracellular loop had significant effects on receptor function. The functional analysis of the receptor mutants in conjunction with the predictions of molecular modeling supports the hypothesis that Arg(254), Lys(258), as well as Leu(151) are directly involved in receptor-G protein interaction and/or receptor-mediated activation of the G protein. In contrast, the residues belonging to the cytosolic extensions of helices 3 and 6 play a predominant role in the activation process of the alpha(1b)-AR. These findings contribute to the delineation of the molecular determinants of the alpha(1b)-AR/G(q) interface.
Resumo:
Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.
Resumo:
We present a new approach to model and classify breast parenchymal tissue. Given a mammogram, first, we will discover the distribution of the different tissue densities in an unsupervised manner, and second, we will use this tissue distribution to perform the classification. We achieve this using a classifier based on local descriptors and probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature. We studied the influence of different descriptors like texture and SIFT features at the classification stage showing that textons outperform SIFT in all cases. Moreover we demonstrate that pLSA automatically extracts meaningful latent aspects generating a compact tissue representation based on their densities, useful for discriminating on mammogram classification. We show the results of tissue classification over the MIAS and DDSM datasets. We compare our method with approaches that classified these same datasets showing a better performance of our proposal
Resumo:
Familial idiopathic basal ganglia calcification, also known as ""Fahr`s disease"" (FD), is a neuropsychiatric disorder with autosomal dominant pattern of inheritance and characterized by symmetric basal ganglia calcifications and, occasionally, other brain regions. Currently, there are three loci linked to this devastating disease. The first one (IBGC1) is located in 14q11.2-21.3 and the other two have been identified in 2q37 (IBGC2) and 8p21.1-q11.13 (IBGC3). Further studies identified a heterozygous variation (rs36060072) which consists in the change of the cytosine to guanine located at MGEA6/CTAGE5 gene, present in all of the affected large American family linked to IBGC1. This missense substitution, which induces changes of a proline to alanine at the 521 position (P521A), in a proline-rich and highly conserved protein domain was considered a rare variation, with a minor allele frequency (MAF) of 0.0058 at the US population. Considering that the population frequency of a given variation is an indirect indicative of potential pathogenicity, we screened 200 chromosomes in a random control set of Brazilian samples and in two nuclear families, comparing with our previous analysis in a US population. In addition, we accomplished analyses through bioinformatics programs to predict the pathogenicity of such variation. Our genetic screen found no P521A carriers. Polling these data together with the previous study in the USA, we have now a MAF of 0.0036, showing that this mutation is very rare. On the other hand, the bioinformatics analysis provided conflicting findings. There are currently various candidate genes and loci that could be involved with the underlying molecular basis of FD etiology, and other groups suggested the possible role played by genes in 2q37, related to calcium metabolism, and at chromosome 8 (NRG1 and SNTG1). Additional mutagenesis and in vivo studies are necessary to confirm the pathogenicity for variation in the P521A MGEA6.
Resumo:
Specific choices about how to represent complex networks can have a substantial impact on the execution time required for the respective construction and analysis of those structures. In this work we report a comparison of the effects of representing complex networks statically by adjacency matrices or dynamically by adjacency lists. Three theoretical models of complex networks are considered: two types of Erdos-Renyi as well as the Barabasi-Albert model. We investigated the effect of the different representations with respect to the construction and measurement of several topological properties (i.e. degree, clustering coefficient, shortest path length, and betweenness centrality). We found that different forms of representation generally have a substantial effect on the execution time, with the sparse representation frequently resulting in remarkably superior performance. (C) 2011 Elsevier B.V. All rights reserved.