985 resultados para Matrix-Variate Statistical Distributions
Resumo:
When Recurrent Neural Networks (RNN) are going to be used as Pattern Recognition systems, the problem to be considered is how to impose prescribed prototype vectors ξ^1,ξ^2,...,ξ^p as fixed points. The synaptic matrix W should be interpreted as a sort of sign correlation matrix of the prototypes, In the classical approach. The weak point in this approach, comes from the fact that it does not have the appropriate tools to deal efficiently with the correlation between the state vectors and the prototype vectors The capacity of the net is very poor because one can only know if one given vector is adequately correlated with the prototypes or not and we are not able to know what its exact correlation degree. The interest of our approach lies precisely in the fact that it provides these tools. In this paper, a geometrical vision of the dynamic of states is explained. A fixed point is viewed as a point in the Euclidean plane R2. The retrieving procedure is analyzed trough statistical frequency distribution of the prototypes. The capacity of the net is improved and the spurious states are reduced. In order to clarify and corroborate the theoretical results, together with the formal theory, an application is presented
Resumo:
2000 Mathematics Subject Classification: 62H30
Resumo:
Extensive data sets on water quality and seagrass distributions in Florida Bay have been assembled under complementary, but independent, monitoring programs. This paper presents the landscape-scale results from these monitoring programs and outlines a method for exploring the relationships between two such data sets. Seagrass species occurrence and abundance data were used to define eight benthic habitat classes from 677 sampling locations in Florida Bay. Water quality data from 28 monitoring stations spread across the Bay were used to construct a discriminant function model that assigned a probability of a given benthic habitat class occurring for a given combination of water quality variables. Mean salinity, salinity variability, the amount of light reaching the benthos, sediment depth, and mean nutrient concentrations were important predictor variables in the discriminant function model. Using a cross-validated classification scheme, this discriminant function identified the most likely benthic habitat type as the actual habitat type in most cases. The model predicted that the distribution of benthic habitat types in Florida Bay would likely change if water quality and water delivery were changed by human engineering of freshwater discharge from the Everglades. Specifically, an increase in the seasonal delivery of freshwater to Florida Bay should cause an expansion of seagrass beds dominated by Ruppia maritima and Halodule wrightii at the expense of the Thalassia testudinum-dominated community that now occurs in northeast Florida Bay. These statistical techniques should prove useful for predicting landscape-scale changes in community composition in diverse systems where communities are in quasi-equilibrium with environmental drivers.
Resumo:
This dissertation presents a study of the D( e, e′p)n reaction carried out at the Thomas Jefferson National Accelerator Facility (Jefferson Lab) for a set of fixed values of four-momentum transfer Q 2 = 2.1 and 0.8 (GeV/c)2 and for missing momenta pm ranging from pm = 0.03 to pm = 0.65 GeV/c. The analysis resulted in the determination of absolute D(e,e′ p)n cross sections as a function of the recoiling neutron momentum and it's scattering angle with respect to the momentum transfer [vector] q. The angular distribution was compared to various modern theoretical predictions that also included final state interactions. The data confirmed the theoretical prediction of a strong anisotropy of final state interaction contributions at Q2 of 2.1 (GeV/c)2 while at the lower Q2 value, the anisotropy was much less pronounced. At Q2 of 0.8 (GeV/c)2, theories show a large disagreement with the experimental results. The experimental momentum distribution of the bound proton inside the deuteron has been determined for the first time at a set of fixed neutron recoil angles. The momentum distribution is directly related to the ground state wave function of the deuteron in momentum space. The high momentum part of this wave function plays a crucial role in understanding the short-range part of the nucleon-nucleon force. At Q2 = 2.1 (GeV/c)2, the momentum distribution determined at small neutron recoil angles is much less affected by FSI compared to a recoil angle of 75°. In contrast, at Q2 = 0.8 (GeV/c)2 there seems to be no region with reduced FSI for larger missing momenta. Besides the statistical errors, systematic errors of about 5–6 % were included in the final results in order to account for normalization uncertainties and uncertainties in the determi- nation of kinematic veriables. The measurements were carried out using an electron beam energy of 2.8 and 4.7 GeV with beam currents between 10 to 100 &mgr; A. The scattered electrons and the ejected protons originated from a 15cm long liquid deuterium target, and were detected in conicidence with the two high resolution spectrometers of Hall A at Jefferson Lab.^
Resumo:
Cumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data in public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and cloud software platforms. Given user-specified requirements in terms of time, monetary cost, and risk tolerance, Cumulon automatically makes intelligent decisions on implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings -- such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for them. Cumulon explores two alternative approaches toward supporting such markets, with different trade-offs between system and optimization complexity. Experimental study is conducted to show the efficiency of Cumulon's execution engine, as well as the optimizer's effectiveness in finding the optimal plan in the vast plan space.
Resumo:
Spectral unmixing (SU) is a technique to characterize mixed pixels of the hyperspectral images measured by remote sensors. Most of the existing spectral unmixing algorithms are developed using the linear mixing models. Since the number of endmembers/materials present at each mixed pixel is normally scanty compared with the number of total endmembers (the dimension of spectral library), the problem becomes sparse. This thesis introduces sparse hyperspectral unmixing methods for the linear mixing model through two different scenarios. In the first scenario, the library of spectral signatures is assumed to be known and the main problem is to find the minimum number of endmembers under a reasonable small approximation error. Mathematically, the corresponding problem is called the $\ell_0$-norm problem which is NP-hard problem. Our main study for the first part of thesis is to find more accurate and reliable approximations of $\ell_0$-norm term and propose sparse unmixing methods via such approximations. The resulting methods are shown considerable improvements to reconstruct the fractional abundances of endmembers in comparison with state-of-the-art methods such as having lower reconstruction errors. In the second part of the thesis, the first scenario (i.e., dictionary-aided semiblind unmixing scheme) will be generalized as the blind unmixing scenario that the library of spectral signatures is also estimated. We apply the nonnegative matrix factorization (NMF) method for proposing new unmixing methods due to its noticeable supports such as considering the nonnegativity constraints of two decomposed matrices. Furthermore, we introduce new cost functions through some statistical and physical features of spectral signatures of materials (SSoM) and hyperspectral pixels such as the collaborative property of hyperspectral pixels and the mathematical representation of the concentrated energy of SSoM for the first few subbands. Finally, we introduce sparse unmixing methods for the blind scenario and evaluate the efficiency of the proposed methods via simulations over synthetic and real hyperspectral data sets. The results illustrate considerable enhancements to estimate the spectral library of materials and their fractional abundances such as smaller values of spectral angle distance (SAD) and abundance angle distance (AAD) as well.
Resumo:
This work represents an original contribution to the methodology for ecosystem models' development as well as the rst attempt of an end-to-end (E2E) model of the Northern Humboldt Current Ecosystem (NHCE). The main purpose of the developed model is to build a tool for ecosystem-based management and decision making, reason why the credibility of the model is essential, and this can be assessed through confrontation to data. Additionally, the NHCE exhibits a high climatic and oceanographic variability at several scales, the major source of interannual variability being the interruption of the upwelling seasonality by the El Niño Southern Oscillation, which has direct e ects on larval survival and sh recruitment success. Fishing activity can also be highly variable, depending on the abundance and accessibility of the main shery resources. This context brings the two main methodological questions addressed in this thesis, through the development of an end-to-end model coupling the high trophic level model OSMOSE to the hydrodynamics and biogeochemical model ROMS-PISCES: i) how to calibrate ecosystem models using time series data and ii) how to incorporate the impact of the interannual variability of the environment and shing. First, this thesis highlights some issues related to the confrontation of complex ecosystem models to data and proposes a methodology for a sequential multi-phases calibration of ecosystem models. We propose two criteria to classify the parameters of a model: the model dependency and the time variability of the parameters. Then, these criteria along with the availability of approximate initial estimates are used as decision rules to determine which parameters need to be estimated, and their precedence order in the sequential calibration process. Additionally, a new Evolutionary Algorithm designed for the calibration of stochastic models (e.g Individual Based Model) and optimized for maximum likelihood estimation has been developed and applied to the calibration of the OSMOSE model to time series data. The environmental variability is explicit in the model: the ROMS-PISCES model forces the OSMOSE model and drives potential bottom-up e ects up the foodweb through plankton and sh trophic interactions, as well as through changes in the spatial distribution of sh. The latter e ect was taken into account using presence/ absence species distribution models which are traditionally assessed through a confusion matrix and the statistical metrics associated to it. However, when considering the prediction of the habitat against time, the variability in the spatial distribution of the habitat can be summarized and validated using the emerging patterns from the shape of the spatial distributions. We modeled the potential habitat of the main species of the Humboldt Current Ecosystem using several sources of information ( sheries, scienti c surveys and satellite monitoring of vessels) jointly with environmental data from remote sensing and in situ observations, from 1992 to 2008. The potential habitat was predicted over the study period with monthly resolution, and the model was validated using quantitative and qualitative information of the system using a pattern oriented approach. The nal ROMS-PISCES-OSMOSE E2E ecosystem model for the NHCE was calibrated using our evolutionary algorithm and a likelihood approach to t monthly time series data of landings, abundance indices and catch at length distributions from 1992 to 2008. To conclude, some potential applications of the model for shery management are presented and their limitations and perspectives discussed.
Resumo:
The main aim of this study was to evaluate the impact of the urban pollution plume from the city of Manaus by emissions from mobile and stationary sources in the atmospheric pollutants concentrations of the Amazon region, by using The Weather Research and Forecasting with Chemistry (WRF-Chem) model. The air pollutants analyzed were CO, NOx, SO2, O3, PM2.5, PM10 and VOCs. The model simulations have been configured with a grid spacing of 3 km, with 190 x and 136 y grid points in horizontal spacing, centered in the city of Manaus during the period of 17 and 18 of March 2014. The anthropogenic emissions inventories have gathered from mobile sources that were estimated the emissions of light and heavy-duty vehicles classes. In addition, the stationary sources have considered the thermal power plants by the type of energy sources used in the region as well as the emissions from the refinery located in Manaus. Various scenarios have been defined with numerical experiments that considered only emissions by biogenic, mobile and stationary sources, and replacement fuel from thermal power plant, along with a future scenario consisting with twice as much anthropogenic emissions. A qualitative assessment of simulation with base scenario has also been carried out, which represents the conditions of the region in its current state, where several statistical methods were used in order to compare the results of air pollutants and meteorological fields with observed ground-based data located in various points in the study grid. The qualitative analysis showed that the model represents satisfactorily the variables analyzed from the point of view of the adopted parameters. Regarding the simulations, defined from the base scenarios, the numerical experiments indicate relevant results such as: it was found that the stationary sources scenario, where the thermal power plants are predominant, resulted in the highest concentrations, for all air pollutants evaluated, except for carbon monoxide when compared to the vehicle emissions scenario; The replacement of the energy matrix of current thermal power plants for natural gas have showed significant reductions in pollutants analyzed, for instance, 63% reductions of NOx in the contribution of average concentration in the study grid; A significant increase in the concentrations of chemical species was observed in a futuristic scenario, reaching up to a 81% increase in peak concentrations of SO2 in the study area. The spatial distributions of the scenarios have showed that the air pollution plume from Manaus is predominantly west and southwest, where it can reach hundreds of kilometers to areas dominated by original soil covering.
Resumo:
In a microscopic setting, humans behave in rich and unexpected ways. In a macroscopic setting, however, distinctive patterns of group behavior emerge, leading statistical physicists to search for an underlying mechanism. The aim of this dissertation is to analyze the macroscopic patterns of competing ideas in order to discern the mechanics of how group opinions form at the microscopic level. First, we explore the competition of answers in online Q&A (question and answer) boards. We find that a simple individual-level model can capture important features of user behavior, especially as the number of answers to a question grows. Our model further suggests that the wisdom of crowds may be constrained by information overload, in which users are unable to thoroughly evaluate each answer and therefore tend to use heuristics to pick what they believe is the best answer. Next, we explore models of opinion spread among voters to explain observed universal statistical patterns such as rescaled vote distributions and logarithmic vote correlations. We introduce a simple model that can explain both properties, as well as why it takes so long for large groups to reach consensus. An important feature of the model that facilitates agreement with data is that individuals become more stubborn (unwilling to change their opinion) over time. Finally, we explore potential underlying mechanisms for opinion formation in juries, by comparing data to various types of models. We find that different null hypotheses in which jurors do not interact when reaching a decision are in strong disagreement with data compared to a simple interaction model. These findings provide conceptual and mechanistic support for previous work that has found mutual influence can play a large role in group decisions. In addition, by matching our models to data, we are able to infer the time scales over which individuals change their opinions for different jury contexts. We find that these values increase as a function of the trial time, suggesting that jurors and judicial panels exhibit a kind of stubbornness similar to what we include in our model of voting behavior.
Resumo:
Laplacian-based descriptors, such as the Heat Kernel Signature and the Wave Kernel Signature, allow one to embed the vertices of a graph onto a vectorial space, and have been successfully used to find the optimal matching between a pair of input graphs. While the HKS uses a heat di↵usion process to probe the local structure of a graph, the WKS attempts to do the same through wave propagation. In this paper, we propose an alternative structural descriptor that is based on continuoustime quantum walks. More specifically, we characterise the structure of a graph using its average mixing matrix. The average mixing matrix is a doubly-stochastic matrix that encodes the time-averaged behaviour of a continuous-time quantum walk on the graph. We propose to use the rows of the average mixing matrix for increasing stopping times to develop a novel signature, the Average Mixing Matrix Signature (AMMS). We perform an extensive range of experiments and we show that the proposed signature is robust under structural perturbations of the original graphs and it outperforms both the HKS and WKS when used as a node descriptor in a graph matching task.
Resumo:
A non-linear least-squares methodology for simultaneously estimating parameters of selectivity curves with a pre-defined functional form, across size classes and mesh sizes, using catch size frequency distributions, was developed based on the model of Kirkwood and Walker [Kirkwood, G.P., Walker, T.L, 1986. Gill net selectivities for gummy shark, Mustelus antarcticus Gunther, taken in south-eastern Australian waters. Aust. J. Mar. Freshw. Res. 37, 689-697] and [Wulff, A., 1986. Mathematical model for selectivity of gill nets. Arch. Fish Wiss. 37, 101-106]. Observed catches of fish of size class I in mesh m are modeled as a function of the estimated numbers of fish of that size class in the population and the corresponding selectivities. A comparison was made with the maximum likelihood methodology of [Kirkwood, G.P., Walker, T.I., 1986. Gill net selectivities for gummy shark, Mustelus antarcticus Gunther, taken in south-eastern Australian waters. Aust. J. Mar. Freshw. Res. 37, 689-697] and [Wulff, A., 1986. Mathematical model for selectivity of gill nets. Arch. Fish Wiss; 37, 101-106], using simulated catch data with known selectivity curve parameters, and two published data sets. The estimated parameters and selectivity curves were generally consistent for both methods, with smaller standard errors for parameters estimated by non-linear least-squares. The proposed methodology is a useful and accessible alternative which can be used to model selectivity in situations where the parameters of a pre-defined model can be assumed to be functions of gear size; facilitating statistical evaluation of different models and of goodness of fit. (C) 1998 Elsevier Science B.V.
Resumo:
Species distribution and ecological niche models are increasingly used in biodiversity management and conservation. However, one thing that is important but rarely done is to follow up on the predictive performance of these models over time, to check if their predictions are fulfilled and maintain accuracy, or if they apply only to the set in which they were produced. In 2003, a distribution model of the Eurasian otter (Lutra lutra) in Spain was published, based on the results of a country-wide otter survey published in 1998. This model was built with logistic regression of otter presence-absence in UTM 10 km2 cells on a diverse set of environmental, human and spatial variables, selected according to statistical criteria. Here we evaluate this model against the results of the most recent otter survey, carried out a decade later and after a significant expansion of the otter distribution area in this country. Despite the time elapsed and the evident changes in this species’ distribution, the model maintained a good predictive capacity, considering both discrimination and calibration measures. Otter distribution did not expand randomly or simply towards vicinity areas,m but specifically towards the areas predicted as most favourable by the model based on data from 10 years before. This corroborates the utility of predictive distribution models, at least in the medium term and when they are made with robust methods and relevant predictor variables.
Resumo:
The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it.
Resumo:
The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.
Resumo:
Congenital muscular dystrophy with laminin α2 chain deficiency (MDC1A) is one of the most severe forms of muscular disease and is characterized by severe muscle weakness and delayed motor milestones. The genetic basis of MDC1A is well known, yet the secondary mechanisms ultimately leading to muscle degeneration and subsequent connective tissue infiltration are not fully understood. In order to obtain new insights into the molecular mechanisms underlying MDC1A, we performed a comparative proteomic analysis of affected muscles (diaphragm and gastrocnemius) from laminin α2 chain-deficient dy(3K)/dy(3K) mice, using multidimensional protein identification technology combined with tandem mass tags. Out of the approximately 700 identified proteins, 113 and 101 proteins, respectively, were differentially expressed in the diseased gastrocnemius and diaphragm muscles compared with normal muscles. A large portion of these proteins are involved in different metabolic processes, bind calcium, or are expressed in the extracellular matrix. Our findings suggest that metabolic alterations and calcium dysregulation could be novel mechanisms that underlie MDC1A and might be targets that should be explored for therapy. Also, detailed knowledge of the composition of fibrotic tissue, rich in extracellular matrix proteins, in laminin α2 chain-deficient muscle might help in the design of future anti-fibrotic treatments. All MS data have been deposited in the ProteomeXchange with identifier PXD000978 (http://proteomecentral.proteomexchange.org/dataset/PXD000978).