973 resultados para statistical speaker models
Resumo:
For climate risk management, cumulative distribution functions (CDFs) are an important source of information. They are ideally suited to compare probabilistic forecasts of primary (e.g. rainfall) or secondary data (e.g. crop yields). Summarised as CDFs, such forecasts allow an easy quantitative assessment of possible, alternative actions. Although the degree of uncertainty associated with CDF estimation could influence decisions, such information is rarely provided. Hence, we propose Cox-type regression models (CRMs) as a statistical framework for making inferences on CDFs in climate science. CRMs were designed for modelling probability distributions rather than just mean or median values. This makes the approach appealing for risk assessments where probabilities of extremes are often more informative than central tendency measures. CRMs are semi-parametric approaches originally designed for modelling risks arising from time-to-event data. Here we extend this original concept beyond time-dependent measures to other variables of interest. We also provide tools for estimating CDFs and surrounding uncertainty envelopes from empirical data. These statistical techniques intrinsically account for non-stationarities in time series that might be the result of climate change. This feature makes CRMs attractive candidates to investigate the feasibility of developing rigorous global circulation model (GCM)-CRM interfaces for provision of user-relevant forecasts. To demonstrate the applicability of CRMs, we present two examples for El Ni ? no/Southern Oscillation (ENSO)-based forecasts: the onset date of the wet season (Cairns, Australia) and total wet season rainfall (Quixeramobim, Brazil). This study emphasises the methodological aspects of CRMs rather than discussing merits or limitations of the ENSO-based predictors.
Resumo:
Aiming to obtain empirical models for the estimation of Syrah leaf area a set of 210 fruiting shoots was randomly collected during the 2013 growing season in an adult experimental vineyard, located in Lisbon, Portugal. Samples of 30 fruiting shoots were taken periodically from the stage of inflorescences visible to veraison (7 sampling dates). At the lab, from each shoot, primary and lateral leaves were separated and numbered according to node insertion. For each leaf, the length of the central and lateral veins was recorded and then the leaf area was measured by a leaf area meter. For single leaf area estimation the best statistical models uses as explanatory variable the sum of the lengths of the two lateral leaf veins. For the estimation of leaf area per shoot it was followed the approach of Lopes & Pinto (2005), based on 3 explanatory variables: number of primary leaves and area of the largest and smallest leaves. The best statistical model for estimation of primary leaf area per shoot uses a calculated variable obtained from the average of the largest and smallest primary leaf area multiplied by the number of primary leaves. For lateral leaf area estimation another model using the same type of calculated variable is also presented. All models explain a very high proportion of variability in leaf area. Our results confirm the already reported strong importance of the three measured variables (number of leaves and area of the largest and smallest leaf) as predictors of the shoot leaf area. The proposed models can be used to accurately predict Syrah primary and secondary leaf area per shoot in any phase of the growing cycle. They are inexpensive, practical, non-destructive methods which do not require specialized staff or expensive equipment.
Resumo:
The purpose of this research was to apply a test that measures different multiple intelligences in children from two different elementary schools to determine whether there are differences between the Academicist Pedagogical Model (traditional approach) established by the Costa Rican Ministry of Public Education and the Cognitive Pedagogical Model (MPC) (constructivist approach). A total of 29 boys and 20 girls with ages 8 to 12 from two different public schools in Heredia (Laboratorio School and San Isidro School) participated in this study. The instrument used was a Multiple Intelligences Test for school age children (Vega, 2006), which consists of 15 items subdivided in seven categories: linguistic, logical-mathematical, visual, kinaesthetic, musical, interpersonal, and intrapersonal. Descriptive and inferential statistics (Two-Way ANOVA) were used for the analysis of data. Significant differences were found in linguistic intelligence (F:9.47; p < 0.01) between the MPC school (3.24±1.24 points) and the academicist school (2.31±1.10 points). Differences were also found between sex (F:5.26; p< 0.05), for girls (3.25±1.02 points) and boys (2.52±1.30 points). In addition, the musical intelligence showed significant statistical differences between sexes (F: 7.97; p < 0.05). In conclusion, the learning pedagogical models in Costa Rican public schools must be updated based on the new learning trends.
Resumo:
Ecological models written in a mathematical language L(M) or model language, with a given style or methodology can be considered as a text. It is possible to apply statistical linguistic laws and the experimental results demonstrate that the behaviour of a mathematical model is the same of any literary text of any natural language. A text has the following characteristics: (a) the variables, its transformed functions and parameters are the lexic units or LUN of ecological models; (b) the syllables are constituted by a LUN, or a chain of them, separated by operating or ordering LUNs; (c) the flow equations are words; and (d) the distribution of words (LUM and CLUN) according to their lengths is based on a Poisson distribution, the Chebanov's law. It is founded on Vakar's formula, that is calculated likewise the linguistic entropy for L(M). We will apply these ideas over practical examples using MARIOLA model. In this paper it will be studied the problem of the lengths of the simple lexic units composed lexic units and words of text models, expressing these lengths in number of the primitive symbols, and syllables. The use of these linguistic laws renders it possible to indicate the degree of information given by an ecological model.
Resumo:
Suppose two or more variables are jointly normally distributed. If there is a common relationship between these variables it would be very important to quantify this relationship by a parameter called the correlation coefficient which measures its strength, and the use of it can develop an equation for predicting, and ultimately draw testable conclusion about the parent population. This research focused on the correlation coefficient ρ for the bivariate and trivariate normal distribution when equal variances and equal covariances are considered. Particularly, we derived the maximum Likelihood Estimators (MLE) of the distribution parameters assuming all of them are unknown, and we studied the properties and asymptotic distribution of . Showing this asymptotic normality, we were able to construct confidence intervals of the correlation coefficient ρ and test hypothesis about ρ. With a series of simulations, the performance of our new estimators were studied and were compared with those estimators that already exist in the literature. The results indicated that the MLE has a better or similar performance than the others.
Resumo:
An important aspect of constructing discrete velocity models (DVMs) for the Boltzmann equation is to obtain the right number of collision invariants. It is a well-known fact that DVMs can also have extra collision invariants, so called spurious collision invariants, in plus to the physical ones. A DVM with only physical collision invariants, and so without spurious ones, is called normal. For binary mixtures also the concept of supernormal DVMs was introduced, meaning that in addition to the DVM being normal, the restriction of the DVM to any single species also is normal. Here we introduce generalizations of this concept to DVMs for multicomponent mixtures. We also present some general algorithms for constructing such models and give some concrete examples of such constructions. One of our main results is that for any given number of species, and any given rational mass ratios we can construct a supernormal DVM. The DVMs are constructed in such a way that for half-space problems, as the Milne and Kramers problems, but also nonlinear ones, we obtain similar structures as for the classical discrete Boltzmann equation for one species, and therefore we can apply obtained results for the classical Boltzmann equation.
Resumo:
This thesis is concerned with change point analysis for time series, i.e. with detection of structural breaks in time-ordered, random data. This long-standing research field regained popularity over the last few years and is still undergoing, as statistical analysis in general, a transformation to high-dimensional problems. We focus on the fundamental »change in the mean« problem and provide extensions of the classical non-parametric Darling-Erdős-type cumulative sum (CUSUM) testing and estimation theory within highdimensional Hilbert space settings. In the first part we contribute to (long run) principal component based testing methods for Hilbert space valued time series under a rather broad (abrupt, epidemic, gradual, multiple) change setting and under dependence. For the dependence structure we consider either traditional m-dependence assumptions or more recently developed m-approximability conditions which cover, e.g., MA, AR and ARCH models. We derive Gumbel and Brownian bridge type approximations of the distribution of the test statistic under the null hypothesis of no change and consistency conditions under the alternative. A new formulation of the test statistic using projections on subspaces allows us to simplify the standard proof techniques and to weaken common assumptions on the covariance structure. Furthermore, we propose to adjust the principal components by an implicit estimation of a (possible) change direction. This approach adds flexibility to projection based methods, weakens typical technical conditions and provides better consistency properties under the alternative. In the second part we contribute to estimation methods for common changes in the means of panels of Hilbert space valued time series. We analyze weighted CUSUM estimates within a recently proposed »high-dimensional low sample size (HDLSS)« framework, where the sample size is fixed but the number of panels increases. We derive sharp conditions on »pointwise asymptotic accuracy« or »uniform asymptotic accuracy« of those estimates in terms of the weighting function. Particularly, we prove that a covariance-based correction of Darling-Erdős-type CUSUM estimates is required to guarantee uniform asymptotic accuracy under moderate dependence conditions within panels and that these conditions are fulfilled, e.g., by any MA(1) time series. As a counterexample we show that for AR(1) time series, close to the non-stationary case, the dependence is too strong and uniform asymptotic accuracy cannot be ensured. Finally, we conduct simulations to demonstrate that our results are practically applicable and that our methodological suggestions are advantageous.
Resumo:
Species distribution and ecological niche models are increasingly used in biodiversity management and conservation. However, one thing that is important but rarely done is to follow up on the predictive performance of these models over time, to check if their predictions are fulfilled and maintain accuracy, or if they apply only to the set in which they were produced. In 2003, a distribution model of the Eurasian otter (Lutra lutra) in Spain was published, based on the results of a country-wide otter survey published in 1998. This model was built with logistic regression of otter presence-absence in UTM 10 km2 cells on a diverse set of environmental, human and spatial variables, selected according to statistical criteria. Here we evaluate this model against the results of the most recent otter survey, carried out a decade later and after a significant expansion of the otter distribution area in this country. Despite the time elapsed and the evident changes in this species’ distribution, the model maintained a good predictive capacity, considering both discrimination and calibration measures. Otter distribution did not expand randomly or simply towards vicinity areas,m but specifically towards the areas predicted as most favourable by the model based on data from 10 years before. This corroborates the utility of predictive distribution models, at least in the medium term and when they are made with robust methods and relevant predictor variables.
Resumo:
The present work aims to provide a deeper understanding of thermally driven turbulence and to address some modelling aspects related to the physics of the flow. For this purpose, two idealized systems are investigated by Direct Numerical Simulation: the rotating and non-rotating Rayleigh-Bénard convection. The preliminary study of the flow topologies shows how the coherent structures organise into different patterns depending on the rotation rate. From a statistical perspective, the analysis of the turbulent kinetic energy and temperature variance budgets allows to identify the flow regions where the production, the transport, and the dissipation of turbulent fluctuations occur. To provide a multi-scale description of the flows, a theoretical framework based on the Kolmogorov and Yaglom equations is applied for the first time to the Rayleigh-Bénard convection. The analysis shows how the spatial inhomogeneity modulates the dynamics at different scales and wall-distances. Inside the core of the flow, the space of scales can be divided into an inhomogeneity-dominated range at large scales, an inertial-like range at intermediate scales and a dissipative range at small scales. This classic scenario breaks close to the walls, where the inhomogeneous mechanisms and the viscous/diffusive processes are important at every scale and entail more complex dynamics. The same theoretical framework is extended to the filtered velocity and temperature fields of non-rotating Rayleigh-Bénard convection. The analysis of the filtered Kolmogorov and Yaglom equations reveals the influence of the residual scales on the filtered dynamics both in physical and scale space, highlighting the effect of the relative position between the filter length and the crossover that separates the inhomogeneity-dominated range from the quasi-homogeneous range. The assessment of the filtered and residual physics results to be instrumental for the correct use of the existing Large-Eddy Simulation models and for the development of new ones.
Resumo:
Collecting and analysing data is an important element in any field of human activity and research. Even in sports, collecting and analyzing statistical data is attracting a growing interest. Some exemplar use cases are: improvement of technical/tactical aspects for team coaches, definition of game strategies based on the opposite team play or evaluation of the performance of players. Other advantages are related to taking more precise and impartial judgment in referee decisions: a wrong decision can change the outcomes of important matches. Finally, it can be useful to provide better representations and graphic effects that make the game more engaging for the audience during the match. Nowadays it is possible to delegate this type of task to automatic software systems that can use cameras or even hardware sensors to collect images or data and process them. One of the most efficient methods to collect data is to process the video images of the sporting event through mixed techniques concerning machine learning applied to computer vision. As in other domains in which computer vision can be applied, the main tasks in sports are related to object detection, player tracking, and to the pose estimation of athletes. The goal of the present thesis is to apply different models of CNNs to analyze volleyball matches. Starting from video frames of a volleyball match, we reproduce a bird's eye view of the playing court where all the players are projected, reporting also for each player the type of action she/he is performing.
Resumo:
The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it.
Resumo:
The study of random probability measures is a lively research topic that has attracted interest from different fields in recent years. In this thesis, we consider random probability measures in the context of Bayesian nonparametrics, where the law of a random probability measure is used as prior distribution, and in the context of distributional data analysis, where the goal is to perform inference given avsample from the law of a random probability measure. The contributions contained in this thesis can be subdivided according to three different topics: (i) the use of almost surely discrete repulsive random measures (i.e., whose support points are well separated) for Bayesian model-based clustering, (ii) the proposal of new laws for collections of random probability measures for Bayesian density estimation of partially exchangeable data subdivided into different groups, and (iii) the study of principal component analysis and regression models for probability distributions seen as elements of the 2-Wasserstein space. Specifically, for point (i) above we propose an efficient Markov chain Monte Carlo algorithm for posterior inference, which sidesteps the need of split-merge reversible jump moves typically associated with poor performance, we propose a model for clustering high-dimensional data by introducing a novel class of anisotropic determinantal point processes, and study the distributional properties of the repulsive measures, shedding light on important theoretical results which enable more principled prior elicitation and more efficient posterior simulation algorithms. For point (ii) above, we consider several models suitable for clustering homogeneous populations, inducing spatial dependence across groups of data, extracting the characteristic traits common to all the data-groups, and propose a novel vector autoregressive model to study of growth curves of Singaporean kids. Finally, for point (iii), we propose a novel class of projected statistical methods for distributional data analysis for measures on the real line and on the unit-circle.
Resumo:
In this thesis, we investigate the role of applied physics in epidemiological surveillance through the application of mathematical models, network science and machine learning. The spread of a communicable disease depends on many biological, social, and health factors. The large masses of data available make it possible, on the one hand, to monitor the evolution and spread of pathogenic organisms; on the other hand, to study the behavior of people, their opinions and habits. Presented here are three lines of research in which an attempt was made to solve real epidemiological problems through data analysis and the use of statistical and mathematical models. In Chapter 1, we applied language-inspired Deep Learning models to transform influenza protein sequences into vectors encoding their information content. We then attempted to reconstruct the antigenic properties of different viral strains using regression models and to identify the mutations responsible for vaccine escape. In Chapter 2, we constructed a compartmental model to describe the spread of a bacterium within a hospital ward. The model was informed and validated on time series of clinical measurements, and a sensitivity analysis was used to assess the impact of different control measures. Finally (Chapter 3) we reconstructed the network of retweets among COVID-19 themed Twitter users in the early months of the SARS-CoV-2 pandemic. By means of community detection algorithms and centrality measures, we characterized users’ attention shifts in the network, showing that scientific communities, initially the most retweeted, lost influence over time to national political communities. In the Conclusion, we highlighted the importance of the work done in light of the main contemporary challenges for epidemiological surveillance. In particular, we present reflections on the importance of nowcasting and forecasting, the relationship between data and scientific research, and the need to unite the different scales of epidemiological surveillance.
Resumo:
Quantum clock models are statistical mechanical spin models which may be regarded as a sort of bridge between the one-dimensional quantum Ising model and the one-dimensional quantum XY model. This thesis aims to provide an exhaustive review of these models using both analytical and numerical techniques. We present some important duality transformations which allow us to recast clock models into different forms, involving for example parafermions and lattice gauge theories. Thus, the notion of topological order enters into the game opening new scenarios for possible applications, like topological quantum computing. The second part of this thesis is devoted to the numerical analysis of clock models. We explore their phase diagram under different setups, with and without chirality, starting with a transverse field and then adding a longitudinal field as well. The most important observables we take into account for diagnosing criticality are the energy gap, the magnetisation, the entanglement entropy and the correlation functions.
Resumo:
To compare time and risk to biochemical recurrence (BR) after radical prostatectomy of two chronologically different groups of patients using the standard and the modified Gleason system (MGS). Cohort 1 comprised biopsies of 197 patients graded according to the standard Gleason system (SGS) in the period 1997/2004, and cohort 2, 176 biopsies graded according to the modified system in the period 2005/2011. Time to BR was analyzed with the Kaplan-Meier product-limit analysis and prediction of shorter time to recurrence using univariate and multivariate Cox proportional hazards model. Patients in cohort 2 reflected time-related changes: striking increase in clinical stage T1c, systematic use of extended biopsies, and lower percentage of total length of cancer in millimeter in all cores. The MGS used in cohort 2 showed fewer biopsies with Gleason score ≤ 6 and more biopsies of the intermediate Gleason score 7. Time to BR using the Kaplan-Meier curves showed statistical significance using the MGS in cohort 2, but not the SGS in cohort 1. Only the MGS predicted shorter time to BR on univariate analysis and on multivariate analysis was an independent predictor. The results favor that the 2005 International Society of Urological Pathology modified system is a refinement of the Gleason grading and valuable for contemporary clinical practice.