902 resultados para Markov chains hidden Markov models Viterbi algorithm Forward-Backward algorithm maximum likelihood
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion.
Resumo:
Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2, 3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4, 5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6, 7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3, 14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15, 16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. Linear spectral unmixing has been intensively researched in the last years [9, 10, 12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21, 32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log( ) law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35, 42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data. ORA SIS [44, 45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postrocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given thresh old. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalizati on. The selected vectors are then projected onto this subspace and a simplex is found by an MV T pro cess. ORA SIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47, 48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating sign al and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selectin g the set of signal eigenvalue s that best represents the data, in the least-square sense [48,49 ], we note, however, that VCA works with projected and with unprojected data. The extraction of the end members exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FIND R algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data on to a direction orthogonal to the subspace spanned by the endmembers already determined. The new end member signature corresponds to the extreme of the projection. The algorithm iterates until all end members are exhausted. VCA performs much better than PPI and better than or comparable to N-FI NDR; yet it has a computational complexity between on e and two orders of magnitude lower than N-FINDR. The chapter is structure d as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Submitted in partial fulfillment for the Requirements for the Degree of PhD in Mathematics, in the Speciality of Statistics in the Faculdade de Ciências e Tecnologia
Resumo:
Digital Businesses have become a major driver for economic growth and have seen an explosion of new startups. At the same time, it also includes mature enterprises that have become global giants in a relatively short period of time. Digital Businesses have unique characteristics that make the running and management of a Digital Business much different from traditional offline businesses. Digital businesses respond to online users who are highly interconnected and networked. This enables a rapid flow of word of mouth, at a pace far greater than ever envisioned when dealing with traditional products and services. The relatively low cost of incremental user addition has led to a variety of innovation in pricing of digital products, including various forms of free and freemium pricing models. This thesis explores the unique characteristics and complexities of Digital Businesses and its implications on the design of Digital Business Models and Revenue Models. The thesis proposes an Agent Based Modeling Framework that can be used to develop Simulation Models that simulate the complex dynamics of Digital Businesses and the user interactions between users of a digital product. Such Simulation models can be used for a variety of purposes such as simple forecasting, analysing the impact of market disturbances, analysing the impact of changes in pricing models and optimising the pricing for maximum revenue generation or a balance between growth in usage and revenue generation. These models can be developed for a mature enterprise with a large historical record of user growth rate as well as for early stage enterprises without much historical data. Through three case studies, the thesis demonstrates the applicability of the Framework and its potential applications.
Resumo:
Pressures on the Brazilian Amazon forest have been accentuated by agricultural activities practiced by families encouraged to settle in this region in the 1970s by the colonization program of the government. The aims of this study were to analyze the temporal and spatial evolution of land cover and land use (LCLU) in the lower Tapajós region, in the state of Pará. We contrast 11 watersheds that are generally representative of the colonization dynamics in the region. For this purpose, Landsat satellite images from three different years, 1986, 2001, and 2009, were analyzed with Geographic Information Systems. Individual images were subject to an unsupervised classification using the Maximum Likelihood Classification algorithm available on GRASS. The classes retained for the representation of LCLU in this study were: (1) slightly altered old-growth forest, (2) succession forest, (3) crop land and pasture, and (4) bare soil. The analysis and observation of general trends in eleven watersheds shows that LCLU is changing very rapidly. The average deforestation of old-growth forest in all the watersheds was estimated at more than 30% for the period of 1986 to 2009. The local-scale analysis of watersheds reveals the complexity of LCLU, notably in relation to large changes in the temporal and spatial evolution of watersheds. Proximity to the sprawling city of Itaituba is related to the highest rate of deforestation in two watersheds. The opening of roads such as the Transamazonian highway is associated to the second highest rate of deforestation in three watersheds.
Resumo:
Here we focus on factor analysis from a best practices point of view, by investigating the factor structure of neuropsychological tests and using the results obtained to illustrate on choosing a reasonable solution. The sample (n=1051 individuals) was randomly divided into two groups: one for exploratory factor analysis (EFA) and principal component analysis (PCA), to investigate the number of factors underlying the neurocognitive variables; the second to test the "best fit" model via confirmatory factor analysis (CFA). For the exploratory step, three extraction (maximum likelihood, principal axis factoring and principal components) and two rotation (orthogonal and oblique) methods were used. The analysis methodology allowed exploring how different cognitive/psychological tests correlated/discriminated between dimensions, indicating that to capture latent structures in similar sample sizes and measures, with approximately normal data distribution, reflective models with oblimin rotation might prove the most adequate.
Resumo:
Ever since the appearance of the ARCH model [Engle(1982a)], an impressive array of variance specifications belonging to the same class of models has emerged [i.e. Bollerslev's (1986) GARCH; Nelson's (1990) EGARCH]. This recent domain has achieved very successful developments. Nevertheless, several empirical studies seem to show that the performance of such models is not always appropriate [Boulier(1992)]. In this paper we propose a new specification: the Quadratic Moving Average Conditional heteroskedasticity model. Its statistical properties, such as the kurtosis and the symmetry, as well as two estimators (Method of Moments and Maximum Likelihood) are studied. Two statistical tests are presented, the first one tests for homoskedasticity and the second one, discriminates between ARCH and QMACH specification. A Monte Carlo study is presented in order to illustrate some of the theoretical results. An empirical study is undertaken for the DM-US exchange rate.
Resumo:
Background and aims: Family-centred care is an expected standard in PICU and parent reported outcomes are rarely measured. The Dutch validated EMPATHIC questionnaire provides accurate measures of parental perceptions of family-centred care in PICU. A French version would provide an important resource for quality control and benchmarking with other PICUs. The study aimed to translate and to assess the French cultural adaptation of the EMPATHIC questionnaire. Methods: In September 2012, following approval from the developer, translation and cultural adaptation were performed using a structured method (Wild et al. 2005). This included forward-backward translation and reconciliation by an official translator, harmonization assessed by the research team, and cognitive debriefing with the target users' population. In this last step, a convenience sample of parents with PICU experience assessed the comprehensibility and cultural relevance of the 65-item French EMPATHIC questionnaire. The PICUs in Lausanne, Switzerland and Lille, France participated. Results: Seventeen parents, including 13 French native and 4 French as second language speakers, tested the cognitive equivalence and cultural relevance of the French EMPATHIC questionnaire. The mean agreement for comprehensibility of all 65 items reached 90.2%. Three items fell below the cut-off 80% agreement and were revised for inclusion in the final French version. Conclusions: The translation and the cultural adaptation permitted to highlight a few cultural differences that did not interfere with the main construct of the EMPATHIC questionnaire. Reliability and validity testing with a new sample of parents is needed to strengthen the psychometric properties of the French EMPATHIC questionnaire.
Resumo:
It has been argued that by truncating the sample space of the negative binomial and of the inverse Gaussian-Poisson mixture models at zero, one is allowed to extend the parameter space of the model. Here that is proved to be the case for the more general three parameter Tweedie-Poisson mixture model. It is also proved that the distributions in the extended part of the parameter space are not the zero truncation of mixed poisson distributions and that, other than for the negative binomial, they are not mixtures of zero truncated Poisson distributions either. By extending the parameter space one can improve the fit when the frequency of one is larger and the right tail is heavier than is allowed by the unextended model. Considering the extended model also allows one to use the basic maximum likelihood based inference tools when parameter estimates fall in the extended part of the parameter space, and hence when the m.l.e. does not exist under the unextended model. This extended truncated Tweedie-Poisson model is proved to be useful in the analysis of words and species frequency count data.
Resumo:
Backgrounds and Aims The spatial separation of stigmas and anthers (herkogamy) in flowering plants functions to reduce self-pollination and avoid interference between pollen dispersal and receipt. Little is known about the evolutionary relationships among the three main forms of herkogamy - approach, reverse and reciprocal herkogamy (distyly) - or about transitions to and from a non-herkogamous condition. This problem was examined in Exochaenium (Gentianaceae), a genus of African herbs that exhibits considerable variation in floral morphology, including the three forms of herkogamy. Methods Using maximum parsimony and maximum likelihood methods, the evolutionary history of herkogamic and non-herkogamic conditions was reconstructed from a molecular phylogeny of 15 species of Exochaenium and four outgroup taxa, based on three chloroplast regions, the nuclear ribosomal internal transcribed spacer (ITS1 and 2) and the 5·8S gene. Ancestral character states were determined and the reconstructions were used to evaluate competing models for the origin of reciprocal herkogamy. Key results Reciprocal herkogamy originated once in Exochaenium from an ancestor with approach herkogamy. Reverse herkogamy and the non-herkogamic condition homostyly were derived from heterostyly. Distylous species possessed pendent, slightly zygomorphic flowers, and the single transition to reverse herkogamy was associated with the hawkmoth pollination syndrome. Reductions in flower size characterized three of four independent transitions from reciprocal herkogamy to homostyly. Conclusions The results support Lloyd and Webb's model in which distyly originated from an ancestor with approach herkogamy. They also demonstrate the lability of sex organ deployment and implicate pollinators, or their absence, as playing an important role in driving transitions among herkogamic and non-herkogamic conditions.
Resumo:
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.
Resumo:
Understanding the different background landscapes in which malaria transmission occurs is fundamental to understanding malaria epidemiology and to designing effective local malaria control programs. Geology, geomorphology, vegetation, climate, land use, and anopheline distribution were used as a basis for an ecological classification of the state of Roraima, Brazil, in the northern Amazon Basin, focused on the natural history of malaria and transmission. We used unsupervised maximum likelihood classification, principal components analysis, and weighted overlay with equal contribution analyses to fine-scale thematic maps that resulted in clustered regions. We used ecological niche modeling techniques to develop a fine-scale picture of malaria vector distributions in the state. Eight ecoregions were identified and malaria-related aspects are discussed based on this classification, including 5 types of dense tropical rain forest and 3 types of savannah. Ecoregions formed by dense tropical rain forest were named as montane (ecoregion I), submontane (II), plateau (III), lowland (IV), and alluvial (V). Ecoregions formed by savannah were divided into steppe (VI, campos de Roraima), savannah (VII, cerrado), and wetland (VIII, campinarana). Such ecoregional mappings are important tools in integrated malaria control programs that aim to identify specific characteristics of malaria transmission, classify transmission risk, and define priority areas and appropriate interventions. For some areas, extension of these approaches to still-finer resolutions will provide an improved picture of malaria transmission patterns.