926 resultados para maximum-likelihood approach
Resumo:
This paper proposes the optimization relaxation approach based on the analogue Hopfield Neural Network (HNN) for cluster refinement of pre-classified Polarimetric Synthetic Aperture Radar (PolSAR) image data. We consider the initial classification provided by the maximum-likelihood classifier based on the complex Wishart distribution, which is then supplied to the HNN optimization approach. The goal is to improve the classification results obtained by the Wishart approach. The classification improvement is verified by computing a cluster separability coefficient and a measure of homogeneity within the clusters. During the HNN optimization process, for each iteration and for each pixel, two consistency coefficients are computed, taking into account two types of relations between the pixel under consideration and its corresponding neighbors. Based on these coefficients and on the information coming from the pixel itself, the pixel under study is re-classified. Different experiments are carried out to verify that the proposed approach outperforms other strategies, achieving the best results in terms of separability and a trade-off with the homogeneity preserving relevant structures in the image. The performance is also measured in terms of computational central processing unit (CPU) times.
Resumo:
Wireless sensor networks are posed as the new communication paradigm where the use of small, low-complexity, and low-power devices is preferred over costly centralized systems. The spectra of potential applications of sensor networks is very wide, ranging from monitoring, surveillance, and localization, among others. Localization is a key application in sensor networks and the use of simple, efficient, and distributed algorithms is of paramount practical importance. Combining convex optimization tools with consensus algorithms we propose a distributed localization algorithm for scenarios where received signal strength indicator readings are used. We approach the localization problem by formulating an alternative problem that uses distance estimates locally computed at each node. The formulated problem is solved by a relaxed version using semidefinite relaxation technique. Conditions under which the relaxed problem yields to the same solution as the original problem are given and a distributed consensusbased implementation of the algorithm is proposed based on an augmented Lagrangian approach and primaldual decomposition methods. Although suboptimal, the proposed approach is very suitable for its implementation in real sensor networks, i.e., it is scalable, robust against node failures and requires only local communication among neighboring nodes. Simulation results show that running an additional local search around the found solution can yield performance close to the maximum likelihood estimate.
Resumo:
The wide range of morphological variations in the “loxurina group” makes taxa identification difficult, and despite several reviews, serious taxonomical confusion remains. We make use of DNA data in conjunction with morphological appearance and available information on species distribution to delimit the boundaries of the “loxurina” group species previously established based on morphology. A fragment of 635 base pairs within the mtDNA gene cytochrome oxidase I (COI) was analysed for seven species of the “loxurina group”. Phylogenetic relationships among the included taxa were inferred using maximum parsimony and maximum likelihood methods. Penaincisalia sigsiga (Bálint et al), P. cillutincarae (Draudt), P. atymna (Hewitson) and P. loxurina (C. Felder & R. Felder) were easily delimited as the morphological, geographic and molecular data were congruent. Penaincisalia ludovica (Bálint & Wojtusiak) and P. loxurina astillero (Johnson) represent the same entity and constitute a sub-species of P. loxurina. However, incongruence among morphological, genetic, and geographic data is shown in P. chachapoya (Bálint & Wojtusiak) and P. tegulina (Bálint et al). Our results highlight that an integrative approach is needed to clarify the taxonomy of these neotropical taxa, but more genetic and geographical studies are still required.
Resumo:
In simultaneous analyses of multiple data partitions, the trees relevant when measuring support for a clade are the optimal tree, and the best tree lacking the clade (i.e., the most reasonable alternative). The parsimony-based method of partitioned branch support (PBS) forces each data set to arbitrate between the two relevant trees. This value is the amount each data set contributes to clade support in the combined analysis, and can be very different to support apparent in separate analyses. The approach used in PBS can also be employed in likelihood: a simultaneous analysis of all data retrieves the maximum likelihood tree, and the best tree without the clade of interest is also found. Each data set is fitted to the two trees and the log-likelihood difference calculated, giving partitioned likelihood support (PLS) for each data set. These calculations can be performed regardless of the complexity of the ML model adopted. The significance of PLS can be evaluated using a variety of resampling methods, such as the Kishino-Hasegawa test, the Shimodiara-Hasegawa test, or likelihood weights, although the appropriateness and assumptions of these tests remains debated.
Resumo:
Objective: Inpatient length of stay (LOS) is an important measure of hospital activity, health care resource consumption, and patient acuity. This research work aims at developing an incremental expectation maximization (EM) based learning approach on mixture of experts (ME) system for on-line prediction of LOS. The use of a batchmode learning process in most existing artificial neural networks to predict LOS is unrealistic, as the data become available over time and their pattern change dynamically. In contrast, an on-line process is capable of providing an output whenever a new datum becomes available. This on-the-spot information is therefore more useful and practical for making decisions, especially when one deals with a tremendous amount of data. Methods and material: The proposed approach is illustrated using a real example of gastroenteritis LOS data. The data set was extracted from a retrospective cohort study on all infants born in 1995-1997 and their subsequent admissions for gastroenteritis. The total number of admissions in this data set was n = 692. Linked hospitalization records of the cohort were retrieved retrospectively to derive the outcome measure, patient demographics, and associated co-morbidities information. A comparative study of the incremental learning and the batch-mode learning algorithms is considered. The performances of the learning algorithms are compared based on the mean absolute difference (MAD) between the predictions and the actual LOS, and the proportion of predictions with MAD < 1 day (Prop(MAD < 1)). The significance of the comparison is assessed through a regression analysis. Results: The incremental learning algorithm provides better on-line prediction of LOS when the system has gained sufficient training from more examples (MAD = 1.77 days and Prop(MAD < 1) = 54.3%), compared to that using the batch-mode learning. The regression analysis indicates a significant decrease of MAD (p-value = 0.063) and a significant (p-value = 0.044) increase of Prop(MAD
Resumo:
This work represents an original contribution to the methodology for ecosystem models' development as well as the rst attempt of an end-to-end (E2E) model of the Northern Humboldt Current Ecosystem (NHCE). The main purpose of the developed model is to build a tool for ecosystem-based management and decision making, reason why the credibility of the model is essential, and this can be assessed through confrontation to data. Additionally, the NHCE exhibits a high climatic and oceanographic variability at several scales, the major source of interannual variability being the interruption of the upwelling seasonality by the El Niño Southern Oscillation, which has direct e ects on larval survival and sh recruitment success. Fishing activity can also be highly variable, depending on the abundance and accessibility of the main shery resources. This context brings the two main methodological questions addressed in this thesis, through the development of an end-to-end model coupling the high trophic level model OSMOSE to the hydrodynamics and biogeochemical model ROMS-PISCES: i) how to calibrate ecosystem models using time series data and ii) how to incorporate the impact of the interannual variability of the environment and shing. First, this thesis highlights some issues related to the confrontation of complex ecosystem models to data and proposes a methodology for a sequential multi-phases calibration of ecosystem models. We propose two criteria to classify the parameters of a model: the model dependency and the time variability of the parameters. Then, these criteria along with the availability of approximate initial estimates are used as decision rules to determine which parameters need to be estimated, and their precedence order in the sequential calibration process. Additionally, a new Evolutionary Algorithm designed for the calibration of stochastic models (e.g Individual Based Model) and optimized for maximum likelihood estimation has been developed and applied to the calibration of the OSMOSE model to time series data. The environmental variability is explicit in the model: the ROMS-PISCES model forces the OSMOSE model and drives potential bottom-up e ects up the foodweb through plankton and sh trophic interactions, as well as through changes in the spatial distribution of sh. The latter e ect was taken into account using presence/ absence species distribution models which are traditionally assessed through a confusion matrix and the statistical metrics associated to it. However, when considering the prediction of the habitat against time, the variability in the spatial distribution of the habitat can be summarized and validated using the emerging patterns from the shape of the spatial distributions. We modeled the potential habitat of the main species of the Humboldt Current Ecosystem using several sources of information ( sheries, scienti c surveys and satellite monitoring of vessels) jointly with environmental data from remote sensing and in situ observations, from 1992 to 2008. The potential habitat was predicted over the study period with monthly resolution, and the model was validated using quantitative and qualitative information of the system using a pattern oriented approach. The nal ROMS-PISCES-OSMOSE E2E ecosystem model for the NHCE was calibrated using our evolutionary algorithm and a likelihood approach to t monthly time series data of landings, abundance indices and catch at length distributions from 1992 to 2008. To conclude, some potential applications of the model for shery management are presented and their limitations and perspectives discussed.
Resumo:
Perez-Losada et al. [1] analyzed 72 complete genomes corresponding to nine mammalian (67 strains) and 2 avian (5 strains) polyomavirus species using maximum likelihood and Bayesian methods of phylogenetic inference. Because some data of 2 genomes in their work are now not available in GenBank, in this work, we analyze the phylogenetic relationship of the remaining 70 complete genomes corresponding to nine mammalian (65 strains) and two avian (5 strains) polyomavirus species using a dynamical language model approach developed by our group (Yu et al., [26]). This distance method does not require sequence alignment for deriving species phylogeny based on overall similarities of the complete genomes. Our best tree separates the bird polyomaviruses (avian polyomaviruses and goose hemorrhagic polymaviruses) from the mammalian polyomaviruses, which supports the idea of splitting the genus into two subgenera. Such a split is consistent with the different viral life strategies of each group. In the mammalian polyomavirus subgenera, mouse polyomaviruses (MPV), simian viruses 40 (SV40), BK viruses (BKV) and JC viruses (JCV) are grouped as different branches as expected. The topology of our best tree is quite similar to that of the tree constructed by Perez-Losada et al.
Resumo:
Crash prediction models are used for a variety of purposes including forecasting the expected future performance of various transportation system segments with similar traits. The influence of intersection features on safety have been examined extensively because intersections experience a relatively large proportion of motor vehicle conflicts and crashes compared to other segments in the transportation system. The effects of left-turn lanes at intersections in particular have seen mixed results in the literature. Some researchers have found that left-turn lanes are beneficial to safety while others have reported detrimental effects on safety. This inconsistency is not surprising given that the installation of left-turn lanes is often endogenous, that is, influenced by crash counts and/or traffic volumes. Endogeneity creates problems in econometric and statistical models and is likely to account for the inconsistencies reported in the literature. This paper reports on a limited-information maximum likelihood (LIML) estimation approach to compensate for endogeneity between left-turn lane presence and angle crashes. The effects of endogeneity are mitigated using the approach, revealing the unbiased effect of left-turn lanes on crash frequency for a dataset of Georgia intersections. The research shows that without accounting for endogeneity, left-turn lanes ‘appear’ to contribute to crashes; however, when endogeneity is accounted for in the model, left-turn lanes reduce angle crash frequencies as expected by engineering judgment. Other endogenous variables may lurk in crash models as well, suggesting that the method may be used to correct simultaneity problems with other variables and in other transportation modeling contexts.
Resumo:
This paper discusses the statistical analyses used to derive bridge live loads models for Hong Kong from a 10-year weigh-in-motion (WIM) data. The statistical concepts required and the terminologies adopted in the development of bridge live load models are introduced. This paper includes studies for representative vehicles from the large amount of WIM data in Hong Kong. Different load affecting parameters such as gross vehicle weights, axle weights, axle spacings, average daily number of trucks etc are first analyzed by various stochastic processes in order to obtain the mathematical distributions of these parameters. As a prerequisite to determine accurate bridge design loadings in Hong Kong, this study not only takes advantages of code formulation methods used internationally but also presents a new method for modelling collected WIM data using a statistical approach.
Resumo:
Many traffic situations require drivers to cross or merge into a stream having higher priority. Gap acceptance theory enables us to model such processes to analyse traffic operation. This discussion demonstrated that numerical search fine tuned by statistical analysis can be used to determine the most likely critical gap for a sample of drivers, based on their largest rejected gap and accepted gap. This method shares some common features with the Maximum Likelihood Estimation technique (Troutbeck 1992) but lends itself well to contemporary analysis tools such as spreadsheet and is particularly analytically transparent. This method is considered not to bias estimation of critical gap due to very small rejected gaps or very large rejected gaps. However, it requires a sufficiently large sample that there is reasonable representation of largest rejected gap/accepted gap pairs within a fairly narrow highest likelihood search band.
Resumo:
This paper presents an approach to building an observation likelihood function from a set of sparse, noisy training observations taken from known locations by a sensor with no obvious geometric model. The basic approach is to fit an interpolant to the training data, representing the expected observation, and to assume additive sensor noise. This paper takes a Bayesian view of the problem, maintaining a posterior over interpolants rather than simply the maximum-likelihood interpolant, giving a measure of uncertainty in the map at any point. This is done using a Gaussian process framework. To validate the approach experimentally, a model of an environment is built using observations from an omni-directional camera. After a model has been built from the training data, a particle filter is used to localise while traversing this environment
Resumo:
Although the relationship between socioeconomic status (SES) and health is well documented for developed countries, less evidence has been presented for developing countries. The aim of this paper is to analyse this relationship at the household level for Fiji, a developing country in the South Pacific, using original household survey data. To allow for the endogeneity of SES status in the household health production function, we utilize a simultaneous equation approach where estimates are achieved by full information maximum likelihood. By restricting our sample to one, relatively small island, and including area and district hospital effects, physical geography effects are unpacked from income effects. We measure SES, as permanent income which is constructed using principal components analysis. An alternative specification considers transitory household income. We find that a 1% increase in wealth (our measure of permanent income) would lead to a 15% decrease in the probability of an incapacitating illness occurring intra-household. Although the presence of a strong relationship indicates that relatively small improvements in SES status can significantly improve health at the household level, it is argued that the design of appropriate policy would also require an understanding of the various mechanisms through which the relationship operates.
Resumo:
The giant freshwater prawn (Macrobrachium rosenbergii) or GFP is one of the most important freshwater crustacean species in the inland aquaculture sector of many tropical and subtropical countries. Since the 1990’s, there has been rapid global expansion of freshwater prawn farming, especially in Asian countries, with an average annual rate of increase of 48% between 1999 and 2001 (New, 2005). In Vietnam, GFP is cultured in a variety of culture systems, typically in integrated or rotational rice-prawn culture (Phuong et al., 2006) and has become one of the most common farmed aquatic species in the country, due to its ability to grow rapidly and to attract high market price and high demand. Despite potential for expanded production, sustainability of freshwater prawn farming in the region is currently threatened by low production efficiency and vulnerability of farmed stocks to disease. Commercial large scale and small scale GFP farms in Vietnam have experienced relatively low stock productivity, large size and weight variation, a low proportion of edible meat (large head to body ratio), scarcity of good quality seed stock. The current situation highlights the need for a systematic stock improvement program for GFP in Vietnam aimed at improving economically important traits in this species. This study reports on the breeding program for fast growth employing combined (between and within) family selection in giant freshwater prawn in Vietnam. The base population was synthesized using a complete diallel cross including 9 crosses from two local stocks (DN and MK strains) and a third exotic stock (Malaysian strain - MY). In the next three selection generations, matings were conducted between genetically unrelated brood stock to produce full-sib and (paternal) half-sib families. All families were produced and reared separately until juveniles in each family were tagged as a batch using visible implant elastomer (VIE) at a body size of approximately 2 g. After tags were verified, 60 to 120 juveniles chosen randomly from each family were released into two common earthen ponds of 3,500 m2 pond for a grow-out period of 16 to 18 weeks. Selection applied at harvest on body weight was a combined (between and within) family selection approach. 81, 89, 96 and 114 families were produced for the Selection line in the F0, F1, F2 and F3 generations, respectively. In addition to the Selection line, 17 to 42 families were produced for the Control group in each generation. Results reported here are based on a data set consisting of 18,387 body and 1,730 carcass records, as well as full pedigree information collected over four generations. Variance and covariance components were estimated by restricted maximum likelihood fitting a multi-trait animal model. Experiments assessed performance of VIE tags in juvenile GFP of different size classes and individuals tagged with different numbers of tags showed that juvenile GFP at 2 g were of suitable size for VIE tags with no negative effects evident on growth or survival. Tag retention rates were above 97.8% and tag readability rates were 100% with a correct assignment rate of 95% through to mature animal size of up to 170 g. Across generations, estimates of heritability for body traits (body weight, body length, cephalothorax length, abdominal length, cephalothorax width and abdominal width) and carcass weight traits (abdominal weight, skeleton-off weight and telson-off weight) were moderate and ranged from 0.14 to 0.19 and 0.17 to 0.21, respectively. Body trait heritabilities estimated for females were significantly higher than for males whereas carcass weight trait heritabilities estimated for females and males were not significantly different (P > 0.05). Maternal and common environmental effects for body traits accounted for 4 to 5% of the total variance and were greater in females (7 to 10%) than in males (4 to 5%). Genetic correlations among body traits were generally high in both sexes. Genetic correlations between body and carcass weight traits were also high in the mixed sexes. Average selection response (% per generation) for body weight (transformed to square root) estimated as the difference between the Selection and the Control group was 7.4% calculated from least squares means (LSMs), 7.0% from estimated breeding values (EBVs) and 4.4% calculated from EBVs between two consecutive generations. Favourable correlated selection responses (estimated from LSMs) were detected for other body traits (12.1%, 14.5%, 10.4%, 15.5% and 13.3% for body length, cephalothorax length, abdominal length, cephalothorax width and abdominal width, respectively) over three selection generations. Data in the second selection generation showed positive correlated responses for carcass weight traits (8.8%, 8.6% and 8.8% for abdominal weight, skeleton-off weight and telson-off weight, respectively). Data in the third selection generation showed that heritability for body traits were moderate and ranged from 0.06 to 0.11 and 0.11 to 0.22 at weeks 10 and 18, respectively. Body trait heritabilities estimated at week 10 were not significantly lower than at week 18. Genetic correlations between body traits within age and genetic correlations for body traits between ages were generally high. Overall our results suggest that growth rate responds well to the application of family selection and carcass weight traits can also be improved in parallel, using this approach. Moreover, selection for high growth rate in GFP can be undertaken successfully before full market size has been reached. The outcome of this study was production of an improved culture strain of GFP for the Vietnamese culture industry that will be trialed in real farm production environments to confirm the genetic gains identified in the experimental stock improvement program.
Resumo:
Commodity price modeling is normally approached in terms of structural time-series models, in which the different components (states) have a financial interpretation. The parameters of these models can be estimated using maximum likelihood. This approach results in a non-linear parameter estimation problem and thus a key issue is how to obtain reliable initial estimates. In this paper, we focus on the initial parameter estimation problem for the Schwartz-Smith two-factor model commonly used in asset valuation. We propose the use of a two-step method. The first step considers a univariate model based only on the spot price and uses a transfer function model to obtain initial estimates of the fundamental parameters. The second step uses the estimates obtained in the first step to initialize a re-parameterized state-space-innovations based estimator, which includes information related to future prices. The second step refines the estimates obtained in the first step and also gives estimates of the remaining parameters in the model. This paper is part tutorial in nature and gives an introduction to aspects of commodity price modeling and the associated parameter estimation problem.
Resumo:
Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.