957 resultados para Kernel density estimator
Resumo:
Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.
Resumo:
We generalize the popular ensemble Kalman filter to an ensemble transform filter, in which the prior distribution can take the form of a Gaussian mixture or a Gaussian kernel density estimator. The design of the filter is based on a continuous formulation of the Bayesian filter analysis step. We call the new filter algorithm the ensemble Gaussian-mixture filter (EGMF). The EGMF is implemented for three simple test problems (Brownian dynamics in one dimension, Langevin dynamics in two dimensions and the three-dimensional Lorenz-63 model). It is demonstrated that the EGMF is capable of tracking systems with non-Gaussian uni- and multimodal ensemble distributions. Copyright © 2011 Royal Meteorological Society
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Reports of triatomine infestation in urban areas have increased. We analysed the spatial distribution of infestation by triatomines in the urban area of Diamantina, in the state of Minas Gerais, Brazil. Triatomines were obtained by community-based entomological surveillance. Spatial patterns of infestation were analysed by Ripley’s K function and Kernel density estimator. Normalised difference vegetation index (NDVI) and land cover derived from satellite imagery were compared between infested and uninfested areas. A total of 140 adults of four species were captured (100 Triatoma vitticeps, 25 Panstrongylus geniculatus, 8 Panstrongylus megistus, and 7 Triatoma arthurneivai specimens). In total, 87.9% were captured within domiciles. Infection by trypanosomes was observed in 19.6% of 107 examined insects. The spatial distributions of T. vitticeps, P. geniculatus, T. arthurneivai, and trypanosome-positive triatomines were clustered, occurring mainly in peripheral areas. NDVI values were statistically higher in areas infested by T. vitticeps and P. geniculatus. Buildings infested by these species were located closer to open fields, whereas infestations of P. megistus and T. arthurneivai were closer to bare soil. Human occupation and modification of natural areas may be involved in triatomine invasion, exposing the population to these vectors.
Resumo:
2000 Mathematics Subject Classification: 62G07, 60F10.
Resumo:
INTRODUCTION: The prevalence and intensity of geohelminth infections and schistosomiasis remain high in the rural areas of Zona da Mata, Pernambuco (ZMP), Brazil, where these parasites still represent a significant public health problem. The present study aimed to spatially assess the occurrences of schistosomiasis and geohelminthiasis in the ZMP. METHODS: The ZMP has a population of 1,132,544 inhabitants, formed by 43 municipalities. An ecological study was conducted, using secondary data relating to positive human cases and parasite loads of schistosomiasis and positive human cases of geohelminthiasis that were worked up in Excel 2007. We used the coordinates of the municipal headquarters to represent the cities which served as the unit of analysis of this study. The Kernel estimator was used to spatially analyze the data and identify distribution patterns and case densities, with analysis done in ArcGIS software. RESULTS: Spatial analysis from the Kernel intensity estimator made it possible to construct density maps showing that the northern ZMP was the region with the greatest number of children infected with parasites and the populations most intensely infected by Schistosoma mansoni. In relation to geohelminths, there was higher spatial distribution of cases of Ascaris lumbricoides and Trichuris trichiura in the southern ZMP, and greater occurrence of hookworms in the northern/central ZMP. CONCLUSIONS: Despite several surveys and studies showing occurrences of schistosomiasis and geohelminthiasis in the ZMP, no preventive measures that are known to have been effective in decreasing these health hazards have yet been implemented in the endemic area.
Resumo:
For the standard kernel density estimate, it is known that one can tune the bandwidth such that the expected L1 error is within a constant factor of the optimal L1 error (obtained when one is allowed to choose the bandwidth with knowledge of the density). In this paper, we pose the same problem for variable bandwidth kernel estimates where the bandwidths are allowed to depend upon the location. We show in particular that for positive kernels on the real line, for any data-based bandwidth, there exists a densityfor which the ratio of expected L1 error over optimal L1 error tends to infinity. Thus, the problem of tuning the variable bandwidth in an optimal manner is ``too hard''. Moreover, from the class of counterexamples exhibited in the paper, it appears thatplacing conditions on the densities (monotonicity, convexity, smoothness) does not help.
Resumo:
We propose robust estimators of the generalized log-gamma distribution and, more generally, of location-shape-scale families of distributions. A (weighted) Q tau estimator minimizes a tau scale of the differences between empirical and theoretical quantiles. It is n(1/2) consistent; unfortunately, it is not asymptotically normal and, therefore, inconvenient for inference. However, it is a convenient starting point for a one-step weighted likelihood estimator, where the weights are based on a disparity measure between the model density and a kernel density estimate. The one-step weighted likelihood estimator is asymptotically normal and fully efficient under the model. It is also highly robust under outlier contamination. Supplementary materials are available online.
Resumo:
Ce mémoire porte sur la présentation des estimateurs de Bernstein qui sont des alternatives récentes aux différents estimateurs classiques de fonctions de répartition et de densité. Plus précisément, nous étudions leurs différentes propriétés et les comparons à celles de la fonction de répartition empirique et à celles de l'estimateur par la méthode du noyau. Nous déterminons une expression asymptotique des deux premiers moments de l'estimateur de Bernstein pour la fonction de répartition. Comme pour les estimateurs classiques, nous montrons que cet estimateur vérifie la propriété de Chung-Smirnov sous certaines conditions. Nous montrons ensuite que l'estimateur de Bernstein est meilleur que la fonction de répartition empirique en terme d'erreur quadratique moyenne. En s'intéressant au comportement asymptotique des estimateurs de Bernstein, pour un choix convenable du degré du polynôme, nous montrons que ces estimateurs sont asymptotiquement normaux. Des études numériques sur quelques distributions classiques nous permettent de confirmer que les estimateurs de Bernstein peuvent être préférables aux estimateurs classiques.
Resumo:
Using the classical Parzen window estimate as the target function, the kernel density estimation is formulated as a regression problem and the orthogonal forward regression technique is adopted to construct sparse kernel density estimates. The proposed algorithm incrementally minimises a leave-one-out test error score to select a sparse kernel model, and a local regularisation method is incorporated into the density construction process to further enforce sparsity. The kernel weights are finally updated using the multiplicative nonnegative quadratic programming algorithm, which has the ability to reduce the model size further. Except for the kernel width, the proposed algorithm has no other parameters that need tuning, and the user is not required to specify any additional criterion to terminate the density construction procedure. Two examples are used to demonstrate the ability of this regression-based approach to effectively construct a sparse kernel density estimate with comparable accuracy to that of the full-sample optimised Parzen window density estimate.
Resumo:
The FE ('fixed effects') estimator of technical inefficiency performs poorly when N ('number of firms') is large and T ('number of time observations') is small. We propose estimators of both the firm effects and the inefficiencies, which have small sample gains compared to the traditional FE estimator. The estimators are based on nonparametric kernel regression of unordered variables, which includes the FE estimator as a special case. In terms of global conditional MSE ('mean square error') criterions, it is proved that there are kernel estimators which are efficient to the FE estimators of firm effects and inefficiencies, in finite samples. Monte Carlo simulations supports our theoretical findings and in an empirical example it is shown how the traditional FE estimator and the proposed kernel FE estimator lead to very different conclusions about inefficiency of Indonesian rice farmers.
Resumo:
The objective of this project was to study the influence of surcharge pressure and moisture content on the compressive behavior and bulk density of soybeans. Three varieties were selected with varying dimensions and shapes. Moisture contents of 10.5, 15.0, and 20% were tested at nine surcharge pressures in the range from 0 to 82.8 kPa. Results indicated that the bulk densities of different soybean varieties have similar behavior with respect to pressure level and moisture content but that the magnitude of bulk density was influenced by variety, Bulk density was influenced by both pressure level and moisture content. The four-element Burger model was found to adequately describe the bulk density of soybeans as a function of pressure for all varieties and moisture levels.
Resumo:
The identification of disease clusters in space or space-time is of vital importance for public health policy and action. In the case of methicillin-resistant Staphylococcus aureus (MRSA), it is particularly important to distinguish between community and health care-associated infections, and to identify reservoirs of infection. 832 cases of MRSA in the West Midlands (UK) were tested for clustering and evidence of community transmission, after being geo-located to the centroids of UK unit postcodes (postal areas roughly equivalent to Zip+4 zip code areas). An age-stratified analysis was also carried out at the coarser spatial resolution of UK Census Output Areas. Stochastic simulation and kernel density estimation were combined to identify significant local clusters of MRSA (p<0.025), which were supported by SaTScan spatial and spatio-temporal scan. In order to investigate local sampling effort, a spatial 'random labelling' approach was used, with MRSA as cases and MSSA (methicillin-sensitive S. aureus) as controls. Heavy sampling in general was a response to MRSA outbreaks, which in turn appeared to be associated with medical care environments. The significance of clusters identified by kernel estimation was independently supported by information on the locations and client groups of nursing homes, and by preliminary molecular typing of isolates. In the absence of occupational/ lifestyle data on patients, the assumption was made that an individual's location and consequent risk is adequately represented by their residential postcode. The problems of this assumption are discussed, with recommendations for future data collection.
Resumo:
Agent-based technology is playing an increasingly important role in today’s economy. Usually a multi-agent system is needed to model an economic system such as a market system, in which heterogeneous trading agents interact with each other autonomously. Two questions often need to be answered regarding such systems: 1) How to design an interacting mechanism that facilitates efficient resource allocation among usually self-interested trading agents? 2) How to design an effective strategy in some specific market mechanisms for an agent to maximise its economic returns? For automated market systems, auction is the most popular mechanism to solve resource allocation problems among their participants. However, auction comes in hundreds of different formats, in which some are better than others in terms of not only the allocative efficiency but also other properties e.g., whether it generates high revenue for the auctioneer, whether it induces stable behaviour of the bidders. In addition, different strategies result in very different performance under the same auction rules. With this background, we are inevitably intrigued to investigate auction mechanism and strategy designs for agent-based economics. The international Trading Agent Competition (TAC) Ad Auction (AA) competition provides a very useful platform to develop and test agent strategies in Generalised Second Price auction (GSP). AstonTAC, the runner-up of TAC AA 2009, is a successful advertiser agent designed for GSP-based keyword auction. In particular, AstonTAC generates adaptive bid prices according to the Market-based Value Per Click and selects a set of keyword queries with highest expected profit to bid on to maximise its expected profit under the limit of conversion capacity. Through evaluation experiments, we show that AstonTAC performs well and stably not only in the competition but also across a broad range of environments. The TAC CAT tournament provides an environment for investigating the optimal design of mechanisms for double auction markets. AstonCAT-Plus is the post-tournament version of the specialist developed for CAT 2010. In our experiments, AstonCAT-Plus not only outperforms most specialist agents designed by other institutions but also achieves high allocative efficiencies, transaction success rates and average trader profits. Moreover, we reveal some insights of the CAT: 1) successful markets should maintain a stable and high market share of intra-marginal traders; 2) a specialist’s performance is dependent on the distribution of trading strategies. However, typical double auction models assume trading agents have a fixed trading direction of either buy or sell. With this limitation they cannot directly reflect the fact that traders in financial markets (the most popular application of double auction) decide their trading directions dynamically. To address this issue, we introduce the Bi-directional Double Auction (BDA) market which is populated by two-way traders. Experiments are conducted under both dynamic and static settings of the continuous BDA market. We find that the allocative efficiency of a continuous BDA market mainly comes from rational selection of trading directions. Furthermore, we introduce a high-performance Kernel trading strategy in the BDA market which uses kernel probability density estimator built on historical transaction data to decide optimal order prices. Kernel trading strategy outperforms some popular intelligent double auction trading strategies including ZIP, GD and RE in the continuous BDA market by making the highest profit in static games and obtaining the best wealth in dynamic games.