814 resultados para data gathering algorithm
Resumo:
This paper proposes a new iterative algorithm for OFDM joint data detection and phase noise (PHN) cancellation based on minimum mean square prediction error. We particularly highlight the problem of "overfitting" such that the iterative approach may converge to a trivial solution. Although it is essential for this joint approach, the overfitting problem was relatively less studied in existing algorithms. In this paper, specifically, we apply a hard decision procedure at every iterative step to overcome the overfitting. Moreover, compared with existing algorithms, a more accurate Pade approximation is used to represent the phase noise, and finally a more robust and compact fast process based on Givens rotation is proposed to reduce the complexity to a practical level. Numerical simulations are also given to verify the proposed algorithm.
The TAMORA algorithm: satellite rainfall estimates over West Africa using multi-spectral SEVIRI data
Resumo:
A multi-spectral rainfall estimation algorithm has been developed for the Sahel region of West Africa with the purpose of producing accumulated rainfall estimates for drought monitoring and food security. Radar data were used to calibrate multi-channel SEVIRI data from MSG, and a probability of rainfall at several different rain-rates was established for each combination of SEVIRI radiances. Radar calibrations from both Europe (the SatPrecip algorithm) and Niger (TAMORA algorithm) were used. 10 day estimates were accumulated from SatPrecip and TAMORA and compared with kriged gauge data and TAMSAT satellite rainfall estimates over West Africa. SatPrecip was found to produce large overestimates for the region, probably because of its non-local calibration. TAMORA was negatively biased for areas of West Africa with relatively high rainfall, but its skill was comparable to TAMSAT for the low-rainfall region climatologically similar to its calibration area around Niamey. These results confirm the high importance of local calibration for satellite-derived rainfall estimates. As TAMORA shows no improvement in skill over TAMSAT for dekadal estimates, the extra cloud-microphysical information provided by multi-spectral data may not be useful in determining rainfall accumulations at a ten day timescale. Work is ongoing to determine whether it shows improved accuracy at shorter timescales.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.
Resumo:
Current commercially available Doppler lidars provide an economical and robust solution for measuring vertical and horizontal wind velocities, together with the ability to provide co- and cross-polarised backscatter profiles. The high temporal resolution of these instruments allows turbulent properties to be obtained from studying the variation in radial velocities. However, the instrument specifications mean that certain characteristics, especially the background noise behaviour, become a limiting factor for the instrument sensitivity in regions where the aerosol load is low. Turbulent calculations require an accurate estimate of the contribution from velocity uncertainty estimates, which are directly related to the signal-to-noise ratio. Any bias in the signal-to-noise ratio will propagate through as a bias in turbulent properties. In this paper we present a method to correct for artefacts in the background noise behaviour of commercially available Doppler lidars and reduce the signal-to-noise ratio threshold used to discriminate between noise, and cloud or aerosol signals. We show that, for Doppler lidars operating continuously at a number of locations in Finland, the data availability can be increased by as much as 50 % after performing this background correction and subsequent reduction in the threshold. The reduction in bias also greatly improves subsequent calculations of turbulent properties in weak signal regimes.
Resumo:
A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Conventional procedures employed in the modeling of viscoelastic properties of polymer rely on the determination of the polymer`s discrete relaxation spectrum from experimentally obtained data. In the past decades, several analytical regression techniques have been proposed to determine an explicit equation which describes the measured spectra. With a diverse approach, the procedure herein introduced constitutes a simulation-based computational optimization technique based on non-deterministic search method arisen from the field of evolutionary computation. Instead of comparing numerical results, this purpose of this paper is to highlight some Subtle differences between both strategies and focus on what properties of the exploited technique emerge as new possibilities for the field, In oder to illustrate this, essayed cases show how the employed technique can outperform conventional approaches in terms of fitting quality. Moreover, in some instances, it produces equivalent results With much fewer fitting parameters, which is convenient for computational simulation applications. I-lie problem formulation and the rationale of the highlighted method are herein discussed and constitute the main intended contribution. (C) 2009 Wiley Periodicals, Inc. J Appl Polym Sci 113: 122-135, 2009
Resumo:
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
Resumo:
A tandem mass spectral database system consists of a library of reference spectra and a search program. State-of-the-art search programs show a high tolerance for variability in compound-specific fragmentation patterns produced by collision-induced decomposition and enable sensitive and specific 'identity search'. In this communication, performance characteristics of two search algorithms combined with the 'Wiley Registry of Tandem Mass Spectral Data, MSforID' (Wiley Registry MSMS, John Wiley and Sons, Hoboken, NJ, USA) were evaluated. The search algorithms tested were the MSMS search algorithm implemented in the NIST MS Search program 2.0g (NIST, Gaithersburg, MD, USA) and the MSforID algorithm (John Wiley and Sons, Hoboken, NJ, USA). Sample spectra were acquired on different instruments and, thus, covered a broad range of possible experimental conditions or were generated in silico. For each algorithm, more than 30,000 matches were performed. Statistical evaluation of the library search results revealed that principally both search algorithms can be combined with the Wiley Registry MSMS to create a reliable identification tool. It appears, however, that a higher degree of spectral similarity is necessary to obtain a correct match with the NIST MS Search program. This characteristic of the NIST MS Search program has a positive effect on specificity as it helps to avoid false positive matches (type I errors), but reduces sensitivity. Thus, particularly with sample spectra acquired on instruments differing in their Setup from tandem-in-space type fragmentation, a comparably higher number of false negative matches (type II errors) were observed by searching the Wiley Registry MSMS.
Resumo:
We propose a general procedure for solving incomplete data estimation problems. The procedure can be used to find the maximum likelihood estimate or to solve estimating equations in difficult cases such as estimation with the censored or truncated regression model, the nonlinear structural measurement error model, and the random effects model. The procedure is based on the general principle of stochastic approximation and the Markov chain Monte-Carlo method. Applying the theory on adaptive algorithms, we derive conditions under which the proposed procedure converges. Simulation studies also indicate that the proposed procedure consistently converges to the maximum likelihood estimate for the structural measurement error logistic regression model.
Resumo:
Frequencies of meiotic configurations in cytogenetic stocks are dependent on chiasma frequencies in segments defined by centromeres, breakpoints, and telomeres. The expectation maximization algorithm is proposed as a general method to perform maximum likelihood estimations of the chiasma frequencies in the intervals between such locations. The estimates can be translated via mapping functions into genetic maps of cytogenetic landmarks. One set of observational data was analyzed to exemplify application of these methods, results of which were largely concordant with other comparable data. The method was also tested by Monte Carlo simulation of frequencies of meiotic configurations from a monotelodisomic translocation heterozygote, assuming six different sample sizes. The estimate averages were always close to the values given initially to the parameters. The maximum likelihood estimation procedures can be extended readily to other kinds of cytogenetic stocks and allow the pooling of diverse cytogenetic data to collectively estimate lengths of segments, arms, and chromosomes.
Resumo:
Numerical modelling methodologies are important by their application to engineering and scientific problems, because there are processes where analytical mathematical expressions cannot be obtained to model them. When the only available information is a set of experimental values for the variables that determine the state of the system, the modelling problem is equivalent to determining the hyper-surface that best fits the data. This paper presents a methodology based on the Galerkin formulation of the finite elements method to obtain representations of relationships that are defined a priori, between a set of variables: y = z(x1, x2,...., xd). These representations are generated from the values of the variables in the experimental data. The approximation, piecewise, is an element of a Sobolev space and has derivatives defined in a general sense into this space. The using of this approach results in the need of inverting a linear system with a structure that allows a fast solver algorithm. The algorithm can be used in a variety of fields, being a multidisciplinary tool. The validity of the methodology is studied considering two real applications: a problem in hydrodynamics and a problem of engineering related to fluids, heat and transport in an energy generation plant. Also a test of the predictive capacity of the methodology is performed using a cross-validation method.
Resumo:
Includes bibliographies (p. 12).
Resumo:
"May 1980."