Biblioteca Digital

854 resultados para data gathering algorithm

Genetic algorithm for shipping route estimation with long-range tracking data : automatic reconstruction of shipping routes based on the historical ship positions for maritime safety applications

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Ship tracking systems allow Maritime Organizations that are concerned with the Safety at Sea to obtain information on the current location and route of merchant vessels. Thanks to Space technology in recent years the geographical coverage of the ship tracking platforms has increased significantly, from radar based near-shore traffic monitoring towards a worldwide picture of the maritime traffic situation. The long-range tracking systems currently in operations allow the storage of ship position data over many years: a valuable source of knowledge about the shipping routes between different ocean regions. The outcome of this Master project is a software prototype for the estimation of the most operated shipping route between any two geographical locations. The analysis is based on the historical ship positions acquired with long-range tracking systems. The proposed approach makes use of a Genetic Algorithm applied on a training set of relevant ship positions extracted from the long-term storage tracking database of the European Maritime Safety Agency (EMSA). The analysis of some representative shipping routes is presented and the quality of the results and their operational applications are assessed by a Maritime Safety expert.

Fuzzy logic algorithm to extract specific interaction forces from atomic force microscopy data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The atomic force microscope is not only a very convenient tool for studying the topography of different samples, but it can also be used to measure specific binding forces between molecules. For this purpose, one type of molecule is attached to the tip and the other one to the substrate. Approaching the tip to the substrate allows the molecules to bind together. Retracting the tip breaks the newly formed bond. The rupture of a specific bond appears in the force-distance curves as a spike from which the binding force can be deduced. In this article we present an algorithm to automatically process force-distance curves in order to obtain bond strength histograms. The algorithm is based on a fuzzy logic approach that permits an evaluation of "quality" for every event and makes the detection procedure much faster compared to a manual selection. In this article, the software has been applied to measure the binding strength between tubuline and microtubuline associated proteins.

Adaptive SIR algorithm for Bayesian multilevel inference on categorical data

Relevância:

40.00% 40.00%

Publicador:

A new algorithm for OFDM joint data detection and phase noise cancellation

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes a new iterative algorithm for OFDM joint data detection and phase noise (PHN) cancellation based on minimum mean square prediction error. We particularly highlight the problem of "overfitting" such that the iterative approach may converge to a trivial solution. Although it is essential for this joint approach, the overfitting problem was relatively less studied in existing algorithms. In this paper, specifically, we apply a hard decision procedure at every iterative step to overcome the overfitting. Moreover, compared with existing algorithms, a more accurate Pade approximation is used to represent the phase noise, and finally a more robust and compact fast process based on Givens rotation is proposed to reduce the complexity to a practical level. Numerical simulations are also given to verify the proposed algorithm.

The TAMORA algorithm: satellite rainfall estimates over West Africa using multi-spectral SEVIRI data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A multi-spectral rainfall estimation algorithm has been developed for the Sahel region of West Africa with the purpose of producing accumulated rainfall estimates for drought monitoring and food security. Radar data were used to calibrate multi-channel SEVIRI data from MSG, and a probability of rainfall at several different rain-rates was established for each combination of SEVIRI radiances. Radar calibrations from both Europe (the SatPrecip algorithm) and Niger (TAMORA algorithm) were used. 10 day estimates were accumulated from SatPrecip and TAMORA and compared with kriged gauge data and TAMSAT satellite rainfall estimates over West Africa. SatPrecip was found to produce large overestimates for the region, probably because of its non-local calibration. TAMORA was negatively biased for areas of West Africa with relatively high rainfall, but its skill was comparable to TAMSAT for the low-rainfall region climatologically similar to its calibration area around Niamey. These results confirm the high importance of local calibration for satellite-derived rainfall estimates. As TAMORA shows no improvement in skill over TAMSAT for dekadal estimates, the extra cloud-microphysical information provided by multi-spectral data may not be useful in determining rainfall accumulations at a ten day timescale. Work is ongoing to determine whether it shows improved accuracy at shorter timescales.

eRules: a modular adaptive classification rule learning algorithm for data streams

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

A generalised background correction algorithm for a Halo Doppler lidar and its application to data from Finland

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Current commercially available Doppler lidars provide an economical and robust solution for measuring vertical and horizontal wind velocities, together with the ability to provide co- and cross-polarised backscatter profiles. The high temporal resolution of these instruments allows turbulent properties to be obtained from studying the variation in radial velocities. However, the instrument specifications mean that certain characteristics, especially the background noise behaviour, become a limiting factor for the instrument sensitivity in regions where the aerosol load is low. Turbulent calculations require an accurate estimate of the contribution from velocity uncertainty estimates, which are directly related to the signal-to-noise ratio. Any bias in the signal-to-noise ratio will propagate through as a bias in turbulent properties. In this paper we present a method to correct for artefacts in the background noise behaviour of commercially available Doppler lidars and reduce the signal-to-noise ratio threshold used to discriminate between noise, and cloud or aerosol signals. We show that, for Doppler lidars operating continuously at a number of locations in Finland, the data availability can be increased by as much as 50 % after performing this background correction and subsequent reduction in the threshold. The reduction in bias also greatly improves subsequent calculations of turbulent properties in weak signal regimes.

Investigation of a new GRASP-based clustering algorithm applied to biological data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A large amount of biological data has been produced in the last years. Important knowledge can be extracted from these data by the use of data analysis techniques. Clustering plays an important role in data analysis, by organizing similar objects from a dataset into meaningful groups. Several clustering algorithms have been proposed in the literature. However, each algorithm has its bias, being more adequate for particular datasets. This paper presents a mathematical formulation to support the creation of consistent clusters for biological data. Moreover. it shows a clustering algorithm to solve this formulation that uses GRASP (Greedy Randomized Adaptive Search Procedure). We compared the proposed algorithm with three known other algorithms. The proposed algorithm presented the best clustering results confirmed statistically. (C) 2009 Elsevier Ltd. All rights reserved.

Genetic Algorithm for the Determination of Linear Viscoelastic Relaxation Spectrum from Experimental Data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Conventional procedures employed in the modeling of viscoelastic properties of polymer rely on the determination of the polymer`s discrete relaxation spectrum from experimentally obtained data. In the past decades, several analytical regression techniques have been proposed to determine an explicit equation which describes the measured spectra. With a diverse approach, the procedure herein introduced constitutes a simulation-based computational optimization technique based on non-deterministic search method arisen from the field of evolutionary computation. Instead of comparing numerical results, this purpose of this paper is to highlight some Subtle differences between both strategies and focus on what properties of the exploited technique emerge as new possibilities for the field, In oder to illustrate this, essayed cases show how the employed technique can outperform conventional approaches in terms of fitting quality. Moreover, in some instances, it produces equivalent results With much fewer fitting parameters, which is convenient for computational simulation applications. I-lie problem formulation and the rationale of the highlighted method are herein discussed and constitute the main intended contribution. (C) 2009 Wiley Periodicals, Inc. J Appl Polym Sci 113: 122-135, 2009

Validation of a modified snow cover retrieval algorithm from historical 1-km AVHRR data over the European Alps

Relevância:

40.00% 40.00%

Publicador:

A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.

Model-based Clustering of Methylation Array Data: A Recursive-partitioning Algorithm for High-dimensional Data Arising as a Mixture of Beta Distributions

Relevância:

40.00% 40.00%

Publicador:

Testing an alternative search algorithm for compound identification with the 'Wiley Registry of Tandem Mass Spectral Data. MSforID'

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A tandem mass spectral database system consists of a library of reference spectra and a search program. State-of-the-art search programs show a high tolerance for variability in compound-specific fragmentation patterns produced by collision-induced decomposition and enable sensitive and specific 'identity search'. In this communication, performance characteristics of two search algorithms combined with the 'Wiley Registry of Tandem Mass Spectral Data, MSforID' (Wiley Registry MSMS, John Wiley and Sons, Hoboken, NJ, USA) were evaluated. The search algorithms tested were the MSMS search algorithm implemented in the NIST MS Search program 2.0g (NIST, Gaithersburg, MD, USA) and the MSforID algorithm (John Wiley and Sons, Hoboken, NJ, USA). Sample spectra were acquired on different instruments and, thus, covered a broad range of possible experimental conditions or were generated in silico. For each algorithm, more than 30,000 matches were performed. Statistical evaluation of the library search results revealed that principally both search algorithms can be combined with the Wiley Registry MSMS to create a reliable identification tool. It appears, however, that a higher degree of spectral similarity is necessary to obtain a correct match with the NIST MS Search program. This characteristic of the NIST MS Search program has a positive effect on specificity as it helps to avoid false positive matches (type I errors), but reduces sensitivity. Thus, particularly with sample spectra acquired on instruments differing in their Setup from tandem-in-space type fragmentation, a comparably higher number of false negative matches (type II errors) were observed by searching the Wiley Registry MSMS.

A stochastic approximation algorithm with Markov chain Monte-Carlo method for incomplete data estimation problems

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We propose a general procedure for solving incomplete data estimation problems. The procedure can be used to find the maximum likelihood estimate or to solve estimating equations in difficult cases such as estimation with the censored or truncated regression model, the nonlinear structural measurement error model, and the random effects model. The procedure is based on the general principle of stochastic approximation and the Markov chain Monte-Carlo method. Applying the theory on adaptive algorithms, we derive conditions under which the proposed procedure converges. Simulation studies also indicate that the proposed procedure consistently converges to the maximum likelihood estimate for the structural measurement error logistic regression model.

A maximum likelihood algorithm for genome mapping of cytogenetic loci from meiotic configuration data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Frequencies of meiotic configurations in cytogenetic stocks are dependent on chiasma frequencies in segments defined by centromeres, breakpoints, and telomeres. The expectation maximization algorithm is proposed as a general method to perform maximum likelihood estimations of the chiasma frequencies in the intervals between such locations. The estimates can be translated via mapping functions into genetic maps of cytogenetic landmarks. One set of observational data was analyzed to exemplify application of these methods, results of which were largely concordant with other comparable data. The method was also tested by Monte Carlo simulation of frequencies of meiotic configurations from a monotelodisomic translocation heterozygote, assuming six different sample sizes. The estimate averages were always close to the values given initially to the parameters. The maximum likelihood estimation procedures can be extended readily to other kinds of cytogenetic stocks and allow the pooling of diverse cytogenetic data to collectively estimate lengths of segments, arms, and chromosomes.

«
1
2
...
15
16
17
18
19
20
21
...
56
57
»