869 resultados para similarity estimation
Resumo:
The information provided by the alignment-independent GRid Independent Descriptors (GRIND) can be condensed by the application of principal component analysis, obtaining a small number of principal properties (GRIND-PP), which is more suitable for describing molecular similarity. The objective of the present study is to optimize diverse parameters involved in the obtention of the GRIND-PP and validate their suitability for applications, requiring a biologically relevant description of the molecular similarity. With this aim, GRIND-PP computed with a collection of diverse settings were used to carry out ligand-based virtual screening (LBVS) on standard conditions. The quality of the results obtained was remarkable and comparable with other LBVS methods, and their detailed statistical analysis allowed to identify the method settings more determinant for the quality of the results and their optimum. Remarkably, some of these optimum settings differ significantly from those used in previously published applications, revealing their unexplored potential. Their applicability in large compound database was also explored by comparing the equivalence of the results obtained using either computed or projected principal properties. In general, the results of the study confirm the suitability of the GRIND-PP for practical applications and provide useful hints about how they should be computed for obtaining optimum results.
Resumo:
To date, state-of-the-art seismic material parameter estimates from multi-component sea-bed seismic data are based on the assumption that the sea-bed consists of a fully elastic half-space. In reality, however, the shallow sea-bed generally consists of soft, unconsolidated sediments that are characterized by strong to very strong seismic attenuation. To explore the potential implications, we apply a state-of-the-art elastic decomposition algorithm to synthetic data for a range of canonical sea-bed models consisting of a viscoelastic half-space of varying attenuation. We find that in the presence of strong seismic attenuation, as quantified by Q-values of 10 or less, significant errors arise in the conventional elastic estimation of seismic properties. Tests on synthetic data indicate that these errors can be largely avoided by accounting for the inherent attenuation of the seafloor when estimating the seismic parameters. This can be achieved by replacing the real-valued expressions for the elastic moduli in the governing equations in the parameter estimation by their complex-valued viscoelastic equivalents. The practical application of our parameter procedure yields realistic estimates of the elastic seismic material properties of the shallow sea-bed, while the corresponding Q-estimates seem to be biased towards too low values, particularly for S-waves. Given that the estimation of inelastic material parameters is notoriously difficult, particularly in the immediate vicinity of the sea-bed, this is expected to be of interest and importance for civil and ocean engineering purposes.
Resumo:
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
Resumo:
Background: A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners. Results: The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST. Conclusion: Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone.
Resumo:
We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the Music Information Retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms.This article first presents a series of experiments carried outwith two state-of-the-art methods for cover song identification.We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or Dynamic Time Warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best-performing ones are finally applied to the newly proposed method. Multipleevaluations of this one confirm a large increase in identificationaccuracy when comparing it with alternative state-of-the-artapproaches.
Resumo:
Human arteries affected by atherosclerosis are characterized by altered wall viscoelastic properties. The possibility of noninvasively assessing arterial viscoelasticity in vivo would significantly contribute to the early diagnosis and prevention of this disease. This paper presents a noniterative technique to estimate the viscoelastic parameters of a vascular wall Zener model. The approach requires the simultaneous measurement of flow variations and wall displacements, which can be provided by suitable ultrasound Doppler instruments. Viscoelastic parameters are estimated by fitting the theoretical constitutive equations to the experimental measurements using an ARMA parameter approach. The accuracy and sensitivity of the proposed method are tested using reference data generated by numerical simulations of arterial pulsation in which the physiological conditions and the viscoelastic parameters of the model can be suitably varied. The estimated values quantitatively agree with the reference values, showing that the only parameter affected by changing the physiological conditions is viscosity, whose relative error was about 27% even when a poor signal-to-noise ratio is simulated. Finally, the feasibility of the method is illustrated through three measurements made at different flow regimes on a cylindrical vessel phantom, yielding a parameter mean estimation error of 25%.
Resumo:
The quantification of wall motion in cerebral aneurysms is becoming important owing to its potential connection to rupture, and as a way to incorporate the effects of vascular compliance in computational fluid dynamics (CFD) simulations.Most of papers report values obtained with experimental phantoms, simulated images, or animal models, but the information for real patients is limited. In this paper, we have combined non-rigid registration (IR) with signal processing techniques to measure pulsation in real patients from high frame rate digital subtraction angiography (DSA). We have obtained physiological meaningful waveforms with amplitudes in therange 0mm-0.3mm for a population of 18 patients including ruptured and unruptured aneurysms. Statistically significant differences in pulsation were found according to the rupture status, in agreement with differences in biomechanical properties reported in the literature.
Resumo:
The problem of jointly estimating the number, the identities, and the data of active users in a time-varying multiuser environment was examined in a companion paper (IEEE Trans. Information Theory, vol. 53, no. 9, September 2007), at whose core was the use of the theory of finite random sets on countable spaces. Here we extend that theory to encompass the more general problem of estimating unknown continuous parameters of the active-user signals. This problem is solved here by applying the theory of random finite sets constructed on hybrid spaces. We doso deriving Bayesian recursions that describe the evolution withtime of a posteriori densities of the unknown parameters and data.Unlike in the above cited paper, wherein one could evaluate theexact multiuser set posterior density, here the continuous-parameter Bayesian recursions do not admit closed-form expressions. To circumvent this difficulty, we develop numerical approximationsfor the receivers that are based on Sequential Monte Carlo (SMC)methods (“particle filtering”). Simulation results, referring to acode-divisin multiple-access (CDMA) system, are presented toillustrate the theory.
Resumo:
Wireless “MIMO” systems, employing multiple transmit and receive antennas, promise a significant increase of channel capacity, while orthogonal frequency-division multiplexing (OFDM) is attracting a good deal of attention due to its robustness to multipath fading. Thus, the combination of both techniques is an attractive proposition for radio transmission. The goal of this paper is the description and analysis of a new and novel pilot-aided estimator of multipath block-fading channels. Typical models leading to estimation algorithms assume the number of multipath components and delays to be constant (and often known), while their amplitudes are allowed to vary with time. Our estimator is focused instead on the more realistic assumption that the number of channel taps is also unknown and varies with time following a known probabilistic model. The estimation problem arising from these assumptions is solved using Random-Set Theory (RST), whereby one regards the multipath-channel response as a single set-valued random entity.Within this framework, Bayesian recursive equations determine the evolution with time of the channel estimator. Due to the lack of a closed form for the solution of Bayesian equations, a (Rao–Blackwellized) particle filter (RBPF) implementation ofthe channel estimator is advocated. Since the resulting estimator exhibits a complexity which grows exponentially with the number of multipath components, a simplified version is also introduced. Simulation results describing the performance of our channel estimator demonstrate its effectiveness.
Resumo:
Analytical results harmonisation is investigated in this study to provide an alternative to the restrictive approach of analytical methods harmonisation which is recommended nowadays for making possible the exchange of information and then for supporting the fight against illicit drugs trafficking. Indeed, the main goal of this study is to demonstrate that a common database can be fed by a range of different analytical methods, whatever the differences in levels of analytical parameters between these latter ones. For this purpose, a methodology making possible the estimation and even the optimisation of results similarity coming from different analytical methods was then developed. In particular, the possibility to introduce chemical profiles obtained with Fast GC-FID in a GC-MS database is studied in this paper. By the use of the methodology, the similarity of results coming from different analytical methods can be objectively assessed and the utility in practice of database sharing by these methods can be evaluated, depending on profiling purposes (evidential vs. operational perspective tool). This methodology can be regarded as a relevant approach for database feeding by different analytical methods and puts in doubt the necessity to analyse all illicit drugs seizures in one single laboratory or to implement analytical methods harmonisation in each participating laboratory.
Resumo:
In this paper, we introduce a pilot-aided multipath channel estimator for Multiple-Input Multiple-Output (MIMO) Orthogonal Frequency Division Multiplexing (OFDM) systems. Typical estimation algorithms assume the number of multipath components and delays to be known and constant, while theiramplitudes may vary in time. In this work, we focus on the more realistic assumption that also the number of channel taps is unknown and time-varying. The estimation problem arising from this assumption is solved using Random Set Theory (RST), which is a probability theory of finite sets. Due to the lack of a closed form of the optimal filter, a Rao-Blackwellized Particle Filter (RBPF) implementation of the channel estimator is derived. Simulation results demonstrate the estimator effectiveness.
Resumo:
BACKGROUND: Tests for recent infections (TRIs) are important for HIV surveillance. We have shown that a patient's antibody pattern in a confirmatory line immunoassay (Inno-Lia) also yields information on time since infection. We have published algorithms which, with a certain sensitivity and specificity, distinguish between incident (< = 12 months) and older infection. In order to use these algorithms like other TRIs, i.e., based on their windows, we now determined their window periods. METHODS: We classified Inno-Lia results of 527 treatment-naïve patients with HIV-1 infection < = 12 months according to incidence by 25 algorithms. The time after which all infections were ruled older, i.e. the algorithm's window, was determined by linear regression of the proportion ruled incident in dependence of time since infection. Window-based incident infection rates (IIR) were determined utilizing the relationship 'Prevalence = Incidence x Duration' in four annual cohorts of HIV-1 notifications. Results were compared to performance-based IIR also derived from Inno-Lia results, but utilizing the relationship 'incident = true incident + false incident' and also to the IIR derived from the BED incidence assay. RESULTS: Window periods varied between 45.8 and 130.1 days and correlated well with the algorithms' diagnostic sensitivity (R(2) = 0.962; P<0.0001). Among the 25 algorithms, the mean window-based IIR among the 748 notifications of 2005/06 was 0.457 compared to 0.453 obtained for performance-based IIR with a model not correcting for selection bias. Evaluation of BED results using a window of 153 days yielded an IIR of 0.669. Window-based IIR and performance-based IIR increased by 22.4% and respectively 30.6% in 2008, while 2009 and 2010 showed a return to baseline for both methods. CONCLUSIONS: IIR estimations by window- and performance-based evaluations of Inno-Lia algorithm results were similar and can be used together to assess IIR changes between annual HIV notification cohorts.
Resumo:
Rockfall propagation areas can be determined using a simple geometric rule known as shadow angle or energy line method based on a simple Coulomb frictional model implemented in the CONEFALL computer program. Runout zones are estimated from a digital terrain model (DTM) and a grid file containing the cells representing rockfall potential source areas. The cells of the DTM that are lowest in altitude and located within a cone centered on a rockfall source cell belong to the potential propagation area associated with that grid cell. In addition, the CONEFALL method allows estimation of mean and maximum velocities and energies of blocks in the rockfall propagation areas. Previous studies indicate that the slope angle cone ranges from 27° to 37° depending on the assumptions made, i.e. slope morphology, probability of reaching a point, maximum run-out, field observations. Different solutions based on previous work and an example of an actual rockfall event are presented here.
Resumo:
AbstractFor a wide range of environmental, hydrological, and engineering applications there is a fast growing need for high-resolution imaging. In this context, waveform tomographic imaging of crosshole georadar data is a powerful method able to provide images of pertinent electrical properties in near-surface environments with unprecedented spatial resolution. In contrast, conventional ray-based tomographic methods, which consider only a very limited part of the recorded signal (first-arrival traveltimes and maximum first-cycle amplitudes), suffer from inherent limitations in resolution and may prove to be inadequate in complex environments. For a typical crosshole georadar survey the potential improvement in resolution when using waveform-based approaches instead of ray-based approaches is in the range of one order-of- magnitude. Moreover, the spatial resolution of waveform-based inversions is comparable to that of common logging methods. While in exploration seismology waveform tomographic imaging has become well established over the past two decades, it is comparably still underdeveloped in the georadar domain despite corresponding needs. Recently, different groups have presented finite-difference time-domain waveform inversion schemes for crosshole georadar data, which are adaptations and extensions of Tarantola's seminal nonlinear generalized least-squares approach developed for the seismic case. First applications of these new crosshole georadar waveform inversion schemes on synthetic and field data have shown promising results. However, there is little known about the limits and performance of such schemes in complex environments. To this end, the general motivation of my thesis is the evaluation of the robustness and limitations of waveform inversion algorithms for crosshole georadar data in order to apply such schemes to a wide range of real world problems.One crucial issue to making applicable and effective any waveform scheme to real-world crosshole georadar problems is the accurate estimation of the source wavelet, which is unknown in reality. Waveform inversion schemes for crosshole georadar data require forward simulations of the wavefield in order to iteratively solve the inverse problem. Therefore, accurate knowledge of the source wavelet is critically important for successful application of such schemes. Relatively small differences in the estimated source wavelet shape can lead to large differences in the resulting tomograms. In the first part of my thesis, I explore the viability and robustness of a relatively simple iterative deconvolution technique that incorporates the estimation of the source wavelet into the waveform inversion procedure rather than adding additional model parameters into the inversion problem. Extensive tests indicate that this source wavelet estimation technique is simple yet effective, and is able to provide remarkably accurate and robust estimates of the source wavelet in the presence of strong heterogeneity in both the dielectric permittivity and electrical conductivity as well as significant ambient noise in the recorded data. Furthermore, our tests also indicate that the approach is insensitive to the phase characteristics of the starting wavelet, which is not the case when directly incorporating the wavelet estimation into the inverse problem.Another critical issue with crosshole georadar waveform inversion schemes which clearly needs to be investigated is the consequence of the common assumption of frequency- independent electromagnetic constitutive parameters. This is crucial since in reality, these parameters are known to be frequency-dependent and complex and thus recorded georadar data may show significant dispersive behaviour. In particular, in the presence of water, there is a wide body of evidence showing that the dielectric permittivity can be significantly frequency dependent over the GPR frequency range, due to a variety of relaxation processes. The second part of my thesis is therefore dedicated to the evaluation of the reconstruction limits of a non-dispersive crosshole georadar waveform inversion scheme in the presence of varying degrees of dielectric dispersion. I show that the inversion algorithm, combined with the iterative deconvolution-based source wavelet estimation procedure that is partially able to account for the frequency-dependent effects through an "effective" wavelet, performs remarkably well in weakly to moderately dispersive environments and has the ability to provide adequate tomographic reconstructions.