729 resultados para Weighting


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a graph-based method to weight medical concepts in documents for the purposes of information retrieval. Medical concepts are extracted from free-text documents using a state-of-the-art technique that maps n-grams to concepts from the SNOMED CT medical ontology. In our graph-based concept representation, concepts are vertices in a graph built from a document, edges represent associations between concepts. This representation naturally captures dependencies between concepts, an important requirement for interpreting medical text, and a feature lacking in bag-of-words representations. We apply existing graph-based term weighting methods to weight medical concepts. Using concepts rather than terms addresses vocabulary mismatch as well as encapsulates terms belonging to a single medical entity into a single concept. In addition, we further extend previous graph-based approaches by injecting domain knowledge that estimates the importance of a concept within the global medical domain. Retrieval experiments on the TREC Medical Records collection show our method outperforms both term and concept baselines. More generally, this work provides a means of integrating background knowledge contained in medical ontologies into data-driven information retrieval approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In studies of germ cell transplantation, measureing tubule diameters and counting cells from different populations using antibodies as markers are very important. Manual measurement of tubule sizes and cell counts is a tedious and sanity grinding work. In this paper, we propose a new boundary weighting based tubule detection method. We first enhance the linear features of the input image and detect the approximate centers of tubules. Next, a boundary weighting transform is applied to the polar transformed image of each tubule region and a circular shortest path is used for the boundary detection. Then, ellipse fitting is carried out for tubule selection and measurement. The algorithm has been tested on a dataset consisting of 20 images, each having about 20 tubules. Experiments show that the detection results of our algorithm are very close to the results obtained manually. © 2013 IEEE.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent years, considerable research efforts have been directed to micro-array technologies and their role in providing simultaneous information on expression profiles for thousands of genes. These data, when subjected to clustering and classification procedures, can assist in identifying patterns and providing insight on biological processes. To understand the properties of complex gene expression datasets, graphical representations can be used. Intuitively, the data can be represented in terms of a bipartite graph, with weighted edges corresponding to gene-sample node couples in the dataset. Biologically meaningful subgraphs can be sought, but performance can be influenced both by the search algorithm, and, by the graph-weighting scheme and both merit rigorous investigation. In this paper, we focus on edge-weighting schemes for bipartite graphical representation of gene expression. Two novel methods are presented: the first is based on empirical evidence; the second on a geometric distribution. The schemes are compared for several real datasets, assessing efficiency of performance based on four essential properties: robustness to noise and missing values, discrimination, parameter influence on scheme efficiency and reusability. Recommendations and limitations are briefly discussed. Keywords: Edge-weighting; weighted graphs; gene expression; bi-clustering

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a new weighting function which is computationally simple and an approximation to the theoretically derived optimum weighting function shown in the literature. The proposed weighting function is perceptually motivated and provides improved vector quantization performance compared to several weighting functions proposed so far, for line spectrum frequency (LSF) parameter quantization of both clean and noisy speech data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a measurement of the top quark mass with t-tbar dilepton events produced in p-pbar collisions at the Fermilab Tevatron $\sqrt{s}$=1.96 TeV and collected by the CDF II detector. A sample of 328 events with a charged electron or muon and an isolated track, corresponding to an integrated luminosity of 2.9 fb$^{-1}$, are selected as t-tbar candidates. To account for the unconstrained event kinematics, we scan over the phase space of the azimuthal angles ($\phi_{\nu_1},\phi_{\nu_2}$) of neutrinos and reconstruct the top quark mass for each $\phi_{\nu_1},\phi_{\nu_2}$ pair by minimizing a $\chi^2$ function in the t-tbar dilepton hypothesis. We assign $\chi^2$-dependent weights to the solutions in order to build a preferred mass for each event. Preferred mass distributions (templates) are built from simulated t-tbar and background events, and parameterized in order to provide continuous probability density functions. A likelihood fit to the mass distribution in data as a weighted sum of signal and background probability density functions gives a top quark mass of $165.5^{+{3.4}}_{-{3.3}}$(stat.)$\pm 3.1$(syst.) GeV/$c^2$.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to describe the atmospheric turbulence which limits the resolution of long-exposure images obtained using ground-based large telescopes, a simplified model of a speckle pattern, reducing the complexity of calculating field-correlations of very high order, is presented. Focal plane correlations are used instead of correlations in the spatial frequency domain. General tripple correlations for a point source and for a binary are calculated and it is shown that they are not a strong function of the binary separation. For binary separations close to the diffraction limit of the telescope, the genuine triple correlation technique ensures a better SNR than the near-axis Knox-Thompson technique. The simplifications allow a complete analysis of the noise properties at all levels of light.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering has been the most popular method for data exploration. Clustering is partitioning the data set into sub-partitions based on some measures say the distance measure, each partition has its own significant information. There are a number of algorithms explored for this purpose, one such algorithm is the Particle Swarm Optimization(PSO) which is a population based heuristic search technique derived from swarm intelligence. In this paper we present an improved version of the Particle Swarm Optimization where, each feature of the data set is given significance accordingly by adding some random weights, which also minimizes the distortions in the dataset if any. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The experimental results shows that our proposed methodology performs significantly better than the previously performed experiments.