7 resultados para feature based cost

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We introduce a flexible technique for interactive exploration of vector field data through classification derived from user-specified feature templates. Our method is founded on the observation that, while similar features within the vector field may be spatially disparate, they share similar neighborhood characteristics. Users generate feature-based visualizations by interactively highlighting well-accepted and domain specific representative feature points. Feature exploration begins with the computation of attributes that describe the neighborhood of each sample within the input vector field. Compilation of these attributes forms a representation of the vector field samples in the attribute space. We project the attribute points onto the canonical 2D plane to enable interactive exploration of the vector field using a painting interface. The projection encodes the similarities between vector field points within the distances computed between their associated attribute points. The proposed method is performed at interactive rates for enhanced user experience and is completely flexible as showcased by the simultaneous identification of diverse feature types.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The environment where galaxies are found heavily influences their evolution. Close groupings, like the ones in the cores of galaxy clusters or compact groups, evolve in ways far more dramatic than their isolated counterparts. We have conducted a multi-wavelength study of Hickson Compact Group 7 (HCG 7), consisting of four giant galaxies: three spirals and one lenticular. We use Hubble Space Telescope (HST) imaging to identify and characterize the young and old star cluster populations. We find young massive clusters (YMCs) mostly in the three spirals, while the lenticular features a large, unimodal population of globular clusters (GCs) but no detectable clusters with ages less than a few Gyr. The spatial and approximate age distributions of the similar to 300 YMCs and similar to 150 GCs thus hint at a regular star formation history in the group over a Hubble time. While at first glance the HST data show the galaxies as undisturbed, our deep ground-based, wide-field imaging that extends the HST coverage reveals faint signatures of stellar material in the intragroup medium (IGM). We do not, however, detect the IGM in H I or Chandra X-ray observations, signatures that would be expected to arise from major mergers. Despite this fact, we find that the H I gas content of the individual galaxies and the group as a whole are a third of the expected abundance. The appearance of quiescence is challenged by spectroscopy that reveals an intense ionization continuum in one galaxy nucleus, and post-burst characteristics in another. Our spectroscopic survey of dwarf galaxy members yields a single dwarf elliptical galaxy in an apparent stellar tidal feature. Based on all this information, we suggest an evolutionary scenario for HCG 7, whereby the galaxies convert most of their available gas into stars without the influence of major mergers and ultimately result in a dry merger. As the conditions governing compact groups are reminiscent of galaxies at intermediate redshift, we propose that HCGs are appropriate for studying galaxy evolution at z similar to 1-2.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This paper presents the formulation of a combinatorial optimization problem with the following characteristics: (i) the search space is the power set of a finite set structured as a Boolean lattice; (ii) the cost function forms a U-shaped curve when applied to any lattice chain. This formulation applies for feature selection in the context of pattern recognition. The known approaches for this problem are branch-and-bound algorithms and heuristics that explore partially the search space. Branch-and-bound algorithms are equivalent to the full search, while heuristics are not. This paper presents a branch-and-bound algorithm that differs from the others known by exploring the lattice structure and the U-shaped chain curves of the search space. The main contribution of this paper is the architecture of this algorithm that is based on the representation and exploration of the search space by new lattice properties proven here. Several experiments, with well known public data, indicate the superiority of the proposed method to the sequential floating forward selection (SFFS), which is a popular heuristic that gives good results in very short computational time. In all experiments, the proposed method got better or equal results in similar or even smaller computational time. (C) 2009 Elsevier Ltd. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes a filter-based algorithm for feature selection. The filter is based on the partitioning of the set of features into clusters. The number of clusters, and consequently the cardinality of the subset of selected features, is automatically estimated from data. The computational complexity of the proposed algorithm is also investigated. A variant of this filter that considers feature-class correlations is also proposed for classification problems. Empirical results involving ten datasets illustrate the performance of the developed algorithm, which in general has obtained competitive results in terms of classification accuracy when compared to state of the art algorithms that find clusters of features. We show that, if computational efficiency is an important issue, then the proposed filter May be preferred over their counterparts, thus becoming eligible to join a pool of feature selection algorithms to be used in practice. As an additional contribution of this work, a theoretical framework is used to formally analyze some properties of feature selection methods that rely on finding clusters of features. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A novel poly(p-xylylene), PPX, derivative bearing phenyl side groups was electrochemically synthesized in 85% yield. The polymer, poly(2-phenyl-p-xylylene) (PPPX), presented a major fraction (88%) soluble in common organic solvents. It showed to be thermally resistant up to 140 degrees C. UV-VIS analysis revealed an Egap of similar to 3.0 eV. Gas sensors made from thin films of CSA doped PPPX deposited on interdigitated electrodes exhibited significant changes in electrical conductance upon exposure to five carbonyl compounds: acetaldehyde, propionaldehyde. benzaldehyde, acetone and butanone. Three-dimensional plots of relative response vs. time of half-response vs. time of half-recovery showed good discrimination between the five carbonyl Compounds tested. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Microfluidic paper-based analytical devices (mu PADs) are a new class of point-of-care diagnostic devices that are inexpensive, easy to use, and designed specifically for use in developing countries. (To listen to a podcast about this feature, please go to the Analytical Chemistry multimedia page at pubs.acs.org/page/ancham/audio/index.html.)